Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am a data scientist with 7+ years of experience delivering value with data science and AI projects. I analyze data and implement machine learning algorithms to gain meaningful insights, leveraging Python and SQL to turn data into action. I enjoy building scalable solutions that impact business outcomes and communicate complex results clearly to both technical and non-technical stakeholders. In my leadership roles as Co-Founder and Data Lead, I guide the development of AI-powered products, dashboards, and data platforms. I collaborate with cross-functional teams to translate business needs into practical data-driven solutions, while managing AI usage and costs to align with the company vision.…I am a data scientist with 7+ years of experience delivering value with data science and AI projects. I analyze data and implement machine learning algorithms to gain meaningful insights, leveraging Python and SQL to turn data into action. I enjoy building scalable solutions that impact business outcomes and communicate complex results clearly to both technical and non-technical stakeholders. In my leadership roles as Co-Founder and Data Lead, I guide the development of AI-powered products, dashboards, and data platforms. I collaborate with cross-functional teams to translate business needs into practical data-driven solutions, while managing AI usage and costs to align with the company vision.

Cornellius Yudha Wijaya

Data Scientist, AI Engineer, Product Designer, +1





I am a data scientist with 7+ years of experience delivering value with data science and AI projects. I analyze data and implement machine learning algorithms to gain meaningful insights, leveraging Python and SQL to turn data into action. I enjoy building scalable solutions that impact business outcomes and communicate complex results clearly to both technical and non-technical stakeholders. In my leadership roles as Co-Founder and Data Lead, I guide the development of AI-powered products, dashboards, and data platforms. I collaborate with cross-functional teams to translate business needs into practical data-driven solutions, while managing AI usage and costs to align with the company vision.…I am a data scientist with 7+ years of experience delivering value with data science and AI projects. I analyze data and implement machine learning algorithms to gain meaningful insights, leveraging Python and SQL to turn data into action. I enjoy building scalable solutions that impact business outcomes and communicate complex results clearly to both technical and non-technical stakeholders. In my leadership roles as Co-Founder and Data Lead, I guide the development of AI-powered products, dashboards, and data platforms. I collaborate with cross-functional teams to translate business needs into practical data-driven solutions, while managing AI usage and costs to align with the company vision.

Available to hire

I am a data scientist with 7+ years of experience delivering value with data science and AI projects. I analyze data and implement machine learning algorithms to gain meaningful insights, leveraging Python and SQL to turn data into action. I enjoy building scalable solutions that impact business outcomes and communicate complex results clearly to both technical and non-technical stakeholders.

In my leadership roles as Co-Founder and Data Lead, I guide the development of AI-powered products, dashboards, and data platforms. I collaborate with cross-functional teams to translate business needs into practical data-driven solutions, while managing AI usage and costs to align with the company vision.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

Indonesian

Fluent

English

Advanced

Work Experience

Co-Founder (Chief Product Officer and Data Lead) at GoArif Startup

January 1, 2024 - Present

Led the development of Data Science Dashboard features, including data visualization, statistical analysis, AI Agent development, Speech-to-Text, and an LLM-based chat platform with RAG capability. Produced Product Requirements Documents and steered the AI product to align with the company vision. Managed AI product usage and related costs. Collaborated with six cross-functional team members (developers, designers, project manager) to deliver four major features about 15% ahead of deadlines.

AI Engineer (Contract-based) at PT. Sobat Sepadan

February 1, 2024 - June 1, 2025

Spearheaded the development of a Financial-AI application leveraging LLM services (OpenAI GPT-4, Gemini), achieving 98% accuracy in business classification and reducing manual processing time by 50%. Designed and deployed scalable AI architectures for three core product features (Document classification, document extraction, document summarization), aligning 100% with the company vision. Ensured AI models were seamlessly integrated with the existing system and translated business requirements into the necessary solutions.

Data Scientist Assistant Manager at Allianz Life Indonesia

June 1, 2020 - December 1, 2024

Generated an end-to-end Data Science working framework and implemented MLOps design within the framework. Successfully implemented and supervised various ML models to improve business quality, including Propensity-to-Buy models (reaching ~150% of annual target), a customer segmentation model with actionable insights to support Propensity-to-Buy, churn prediction reducing churn by 3%, and a fraud detection project reducing claim fraud by 1%. Led and mentored junior data scientists, developed MicroStrategy dashboards for business users, and implemented AWS cloud-based data science services for end-to-end use cases.

Education

Master of Science in Evolutionary Biology at Uppsala University

August 1, 2016 - June 1, 2018

Bachelor of Science in Biology at Universitas Gadjah Mada

August 1, 2011 - October 1, 2015

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Financial Services

End-to-End Machine Learning Project

End-to-End Machine Learning Project: Telco Customer Churn Prediction

A beginner-friendly, business-oriented machine learning project that builds a churn prediction model from start to finish. The goal is not just “training a model”, but delivering something a company could actually use: a repeatable process that turns customer data into actionable churn risk signals.

What we want to achieve

Retention impact: identify customers likely to churn so the business can prioritize outreach, offers, or service recovery.
Clear success measure: optimize for Recall on churners, because missing a true churner can be more costly than contacting a customer who would have stayed.
Repeatable workflow: a structured project that can be re-run as new data arrives.

Project structure (end-to-end)

Business understanding
Define the decision the business will make (who to target) and the primary metric (Recall).
Data collection and preparation
Use the Telco Customer Churn dataset (about 7k customers) and ensure the data is clean and usable for modeling.
Model build (baseline)
Train an initial model to establish a baseline performance and generate early churn risk scores.
Optimization
Improve results through better feature choices and parameter tuning, guided by validation results and business trade-offs.
Deployment concept
Package the outcome so it can be used consistently (for example: scoring new customers regularly and producing a prioritized retention list).

Deliverables

A churn risk scoring output (customer-level risk)
A short performance summary with Recall emphasized
A documented workflow that can be extended (new features, new models, monitoring)

References

KDnuggets: Step-by-Step Tutorial to Building Your First Machine Learning Model (June 10, 2024)
https://www.twine.net/signin
Dataset: Telco Customer Churn (Kaggle)
https://www.twine.net/signin

Data AnalystData Scientist

Creating a Useful Voice-Activated Fully Local RAG System

A fully local, voice-first Retrieval-Augmented Generation (RAG) assistant that runs end-to-end on your machine. The system listens for a wake word, records and transcribes speech locally, retrieves relevant context from a PDF knowledge base using vector similarity search, generates a grounded answer with a local LLM, and converts the answer into a playable audio response.

What this project does

Voice input (recording): Record microphone audio to a WAV file.
Speech-to-text (local transcription): Transcribe the audio locally using Whisper.
Wake word activation: Gate the system behind a wake word, detected via embedding similarity to remain robust to transcription variations.
Knowledge base preparation:
- Load and extract text from a PDF handbook.
- Split the text into overlapping chunks for retrieval.
- Embed each chunk and store it in a local vector database.
Retrieval: Embed the user query and retrieve the top-k most relevant chunks from the vector store.
Generation (local LLM): Generate a response using a local chat model, grounded by the retrieved chunks.
Text-to-speech (local TTS): Convert the generated response to speech, save it as an audio file, and play it back.

Core components

Voice Receiver and Transcription
- Audio recording and playback
- Local transcription for user input
- Wake word detection using embedding similarity + cosine similarity
Knowledge Base
- PDF extraction
- Chunking with overlap
- Local vector storage and retrieval
Audio File Response Generation
- Local response generation
- Local text-to-speech and playback

Tech stack

Audio I/O: sounddevice
Speech-to-text: Whisper (base.en)
Embeddings: Sentence Transformers (all-MiniLM-L6-v2)
Vector DB: ChromaDB (local)
Chunking: RecursiveCharacterTextSplitter
Similarity: cosine similarity (sklearn.metrics.pairwise)
Local LLM: Qwen/Qwen1.5-0.5B-Chat (Hugging Face transformers)
Text-to-speech: suno/bark-small (Hugging Face transformers)

Installation

Create and activate a virtual environment:

python -m venv rag-env-audio
# Windows: rag-env-audio\Scripts\activate
# macOS/Linux: source rag-env-audio/bin/activate

Suggested project structure

.
├─ app.py                 # voice → RAG → audio response pipeline
├─ dataset/
│  └─ Insurance_Handbook_20103.pdf
└─ chroma_db/             # persisted vector store (generated)

References

Article: https://www.twine.net/signin

AI DeveloperData Scientist

Simple RAG Implementation With Contextual Semantic Search

A minimal, end-to-end reference implementation of a Retrieval-Augmented Generation (RAG) pipeline that grounds LLM responses in your documents using contextual semantic search. This project demonstrates how to reduce hallucinations by retrieving the most relevant passages from a PDF knowledge base and injecting them as context at generation time.

What this project does

Ingest: Extract text from one or more PDF files.
Chunk: Split raw text into overlapping, semantically meaningful chunks.
Embed + Index: Convert chunks into vector embeddings and store them in a local vector database.
Retrieve: For each user query, fetch the top-k most similar chunks via vector similarity search.
Generate: Provide the retrieved context to an LLM to produce a grounded answer.

Key capabilities

PDF ingestion with PyPDF2
Chunking with LangChain RecursiveCharacterTextSplitter
Embedding generation with Sentence Transformers (all-MiniLM-L6-v2)
Vector storage and retrieval with Chroma (persistent local store for prototyping)
LLM integration with LiteLLM (example: Gemini gemini-1.5-flash)

Default configuration

Chunk size: 500
Chunk overlap: 50
Retrieval: top_k = 5
Vector DB persistence path: chroma_db
Example collection name: knowledge_base

Installation

pip install -q chromadb pypdf2 sentence-transformers litellm langchain

Environment variables

export HUGGINGFACE_TOKEN="YOUR_TOKEN"
export GEMINI_API_KEY="YOUR_KEY"

Suggested project structure

.
├─ dataset/          # PDF files used as the knowledge base
├─ chroma_db/        # local persisted vector store (generated)
├─ notebooks/        # tutorial / experiments
└─ src/              # reusable pipeline components (optional refactor)

Intended use

A clean baseline for learning and demos of RAG fundamentals
Internal knowledge-base Q&A prototypes (policies, manuals, SOPs)
A starting point to add reranking, evaluation, guardrails, and observability

References

Tutorial article: https://www.twine.net/signin
Repository: https://www.twine.net/signin

AI DeveloperData Scientist

Hire a Data Scientist

We have the best data scientist experts on Twine. Hire a data scientist in Jakarta today.

Find a Data Scientist

AI Engineers for hire in Jakarta, Indonesia

Data Scientists for hire in Jakarta, Indonesia

Product Designers for hire in Jakarta, Indonesia

Writers for hire in Jakarta, Indonesia