I'm Kai Cao, a Machine Learning Engineer specializing in large language models, retrieval systems, and experiment-driven performance optimization. I design and evaluate LLM-powered retrieval pipelines, build benchmarking frameworks without labeled data, fine-tune embeddings and rerankers, and integrate modern ML systems into scalable cloud-native backends. My background spans AI-powered search, NLP-driven systems, microservice architecture, event-driven data pipelines, and secure ML deployment on AWS. I work across the full lifecycle—from research ideation and prototyping to production-scale inference, observability, and continuous improvement. I thrive in fast-moving environments where ambiguity requires clear experimental thinking, rigorous evaluation, and rapid iteration.

Kai Cao

I'm Kai Cao, a Machine Learning Engineer specializing in large language models, retrieval systems, and experiment-driven performance optimization. I design and evaluate LLM-powered retrieval pipelines, build benchmarking frameworks without labeled data, fine-tune embeddings and rerankers, and integrate modern ML systems into scalable cloud-native backends. My background spans AI-powered search, NLP-driven systems, microservice architecture, event-driven data pipelines, and secure ML deployment on AWS. I work across the full lifecycle—from research ideation and prototyping to production-scale inference, observability, and continuous improvement. I thrive in fast-moving environments where ambiguity requires clear experimental thinking, rigorous evaluation, and rapid iteration.

Available to hire

I’m Kai Cao, a Machine Learning Engineer specializing in large language models, retrieval systems, and experiment-driven performance optimization. I design and evaluate LLM-powered retrieval pipelines, build benchmarking frameworks without labeled data, fine-tune embeddings and rerankers, and integrate modern ML systems into scalable cloud-native backends.

My background spans AI-powered search, NLP-driven systems, microservice architecture, event-driven data pipelines, and secure ML deployment on AWS. I work across the full lifecycle—from research ideation and prototyping to production-scale inference, observability, and continuous improvement. I thrive in fast-moving environments where ambiguity requires clear experimental thinking, rigorous evaluation, and rapid iteration.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent
Finnish
Intermediate
Vietnamese
Intermediate

Work Experience

Software Engineer at Datura AI
November 1, 2023 - Present
Developed retrieval pipelines that combine dense embeddings, semantic similarity models, hybrid lexical signals, and multi-stage reranking for more accurate query resolution. Designed evaluation frameworks for retrieval relevance without labeled data, leveraging contrastive heuristics, cluster-based validation, and self-supervised metrics aligned with modern RAG research. Built adaptive, agent-like query flows that switch between fast heuristics and deeper multi-hop search based on query ambiguity and confidence scoring. Fine-tuned encoder/embedding models in PyTorch to improve semantic clustering and ranking, and created microservice-based ML infrastructure on AWS (Lambda, ECS, S3, DynamoDB, EC2) for scalable indexing, feature extraction, and low-latency inference. Led end-to-end experimentation pipelines (ablation automation, dataset sampling, statistical comparisons) to guide model improvements. Integrated observability across pipelines with logs, metrics, and tracing to diagnose re
Software Engineer at RAFA AI
October 1, 2018 - October 1, 2023
Developed evaluation methods for retrieval, ranking, and LLM-driven outputs in the absence of labeled datasets using synthetic generation, heuristic scoring, and stability-based metrics. Built embeddings pipelines and vector indexing prototypes, exploring dense retrieval, hybrid search, and metadata-driven contextual ranking. Designed and deployed ML microservices (Python, Node.js, Docker) on AWS (Lambda, S3, ECS, Step Functions) to support scalable inference and data processing. Created automated data pipelines for text normalization, chunking, feature extraction, and embeddings generation for downstream retrieval and LLM fine-tuning. Ensured system reliability via automated testing, monitoring, and CI/CD deployments. Implemented IAM-based permissions, encrypted model storage, and controlled inference environments. Supported research sprints on LLM fine-tuning, instruction-following behavior, embeddings adaptation, and reranker improvements. Integrated ML components into production st
Software Engineer at Smartifik Oy
August 1, 2017 - September 1, 2018
Built NLP-driven features for conversational AI and online assistant solutions, improving intent detection, classification accuracy, and semantic matching. Designed text-processing pipelines and retrieval logic for Q&A within chatbot knowledge sources. Integrated ML capabilities into backend microservices using Django, Node.js, and C#/.NET for real-time dialogue understanding. Implemented automated data ingestion and semantic analysis workflows with Python and event-driven cloud components. Improved quality and reliability through structured logging, monitoring dashboards, and root-cause analysis of conversational failures. Collaborated with frontend teams to deliver React-based interfaces surface AI-driven assistance. Applied Agile practices, code reviews, and automated testing for high-quality feature delivery.
Associate Software Engineer at Smartifik Oy
September 1, 2015 - August 1, 2017
Developed backend components supporting early conversational-AI and NLP-driven automation. Implemented data-processing utilities, text-normalization scripts, and foundational retrieval logic for chatbot knowledge sources. Built REST APIs and microservices using Python, C#, and JavaScript to serve NLP features and user-facing integrations. Introduced QA automation, unit tests, and performance improvements for system stability. Contributed to early experimentation on semantic text processing and question-answering behavior.

Education

Master of Computer Applications (MCA) at University of Oulu
January 11, 2030 - August 1, 2015
Bachelor of Science (BS) at Vietnamese-German University
January 11, 2030 - August 1, 2013

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet