I am a Senior Data Scientist with 9 years of experience building AI training pipelines, data labeling workflows, and production ML systems across consulting, startups, finance, healthcare, and public sector programs. I specialize in LLMs, Retrieval Augmented Generation (RAG), graph-based retrieval, prompt engineering, embeddings, semantic search, and statistical analysis for model evaluation and quality scoring. I’m known for clear written communication, attention to detail, and decisive problem solving guided by well-documented guidelines and reviewer feedback. I have extensive experience delivering annotation QA, A/B testing, and content review processes in Python and SQL within cloud environments.

Carlo Reyes

I am a Senior Data Scientist with 9 years of experience building AI training pipelines, data labeling workflows, and production ML systems across consulting, startups, finance, healthcare, and public sector programs. I specialize in LLMs, Retrieval Augmented Generation (RAG), graph-based retrieval, prompt engineering, embeddings, semantic search, and statistical analysis for model evaluation and quality scoring. I’m known for clear written communication, attention to detail, and decisive problem solving guided by well-documented guidelines and reviewer feedback. I have extensive experience delivering annotation QA, A/B testing, and content review processes in Python and SQL within cloud environments.

Available to hire

I am a Senior Data Scientist with 9 years of experience building AI training pipelines, data labeling workflows, and production ML systems across consulting, startups, finance, healthcare, and public sector programs. I specialize in LLMs, Retrieval Augmented Generation (RAG), graph-based retrieval, prompt engineering, embeddings, semantic search, and statistical analysis for model evaluation and quality scoring.
I’m known for clear written communication, attention to detail, and decisive problem solving guided by well-documented guidelines and reviewer feedback. I have extensive experience delivering annotation QA, A/B testing, and content review processes in Python and SQL within cloud environments.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Senior Data Scientist at Cognizant
April 1, 2020 - Present
Led AI training and review workflow for LLM responses using Python, SQL, and rubric-based guidelines; improved annotation QA consistency and feedback quality; resolved ambiguous cases with decisive reviewer notes. Built GraphRAG pipelines with Neo4j, embeddings, and semantic search to retrieve tax guidance from large PDF corpora; enabled faster reviewer decisions and more reliable outputs aligned to policy constraints. Designed response rating criteria and sampling plans using statistics and A/B testing to compare prompts, track drift, and communicate results to cross-functional stakeholders. Implemented document processing for complex PDFs with Python, OCR-style parsing, and hierarchy extraction; produced normalized JSON outputs supporting labeling, audit trails, and downstream LLM training. Created cross-document entity linking with NER, Cypher queries, and NetworkX traversal to reduce duplicates and improve context for RAG evaluations during edge-case reviews. Shipped a multi-agent
Machine Learning Engineer at Attain Partners
May 1, 2018 - April 1, 2020
Built data labeling and QA routines for recommendation training datasets; improved consistency of session events and reduced noisy labels through audit samples and decisive reviewer feedback. Developed content-based recommenders with KNN, feature engineering, and embeddings; validated results with A/B testing to explain trade-offs for product decisions. Implemented collaborative filtering via neural matrix completion in PyTorch and optimized offline evaluation with cross-validation, ensuring repeatable scoring and clear error analysis for stakeholder reviews. Created session-based next-click prediction using Transformer models in PyTorch; documented dataset curation to ensure consistent application of definitions across experiments. Delivered healthcare IoT predictive maintenance models using gradient boosting and outlier detection; deployed ML pipelines in Azure ML and Databricks with automated training runs and monitoring to catch data quality issues early. Built an advertisement opt
Data Scientist at Maximus
June 1, 2017 - April 1, 2018
Built supervised learning models for program analytics, combining feature engineering with clear label definitions to maintain training data quality. Performed statistical analysis, hypothesis testing, and regression modeling to evaluate policy changes; produced memos that explained assumptions and actionable recommendations. Implemented text classification and NER baselines, conducted error analysis, and updated labeling guidance to align ambiguous language interpretations. Created repeatable ETL pipelines in Bash, Python, and PostgreSQL to improve data completeness checks and enable faster sampling for QA. Designed cross-validation and model selection routines for classification tasks; documented metrics and edge cases for consistent evaluation. Supported content review workflows by building SQL dashboards and templates for auditing records and tracing decisions back to source data. Coordinated with SMEs to resolve conflicting definitions and standardize taxonomy for labeling and rep
Junior Data Scientist at ITS
April 1, 2016 - January 31, 2017
Prepared and cleaned datasets using Python, pandas, and SQL with well-documented assumptions. Built baseline classification and regression models, and communicated results in clear written language for non-technical stakeholders. Assisted with data labeling, defined categories, and flagged edge cases early to enable decisive guidance from senior staff. Maintained small ETL scripts in Bash and Python for data pulls, supporting timely sampling for QA. Ran outlier detection and data validation checks in SQL, escalating anomalies with evidence and suggested fixes. Supported model evaluation with train/test splits and metric tracking, ensuring transparent reporting. Contributed to simple REST data pulls and JSON parsing; assisted with basic indexing requests and query tuning in PostgreSQL. Created training guides for interns on labeling rules and QA checks, improving consistency and reducing rework.

Education

Master of Science in Computer Science at Stratford University
October 1, 2017 - April 10, 2026
Bachelor of Science in Computer Science at Stratford University
May 1, 2015 - April 10, 2026

Qualifications

AWS Certified Cloud Practitioner
January 11, 2030 - April 10, 2026
Microsoft Certified: Azure Data Scientist Associate
January 11, 2030 - April 10, 2026

Industry Experience

Financial Services, Government, Healthcare, Professional Services, Software & Internet