Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hello, I’m David Carter Antony, a STEM-focused AI/ML specialist with 3+ years of experience evaluating and improving large language model outputs across mathematics, physics, computer science, and statistics. I hold a Master of Science in Machine Learning from the University of Texas at Austin and have a strong foundation in calibration theory, statistical modeling, and quantitative analysis. I work remotely and have a proven track record auditing multi-step reasoning, detecting hallucinations, validating mathematical correctness, and delivering structured RLHF-style feedback used in model retraining workflows. I’m known for disciplined rubric adherence, high inter-rater reliability, and precise technical writing in remote evaluation environments. I excel at surfacing edge cases in symbolic reasoning, probabilistic inference, and algorithmic complexity, and I contribute clear, actionable write-ups for model improvement and safety.…Hello, I’m David Carter Antony, a STEM-focused AI/ML specialist with 3+ years of experience evaluating and improving large language model outputs across mathematics, physics, computer science, and statistics. I hold a Master of Science in Machine Learning from the University of Texas at Austin and have a strong foundation in calibration theory, statistical modeling, and quantitative analysis. I work remotely and have a proven track record auditing multi-step reasoning, detecting hallucinations, validating mathematical correctness, and delivering structured RLHF-style feedback used in model retraining workflows. I’m known for disciplined rubric adherence, high inter-rater reliability, and precise technical writing in remote evaluation environments. I excel at surfacing edge cases in symbolic reasoning, probabilistic inference, and algorithmic complexity, and I contribute clear, actionable write-ups for model improvement and safety.

David Carter Antony

Data Scientist, AI Engineer, AI Prompt Engineer, +2





Hello, I’m David Carter Antony, a STEM-focused AI/ML specialist with 3+ years of experience evaluating and improving large language model outputs across mathematics, physics, computer science, and statistics. I hold a Master of Science in Machine Learning from the University of Texas at Austin and have a strong foundation in calibration theory, statistical modeling, and quantitative analysis. I work remotely and have a proven track record auditing multi-step reasoning, detecting hallucinations, validating mathematical correctness, and delivering structured RLHF-style feedback used in model retraining workflows. I’m known for disciplined rubric adherence, high inter-rater reliability, and precise technical writing in remote evaluation environments. I excel at surfacing edge cases in symbolic reasoning, probabilistic inference, and algorithmic complexity, and I contribute clear, actionable write-ups for model improvement and safety.…Hello, I’m David Carter Antony, a STEM-focused AI/ML specialist with 3+ years of experience evaluating and improving large language model outputs across mathematics, physics, computer science, and statistics. I hold a Master of Science in Machine Learning from the University of Texas at Austin and have a strong foundation in calibration theory, statistical modeling, and quantitative analysis. I work remotely and have a proven track record auditing multi-step reasoning, detecting hallucinations, validating mathematical correctness, and delivering structured RLHF-style feedback used in model retraining workflows. I’m known for disciplined rubric adherence, high inter-rater reliability, and precise technical writing in remote evaluation environments. I excel at surfacing edge cases in symbolic reasoning, probabilistic inference, and algorithmic complexity, and I contribute clear, actionable write-ups for model improvement and safety.

Available to hire

Hello, I’m David Carter Antony, a STEM-focused AI/ML specialist with 3+ years of experience evaluating and improving large language model outputs across mathematics, physics, computer science, and statistics. I hold a Master of Science in Machine Learning from the University of Texas at Austin and have a strong foundation in calibration theory, statistical modeling, and quantitative analysis. I work remotely and have a proven track record auditing multi-step reasoning, detecting hallucinations, validating mathematical correctness, and delivering structured RLHF-style feedback used in model retraining workflows.

I’m known for disciplined rubric adherence, high inter-rater reliability, and precise technical writing in remote evaluation environments. I excel at surfacing edge cases in symbolic reasoning, probabilistic inference, and algorithmic complexity, and I contribute clear, actionable write-ups for model improvement and safety.

Skills

Experience Level

Expert

Expert

Expert

Expert

Language

English

Fluent

German

Fluent

Work Experience

AI Training Data Evaluator — STEM Specialist (Remote Contract) at Remotasks / Scale AI

January 1, 2023 - Present

Evaluate 700–900 LLM-generated responses per month across advanced mathematics, physics problem-solving, statistical inference, and computer science tasks using structured multi-dimensional rubrics. Audit multi-step quantitative reasoning chains, verify algebraic transformations, calculus derivations, probabilistic reasoning, algorithmic correctness, and unit consistency. Perform blind comparative ranking for RLHF pipelines, maintain high inter-rater agreement across evaluated output pairs, identify hallucinated theorems and fabrication, and produce structured written justifications for model retraining. Conduct adversarial prompt testing to surface edge cases in symbolic reasoning, probabilistic inference, and algorithmic analysis. Maintain low revision rates and adapt to rubric revisions.

Junior Data Scientist at DataBridge Analytics

July 1, 2022 - December 1, 2022

Designed supervised learning pipelines for structured datasets, improving churn-prediction F1 from 0.71 to 0.83 through feature engineering and hyperparameter optimization. Applied stratified cross-validation, class-imbalance handling, and threshold calibration to enhance generalization. Conducted exploratory data analysis across 15+ heterogeneous datasets, developed Python ETL workflows, and participated in Git-based reviews to ensure statistical soundness prior to deployment.

Data Annotation Specialist (Remote Contract) at Appen

September 1, 2021 - June 1, 2022

Labeled and cross-validated 5,000+ text and code samples supporting NLP model training, maintaining high agreement with gold references. Evaluated AI-generated code for algorithmic correctness, logical consistency, and language-specific constraints; flagged mathematical inaccuracies and policy violations. Contributed to the refinement of technical annotation guidelines used by a 12-person evaluation team.