Analytical AI evaluation specialist with hands-on experience in scenario-based evaluation, gold-standard definition, and reasoning
analysis for large language models and autonomous AI agents.
Experienced in reviewing complex evaluation setups, identifying hidden assumptions, edge cases, and logical gaps, and translating those findings into clear evaluation criteria and actionable feedback for research and product teams.
Particularly interested in roles focused on AI alignment, agent evaluation frameworks, and human-in-the-loop quality assurance.
Experience Level
Language
Work Experience
Education
Qualifications
Industry Experience
This repository documents my practical experience in the evaluation, alignment, and qualitative analysis of large language models (LLMs) and speech systems.
The work presented here focuses on applied evaluation rather than model training, covering tasks related to alignment, reasoning quality, linguistic robustness, and real-world language behavior.
All materials are synthetic and designed exclusively for professional portfolio purposes.
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Barcelona today.