Analytical AI evaluation specialist with hands-on experience in scenario-based evaluation, gold-standard definition, and reasoning analysis for large language models and autonomous AI agents. Experienced in reviewing complex evaluation setups, identifying hidden assumptions, edge cases, and logical gaps, and translating those findings into clear evaluation criteria and actionable feedback for research and product teams. Particularly interested in roles focused on AI alignment, agent evaluation frameworks, and human-in-the-loop quality assurance.

Carles Rodríguez

Analytical AI evaluation specialist with hands-on experience in scenario-based evaluation, gold-standard definition, and reasoning analysis for large language models and autonomous AI agents. Experienced in reviewing complex evaluation setups, identifying hidden assumptions, edge cases, and logical gaps, and translating those findings into clear evaluation criteria and actionable feedback for research and product teams. Particularly interested in roles focused on AI alignment, agent evaluation frameworks, and human-in-the-loop quality assurance.

Available to hire

Analytical AI evaluation specialist with hands-on experience in scenario-based evaluation, gold-standard definition, and reasoning
analysis for large language models and autonomous AI agents.
Experienced in reviewing complex evaluation setups, identifying hidden assumptions, edge cases, and logical gaps, and translating those findings into clear evaluation criteria and actionable feedback for research and product teams.
Particularly interested in roles focused on AI alignment, agent evaluation frameworks, and human-in-the-loop quality assurance.

See more

Language

Spanish; Castilian
Fluent
Catalan; Valencian
Fluent
English
Advanced

Work Experience

AI Evaluation Specialist / Research Analyst at Barcelona Supercomputing Center (BSC)
April 1, 2025 - December 1, 2025
Designed and reviewed structured evaluation scenarios to assess LLM reasoning, instruction-following, and factual accuracy. Defined expected agent behaviors (gold standards) and scoring criteria to evaluate decision paths and task execution. Identified failure modes, inconsistencies, and implicit assumptions; translated findings into clear, actionable feedback for multidisciplinary teams to improve model behavior and evaluation design.
AI Language & Model Evaluation Specialist (Remote) at Outlier / Thoth AI / Data Force
June 1, 2024 - February 28, 2025
Evaluated AI-generated outputs in Spanish and English for logical coherence, semantic accuracy, and instruction adherence. Conducted scenario validation and edge-case analysis for LLM tasks. Applied quality guidelines, checklists, and scoring frameworks to ensure evaluation quality and reproducibility across large-scale evaluation workflows. Contributed qualitative feedback to refine evaluation logic and task design.
Quality & Policy Specialist at TELUS International
January 1, 2019 - December 31, 2023
Performed quality reviews and auditing of complex content under strict policy and decision frameworks. Participated in calibration, consistency checks, and process-improvement initiatives. Maintained high accuracy in decision-making environments involving ambiguity and risk assessment.

Education

Bachelor of Sociology at Universidad de Oriente
September 1, 2004 - June 15, 2009
Postgraduate studies in Social Communication, Advertising and NewMedia at Universidad de Oriente
September 1, 2009 - June 20, 2010
Procesamiento de Lenguaje Natural con Python: fundamentos, bibliotecas y aplicaciones at Universidad de la Rioja
September 21, 2025 - September 28, 2025
Recovery Augmented Generation (RAG): theory and applications at Universidad de la Rioja
June 8, 2025 - June 15, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Media & Entertainment
    paper LLM Evaluation, Alignment and Language Analysis Portfolio

    This repository documents my practical experience in the evaluation, alignment, and qualitative analysis of large language models (LLMs) and speech systems.

    The work presented here focuses on applied evaluation rather than model training, covering tasks related to alignment, reasoning quality, linguistic robustness, and real-world language behavior.

    All materials are synthetic and designed exclusively for professional portfolio purposes.

    https://www.twine.net/signin

Hire a Data Scientist

We have the best data scientist experts on Twine. Hire a data scientist in Barcelona today.