Hi, I’m Tosin Owadokun, a Lagos-based AI Research Engineer and Physicist. I specialize in LLM evaluation and computational modeling, leveraging a strong background in experimental physics to build robust testing frameworks for AI. I’m passionate about improving data quality for RLHF and bridging the gap between scientific simulation and AI, with experience shipping edge-optimized evaluation pipelines and adversarial testing architectures.

Tosin Owadokun

Hi, I’m Tosin Owadokun, a Lagos-based AI Research Engineer and Physicist. I specialize in LLM evaluation and computational modeling, leveraging a strong background in experimental physics to build robust testing frameworks for AI. I’m passionate about improving data quality for RLHF and bridging the gap between scientific simulation and AI, with experience shipping edge-optimized evaluation pipelines and adversarial testing architectures.

Available to hire

Hi, I’m Tosin Owadokun, a Lagos-based AI Research Engineer and Physicist. I specialize in LLM evaluation and computational modeling, leveraging a strong background in experimental physics to build robust testing frameworks for AI.

I’m passionate about improving data quality for RLHF and bridging the gap between scientific simulation and AI, with experience shipping edge-optimized evaluation pipelines and adversarial testing architectures.

See more

Language

English
Fluent

Work Experience

Ai workflows specialist at Lexpertz AI
December 1, 2020 - Present
Overseeing operations and AI automation process

Education

BSc Physics at University of Calabar, Nigeria
January 11, 2030 - July 1, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Education, Professional Services, Media & Entertainment
    paper ReasonBench | Automated AI Logic Evaluation Framework

    The Problem

    Standard AI benchmarks often fail to detect “hallucinated logic.” A model might guess the correct final answer (“42”) but use completely wrong reasoning to get there. In high-stakes fields like Physics or Finance, this hidden error is dangerous.

    The Solution

    I engineered ReasonBench, an automated evaluation framework that treats AI reasoning like a rigorous laboratory experiment.

    • Adversarial Grading: It utilizes Google Gemini 2.5 as a “Supervisor Model” to audit the step-by-step Chain-of-Thought (CoT) logic of student models.
    • Mobile-Edge Engineering: Unlike heavy cloud tools, I optimized this entire pipeline to run on mobile-edge environments (Android/Pydroid), proving that rigorous AI evaluation can be resource-efficient.
    • RLHF Data Generation: It automatically generates structured “Reasoning Gap” reports, creating high-value training data for model fine-tuning.

    Tech Stack

    • Core: Python 3.10+, Pandas, NumPy
    • AI Models: Google Gemini 2.5 Flash, GSM8K Benchmark
    • Infrastructure: Mobile-Edge Deployment (Pydroid 3)

    🔗 View the Code & Results: [https://www.twine.net/signin