I am an experienced Data Scientist and AI/ML Engineer with a unique blend of expertise across data engineering, machine learning, and advanced AI capabilities developed through years of work at CERN and MIT, handling sophisticated analyses and large-scale data. I excel at leading end-to-end projects, from building scalable data pipelines to deploying retrainable ML/AI systems, and I bring rigorous statistical analysis, uncertainty quantification, and practical, collaborative problem solving to deliver impact across research, finance, real estate, and technology sectors.\n\nI thrive in cross-disciplinary teams, translating scientific challenges into production-ready models and data systems. My work spans from end-to-end data engineering to ML research and LLM-based applications, combining rigorous uncertainty quantification, reproducibility, and DevOps discipline to drive impact across sectors.

Mohammad Mousavi

I am an experienced Data Scientist and AI/ML Engineer with a unique blend of expertise across data engineering, machine learning, and advanced AI capabilities developed through years of work at CERN and MIT, handling sophisticated analyses and large-scale data. I excel at leading end-to-end projects, from building scalable data pipelines to deploying retrainable ML/AI systems, and I bring rigorous statistical analysis, uncertainty quantification, and practical, collaborative problem solving to deliver impact across research, finance, real estate, and technology sectors.\n\nI thrive in cross-disciplinary teams, translating scientific challenges into production-ready models and data systems. My work spans from end-to-end data engineering to ML research and LLM-based applications, combining rigorous uncertainty quantification, reproducibility, and DevOps discipline to drive impact across sectors.

Available to hire

I am an experienced Data Scientist and AI/ML Engineer with a unique blend of expertise across data engineering, machine learning, and advanced AI capabilities developed through years of work at CERN and MIT, handling sophisticated analyses and large-scale data. I excel at leading end-to-end projects, from building scalable data pipelines to deploying retrainable ML/AI systems, and I bring rigorous statistical analysis, uncertainty quantification, and practical, collaborative problem solving to deliver impact across research, finance, real estate, and technology sectors.\n\nI thrive in cross-disciplinary teams, translating scientific challenges into production-ready models and data systems. My work spans from end-to-end data engineering to ML research and LLM-based applications, combining rigorous uncertainty quantification, reproducibility, and DevOps discipline to drive impact across sectors.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Beginner
See more

Language

Persian
Fluent
English
Fluent

Work Experience

Data Scientist, Machine Learning & AI Engineer at CERN (European Organization for Nuclear Research)
January 1, 2025 - July 27, 2025
Designed and deployed end-to-end machine learning pipelines for large-scale physics datasets, handling data preparation, cleaning, and feature engineering with Azure, Snowflake, dbt, and Apache Spark. Developed models using scikit-learn, TensorFlow, and LLMs for anomaly detection and predictive analysis. Orchestrated workflows with Airflow and managed scalable training, deployment, and automated retraining on Azure ML using Azure Compute clusters.
Data Scientist, Machine Learning & AI Engineer at National Center for Nuclear Research (NCBJ)
January 1, 2025 - July 27, 2025
Conducted big data analysis using PySpark and Snowflake, preparing large datasets for machine learning applications. Built and deployed models with TensorFlow and scikit-learn, and orchestrated automated workflows with Airflow for scalable data processing and model retraining. Managed cloud resources on Azure and AWS to support production-grade ML workflows and scalable data operations.
Data Scientist, AI & ML Engineer at CERN
January 1, 2020 - Present
Built scalable ML pipelines on Azure and Snowflake handling complex scientific data; conducted statistical modeling including Gaussian fitting and Monte Carlo simulations; designed anomaly detection systems with models achieving over 90% accuracy; deployed retrainable pipelines using Airflow and Azure ML with cross-cloud alternatives; developed an internal RAG-based LLM chatbot; engineered batch and streaming data pipelines with Kafka and MongoDB; integrated MLflow for experiment tracking and model versioning; applied DevOps practices including Docker and GitHub Actions for CI/CD and scalable deployment.
Data Scientist, AI & ML Engineer at NCBJ
January 1, 2023 - Present
Developed predictive models on large datasets using PySpark, TensorFlow, and scikit-learn; automated workflows for data ingestion, training, and deployment via Airflow; performed advanced curve fitting and statistical validation; managed scalable cloud infrastructure on Azure and AWS; implemented real-time data processing and analysis for telemetry and event-driven streams.
Data Scientist, AI & ML Engineer at CERN
January 1, 2020 - Present
Built scalable ML pipelines on Azure and Snowflake to handle complex scientific data, conducted statistical modeling including Gaussian fitting and Monte Carlo simulations, designed anomaly detection systems achieving over 90% accuracy, deployed retrainable pipelines with Airflow and Azure ML, created an internal RAG-based LLM chatbot using Hugging Face and FastAPI, engineered batch and streaming data pipelines integrating Kafka and MongoDB, integrated MLflow for experiment tracking and model versioning, and applied DevOps practices with Docker and GitHub Actions for CI/CD workflows.
Data Scientist, AI & ML Engineer at NCBJ
January 1, 2023 - Present
Developed predictive models on large datasets using PySpark, TensorFlow, and scikit-learn. Automated workflows for data ingestion, model training, and deployment using Airflow. Performed advanced curve fitting and statistical validation including error bars and confidence intervals. Managed scalable cloud infrastructure across Azure and AWS. Implemented real-time data processing for telemetry and event-driven streams.
Data Scientist, AI & ML Engineer at CERN
December 31, 2025 - October 16, 2025
Designed and deployed ML pipelines on Azure and Snowflake, processing 10TB+ of physics data daily for automated research workflows. Developed high-accuracy anomaly detection models (>90%) for event classification and rare data pattern recognition. Built and deployed a Retrieval-Augmented Generation (RAG) chatbot for scientific data access, reducing manual query time by 70%. Mentored MIT–CERN junior researchers in applied ML, DevOps, and AI model development best practices. Integrated MLflow for experiment tracking, model versioning, and automated retraining pipelines using Airflow.
Data Scientist, AI & ML Engineer at CERN
January 1, 2017 - January 1, 2025
Built scalable ML pipelines on Azure + Snowflake, handling complex scientific data. Conducted statistical modeling: Gaussian fitting, uncertainty propagation, Monte Carlo simulations. Designed anomaly detection systems and trained models with >90% accuracy for physics events. Deployed retrainable pipelines with Airflow and Azure ML (plus AWS & GCP alternatives). Created an internal RAG-based LLM chatbot using Hugging Face, FAISS, FastAPI. Engineered batch and streaming data pipelines integrating Kafka and MongoDB. Integrated MLflow for experiment tracking, model versioning, and reproducibility. Applied DevOps practices using Docker and GitHub Actions for CI/CD workflows and scalable deployment.

Education

M.Sc. at University of Tehran
January 1, 2016 - December 31, 2019
B.Sc. at Petroleum University of Technology
January 1, 2010 - December 31, 2015
Ph.D. at Massachusetts Institute of Technology (MIT)
January 1, 2023 - December 31, 2025
M.Sc. at University of Tehran
January 1, 2016 - December 31, 2019
B.Sc. at Petroleum University of Technology
January 1, 2010 - December 31, 2015
M.Sc. at University of Tehran
January 1, 2016 - December 31, 2019
B.Sc. at Petroleum University of Technology
January 1, 2010 - December 31, 2015
Ph.D. Research Fellowship (Data Science & Physics) at Massachusetts Institute of Technology (MIT), USA
January 1, 2023 - December 31, 2025
Pre-Doctoral Fellowship (Experimental Particle Physics) at International Centre for Theoretical Physics (ICTP), Italy
January 1, 2022 - December 31, 2023
M.Sc. in Elementary Particle Physics at University of Tehran, Iran
January 1, 2016 - December 31, 2019
B.Sc. in Chemical Engineering at Petroleum University of Technology, Iran
January 1, 2010 - December 31, 2015
PhD Research Fellowship (Data Science & Physics) at Massachusetts Institute of Technology (MIT)
January 1, 2023 - January 1, 2025
Pre-Doctoral Fellowship (Experimental Particle Physics) at International Centre for Theoretical Physics (ICTP)
January 1, 2022 - January 1, 2023
M.Sc. in Elementary Particle Physics at University of Tehran
January 1, 2016 - January 1, 2019
B.Sc. in Chemical Engineering at Petroleum University of Technology, Iran
January 1, 2010 - January 1, 2015
Ph.D. Research Fellowship (Data Science & Physics) at Massachusetts Institute of Technology (MIT)
January 1, 2023 - January 1, 2025
Pre-Doctoral Fellowship (Experimental Particle Physics) at International Centre for Theoretical Physics (ICTP)
January 1, 2022 - January 1, 2023
M.Sc. in Elementary Particle Physics at University of Tehran
January 1, 2016 - January 1, 2019
B.Sc. in Chemical Engineering at Petroleum University of Technology, Iran
January 1, 2010 - January 1, 2015

Qualifications

Use Generative AI for Software Development
January 1, 2023 - December 31, 2023
Machine Learning in Science and Market
January 1, 2023 - December 31, 2023
Application of Data Analysis
January 1, 2023 - December 31, 2023
C++ and Python Programming
January 1, 2023 - December 31, 2023
Git/GitHub
January 1, 2023 - December 31, 2023
Use Generative AI for Software Development
January 1, 2023 - December 31, 2023
Machine Learning in Science and Market
January 1, 2023 - December 31, 2023
Application of Data Analysis
January 1, 2023 - December 31, 2023
Git, Python, and C++ Programming
January 1, 2023 - December 31, 2023
Use Generative AI for Software Development
January 1, 2023 - December 31, 2023
Machine Learning in Science and Market
January 1, 2023 - December 31, 2023
Application of Data Analysis
January 1, 2023 - December 31, 2023
Git, Python, and C++ Programming
January 1, 2023 - December 31, 2023
Generative AI for Software Development (LinkedIn Learning)
January 11, 2030 - October 16, 2025
Machine Learning in Science and Market (MITx)
January 11, 2030 - October 16, 2025
Applied Data Analysis (Coursera)
January 11, 2030 - October 16, 2025
Git, Python, and C++ Programming (Udemy)
January 11, 2030 - October 16, 2025
Use Generative AI for Software Development
January 11, 2030 - December 24, 2025
Machine Learning in Science and Market
January 11, 2030 - December 24, 2025
Application of Data Analysis
January 11, 2030 - December 24, 2025
Git, Python, and C++ Programming
January 11, 2030 - December 24, 2025
Use Generative AI for Software Development
January 11, 2030 - December 24, 2025
Machine Learning in Science and Market
January 11, 2030 - December 24, 2025
Application of Data Analysis
January 11, 2030 - December 24, 2025
Git, Python, and C++ Programming
January 11, 2030 - December 24, 2025

Industry Experience

Life Sciences, Financial Services, Software & Internet, Professional Services, Education, Transportation & Logistics, Other, Media & Entertainment
    paper User Event Pipeline — GCP + BigQuery + DBT

    This project demonstrates a complete event data pipeline from Google Cloud Storage to BigQuery, using Python, DBT, and data quality testing.

    It simulates ingestion and transformation of user event logs (clicks, scrolls, purchases), handling bad data, applying Data Vault modeling, and testing via DBT tests.

    paper Driver Behavior Classification API

    This project uses sensor data to classify driver behavior as one of several categories like Normal, Aggressive, or Risky. It consists of a machine learning model trained on processed driving telemetry and served via a FastAPI endpoint.

    paper German NER Product Info Extraction

    This project is a full Named Entity Recognition (NER) pipeline for extracting structured product information (e.g. brand, storage, color) from messy German e-commerce texts. It includes rule-based + machine learning (hybrid) entity extraction, training, evaluation, logging, and a FastAPI service for serving predictions.

    paper Physics RAG Assistant

    This project is a fully functional Retrieval-Augmented Generation (RAG) system designed to answer scientific questions using recent public research publications about the Higgs boson from CERN.