Hi, I’m Sherin Thomas. I’m passionate about turning complex data into practical, data-driven solutions that improve real-world outcomes. I’ve led large-scale healthcare analytics projects, built predictive models, and explored synthetic data generation to expand training sets while prioritizing privacy and HIPAA compliance. I enjoy collaborating with interdisciplinary teams to translate research into actionable interventions. My toolkit includes Python, SQL, AWS, and modern ML/NLP frameworks, and I’m driven by curiosity, robust experimentation, and delivering impact-driven results.

Sherin Thomas

Hi, I’m Sherin Thomas. I’m passionate about turning complex data into practical, data-driven solutions that improve real-world outcomes. I’ve led large-scale healthcare analytics projects, built predictive models, and explored synthetic data generation to expand training sets while prioritizing privacy and HIPAA compliance. I enjoy collaborating with interdisciplinary teams to translate research into actionable interventions. My toolkit includes Python, SQL, AWS, and modern ML/NLP frameworks, and I’m driven by curiosity, robust experimentation, and delivering impact-driven results.

Available to hire

Hi, I’m Sherin Thomas. I’m passionate about turning complex data into practical, data-driven solutions that improve real-world outcomes. I’ve led large-scale healthcare analytics projects, built predictive models, and explored synthetic data generation to expand training sets while prioritizing privacy and HIPAA compliance.

I enjoy collaborating with interdisciplinary teams to translate research into actionable interventions. My toolkit includes Python, SQL, AWS, and modern ML/NLP frameworks, and I’m driven by curiosity, robust experimentation, and delivering impact-driven results.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Data Scientist at University of Rochester Medical Center
September 1, 2023 - November 1, 2025
Analyzed 10M+ Electrocardiogram (ECG) records from 40,000+ patients across 31 hospitals using Python (SpaCy, NLTK) and signal denoising, accelerating research workflows by 40%. Developed machine learning models (XGBoost, logistic regression, ensemble) on document topic vectors, achieving 94% recall. Implemented Denoising Diffusion Probabilistic Models to generate synthetic data for rare cases, increasing training data diversity. Extracted topic distributions from clinical notes with LDA to improve model accuracy. Enhanced data security by masking PII, enabling HIPAA-compliant data access. Implemented LLMs to extract terms, reducing manual processing by 80% and boosting workflow efficiency. Collaborated with interdisciplinary teams to translate findings into practical healthcare interventions.
Data Scientist at UR Motion Analysis Lab
January 1, 2024 - May 1, 2024
Designed a PostgreSQL database for 1TB+ biomechanics data, automating manual workflows and saving 3+ hours per day. Developed and maintained ETL pipelines using Apache Airflow to automate data ingestion and processing for 2TB+ MATLAB motion-capture data, improving workflow efficiency by 30%. Acted as lead data engineer after the lead’s departure. Collaborated with researchers and lab technicians to optimize Streamlit interfaces enabling 6 non-technical team members to perform daily Python and SQL tasks. Set up and managed data storage and processing workflows on AWS, making datasets easier to access.
Data Scientist at Medical AI Exploration Lab, Emory University
August 1, 2024 - December 1, 2024
Implemented foundation and large language models to train domain-specific word embeddings, improving model accuracy by 15%. Conducted data wrangling and analysis using Python and R, delivering high-quality research outputs for publications. Applied advanced statistical methods for heart failure risk predictions, driving actionable healthcare strategies.
AI Engineer at Auradigm Corporation
January 1, 2025 - November 1, 2025
Improved response time of Xplora.ai’s Retrieval Augmented Generation (RAG) chatbot by performing Milvus vector indexing and LangChain-based optimizations, improving response speed by 10%. Built NLP pipelines for document summarization using text extraction, topic modeling, and LLMs. Collaborated with engineers to refine code to production requirements.

Education

Master of Science in Data Science at University of Rochester, Goergen Institute for Data Science and AI
January 11, 2030 - December 1, 2024
Bachelor of Engineering in Information Technology at University of Mumbai
January 11, 2030 - May 1, 2023

Qualifications

AWS Certified Machine Learning Engineer – Associate
March 1, 2025 - January 29, 2026

Industry Experience

Healthcare, Life Sciences, Software & Internet