Hi there! I’m Rushabh Shah, a research data scientist and AI engineer focused on turning messy data into actionable insights. At NC State, I build end-to-end data pipelines, evaluate predictive models for environmental health challenges like lead contamination, and craft reproducible ML workflows with Docker, Nextflow, and Slurm to enable reliable MLOps. I enjoy experimenting with multi-source datasets and deploying models that make a real impact. I thrive collaborating across disciplines to translate complex data into scalable solutions, from cloud deployments to NLP-powered assistants. I’m passionate about responsible AI, continuous learning, and solving new problems in ML, data engineering, and software development while keeping a friendly, collaborative approach.

Rushabh Nilesh Kumar Shah

Hi there! I’m Rushabh Shah, a research data scientist and AI engineer focused on turning messy data into actionable insights. At NC State, I build end-to-end data pipelines, evaluate predictive models for environmental health challenges like lead contamination, and craft reproducible ML workflows with Docker, Nextflow, and Slurm to enable reliable MLOps. I enjoy experimenting with multi-source datasets and deploying models that make a real impact. I thrive collaborating across disciplines to translate complex data into scalable solutions, from cloud deployments to NLP-powered assistants. I’m passionate about responsible AI, continuous learning, and solving new problems in ML, data engineering, and software development while keeping a friendly, collaborative approach.

Available to hire

Hi there! I’m Rushabh Shah, a research data scientist and AI engineer focused on turning messy data into actionable insights. At NC State, I build end-to-end data pipelines, evaluate predictive models for environmental health challenges like lead contamination, and craft reproducible ML workflows with Docker, Nextflow, and Slurm to enable reliable MLOps. I enjoy experimenting with multi-source datasets and deploying models that make a real impact.

I thrive collaborating across disciplines to translate complex data into scalable solutions, from cloud deployments to NLP-powered assistants. I’m passionate about responsible AI, continuous learning, and solving new problems in ML, data engineering, and software development while keeping a friendly, collaborative approach.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Language

English
Advanced
Hindi
Fluent

Work Experience

Research Data Scientist at North Carolina State University
February 1, 2024 - Present
Engineered a multi-source dataset (850K records) by integrating Census, EJScreen, water, and soil data across seven counties to enable predictive lead contamination modelling. Evaluated XGBoost, LightGBM, Random Forest, and Neural Network models for lead contamination prediction, achieving AUC-ROC scores 0.88–0.93, and identifying key contributing factors. Built a CNN-based image classification pipeline using Nextflow (DSL2), Docker, and Slurm Tower for reproducible, scalable ML workflows. Developed a repo-aware LLM chatbot to fetch direct links to source files from GitHub.
Data Science Intern at Schneider Electric
May 1, 2024 - August 1, 2024
Architected and deployed a custom LangChain-based LLM chatbot on AWS Bedrock (Claude 3.5 Sonnet) to automate information retrieval from structured data, saving 3–4 hours of manual lookups daily. Achieved ~90% response accuracy through advanced NLP, prompt engineering, and evaluation. Optimized training schedules and resource allocation by designing a DBSCAN-based clustering algorithm on location data and S3 datasets, projecting 15–20% reduction in travel costs.
AI Engineer Intern at Grow Scale Techno Lab
December 1, 2022 - June 1, 2023
Prepared and analyzed structured data using Python and SQL; applied feature engineering and regression techniques to improve model accuracy. Conducted statistical analysis including hypothesis testing and controlled experiments to validate model performance. Built and maintained Tableau dashboards to track KPIs and trends, reducing manual reporting effort by 30%.

Education

Master of Computer Science at North Carolina State University
August 1, 2023 - May 1, 2025
B.Tech in Computer Engineering at Pandit Deendayal Energy University
August 1, 2019 - June 1, 2023

Qualifications

Add your qualifications or awards here.

Industry Experience

Computers & Electronics, Software & Internet, Professional Services, Education, Media & Entertainment, Healthcare, Other