Available to hire
I’m Sidhant Thakur, a data scientist focused on machine learning, Generative AI, and large-scale analytics across healthcare, pharma, and enterprise settings. I enjoy turning messy data into clear, decision-ready insights and building cloud-native data pipelines that scale.
With hands-on experience in LLMs for summarization and semantic retrieval, I streamline document-heavy workflows and shorten analysis time. I collaborate with clinical and business teams to move faster and make evidence-based decisions.
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Language
English
Fluent
Work Experience
Data Scientist at Merck
January 1, 2025 - PresentLed end-to-end data science initiatives including an XGBoost-based fraud detection model that automated processing of 10K+ claims per day and reduced manual reviews by 60 hours/month. Designed a Kafka-to-Snowflake ingestion pipeline handling 1.5M+ transactions daily for real-time drug-efficacy monitoring. Built a PyTorch anomaly-detection model to flag supply-chain inconsistencies across 20+ sites within 15 minutes, cutting incident-response time by 30%. Implemented TensorFlow time-series models forecasting demand for 100+ pharmaceutical products to optimize inventory. Delivered real-time clinical and claims insights via Power BI dashboards, and developed a SQL/Python predictive model that increased high-risk patient identification by 20%. Added LLM-based summaries to shorten document review by 45% and a lightweight embedding-based semantic search to speed record retrieval by 40%. Created a RAG-style evidence-retrieval workflow that reduced evidence gathering time by 45% and improved d
Data Scientist at Blue Cross Blue Shield
January 1, 2024 - January 1, 2025Drove a Scikit-Learn-based fraud detection system on Python/Hadoop handling 10M claims/day with ~5-second anomaly flags, delivering substantial cost savings. Built a PyTorch-based deep-learning classifier on GCP BigQuery processing 500K+ medical records/month to improve claim approvals. Automated Spark ETL pipelines, reducing transformation time from 6 hours to 30 minutes. Created a large-scale risk-stratification model using AWS EMR and Tableau to analyze 1B+ patient interactions for data-driven coverage decisions. Implemented K-Means segmentation to identify 5 cohorts for targeted interventions, and maintained a Tableau dashboard suite with 25+ dashboards for rapid decision-making. Developed regression-based predictive analytics improving forecast accuracy by 22% and reducing fraudulent payouts by 18%. Deployed LLM-based summarization to cut document review time by 60%, and built a lightweight embedding-based search tool to halve lookup times. Introduced an LLM-powered Q&A workflow t
Data Analyst at Dell Technologies
July 1, 2019 - August 1, 2021Developed Scikit-Learn models to predict failures across 100+ product lines, reducing warranty claims by 15%. Led pricing analysis on 3M+ transactions using R and ggplot2 to refine global pricing strategy and margins. Implemented a Random Forest classifier for customer segmentation to drive targeted campaigns. Designed an Azure-based data lake to centralize data and accelerate ML pipelines, cutting decision speed by 30%. Migrated 250+ Excel reports to PostgreSQL with automated pipelines for real-time reporting, reducing reporting time by 60%. Performed anomaly detection on hardware logs using Hadoop/Spark to uncover defect patterns and improve maintenance planning. Enhanced Excel automation with macros to deliver predictive sales insights, reducing manual effort by 70%.
Data Analyst Intern at Trinity Technolabs
January 1, 2019 - June 1, 2019Analyzed 50K+ social-media comments using NLTK and Tableau to identify sentiment patterns that boosted engagement. Optimized SQL Server validation scripts, reducing errors by 90% and reconciliation time by 50%. Built a Tableau dashboard with 10+ real-time KPIs for immediate action on customer behavior. Supported migration of Excel reports into PostgreSQL to improve analytics speed and reliability. Created an automated sales-reporting system using advanced Excel, saving 15+ hours per week.
Data Analyst at Trinity Technolabs
January 1, 2019 - June 1, 2019Analyzed 50K+ social-media comments using NLTK and Tableau to identify sentiment patterns that boosted engagement. Optimized SQL Server validation scripts, reducing errors by 90% and reconciliation time by 50%. Built a Tableau dashboard with 10+ real-time KPIs to empower sales teams. Supported migration of Excel reports to PostgreSQL and created automated sales reporting in Excel, saving 15+ hours per week.
Education
Master of Science at DePaul University
September 1, 2021 - November 1, 2023Bachelor of Science at SRM University
June 1, 2015 - May 1, 2019Master of Science at DePaul University
September 1, 2021 - November 1, 2023Bachelor of Science at SRM University
June 1, 2015 - May 1, 2019Qualifications
Industry Experience
Healthcare, Life Sciences, Software & Internet, Professional Services
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Chicago today.