Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Work Experience
Senior Data Engineer & Subject Matter Expert at McKesson Inc
November 1, 2023 - PresentBuilt an OCR→NLP pipeline to extract adverse events from physician reports using regex, NER, and R-based microservices orchestrated with Docker, Jenkins, and Airflow; produced regulatory-grade AE lists and reduced manual review time. Developed automated molecular report generation workflows for sequencing data used in regulatory submissions. Supported development of a radiology-based breast cancer staging model using CNN/BNN architectures and LLM-assisted image interpretation. Assisted in converting SAS code to NextGen analytical tools (DBX PySpark, R/dplyr, SQL) in a Databricks sandbox, and evaluated outcome accuracy with custom comparison tools. Built an RAG-based NLP chatbot using LangChain, transformer models (BioBERT, ClinicalBERT, GPT, BioMistral) and Dialogflow to answer trial development questions from PDFs, reducing literature review time by ~60%. Designed two production-grade Shiny dashboards with custom CSS/JavaScript and NLQ tooling. Implemented CDISC data engineering acr
Data Science Field Application Scientist (Cytobank Machine Learning SaaS) at Danaher Inc.
June 1, 2022 - November 1, 2023Provided support to ~120 pharma/clinical users on Cytobank’s ML platform; troubleshooted supervised (logistic regression, XGBoost, neural networks) and unsupervised (t-SNE, UMAP, SOM) workflows on high-dimensional biological data. Collaborated with the development team to build 3 major ML algorithms and 8 novel mini features; performed feasibility assessments and incorporated user feedback for 100 feature enhancements. Led the conversational AI assistant initiative to automate FAQs and streamline user support using Zendesk SDK, Amazon Lex, and GPT. Developed 20+ Python/R scripts leveraging Cytobank APIs for premium users. Managed Agile workflows with Jira/ServiceMax; improved KPIs by 10%. Coordinated with commercial/marketing teams using Salesforce and Google Analytics.
Sr. Data Engineer and Project Manager at Atos Inc. (Client: Daiichi Sankyo Ind.)
May 1, 2021 - May 1, 2022Engineered data pipelines transforming SDTM datasets into ADaM and TLF formats for 30+ FDA submissions. Implemented ETL on AWS and GCP to support data pipelines; tested SDTM data consistency; performed statistical programming (t-tests, ANOVA, z-tests, regression). Applied data wrangling/preprocessing for ML workflows (feature engineering, imputation, scaling, one-hot encoding). Supported ML with SVM, logistic regression, random forest, and XGBoost; enabled 20 concurrent classical/neural models on SageMaker/DataRobot. Developed interactive TLF visualizations in RShiny, PowerBI, and Tableau. Managed a 10-engineer team with Jira/ZenHub/Asana; collaborated with DevOps to maintain APIs and data migration processes.
Data Scientist at ARUP Laboratories Inc
January 1, 2018 - May 1, 2021Developed unsupervised ML pipelines (k-means, t-SNE, UMAP, X-shift) for CyTOF and RNA-seq data; identified candidate biomarkers. Performed biostatistical analyses and built ML-ready datasets for clinical studies; created publication-ready visualizations in R (dplyr, ggplot2, survival, caret, admiral, RShiny). Integrated ML with gating methods and EP Evaluator; established reference intervals for 500+ immunological cell populations. Supported target validation and data analysis of RNA-seq and flow cytometry data using open-source tools.
Fellow and Faculty – Johns Hopkins Medical Institute / Aab CVRI / URMC at The Johns Hopkins Medical Institute
November 1, 2004 - July 1, 2014Fellow and faculty focusing on biomedical research, preclinical studies, and clinical trials with emphasis on data mining, computational biology, and sequence alignment. Contributed to 15 highly reviewed publications with approximately 1000 citations (2005–2014).
Clinical Trial Manager at New York Blood Center Inc
July 1, 2014 - June 1, 2016Managed Phase IV oncology trials with clinical data workflows using SAS and R on SDTM datasets; oversaw trial logistics and data quality.
Data Scientist at Health Research Inc. – RPCI
September 1, 2016 - January 1, 2018Conducted computational biology and high-dimensional cytometry analyses (CyTOF, IMC, RNA-seq) using R/Bioconductor and cloud-based tools; supported biomarker discovery, phenotyping, and assay development.
Intern/Associate at International Center for Genetic Engineering and Biotechnology, Trieste
July 1, 2002 - November 1, 2004Worked on vaccine development projects using data mining and sequence alignment.
Education
PhD, Biochemistry at Institute of Medical Sciences, Banaras Hindu University (BHU), India
January 11, 2030 - March 15, 2026MSc, Biochemistry at Institute of Sciences, BHU, India
January 11, 2030 - March 15, 2026BSc, Mathematics at Institute of Sciences, BHU, India
January 11, 2030 - March 15, 2026Qualifications
Complete Generative AI Course: RAG, AI Agents & Deployment
January 11, 2030 - March 15, 2026AWS Cloud Practitioner Essentials (Certificate)
January 11, 2030 - March 15, 2026Artificial Intelligence for Breast Cancer Detection (Certificate)
January 11, 2030 - March 15, 2026Intermediate R Software Development (certificate) - Bioconductor Open-source Society
January 11, 2030 - March 15, 2026High Impact IT Leader – Tech MBA
January 11, 2030 - March 15, 2026Professional Certificates on Classical Machine Learning for Financial Engineering: Using sklearn/python for machine learning
January 11, 2030 - March 15, 2026Industry Experience
Healthcare, Life Sciences, Professional Services, Software & Internet
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Boston today.