Kalyan Srivastava

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate

Work Experience

Senior Data Engineer & Subject Matter Expert at McKesson Inc
November 1, 2023 - Present
Built an OCR→NLP pipeline to extract adverse events from physician reports using regex, NER, and R-based microservices orchestrated with Docker, Jenkins, and Airflow; produced regulatory-grade AE lists and reduced manual review time. Developed automated molecular report generation workflows for sequencing data used in regulatory submissions. Supported development of a radiology-based breast cancer staging model using CNN/BNN architectures and LLM-assisted image interpretation. Assisted in converting SAS code to NextGen analytical tools (DBX PySpark, R/dplyr, SQL) in a Databricks sandbox, and evaluated outcome accuracy with custom comparison tools. Built an RAG-based NLP chatbot using LangChain, transformer models (BioBERT, ClinicalBERT, GPT, BioMistral) and Dialogflow to answer trial development questions from PDFs, reducing literature review time by ~60%. Designed two production-grade Shiny dashboards with custom CSS/JavaScript and NLQ tooling. Implemented CDISC data engineering acr
Data Science Field Application Scientist (Cytobank Machine Learning SaaS) at Danaher Inc.
June 1, 2022 - November 1, 2023
Provided support to ~120 pharma/clinical users on Cytobank’s ML platform; troubleshooted supervised (logistic regression, XGBoost, neural networks) and unsupervised (t-SNE, UMAP, SOM) workflows on high-dimensional biological data. Collaborated with the development team to build 3 major ML algorithms and 8 novel mini features; performed feasibility assessments and incorporated user feedback for 100 feature enhancements. Led the conversational AI assistant initiative to automate FAQs and streamline user support using Zendesk SDK, Amazon Lex, and GPT. Developed 20+ Python/R scripts leveraging Cytobank APIs for premium users. Managed Agile workflows with Jira/ServiceMax; improved KPIs by 10%. Coordinated with commercial/marketing teams using Salesforce and Google Analytics.
Sr. Data Engineer and Project Manager at Atos Inc. (Client: Daiichi Sankyo Ind.)
May 1, 2021 - May 1, 2022
Engineered data pipelines transforming SDTM datasets into ADaM and TLF formats for 30+ FDA submissions. Implemented ETL on AWS and GCP to support data pipelines; tested SDTM data consistency; performed statistical programming (t-tests, ANOVA, z-tests, regression). Applied data wrangling/preprocessing for ML workflows (feature engineering, imputation, scaling, one-hot encoding). Supported ML with SVM, logistic regression, random forest, and XGBoost; enabled 20 concurrent classical/neural models on SageMaker/DataRobot. Developed interactive TLF visualizations in RShiny, PowerBI, and Tableau. Managed a 10-engineer team with Jira/ZenHub/Asana; collaborated with DevOps to maintain APIs and data migration processes.
Data Scientist at ARUP Laboratories Inc
January 1, 2018 - May 1, 2021
Developed unsupervised ML pipelines (k-means, t-SNE, UMAP, X-shift) for CyTOF and RNA-seq data; identified candidate biomarkers. Performed biostatistical analyses and built ML-ready datasets for clinical studies; created publication-ready visualizations in R (dplyr, ggplot2, survival, caret, admiral, RShiny). Integrated ML with gating methods and EP Evaluator; established reference intervals for 500+ immunological cell populations. Supported target validation and data analysis of RNA-seq and flow cytometry data using open-source tools.
Fellow and Faculty – Johns Hopkins Medical Institute / Aab CVRI / URMC at The Johns Hopkins Medical Institute
November 1, 2004 - July 1, 2014
Fellow and faculty focusing on biomedical research, preclinical studies, and clinical trials with emphasis on data mining, computational biology, and sequence alignment. Contributed to 15 highly reviewed publications with approximately 1000 citations (2005–2014).
Clinical Trial Manager at New York Blood Center Inc
July 1, 2014 - June 1, 2016
Managed Phase IV oncology trials with clinical data workflows using SAS and R on SDTM datasets; oversaw trial logistics and data quality.
Data Scientist at Health Research Inc. – RPCI
September 1, 2016 - January 1, 2018
Conducted computational biology and high-dimensional cytometry analyses (CyTOF, IMC, RNA-seq) using R/Bioconductor and cloud-based tools; supported biomarker discovery, phenotyping, and assay development.
Intern/Associate at International Center for Genetic Engineering and Biotechnology, Trieste
July 1, 2002 - November 1, 2004
Worked on vaccine development projects using data mining and sequence alignment.

Education

PhD, Biochemistry at Institute of Medical Sciences, Banaras Hindu University (BHU), India
January 11, 2030 - March 15, 2026
MSc, Biochemistry at Institute of Sciences, BHU, India
January 11, 2030 - March 15, 2026
BSc, Mathematics at Institute of Sciences, BHU, India
January 11, 2030 - March 15, 2026

Qualifications

Complete Generative AI Course: RAG, AI Agents & Deployment
January 11, 2030 - March 15, 2026
AWS Cloud Practitioner Essentials (Certificate)
January 11, 2030 - March 15, 2026
Artificial Intelligence for Breast Cancer Detection (Certificate)
January 11, 2030 - March 15, 2026
Intermediate R Software Development (certificate) - Bioconductor Open-source Society
January 11, 2030 - March 15, 2026
High Impact IT Leader – Tech MBA
January 11, 2030 - March 15, 2026
Professional Certificates on Classical Machine Learning for Financial Engineering: Using sklearn/python for machine learning
January 11, 2030 - March 15, 2026

Industry Experience

Healthcare, Life Sciences, Professional Services, Software & Internet