I am a data scientist with 4+ years of experience delivering AI and machine learning solutions across healthcare and financial services. I am proficient in Python, PyTorch, TensorFlow, Scikit-learn, XGBoost, and large language models, with a strong focus on end-to-end ML lifecycle automation and scalable production systems. I have built large-scale data pipelines processing 40M+ clinical encounter records, engineered RAG platforms across 12 clinical knowledge bases, and deployed 30+ production-grade models using MLflow. My work spans NLP, multimodal document extraction, anomaly detection, and visualization, with hands-on experience collaborating with clinical, product, and governance teams to ensure model safety, reliability, and regulatory compliance.

Dileep Chowdary Ealapolu

I am a data scientist with 4+ years of experience delivering AI and machine learning solutions across healthcare and financial services. I am proficient in Python, PyTorch, TensorFlow, Scikit-learn, XGBoost, and large language models, with a strong focus on end-to-end ML lifecycle automation and scalable production systems. I have built large-scale data pipelines processing 40M+ clinical encounter records, engineered RAG platforms across 12 clinical knowledge bases, and deployed 30+ production-grade models using MLflow. My work spans NLP, multimodal document extraction, anomaly detection, and visualization, with hands-on experience collaborating with clinical, product, and governance teams to ensure model safety, reliability, and regulatory compliance.

Available to hire

I am a data scientist with 4+ years of experience delivering AI and machine learning solutions across healthcare and financial services. I am proficient in Python, PyTorch, TensorFlow, Scikit-learn, XGBoost, and large language models, with a strong focus on end-to-end ML lifecycle automation and scalable production systems.

I have built large-scale data pipelines processing 40M+ clinical encounter records, engineered RAG platforms across 12 clinical knowledge bases, and deployed 30+ production-grade models using MLflow. My work spans NLP, multimodal document extraction, anomaly detection, and visualization, with hands-on experience collaborating with clinical, product, and governance teams to ensure model safety, reliability, and regulatory compliance.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

AI/ ML Engineer at CVS Health
January 1, 2025 - Present
Built large-scale clinical data processing pipelines on Databricks using PySpark and Delta Lake to ingest and harmonize 40M+ encounter records for downstream ML workflows across pharmacy, benefits, and MinuteClinic operations. Fine-tuned domain-specific LLMs (Llama-3, Flan-UL2) on de-identified EMR notes with LoRA adapters, generating provider-ready summaries and reducing manual review load by 8k+ documents/week. Engineered a unified RAG platform with LangChain, Milvus, and Azure OpenAI embeddings to support fast retrieval across 12 clinical knowledge bases. Designed predictive pipelines for chronic-care management using PySpark MLlib and XGBoost, surfacing high-risk patient cohorts. Deployed end-to-end ML workflows via MLflow and Feature Store for 30 production-grade models. Implemented multimodal OCR+LLM pipelines for structured extraction from millions of scanned PDFs and orchestrated a retrieval+draft-generation pipeline to streamline clinical documentation. Collaborated with clini
Machine Learning Engineer at Zensar Technologies
January 1, 2020 - June 1, 2023
Created end-to-end ML pipelines with Airflow, Docker, and AWS Batch for automated dataset preparation, training, and drift checks, improving reliability across weekly deployments. Built high-performance predictive models (TensorFlow, PyTorch) for churn scoring, incident triage, and service-demand forecasting. Developed scalable inference services using FastAPI and TorchServe, enhancing API stability and reducing latency during peak hours. Standardized experimentation with MLflow to improve reproducibility and hyperparameter tracking. Refined transformer-based NLP models (RoBERTa, DistilBERT) for sentiment, classification, and entity extraction across multiple queues. Implemented model optimization (ONNX Runtime, pruning, quantization) for cost-efficient deployment. Advanced feature engineering with Scikit-learn, StatsModels, and NetworkX, and conducted evaluation with SHAP and permutation importance to monitor drift and bias. Built time-series forecasting pipelines using Prophet and LS

Education

Master of Science in Data Science at University of Texas at Arlington
January 11, 2030 - December 3, 2025
Bachelor of Technology in Computer Science and Engineering at National Institute of Technology Silchar
January 11, 2030 - December 3, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Financial Services, Software & Internet, Professional Services