I'm Pavan Kumar Kollipara, a results-driven Machine Learning Engineer with 6+ years of experience designing, training, and deploying scalable ML and Generative AI systems on AWS, supporting enterprise-scale data pipelines and intelligent automation across multiple domains. I’ve built 50+ distributed models for prediction, recommendation, and classification on multi-terabyte datasets, and I’m proficient with PyTorch, TensorFlow, XGBoost, and SageMaker. I’ve led MLOps automation using MLflow, Airflow, GitHub Actions, ArgoCD, and Spinnaker, automating production pipelines for training, deployment, and monitoring. I also work with LLMs and RAG pipelines using LangChain, LangFuse, and Hugging Face to enable retrieval-based responses and scalable contextual reasoning. Recently I’ve focused on building retrieval workflows, agentic reasoning systems, and multi-GPU production workloads, while collaborating with cross-functional teams to translate research into reliable, scalable solutions. I enjoy mentoring teammates, driving practical ML adoption, and delivering value through end-to-end system design, monitoring, and continuous improvement.

Pavan Kumar Kollipara

I'm Pavan Kumar Kollipara, a results-driven Machine Learning Engineer with 6+ years of experience designing, training, and deploying scalable ML and Generative AI systems on AWS, supporting enterprise-scale data pipelines and intelligent automation across multiple domains. I’ve built 50+ distributed models for prediction, recommendation, and classification on multi-terabyte datasets, and I’m proficient with PyTorch, TensorFlow, XGBoost, and SageMaker. I’ve led MLOps automation using MLflow, Airflow, GitHub Actions, ArgoCD, and Spinnaker, automating production pipelines for training, deployment, and monitoring. I also work with LLMs and RAG pipelines using LangChain, LangFuse, and Hugging Face to enable retrieval-based responses and scalable contextual reasoning. Recently I’ve focused on building retrieval workflows, agentic reasoning systems, and multi-GPU production workloads, while collaborating with cross-functional teams to translate research into reliable, scalable solutions. I enjoy mentoring teammates, driving practical ML adoption, and delivering value through end-to-end system design, monitoring, and continuous improvement.

Available to hire

I’m Pavan Kumar Kollipara, a results-driven Machine Learning Engineer with 6+ years of experience designing, training, and deploying scalable ML and Generative AI systems on AWS, supporting enterprise-scale data pipelines and intelligent automation across multiple domains. I’ve built 50+ distributed models for prediction, recommendation, and classification on multi-terabyte datasets, and I’m proficient with PyTorch, TensorFlow, XGBoost, and SageMaker. I’ve led MLOps automation using MLflow, Airflow, GitHub Actions, ArgoCD, and Spinnaker, automating production pipelines for training, deployment, and monitoring. I also work with LLMs and RAG pipelines using LangChain, LangFuse, and Hugging Face to enable retrieval-based responses and scalable contextual reasoning.

Recently I’ve focused on building retrieval workflows, agentic reasoning systems, and multi-GPU production workloads, while collaborating with cross-functional teams to translate research into reliable, scalable solutions. I enjoy mentoring teammates, driving practical ML adoption, and delivering value through end-to-end system design, monitoring, and continuous improvement.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

ML Research Assistant at UMBC
August 1, 2025 - November 18, 2025
Developed and fine-tuned LLMs using Hugging Face, LangChain, and OpenAI APIs on AWS SageMaker and EMR, integrating PySpark ETL pipelines to improve sensor data inference on 1M+ records. Built and optimized RAG pipelines with LangChain, Pinecone, and LangGraph, analyzing 500GB+ of sensor logs for faster and more accurate contextual responses. Implemented MLOps workflows automating 200+ model runs for tracking, drift detection, and reproducibility. Deployed agentic AI frameworks leveraging LangGraph, MCP, and MLflow, automating data prep, tuning, and scheduling for 30K+ records per run on multi-GPU clusters.
ML Engineer at Wipro Limited
August 1, 2023 - August 1, 2023
Developed and deployed 8+ ML models for fraud detection, traffic classification, recommendations, and engraving using PyTorch, Scikit-learn, and AWS SageMaker, automating ETL in Airflow, Glue, and Kafka for 100K+ daily transactions. Built real-time traffic filtering models using supervised and anomaly detection, blocking 250K+ bot requests daily across 15 global storefronts with AWS Step Functions and Lambda workflows. Collaborated on CNN-based text-to-engraving model on multi-GPU clusters, processing 50K+ text inputs and improving accuracy across 5K+ SKUs. Streamlined MLOps pipelines with MLflow, Prometheus, and Grafana, enabling 250+ retraining cycles and versioned deployments via SageMaker Pipelines.
ML Engineer at Cipla
September 1, 2021 - September 1, 2021
Architected end-to-end ML pipelines on AWS SageMaker using PyTorch, XGBoost, and Docker, training 10+ production models and reducing latency across 1M+ daily predictions. Engineered ETL and data warehousing pipelines with PySpark, AWS EMR, and Redshift, processing 50M+ clinical records daily with automated validation and schema evolution. Automated retraining and tuning workflows with Airflow DAGs and MLflow, managing 300+ model runs across 5 teams and cutting manual effort. Deployed models using Canary/A/B/blue-green strategies via CI/CD pipelines, maintaining zero downtime across 12 environments and serving 1M+ predictions daily.
DevOps Engineer at Trigent
May 1, 2019 - May 1, 2019
Developed infrastructure automation using Python, Shell scripting, CloudFormation, Ansible, and AWS SSM, provisioning 500+ AWS instances across 8 environments. Built CI/CD pipelines with GitHub, Jenkins, GitLab CI, SonarQube, and JFrog, automating build and deployment for 30+ applications. Managed AWS infrastructure with EC2, ECS, EKS, S3, RDS, Lambda, and CloudFormation, implementing multi-region replication and HA/DR solutions across 5 regions. Containerized and orchestrated 40+ microservices using Docker and Kubernetes, integrating AWS ElastiCache and Hazelcast for distributed caching, reducing API response time to 300 ms and supporting 10K+ concurrent daily users across production clusters.
ML Engineer at Wipro Limited
September 1, 2021 - August 1, 2023
Developed and deployed 8+ ML models for fraud detection, traffic classification, recommendations, and engraving using PyTorch, Scikit-learn, and AWS SageMaker, automating ETL in Airflow, Glue, and Kafka for 100K+ daily transactions. Built real-time traffic filtering models using supervised and anomaly detection, blocking 250K+ bot requests daily across 15 global storefronts with AWS Step Functions and Lambda workflows. Collaborated on a CNN-based text-to-engraving model on multi-GPU (NVIDIA A100) clusters, processing 50K+ text inputs and improving engraving accuracy across 5K+ SKUs. Streamlined MLOps pipelines with MLflow, Prometheus, and Grafana, automating 250+ retraining cycles and enabling versioned deployments via SageMaker Pipelines. Optimized deployments using FastAPI, Flask, Istio, and Cilium eBPF, achieving zero downtime across 15+ Apple Store environments; built 5+ Go-based CLI tools for automation. Enhanced scalability with Terraform, Helm, and AWS SDK, cutting environment s
ML Engineer at Cipla
June 1, 2019 - September 1, 2021
Architected end-to-end ML pipelines on AWS SageMaker using PyTorch, XGBoost, and Docker, training 10+ production models and reducing latency across 1M+ daily predictions. Engineered ETL and data warehousing pipelines with PySpark, AWS EMR, and Redshift, processing 50M+ clinical records daily with automated validation and schema evolution. Automated retraining and tuning workflows with Airflow DAGs and MLflow, managing 300+ model runs across 5 teams and cutting manual effort by 40+ hours monthly. Deployed models using Canary, A/B, and blue-green strategies via CI/CD pipelines, maintaining zero downtime across 12 environments and serving 1M+ predictions daily. Implemented model monitoring and drift detection with SageMaker Model Monitor and CloudWatch, tracking feature drift and accuracy metrics for 15+ endpoints. Built Tableau and Power BI dashboards visualizing model metrics, drift, and ETL throughput, enabling 50+ stakeholders to monitor and improve R&D performance.
DevOps Engineer at Trigent
May 1, 2017 - May 1, 2019
Developed infrastructure automation using Python, Shell scripting, CloudFormation, Ansible, and AWS SSM, provisioning 500+ AWS instances across 8 environments. Built CI/CD pipelines with GitHub, Jenkins, GitLab CI, SonarQube, and JFrog, automating build and deployment for 30+ applications. Managed AWS infrastructure with EC2, ECS, EKS, S3, RDS, Lambda, and CloudFormation, implementing multi-region replication and HA/DR solutions across 5 regions. Containerized and orchestrated 40+ microservices using Docker and Kubernetes, integrating AWS ElastiCache and Hazelcast for distributed caching, reducing API response time to 300 ms and supporting 10K+ concurrent daily users across production clusters. Enhanced observability and reliability with Prometheus, Grafana, ELK Stack, and AWS CloudWatch, tracking 500+ system metrics and logs to enable real-time anomaly alerts, auto-scaling triggers, and faster incident resolution (Less than 5 min detection time). Optimized deployment efficiency using

Education

Master of Professional Studies in Data Science at University of Maryland Baltimore County
August 1, 2023 - May 1, 2025
Master of Professional Studies in Data Science at University of Maryland Baltimore County
August 1, 2023 - May 1, 2025

Qualifications

HashiCorp Certified: Terraform Associate (003)
July 1, 2025 - July 1, 2027
HashiCorp Certified: Terraform Associate (003)
July 1, 2025 - July 1, 2027

Industry Experience

Software & Internet, Healthcare, Education, Professional Services