I'm an AIML engineer with over 4 years of hands-on experience designing, training, and deploying machine learning and generative AI systems across enterprise and healthcare domains. I'm proficient in Python, PyTorch, Airflow, and AWS, and I've built predictive models using XGBoost and Random Forest, NLP pipelines, as well as optimizing LLM inference with vLLM, TensorRT-LLM, and DeepSpeed. At MetLife I boosted large-language-model throughput by 42% while reducing GPU costs through optimized multi-GPU orchestration.

Sri Sai Lalitha Mallika Yeturi

I'm an AIML engineer with over 4 years of hands-on experience designing, training, and deploying machine learning and generative AI systems across enterprise and healthcare domains. I'm proficient in Python, PyTorch, Airflow, and AWS, and I've built predictive models using XGBoost and Random Forest, NLP pipelines, as well as optimizing LLM inference with vLLM, TensorRT-LLM, and DeepSpeed. At MetLife I boosted large-language-model throughput by 42% while reducing GPU costs through optimized multi-GPU orchestration.

Available to hire

I’m an AIML engineer with over 4 years of hands-on experience designing, training, and deploying machine learning and generative AI systems across enterprise and healthcare domains.

I’m proficient in Python, PyTorch, Airflow, and AWS, and I’ve built predictive models using XGBoost and Random Forest, NLP pipelines, as well as optimizing LLM inference with vLLM, TensorRT-LLM, and DeepSpeed. At MetLife I boosted large-language-model throughput by 42% while reducing GPU costs through optimized multi-GPU orchestration.

See more

Experience Level

Expert
Expert
Expert
Expert
Intermediate

Work Experience

AIML Engineer - LLM Optimization & Inference Platform at MetLife
September 1, 2024 - November 27, 2025
Led optimization of large-language-model inference using vLLM and TensorRT-LLM, reducing latency by 42% for 3B-parameter variants; deployed quantized INT8 GPTQ pipelines with KV-cache streaming and head pruning, cutting GPU memory by 35% and enabling parallel serving on commodity A100 clusters; integrated DeepSpeed MII for multi-GPU orchestration and gradient-free inference, achieving near-linear scalability on 8-GPU nodes via NCCL optimization and CUDA Graphs; built an inference monitoring stack with Triton Server, Prometheus, and CloudWatch for real-time latency metrics and auto-scaling; converted fine-tuned models to ONNX Runtime for edge-ready deployments, accelerating token throughput by 1.7x; developed modular retrieval and prompt-evaluation workflows using LangChain aligned with compliance; automated CI/CD for model packaging and rollout via Docker and GitHub Actions, shortening release cycles from weekly to daily and boosting deployment reliability by 50%.
Machine Learning Engineer - Predictive Health Risk Modeling & ML Pipeline Automation Platform at Sage Softtech
July 1, 2023 - July 1, 2023
Designed and deployed patient readmission risk models using Random Forest and XGBoost on 50M+ historical claims and EHR data, improving early detection of high-risk patients by 18%; built end-to-end ML pipeline in Python and Airflow automating data ingestion, feature generation, model training, and evaluation; reduced retraining time from 4 hours to under 45 minutes; deployed inference APIs via FastAPI and Docker on AWS EC2, integrating real-time predictions into the care management dashboard accessed by 200+ physicians daily; implemented model lifecycle tracking and drift monitoring with MLflow and Prometheus, enabling reproducible experiments and timely retraining; processed unstructured physician notes with spaCy and NER to extract comorbidities, medication patterns, and discharge summaries; enriched data boosted predictive AUC by 0.09 and informed chronic-care management programs.

Education

Master of Science in Artificial Intelligence at Yeshiva University
January 11, 2030 - May 1, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Software & Internet, Professional Services