I am an AI/ML Engineer with over 4 years of experience delivering scalable, low-latency AI solutions across cloud and edge platforms. I specialize in architecting real-time AI systems that significantly boost user engagement, accelerate decision-making, and reduce operational bottlenecks. I have strong skills in deploying and fine-tuning large language models (LLMs), RAG pipelines, and multi-agent AI frameworks to deliver measurable improvements in accuracy, speed, and adoption. I am highly adept at MLOps, building microservices, and on-device AI ensuring privacy, compliance, and production-grade reliability in high-impact environments. I enjoy solving complex problems and creating efficient AI infrastructure that drives business value in fast-paced cloud and enterprise settings.

Mani Varma

I am an AI/ML Engineer with over 4 years of experience delivering scalable, low-latency AI solutions across cloud and edge platforms. I specialize in architecting real-time AI systems that significantly boost user engagement, accelerate decision-making, and reduce operational bottlenecks. I have strong skills in deploying and fine-tuning large language models (LLMs), RAG pipelines, and multi-agent AI frameworks to deliver measurable improvements in accuracy, speed, and adoption. I am highly adept at MLOps, building microservices, and on-device AI ensuring privacy, compliance, and production-grade reliability in high-impact environments. I enjoy solving complex problems and creating efficient AI infrastructure that drives business value in fast-paced cloud and enterprise settings.

Available to hire

I am an AI/ML Engineer with over 4 years of experience delivering scalable, low-latency AI solutions across cloud and edge platforms. I specialize in architecting real-time AI systems that significantly boost user engagement, accelerate decision-making, and reduce operational bottlenecks. I have strong skills in deploying and fine-tuning large language models (LLMs), RAG pipelines, and multi-agent AI frameworks to deliver measurable improvements in accuracy, speed, and adoption.

I am highly adept at MLOps, building microservices, and on-device AI ensuring privacy, compliance, and production-grade reliability in high-impact environments. I enjoy solving complex problems and creating efficient AI infrastructure that drives business value in fast-paced cloud and enterprise settings.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

AI/ML Engineer at McKinsey & Company
September 1, 2024 - Present
Designed an AI-powered document processing platform using LLMs and RAG architecture, enabling real-time inference on over 100K documents with 40% faster retrieval. Delivered RAG pipelines with vector databases (Qdrant, Weaviate) that cut multi-modal retrieval latency by 40%, significantly improving customer experience. Developed API-driven AI/ML microservices using FastAPI and Flask, containerized with Docker, and orchestrated with Kubernetes for deployment across Azure and GCP, reducing deployment times by 30%. Implemented MLOps pipelines with MLflow, Weights & Biases, and AWS SageMaker, improving model governance and increasing accuracy by 18%. Built on-device AI solutions for Android using TensorFlow Lite, MediaPipe, and PyTorch Mobile with federated learning to ensure privacy compliance. Modernized enterprise infrastructure through Autosys migration and vulnerability management, improving release cycles by 40%.
ML Engineer at Avenir Technologies
December 31, 2022 - August 26, 2025
Automated data preprocessing pipelines using Python and PySpark, reducing data preparation time by 50% and enhancing feature quality. Built predictive machine learning models with TensorFlow and Scikit-learn, increasing accuracy by 35% and driving higher customer conversions. Architected and deployed ML services using Docker, Kubernetes, and AWS/Azure for real-time predictions with GPU-optimized vector search via Pinecone, FAISS, and Qdrant. Engineered secure cloud-native ML infrastructure with Terraform, integrating real-time ingestion pipelines using Kafka, Databricks, and GCP BigQuery, enhancing throughput by 45%. Introduced model validation and automated testing pipelines using MLflow and pytest, ensuring 99% reproducibility and faster compliance.
AI/ML Engineer at McKinsey & Company
September 1, 2024 - Present
Designed an AI-powered document processing platform using LLMs, RAG architecture, ML pipelines with vector databases and AWS services enabling real-time inference on 100K+ documents with 40% faster retrieval. Delivered RAG pipelines with vector databases (Qdrant, Weaviate), cutting multi-modal retrieval latency by 40%, improving customer experience in high-volume environments. Built API-driven AI/ML microservices (FastAPI, Flask), containerized with Docker, orchestrated via Kubernetes, and deployed across Azure & GCP, reducing deployment times by 30%. Implemented MLOps pipelines with MLflow, W&B, and SageMaker improving model governance, enabling continuous experimentation, raising model accuracy by 18%. Developed on-device AI solutions for Android using TensorFlow Lite, MediaPipe, and PyTorch Mobile achieving low-latency inference with federated learning ensuring privacy compliance. Modernized enterprise infrastructure via Autosys migration, vulnerability management, and Java upgrades
ML Engineer at Avenir Technologies
December 31, 2022 - August 26, 2025
Automated data preprocessing pipelines in Python and PySpark, cutting data prep time by 50% and improving feature quality for downstream models. Built predictive ML models in TensorFlow and Scikit-learn, increasing accuracy by 35% and driving higher customer conversions. Architected and deployed ML services with Docker, Kubernetes, and AWS/Azure, enabling real-time predictions and GPU-optimized vector search with Pinecone, FAISS, and Qdrant. Engineered secure cloud-native ML infrastructure with Terraform, integrated with real-time ingestion pipelines using Kafka, Databricks, and GCP BigQuery, improving throughput by 45%. Introduced model validation and automated testing pipelines with MLflow and pytest, ensuring 99% reproducibility and faster compliance checks.

Education

Master of Computer Science at Lamar University
January 11, 2030 - August 26, 2025
Master of Computer Science at Lamar University
January 11, 2030 - August 26, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Financial Services, Professional Services, Healthcare, Computers & Electronics