I am an AI/ML Engineer with 5 years of experience building multi-agent systems, large-scale LLM inference platforms, and real-time financial ML solutions. I thrive on designing scalable, data-driven architectures and collaborating with cross-functional teams to optimize throughput and latency, deploying models on AWS and ensuring robust MLOps practices.

Sai Kumar

I am an AI/ML Engineer with 5 years of experience building multi-agent systems, large-scale LLM inference platforms, and real-time financial ML solutions. I thrive on designing scalable, data-driven architectures and collaborating with cross-functional teams to optimize throughput and latency, deploying models on AWS and ensuring robust MLOps practices.

Available to hire

I am an AI/ML Engineer with 5 years of experience building multi-agent systems, large-scale LLM inference platforms, and real-time financial ML solutions.
I thrive on designing scalable, data-driven architectures and collaborating with cross-functional teams to optimize throughput and latency, deploying models on AWS and ensuring robust MLOps practices.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

ML/Data Engineer at McKinsey & Co.
June 1, 2023 - Present
Designed and developed multi-tiered enterprise data applications, built scalable ETL pipelines using Python, SQL, and Spark to process large volumes of transactional, behavioral, and risk-related data across financial systems; developed ingestion pipelines for unstructured data; built real-time Kafka pipelines; designed data models and analytical marts enabling fraud analytics, identity scoring, and risk monitoring; integrated vector databases into enterprise search and RAG systems for internal knowledge retrieval; automated validation, schema enforcement, and data quality checks; containerized and deployed data services with Kubernetes and Azure Functions; partnered with data scientists to build feature pipelines and ML workflows; contributed to benchmarking frameworks and model performance analysis.
ML/Data Engineer at PNC Bank
May 1, 2019 - August 1, 2021
Developed backend services for financial transaction processing; built LLM inference and embedding pipelines on Azure; developed ETL and text-processing workflows for PDFs, logs, and telemetry; designed RAG pipelines with chunking strategies and embedding storage; built high-throughput data ingestion for fraud analytics and risk scoring (30M+ daily transactions); implemented ETL pipelines for aggregating customer, device, merchant, and geo-behavioral data; created logging and telemetry for anomaly detection; designed data models for fraud investigations and risk classification; built real-time ML scoring pipelines; created PDF/document extraction pipelines; integrated vector databases for semantic search; implemented Airflow orchestration, data quality monitoring, and dashboards; delivered FastAPI gateways and TensorFlow Serving endpoints to support real-time case management.
AI/ML Engineer at McKinsey & Co.
June 1, 2023 - Present
Designed and developed multi-tiered data and analytics applications that improved processing efficiency by 35% for consulting workflows. Built a data-centric multi-agent AI platform supporting trading, risk, compliance, and customer analytics on structured financial datasets. Deployed scalable agent and model-serving infrastructure on AWS (Lambda and containerized services) with sub-2-second latency for 50,000+ daily queries. Partnered with infra teams to deploy LLMs on AWS Trainium/Inferentia and tune batching/parallelism, improving throughput by 35% and optimizing resource utilization. Developed internal ML tooling to profile and debug accuracy-performance tradeoffs across hardware accelerators. Collaborated with data science teams to tune model parallelism and batching techniques for enterprise-scale inference workloads. Contributed to benchmarking frameworks measuring end-to-end model performance and resource utilization across multi-node deployments.
ML Engineer at PNC Bank
May 1, 2019 - August 1, 2021
Developed core backend services for financial transaction processing, improving system throughput by 45% and reducing latency by 30%. Built real-time fraud detection models for 30M+ daily transactions, improving precision to 98% and reducing false positives by 35%. Built ensemble fraud models combining gradient boosting (XGBoost) and deep neural networks, achieving 94% precision and 87% recall while reducing false positives by 52%. Deployed PyTorch and TensorFlow models on AWS SageMaker with autoscaling, maintaining 99.9% uptime across peak loads. Accelerated inference using quantization and pruning, cutting latency by 30–45% and lowering compute cost by 40%. Implemented end-to-end ML CI/CD with GitHub Actions, Docker, Terraform, and Helm, reducing deployment cycles from days to under three hours. Set up MLflow experiment tracking and a model registry with approval gates for risk and compliance, ensuring full lineage and auditability. Delivered production monitoring for data quality,

Education

M.S. Applied Statistics & Decision Analytics at Wester Illinois University
January 11, 2030 - February 9, 2026
M.S. Applied Statistics & Decision Analytics at Western Illinois University
January 11, 2030 - February 9, 2026

Qualifications

Add your qualifications or awards here.

Industry Experience

Financial Services, Professional Services, Software & Internet