I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

Chetan Anand Panthukala

I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

Available to hire

I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges.

I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

See more

Experience Level

Expert
Expert
Expert
Expert

Language

English
Fluent

Work Experience

Backend Software Engineer at Perplexity AI – Perplexity Sonar (In-house Search LLM & API)
January 1, 2025 - November 18, 2025
Led next-generation LLM backend with Llama 3.3-70B and speculative decoding, improving factual Q&A reliability and massively scaling inference throughput on Cerebras clusters. Implemented Mixture-of-Experts routing and FP8 quantization to reduce latency and improve compute efficiency. Architected a Retrieval-Augmented Generation stack with FAISS, BM25, and LangChain, boosting contextual precision and reducing hallucinations. Automated benchmark evaluation across MMLU, TruthfulQA, and GSM8K, driving cost reductions through optimized AWS SageMaker workloads. Deployed Sonar Pro on AWS EKS and Triton Inference Server to ensure high availability and widespread enterprise adoption.
Software Engineer - Machine Learning at Meta
December 1, 2024 - December 1, 2024
Engineered large-scale training pipelines for LLaMA 3 (70B–130B) using PyTorch 2.2, DeepSpeed, and AWS, enabling faster throughput and reduced compute costs across 12K GPU clusters. Optimized multimodal architectures with FP8 quantization on AWS, enhancing cross-modal accuracy. Implemented RLHF + DPO fine-tuning with human feedback curation, improving alignment safety metrics and reducing hallucinations in production. Contributed to the open-source LLaMA 3.2 release with deployment scripts and evaluation suites on AWS, accelerating adoption. Built distributed data-processing workflows on AWS EMR/S3 using Spark, Ray, PyArrow, and Hydra for trillion-token multilingual datasets. Deployed optimized inference with Triton Inference Server, ONNX Runtime, TorchServe; monitored performance with Prometheus, Grafana, and internal dashboards.
Software Engineer at Accenture
June 1, 2023 - June 1, 2023
Built scalable knowledge graph embeddings on AWS using TensorFlow and Keras, boosting link prediction accuracy. Developed automated ML pipelines (training, evaluation, deployment) with MLflow and Docker, enhancing reproducibility and experimentation speed. Integrated AWS SageMaker, Azure ML, and Vertex AI for model lifecycle management, supporting large-scale generative AI and knowledge graph reasoning. Implemented MLOps pipelines with Docker, Airflow, and MLflow, enabling robust retraining and CI/CD automation across distributed cloud environments.

Education

Masters in Advanced Data Analytics at University of North Texas -TX, USA
January 11, 2030 - November 18, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services