Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.…I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

Chetan Anand Panthukala

AI Engineer, Data Scientist, Full Stack Developer, +2





I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.…I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges. I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

Available to hire

I’m a software/ML engineer with 5+ years of experience building and optimizing large language models, RAG systems, and distributed MLOps pipelines across AWS, Azure, and GCP. I enjoy turning research innovations into production-grade AI systems using PyTorch, DeepSpeed, TensorFlow, and Kubernetes to deliver scalable, reliable solutions for enterprise-scale challenges.

I’m passionate about designing cloud-native AI infrastructure that bridges research with real-world applications, focusing on reliable inference, MoE routing, quantization, and efficient multi-node deployments to drive impactful AI adoption in business contexts.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Fluent

Work Experience

Backend Software Engineer at Perplexity AI – Perplexity Sonar (In-house Search LLM & API)

January 1, 2025 - November 18, 2025

Led next-generation LLM backend with Llama 3.3-70B and speculative decoding, improving factual Q&A reliability and massively scaling inference throughput on Cerebras clusters. Implemented Mixture-of-Experts routing and FP8 quantization to reduce latency and improve compute efficiency. Architected a Retrieval-Augmented Generation stack with FAISS, BM25, and LangChain, boosting contextual precision and reducing hallucinations. Automated benchmark evaluation across MMLU, TruthfulQA, and GSM8K, driving cost reductions through optimized AWS SageMaker workloads. Deployed Sonar Pro on AWS EKS and Triton Inference Server to ensure high availability and widespread enterprise adoption.

Software Engineer - Machine Learning at Meta

December 1, 2024 - December 1, 2024

Engineered large-scale training pipelines for LLaMA 3 (70B–130B) using PyTorch 2.2, DeepSpeed, and AWS, enabling faster throughput and reduced compute costs across 12K GPU clusters. Optimized multimodal architectures with FP8 quantization on AWS, enhancing cross-modal accuracy. Implemented RLHF + DPO fine-tuning with human feedback curation, improving alignment safety metrics and reducing hallucinations in production. Contributed to the open-source LLaMA 3.2 release with deployment scripts and evaluation suites on AWS, accelerating adoption. Built distributed data-processing workflows on AWS EMR/S3 using Spark, Ray, PyArrow, and Hydra for trillion-token multilingual datasets. Deployed optimized inference with Triton Inference Server, ONNX Runtime, TorchServe; monitored performance with Prometheus, Grafana, and internal dashboards.

Software Engineer at Accenture

June 1, 2023 - June 1, 2023

Built scalable knowledge graph embeddings on AWS using TensorFlow and Keras, boosting link prediction accuracy. Developed automated ML pipelines (training, evaluation, deployment) with MLflow and Docker, enhancing reproducibility and experimentation speed. Integrated AWS SageMaker, Azure ML, and Vertex AI for model lifecycle management, supporting large-scale generative AI and knowledge graph reasoning. Implemented MLOps pipelines with Docker, Airflow, and MLflow, enabling robust retraining and CI/CD automation across distributed cloud environments.