Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I'm Sai Mouleeswar Reddy, an AI/ML Engineer specializing in LLM quantization, compression, and inference optimization. I enjoy building deployment-ready AI pipelines and exploring scalable MLOps to bring cutting-edge models into production. In my current role at TCS for the AMD client, I focus on post-training quantization, benchmarking, and exporting optimized models, collaborating across AI, hardware, and runtime teams to deliver efficient and robust LLM solutions.…I'm Sai Mouleeswar Reddy, an AI/ML Engineer specializing in LLM quantization, compression, and inference optimization. I enjoy building deployment-ready AI pipelines and exploring scalable MLOps to bring cutting-edge models into production. In my current role at TCS for the AMD client, I focus on post-training quantization, benchmarking, and exporting optimized models, collaborating across AI, hardware, and runtime teams to deliver efficient and robust LLM solutions.

Sai Mouleeswar Reddy

AI Engineer, Data Annotator, Data Collector, +1





I'm Sai Mouleeswar Reddy, an AI/ML Engineer specializing in LLM quantization, compression, and inference optimization. I enjoy building deployment-ready AI pipelines and exploring scalable MLOps to bring cutting-edge models into production. In my current role at TCS for the AMD client, I focus on post-training quantization, benchmarking, and exporting optimized models, collaborating across AI, hardware, and runtime teams to deliver efficient and robust LLM solutions.…I'm Sai Mouleeswar Reddy, an AI/ML Engineer specializing in LLM quantization, compression, and inference optimization. I enjoy building deployment-ready AI pipelines and exploring scalable MLOps to bring cutting-edge models into production. In my current role at TCS for the AMD client, I focus on post-training quantization, benchmarking, and exporting optimized models, collaborating across AI, hardware, and runtime teams to deliver efficient and robust LLM solutions.

Available to hire

I’m Sai Mouleeswar Reddy, an AI/ML Engineer specializing in LLM quantization, compression, and inference optimization. I enjoy building deployment-ready AI pipelines and exploring scalable MLOps to bring cutting-edge models into production.

In my current role at TCS for the AMD client, I focus on post-training quantization, benchmarking, and exporting optimized models, collaborating across AI, hardware, and runtime teams to deliver efficient and robust LLM solutions.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Work Experience

AI/ML Engineer – LLM Quantization & Optimization at Tata Consultancy Services Ltd.

November 1, 2023 - Present

Performed post-training LLM quantization using AWQ, GPTQ, SmoothQuant, and rotation-based methods. Worked with LLMs including Qwen2.5, Qwen1.5, DeepSeek, LLaMA, Meta LLaMA, Mistral 7B, Phi-3, Phi-4. Configured quantization parameters including weight-only 4-bit per-group schemes, group sizes, sequence lengths, and calibration datasets to optimize inference efficiency and accuracy. Ran validation and benchmarking pipelines, measuring perplexity, throughput, and model stability. Exported optimized models in Hugging Face format and runtime-compatible formats for deployment. Collaborated with AI developers, hardware teams, and runtime engineers to deliver deployment-ready, optimized LLM variants.