Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.

Vishwam Shukla

Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.

Available to hire

Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and
recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM
finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background
in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and
monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI
solutions into measurable business impact.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

AI/ML Engineer at Meta
May 1, 2023 - Present
Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% increase in daily engagement. Architected retrieval augmented generation pipelines with FAISS, BM25, Hugging Face, and LangChain, grounding answers from 50M+ documents and reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses using 4-bit quantization and TorchScript, reducing memory by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR+TTS with wav2vec-U and quantized HiFi-GAN for voice-first experiences across glasses and mobile. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, maintaining 5.7 TFLOP/GPU to accelerate 70B parameter LLaMA. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow; built privacy-compliant RLHF corpora and feature pipelines in SageMake
AI/ML Engineer at NVIDIA
August 1, 2022 - October 15, 2025
Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, boosting accuracy by ~30% across enterprise use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, improving query resolution by 25% and customer satisfaction. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries such as healthcare and finance. Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and increasing GPU throughput by 35%. Led MLOps deployment with Docker, Kubernetes, and AWS SageMaker, achieving 99.9% uptime and faster release cycles. Conducted benchmarking, drift monitoring, and evaluation to strengthen safety and regulatory guardrails and improve GPU utilization by 60%.
AI/ML Engineer at Meta
May 1, 2023 - November 5, 2025
Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving 37% latency reduction and boosting daily engagement by 22%. Architected retrieval-augmented generation pipelines grounding answers from 50M+ documents using FAISS, BM25, Hugging Face, and LangChain, reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses with 4-bit quantization, Glow compiler, and TorchScript, reducing memory usage by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR + TTS systems with wav2vec-U and quantized HiFi-GAN. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU to accelerate iteration cycles of 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow, building privacy-compliant RLHF corpora and integrating feature engineer
AI/ML Engineer at NVIDIA
August 1, 2022 - August 1, 2022
Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, leveraging supervised learning and sequence-to-sequence architectures to improve accuracy by 30% across enterprise-grade use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, raising query resolution and engagement. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries (healthcare, finance). Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and boosting GPU throughput by 35%. Streamlined lifecycle management through a deployment matrix spanning Docker, Kubernetes, and AWS SageMaker.
AI/ML Engineer at Meta
May 1, 2023 - November 18, 2025
Led optimization and deployment of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% higher daily engagement. Architected retrieval-augmented generation pipelines using FAISS, BM25, Hugging Face, and LangChain to ground answers from 50M+ documents and reduce hallucinations by 28%. Built and deployed 4-bit quantized LLaMA-Edge models for Ray-Ban smart glasses with Glow, TorchScript, cutting memory by 45% while preserving cross-lingual ASR accuracy. Led real-time multilingual ASR+TTS systems with wav2vec-U and quantized HiFi-GAN for voice-first experiences on glasses and mobile. Directed large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU for 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs via Spark, PyArrow, Hive, Airflow; built privacy-compliant RLHF corpora and integrated feature engineering

Education

Master of Science at Northeastern University
January 11, 2030 - October 15, 2025
Master of Science, Computer Software Engineering at Northeastern University
January 11, 2030 - November 5, 2025
Master of Science in Computer Software Engineering at Northeastern University, Boston, MA
January 11, 2030 - November 18, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Computers & Electronics, Media & Entertainment, Professional Services