Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.…Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.

Vishwam Shukla

AI Engineer, Data Scientist, Full Stack Developer, +5





Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.…Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI solutions into measurable business impact.

Available to hire

Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and
recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM
finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background
in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and
monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI
solutions into measurable business impact.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Intermediate

Intermediate

Language

English

Fluent

Work Experience

AI/ML Engineer at Meta

May 1, 2023 - Present

Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% increase in daily engagement. Architected retrieval augmented generation pipelines with FAISS, BM25, Hugging Face, and LangChain, grounding answers from 50M+ documents and reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses using 4-bit quantization and TorchScript, reducing memory by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR+TTS with wav2vec-U and quantized HiFi-GAN for voice-first experiences across glasses and mobile. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, maintaining 5.7 TFLOP/GPU to accelerate 70B parameter LLaMA. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow; built privacy-compliant RLHF corpora and feature pipelines in SageMake

AI/ML Engineer at NVIDIA

August 1, 2022 - October 15, 2025

Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, boosting accuracy by ~30% across enterprise use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, improving query resolution by 25% and customer satisfaction. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries such as healthcare and finance. Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and increasing GPU throughput by 35%. Led MLOps deployment with Docker, Kubernetes, and AWS SageMaker, achieving 99.9% uptime and faster release cycles. Conducted benchmarking, drift monitoring, and evaluation to strengthen safety and regulatory guardrails and improve GPU utilization by 60%.

AI/ML Engineer at Meta

May 1, 2023 - November 5, 2025

Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving 37% latency reduction and boosting daily engagement by 22%. Architected retrieval-augmented generation pipelines grounding answers from 50M+ documents using FAISS, BM25, Hugging Face, and LangChain, reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses with 4-bit quantization, Glow compiler, and TorchScript, reducing memory usage by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR + TTS systems with wav2vec-U and quantized HiFi-GAN. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU to accelerate iteration cycles of 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow, building privacy-compliant RLHF corpora and integrating feature engineer

AI/ML Engineer at NVIDIA

August 1, 2022 - August 1, 2022

Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, leveraging supervised learning and sequence-to-sequence architectures to improve accuracy by 30% across enterprise-grade use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, raising query resolution and engagement. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries (healthcare, finance). Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and boosting GPU throughput by 35%. Streamlined lifecycle management through a deployment matrix spanning Docker, Kubernetes, and AWS SageMaker.

AI/ML Engineer at Meta

May 1, 2023 - November 18, 2025

Led optimization and deployment of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% higher daily engagement. Architected retrieval-augmented generation pipelines using FAISS, BM25, Hugging Face, and LangChain to ground answers from 50M+ documents and reduce hallucinations by 28%. Built and deployed 4-bit quantized LLaMA-Edge models for Ray-Ban smart glasses with Glow, TorchScript, cutting memory by 45% while preserving cross-lingual ASR accuracy. Led real-time multilingual ASR+TTS systems with wav2vec-U and quantized HiFi-GAN for voice-first experiences on glasses and mobile. Directed large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU for 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs via Spark, PyArrow, Hive, Airflow; built privacy-compliant RLHF corpora and integrated feature engineering