Available to hire
Experienced AI/ML Engineer with 5+ years of building scalable AI solutions in Generative AI, LLMs, vector search, and
recommendation systems. Skilled in Python, PyTorch, TensorFlow, FAISS, Spark, and MLOps, with hands-on expertise in LLM
finetuning, prompt engineering, transformer architectures, GPU/CUDA acceleration, and big data pipelines. Strong background
in Docker, Kubernetes, CI/CD, Airflow/Kubeflow, and cloud platforms (AWS, GCP, Azure) for efficient model deployment and
monitoring. Adept at feature engineering, model explainability (SHAP, LIME), real-time inference, and translating complex AI
solutions into measurable business impact.
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Language
English
Fluent
Work Experience
AI/ML Engineer at Meta
May 1, 2023 - PresentLed optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% increase in daily engagement. Architected retrieval augmented generation pipelines with FAISS, BM25, Hugging Face, and LangChain, grounding answers from 50M+ documents and reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses using 4-bit quantization and TorchScript, reducing memory by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR+TTS with wav2vec-U and quantized HiFi-GAN for voice-first experiences across glasses and mobile. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, maintaining 5.7 TFLOP/GPU to accelerate 70B parameter LLaMA. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow; built privacy-compliant RLHF corpora and feature pipelines in SageMake
AI/ML Engineer at NVIDIA
August 1, 2022 - October 15, 2025Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, boosting accuracy by ~30% across enterprise use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, improving query resolution by 25% and customer satisfaction. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries such as healthcare and finance. Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and increasing GPU throughput by 35%. Led MLOps deployment with Docker, Kubernetes, and AWS SageMaker, achieving 99.9% uptime and faster release cycles. Conducted benchmarking, drift monitoring, and evaluation to strengthen safety and regulatory guardrails and improve GPU utilization by 60%.
AI/ML Engineer at Meta
May 1, 2023 - November 5, 2025Led optimization of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving 37% latency reduction and boosting daily engagement by 22%. Architected retrieval-augmented generation pipelines grounding answers from 50M+ documents using FAISS, BM25, Hugging Face, and LangChain, reducing hallucinations by 28%. Deployed LLaMA-Edge models for Ray-Ban smart glasses with 4-bit quantization, Glow compiler, and TorchScript, reducing memory usage by 45% while maintaining cross-lingual ASR accuracy. Directed real-time multilingual ASR + TTS systems with wav2vec-U and quantized HiFi-GAN. Orchestrated large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU to accelerate iteration cycles of 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs using Spark, PyArrow, Hive, and Airflow, building privacy-compliant RLHF corpora and integrating feature engineer
AI/ML Engineer at NVIDIA
August 1, 2022 - August 1, 2022Deployed real-time multilingual conversational AI systems with Python, PyTorch, Riva ASR/TTS, TensorRT, and CUDA, reducing response latency by 45% and enabling scalable cloud, edge, and on-prem deployments. Fine-tuned speech recognition and TTS models on domain-specific datasets, leveraging supervised learning and sequence-to-sequence architectures to improve accuracy by 30% across enterprise-grade use cases. Integrated NLP services, embeddings, and conversational workflows into chatbots and assistants, raising query resolution and engagement. Built end-to-end multimodal pipelines using NeMo, Spark, and SQL for regulated industries (healthcare, finance). Optimized inference pipelines with TensorRT, NVIDIA GPUs, and MLflow, cutting latency by 40% and boosting GPU throughput by 35%. Streamlined lifecycle management through a deployment matrix spanning Docker, Kubernetes, and AWS SageMaker.
AI/ML Engineer at Meta
May 1, 2023 - November 18, 2025Led optimization and deployment of Meta AI Assistant for 2B+ users by advancing transformer inference with PyTorch 2.1, TorchScript, and FBGEMM, achieving a 37% latency reduction and 22% higher daily engagement. Architected retrieval-augmented generation pipelines using FAISS, BM25, Hugging Face, and LangChain to ground answers from 50M+ documents and reduce hallucinations by 28%. Built and deployed 4-bit quantized LLaMA-Edge models for Ray-Ban smart glasses with Glow, TorchScript, cutting memory by 45% while preserving cross-lingual ASR accuracy. Led real-time multilingual ASR+TTS systems with wav2vec-U and quantized HiFi-GAN for voice-first experiences on glasses and mobile. Directed large-scale distributed training across 2,048 GPUs with DeepSpeed ZeRO-3 and NCCL, sustaining 5.7 TFLOP/GPU for 70B-parameter LLaMA models. Automated ingestion and transformation of 12TB+ daily logs via Spark, PyArrow, Hive, Airflow; built privacy-compliant RLHF corpora and integrated feature engineering
Education
Master of Science at Northeastern University
January 11, 2030 - October 15, 2025Master of Science, Computer Software Engineering at Northeastern University
January 11, 2030 - November 5, 2025Master of Science in Computer Software Engineering at Northeastern University, Boston, MA
January 11, 2030 - November 18, 2025Qualifications
Industry Experience
Software & Internet, Computers & Electronics, Media & Entertainment, Professional Services
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Hire a AI Engineer
We have the best ai engineer experts on Twine. Hire a ai engineer in Seattle today.