Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I’m Sai Teja Gurrapu, an AI/ML Engineer with 5+ years focused on multimodal AI, generative AI, and large-scale ML deployments. I enjoy accelerating analytics with GPU computing, building RAG systems, and co-developing enterprise AI platforms. I thrive when turning complex data into practical business solutions and collaborating with cross-functional teams across global industries. My toolkit spans MLOps, cloud-native integration, and Responsible AI practices. I’ve led scalable data pipelines, monitoring for GPU clusters, and multi-agent AI orchestration to deliver measurable improvements in research productivity and production performance. I’m passionate about shared learning, experimentation, and building robust, auditable AI systems.…Hi, I’m Sai Teja Gurrapu, an AI/ML Engineer with 5+ years focused on multimodal AI, generative AI, and large-scale ML deployments. I enjoy accelerating analytics with GPU computing, building RAG systems, and co-developing enterprise AI platforms. I thrive when turning complex data into practical business solutions and collaborating with cross-functional teams across global industries. My toolkit spans MLOps, cloud-native integration, and Responsible AI practices. I’ve led scalable data pipelines, monitoring for GPU clusters, and multi-agent AI orchestration to deliver measurable improvements in research productivity and production performance. I’m passionate about shared learning, experimentation, and building robust, auditable AI systems.

Sai Teja Gurrapu

AI Engineer, Data Scientist, Full Stack Developer, +4





Hi, I’m Sai Teja Gurrapu, an AI/ML Engineer with 5+ years focused on multimodal AI, generative AI, and large-scale ML deployments. I enjoy accelerating analytics with GPU computing, building RAG systems, and co-developing enterprise AI platforms. I thrive when turning complex data into practical business solutions and collaborating with cross-functional teams across global industries. My toolkit spans MLOps, cloud-native integration, and Responsible AI practices. I’ve led scalable data pipelines, monitoring for GPU clusters, and multi-agent AI orchestration to deliver measurable improvements in research productivity and production performance. I’m passionate about shared learning, experimentation, and building robust, auditable AI systems.…Hi, I’m Sai Teja Gurrapu, an AI/ML Engineer with 5+ years focused on multimodal AI, generative AI, and large-scale ML deployments. I enjoy accelerating analytics with GPU computing, building RAG systems, and co-developing enterprise AI platforms. I thrive when turning complex data into practical business solutions and collaborating with cross-functional teams across global industries. My toolkit spans MLOps, cloud-native integration, and Responsible AI practices. I’ve led scalable data pipelines, monitoring for GPU clusters, and multi-agent AI orchestration to deliver measurable improvements in research productivity and production performance. I’m passionate about shared learning, experimentation, and building robust, auditable AI systems.

Available to hire

Hi, I’m Sai Teja Gurrapu, an AI/ML Engineer with 5+ years focused on multimodal AI, generative AI, and large-scale ML deployments. I enjoy accelerating analytics with GPU computing, building RAG systems, and co-developing enterprise AI platforms. I thrive when turning complex data into practical business solutions and collaborating with cross-functional teams across global industries.

My toolkit spans MLOps, cloud-native integration, and Responsible AI practices. I’ve led scalable data pipelines, monitoring for GPU clusters, and multi-agent AI orchestration to deliver measurable improvements in research productivity and production performance. I’m passionate about shared learning, experimentation, and building robust, auditable AI systems.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Intermediate

Language

English

Fluent

Work Experience

AI/ML Engineer at Meta

February 1, 2024 - Present

Scaled Meta’s GPU infrastructure powering LLMs (e.g., LLaMA) and enabled efficient distributed training, boosting AI research productivity and reducing model development cycles. Built high-throughput pipelines across 24,000+ H100 GPUs and contributed to global AI hardware standards that cut training costs by 20%. Implemented distributed training workflows in PyTorch with tensor and pipeline parallelism, and developed AI-assisted data pipelines for preprocessing, fault recovery, and experiment tracking around billions of parameters. Designed monitoring with AutoGen agents for intelligent workload management and dynamic scaling, and created custom schedulers coordinating thousands of nodes for optimized resource allocation.

AI/ML Engineer at Meta

February 1, 2024 - Present

AI/ML Engineer responsible for scaling Meta’s GPU infrastructure powering LLMs such as LLaMA and enabling efficient distributed training. Built high-throughput compute pipelines across 24,000+ H100 GPUs, accelerating AI-assisted pipelines and production deployments for global research teams. Contributed to open AI hardware standards, reducing training costs by ~20% and advancing scalability. Engineered PyTorch distributed training workflows with tensor and pipeline parallelism to optimize GPU utilization during pretraining, fine-tuning, and large-scale inference. Developed AI-assisted pipelines automating dataset preprocessing, fault recovery, and experiment tracking for billions of parameters. Designed GPU-cluster monitoring with AutoGen agents for workload management and anomaly detection, enabling dynamic scaling. Implemented custom schedulers coordinating thousands of nodes and integrated PyTorch Distributed with AI tools. Collaborated with research teams on next-generation LLaMA

AI/ML Engineer at Meta

February 1, 2024 - Present

Scaled Meta’s GPU infrastructure powering LLMs like LLaMA; delivered high-throughput compute pipelines across 24,000+ H100 GPUs; contributed to global AI hardware standards; engineered distributed training workflows in PyTorch with tensor and pipeline parallelism; developed AI-assisted pipelines automating dataset preprocessing, fault recovery, and experiment tracking; designed and deployed GPU cluster monitoring with AutoGen agents; implemented custom schedulers coordinating thousands of nodes; partnered with research teams on LLaMA training/inference; built Python automation scripts for multimodal dataset orchestration and reproducible experiments across Meta’s infrastructure platforms; applied AutoGen multi-agent orchestration to scale LLaMA training and inference.

AI/ML Engineer at HCL Technologies

November 1, 2022 - October 9, 2025

Co-designed and deployed predictive maintenance ML models on edge compute for medical equipment, improving uptime by 30% and enabling real-time insights via integrated dashboards. Built supervised ML pipelines (SVM, Random Forest, Neural Networks) on IoT telemetry and unsupervised anomaly detection (PCA, K-means) to enhance early failure detection. Engineered IoT data pipelines with preprocessing steps including Fourier transforms and noise filtering, and deployed models on on-premise edge with Flask REST APIs and Docker, integrating with HCL’s IoT Works platform for live monitoring. Led model retraining and monitoring pipelines to maintain accuracy and collaborated with junior engineers on GPU utilization and MLOps practices, co-prototyping CV and NLP PoCs in DGX environments.

AI/ML Engineer at HCL Technologies

November 1, 2022 - October 9, 2025

Assisted in detecting anomalies in critical medical devices using predictive maintenance ML models, improving uptime by ~30% and reducing unplanned downtime. Co-automated AI models deployed on edge compute with Flask APIs and Docker, delivering real-time insights through integrated dashboards. Implemented supervised ML pipelines (SVM, Random Forest, Neural Networks) on IoT telemetry to predict equipment failures and support preventive maintenance strategies. Built unsupervised anomaly detection workflows using PCA and K-means to identify unusual sensor patterns and enhance early failure detection. Engineered IoT data pipelines with preprocessing steps including Fourier transforms and noise filtering to convert raw signals into structured model-ready data. Deployed models in on-premise edge environments and co-integrated with HCL’s IoT Works platform for live monitoring. Helped establish retraining and monitoring pipelines to maintain predictive accuracy and adapt to evolving telemetr

AI/ML Engineer at HCL Technologies

November 1, 2022 - October 9, 2025

Assisted in anomaly detection for medical devices using predictive maintenance ML models; co-automated fault detection on edge compute; implemented anomaly alerts and health monitoring dashboards; co-designed supervised ML pipelines (SVM, Random Forest, Neural Networks) for IoT telemetry; co-built unsupervised anomaly detection (PCA, K-means); engineered IoT data pipelines with Fourier transforms, noise filtering, and feature engineering; deployed models on on-premise edge with Flask REST APIs and Docker; helped establish retraining and monitoring pipelines; co-prototyped computer vision and NLP solutions in Next.ai Lab using NVIDIA DGX-1 GPUs, TensorFlow, PyTorch, and Scikit-learn; collaborated with junior engineers on GPU utilization, AI experimentation, and MLOps best practices.

AI/ML Engineer at HCL Technologies

June 1, 2019 - November 1, 2022

Contributed to predictive maintenance initiatives by developing ML models to detect anomalies in critical medical devices, improving uptime and reducing unplanned downtime. Co-automated fault detection on edge devices with real-time dashboards, boosting operational efficiency. Implemented anomaly alerts and health monitoring dashboards to accelerate decision-making and maintenance scheduling. Built supervised ML pipelines (SVM, Random Forest, Neural Networks) on IoT telemetry for predicting equipment failures. Developed unsupervised anomaly detection using PCA and K-means to extend coverage. Engineered IoT data pipelines with preprocessing (Fourier transforms, noise filtering, feature engineering) to prepare high-frequency signals for modeling. Deployed AI models on on-premise edge environments via Flask REST APIs and Docker, integrating with HCL’s IoT Works platform for real-time monitoring. Implemented model retraining and monitoring pipelines to maintain accuracy as equipment beha