Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I’m Monica Challa, a machine learning engineer focused on production-grade LLMs, retrieval-augmented generation, and secure, scalable MLOps on AWS. Over the past 5+ years, I’ve built end-to-end AI systems across financial services, enterprise software, and AdTech, driving measurable gains in model accuracy, latency, and adoption. I thrive in cross-functional teams and enjoy turning complex ML concepts into reliable, production-ready solutions with human-in-the-loop safety. In my current role at Scale AI, I lead end-to-end ML deployments, optimize real-time inference, and implement secure personalization and payments integrations. Previously at Accenture, I delivered real-time ML in SynOps, boosted CTR and fraud detection, and built large-scale data pipelines. I’m passionate about robust ML engineering, secure AI systems, and scalable cloud-native architectures.…Hi, I’m Monica Challa, a machine learning engineer focused on production-grade LLMs, retrieval-augmented generation, and secure, scalable MLOps on AWS. Over the past 5+ years, I’ve built end-to-end AI systems across financial services, enterprise software, and AdTech, driving measurable gains in model accuracy, latency, and adoption. I thrive in cross-functional teams and enjoy turning complex ML concepts into reliable, production-ready solutions with human-in-the-loop safety. In my current role at Scale AI, I lead end-to-end ML deployments, optimize real-time inference, and implement secure personalization and payments integrations. Previously at Accenture, I delivered real-time ML in SynOps, boosted CTR and fraud detection, and built large-scale data pipelines. I’m passionate about robust ML engineering, secure AI systems, and scalable cloud-native architectures.

Monica Challa





Hi, I’m Monica Challa, a machine learning engineer focused on production-grade LLMs, retrieval-augmented generation, and secure, scalable MLOps on AWS. Over the past 5+ years, I’ve built end-to-end AI systems across financial services, enterprise software, and AdTech, driving measurable gains in model accuracy, latency, and adoption. I thrive in cross-functional teams and enjoy turning complex ML concepts into reliable, production-ready solutions with human-in-the-loop safety. In my current role at Scale AI, I lead end-to-end ML deployments, optimize real-time inference, and implement secure personalization and payments integrations. Previously at Accenture, I delivered real-time ML in SynOps, boosted CTR and fraud detection, and built large-scale data pipelines. I’m passionate about robust ML engineering, secure AI systems, and scalable cloud-native architectures.…Hi, I’m Monica Challa, a machine learning engineer focused on production-grade LLMs, retrieval-augmented generation, and secure, scalable MLOps on AWS. Over the past 5+ years, I’ve built end-to-end AI systems across financial services, enterprise software, and AdTech, driving measurable gains in model accuracy, latency, and adoption. I thrive in cross-functional teams and enjoy turning complex ML concepts into reliable, production-ready solutions with human-in-the-loop safety. In my current role at Scale AI, I lead end-to-end ML deployments, optimize real-time inference, and implement secure personalization and payments integrations. Previously at Accenture, I delivered real-time ML in SynOps, boosted CTR and fraud detection, and built large-scale data pipelines. I’m passionate about robust ML engineering, secure AI systems, and scalable cloud-native architectures.

Available to hire

Hi, I’m Monica Challa, a machine learning engineer focused on production-grade LLMs, retrieval-augmented generation, and secure, scalable MLOps on AWS. Over the past 5+ years, I’ve built end-to-end AI systems across financial services, enterprise software, and AdTech, driving measurable gains in model accuracy, latency, and adoption. I thrive in cross-functional teams and enjoy turning complex ML concepts into reliable, production-ready solutions with human-in-the-loop safety.

In my current role at Scale AI, I lead end-to-end ML deployments, optimize real-time inference, and implement secure personalization and payments integrations. Previously at Accenture, I delivered real-time ML in SynOps, boosted CTR and fraud detection, and built large-scale data pipelines. I’m passionate about robust ML engineering, secure AI systems, and scalable cloud-native architectures.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Advanced

Work Experience

Machine Learning Engineer at Scale AI

January 1, 2025 - Present

Led end-to-end ML deployment for enterprise-grade LLMs, achieving a 32% increase in accuracy via RLHF fine-tuning and domain-specific retrieval pipelines; reduced real-time inference latency by 45% through containerized deployments, GPU autoscaling, and prompt-chain compression on Kubernetes (EKS) and EC2; enhanced AI Copilot adoption by 60% through secure, vector-based personalization with user embeddings; built scalable RAG systems using FAISS and OpenSearch integrated with Hugging Face Transformers and PyTorch for multimodal LLMs; implemented real-time, fault-tolerant inference services with Triton Inference Server, TorchServe, and ONNX Runtime; established CI/CD pipelines for ML deployment and blue/green rollouts; integrated PCI-aligned Stripe and Plaid payments and AI-driven recommendations; set up monitoring with CloudWatch, Prometheus, Grafana; collaborated on automated model validation with human-in-the-loop safety thresholds.

Software Engineer at Accenture

September 1, 2019 - November 1, 2023

Delivered a 41% reduction in operational processing time by deploying real-time ML models into SynOps workflows for finance and procurement; Improved ad personalization CTR by 28% with a Transformer-based recommender served via SageMaker endpoints and Redis; Increased fraud detection accuracy by 35% using gradient-boosted models and anomaly detection; Built NLP and deep learning models with PyTorch, Transformers, and TensorFlow for content routing, document parsing, and ad relevance scoring; Engineered high-volume data pipelines with Spark, Airflow, and AWS Glue handling 50M+ daily events; Implemented real-time event-driven inference with Lambda, Kinesis, Kafka; Managed MLOps lifecycle with MLflow, SageMaker Pipelines, and Kubernetes; Architected multimodal systems combining Textract, SageMaker, and S3 for OCR and invoice processing; Optimized distributed training on SageMaker and EC2 GPUs; Integrated Stripe and Razorpay and ad platforms to enable secure monetization.