Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Hi, I'm Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay. I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.…Hi, I'm Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay. I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.

Kartheek S

AI Engineer, Data Scientist, Full Stack Developer, +2





Hi, I'm Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay. I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.…Hi, I'm Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay. I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.

Available to hire

Hi, I’m Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay.

I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Work Experience

AI Software Engineer (Generative AI Agents and Automation) at NVIDIA

June 1, 2024 - Present

Delivered a 38% improvement in real-time agent response latency by optimizing Triton inference pipelines, TensorRT-LLM execution graphs, and GPU memory management across multi-node DGX clusters. Increased user engagement by 2.4× by deploying personalized multimodal AI agents that combine LLM reasoning, TTS/ASR (Riva), and dynamic media generation via Maxine and Omniverse. Built NVIDIA NIM-based LLM microservices with tool-use orchestration, RAG retrieval, and secure action execution, deployed as GPU-backed microservices on Kubernetes using NVIDIA GPU Operator. Implemented end-to-end RAG pipelines using Milvus/FAISS, NeMo Retriever, RedisVector, and embedding optimization to support low-latency retrieval and continuous memory for agents. Enabled real-time generative media pipelines using Maxine (super-resolution, face alignment), NVENC/NVDEC, WebRTC, and CUDA-accelerated GStreamer for sub-100ms speech-avatar streaming. Engineered backend AI services using Python (FastAPI), Go, and gRPC

AI/ML Software Engineer at Amazon

June 1, 2019 - July 1, 2023

Led end-to-end design and deployment of real-time fraud detection models (XGBoost + GNN-based behavioral scoring), reducing false positives by 28% and improving authorization success rate by ~7% across UPI, wallet, and card payments. Architected a sub-50ms ML inference pipeline on AWS (SageMaker + EKS + DynamoDB feature store) that scaled to 40M+ daily transactions, directly increasing Amazon Pay revenue and lowering risk losses. Delivered AI-driven personalized payments and reward ranking systems that boosted offer CTR by 31%, improved user retention in bill-pay/recharge flows, and contributed to multi-quarter growth in Amazon Pay monetization KPIs. Built distributed streaming data pipelines using Kinesis, Kafka, Spark (EMR), Glue, and Terraform IaC — powering real-time features for fraud, KYC, risk, and bill-pay personalization. Developed and productionized ML models with PyTorch, TensorFlow, Scikit-Learn, leveraging SageMaker for training, feature extraction, hyperparameter tuning