Available to hire
Hi, I’m Kartheek S, an AI/ML engineer with 5+ years of experience designing and deploying high-performance generative AI, multimodal pipelines, and fintech systems. I specialize in LLM-powered agents, real-time fraud/risk scoring, and GPU-accelerated inference on cloud-native architectures. I thrive on reducing latency, improving engagement, and driving revenue across platforms like NVIDIA and Amazon Pay.
I lead end-to-end ML production—from data pipelines and model design to secure action execution and real-time inference—across AWS, Kubernetes, SageMaker, Triton, and GPU clusters, delivering scalable, reliable AI solutions.
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Work Experience
AI Software Engineer (Generative AI Agents and Automation) at NVIDIA
June 1, 2024 - PresentDelivered a 38% improvement in real-time agent response latency by optimizing Triton inference pipelines, TensorRT-LLM execution graphs, and GPU memory management across multi-node DGX clusters. Increased user engagement by 2.4× by deploying personalized multimodal AI agents that combine LLM reasoning, TTS/ASR (Riva), and dynamic media generation via Maxine and Omniverse. Built NVIDIA NIM-based LLM microservices with tool-use orchestration, RAG retrieval, and secure action execution, deployed as GPU-backed microservices on Kubernetes using NVIDIA GPU Operator. Implemented end-to-end RAG pipelines using Milvus/FAISS, NeMo Retriever, RedisVector, and embedding optimization to support low-latency retrieval and continuous memory for agents. Enabled real-time generative media pipelines using Maxine (super-resolution, face alignment), NVENC/NVDEC, WebRTC, and CUDA-accelerated GStreamer for sub-100ms speech-avatar streaming. Engineered backend AI services using Python (FastAPI), Go, and gRPC
AI/ML Software Engineer at Amazon
June 1, 2019 - July 1, 2023Led end-to-end design and deployment of real-time fraud detection models (XGBoost + GNN-based behavioral scoring), reducing false positives by 28% and improving authorization success rate by ~7% across UPI, wallet, and card payments. Architected a sub-50ms ML inference pipeline on AWS (SageMaker + EKS + DynamoDB feature store) that scaled to 40M+ daily transactions, directly increasing Amazon Pay revenue and lowering risk losses. Delivered AI-driven personalized payments and reward ranking systems that boosted offer CTR by 31%, improved user retention in bill-pay/recharge flows, and contributed to multi-quarter growth in Amazon Pay monetization KPIs. Built distributed streaming data pipelines using Kinesis, Kafka, Spark (EMR), Glue, and Terraform IaC — powering real-time features for fraud, KYC, risk, and bill-pay personalization. Developed and productionized ML models with PyTorch, TensorFlow, Scikit-Learn, leveraging SageMaker for training, feature extraction, hyperparameter tuning
Education
Master of Information Technology at University of North Carolina at Charlotte
January 11, 2030 - February 16, 2026Bachelor of Computer Science at Sri Venkateswara University College of Engineering
January 11, 2030 - February 16, 2026Qualifications
Industry Experience
Software & Internet
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Hire a AI Engineer
We have the best ai engineer experts on Twine. Hire a ai engineer in Santa Clara today.