Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora), agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.…Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora), agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.

Lazar Prokovic

AI Engineer, Data Scientist, Architect, +2





Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora), agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.…Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora), agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.

Available to hire

Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal
AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora),
agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise
across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system
reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Work Experience

Principal AI Engineer at PolyAI

January 1, 2023 - November 1, 2025

Led the design and deployment of a HIPAA-compliant, multi-agent Voice AI assistant for clinical intake and triage, utilizing LangGraph for reasoning orchestration with LiveKit for real-time voice interactions, achieving sub-500ms end-to-end latency. Architected the voice pipeline integrating Deepgram STT with streaming ASR, Claude for dialogue management, and ElevenLabs TTS, implementing semantic turn detection and interruption handling that reduced user-perceived latency by 40%. Engineered VAD tuning and barge-in handling systems with WebRTC transport and Twilio SIP integration for telephony deployment. Scaled a Hybrid Retrieval-Augmented Generation (RAG) system using pgvector (PostgreSQL) with HNSW indexing, BM25, semantic retrieval, and a BGE cross-encoder reranker, reducing non-grounded responses by 38% and boosting retrieval precision by 45%. Implemented two-level semantic caching and domain-specific chunking for clinical documentation, established LLM evaluation framework (RAGAS)

Senior AI Engineer / Founding Engineer at Harvey AI

March 1, 2021 - December 31, 2024

Designed stateful, graph-based autonomous agent workflows using LangGraph with tool use, multi-step planning, and persistent memory for complex financial document analysis and automated report generation. Built conditional routing and parallel execution branches for multi-agent coordination, implementing fallback paths for tool failures and uncertainty detection that reduced agent hallucination in ambiguous scenarios by 52%. Integrated LangSmith for agent tracing and debugging, enabling audits of decision paths. Developed and maintained Model Context Protocol servers exposing CRM, document management, and internal databases to AI agents with secure role-based authorization. Implemented two-level tool interfaces following granularity principles for dynamic tool discovery. Built OCR pipeline via AWS Textract with Tesseract/PaddleOCR fallback and normalizing output to JSON with 98.5% confidence. Implemented A2A communication patterns for orchestrating sub-agents with structured I/O valida

Senior Machine Learning Engineer at Cohere

June 1, 2018 - February 1, 2021

Designed distributed fine-tuning pipelines for transformer models (BERT, GPT-2, T5) using adapter-based methods and transfer learning on AWS SageMaker, accelerating convergence by 42% and reducing domain-specific error rates by 35%. Built enterprise-scale embedding pipelines with vector indexing (FAISS and Elasticsearch), implementing query transformation and metadata filtering that improved retrieval accuracy for technical documentation by 55%; migrated to Pinecone in 2021. Developed automated prompt engineering and versioning framework with integrated A/B testing for data-driven prompt optimization. Architected MLOps infrastructure using Docker, Kubernetes, and Terraform with GPU autoscaling and safe model rollouts; operated AWS-based ML stack (EC2, S3, EKS/ECS, Lambda, CloudWatch, SageMaker) with elastic autoscaling. Established model lifecycle management (versioning, drift detection, automated retraining) using MLflow and Weights & Biases. Collaborated with product, data engineers,

Machine Learning Engineer at Google

August 1, 2015 - May 1, 2018

Developed NLP models for text classification, named entity recognition, and sentiment analysis using BiLSTM, CNN-based architectures, and ELMo with TensorFlow; early PyTorch adoption. Built scalable data pipelines with Apache Kafka and Apache Spark, processing 10M+ events daily for recommendation systems. Employed model optimization techniques (quantization, pruning, knowledge distillation) to reduce inference latency by 65% while preserving ~98% accuracy. Designed RESTful APIs using Flask and integrated with PostgreSQL and Redis for efficient data retrieval and caching.