Available to hire
Principal AI engineer with 10+ years of experience architecting and scaling production-grade Generative AI, Multi-Modal
AI, and LLM systems across regulated industries. Specialized in advanced RAG architectures (850K+ document corpora),
agentic workflows, and real-time voice pipelines achieving sub-500ms latency at scale. End-to-end LLMOps expertise
across AWS, Azure, and GCP, with a track record of reducing inference costs by 60% while maintaining 99.7% system
reliability. Led cross-functional teams delivering AI products serving millions of users in healthcare and financial services.
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Work Experience
Principal AI Engineer at PolyAI
January 1, 2023 - November 1, 2025Led the design and deployment of a HIPAA-compliant, multi-agent Voice AI assistant for clinical intake and triage, utilizing LangGraph for reasoning orchestration with LiveKit for real-time voice interactions, achieving sub-500ms end-to-end latency. Architected the voice pipeline integrating Deepgram STT with streaming ASR, Claude for dialogue management, and ElevenLabs TTS, implementing semantic turn detection and interruption handling that reduced user-perceived latency by 40%. Engineered VAD tuning and barge-in handling systems with WebRTC transport and Twilio SIP integration for telephony deployment. Scaled a Hybrid Retrieval-Augmented Generation (RAG) system using pgvector (PostgreSQL) with HNSW indexing, BM25, semantic retrieval, and a BGE cross-encoder reranker, reducing non-grounded responses by 38% and boosting retrieval precision by 45%. Implemented two-level semantic caching and domain-specific chunking for clinical documentation, established LLM evaluation framework (RAGAS)
Senior AI Engineer / Founding Engineer at Harvey AI
March 1, 2021 - December 31, 2024Designed stateful, graph-based autonomous agent workflows using LangGraph with tool use, multi-step planning, and persistent memory for complex financial document analysis and automated report generation. Built conditional routing and parallel execution branches for multi-agent coordination, implementing fallback paths for tool failures and uncertainty detection that reduced agent hallucination in ambiguous scenarios by 52%. Integrated LangSmith for agent tracing and debugging, enabling audits of decision paths. Developed and maintained Model Context Protocol servers exposing CRM, document management, and internal databases to AI agents with secure role-based authorization. Implemented two-level tool interfaces following granularity principles for dynamic tool discovery. Built OCR pipeline via AWS Textract with Tesseract/PaddleOCR fallback and normalizing output to JSON with 98.5% confidence. Implemented A2A communication patterns for orchestrating sub-agents with structured I/O valida
Senior Machine Learning Engineer at Cohere
June 1, 2018 - February 1, 2021Designed distributed fine-tuning pipelines for transformer models (BERT, GPT-2, T5) using adapter-based methods and transfer learning on AWS SageMaker, accelerating convergence by 42% and reducing domain-specific error rates by 35%. Built enterprise-scale embedding pipelines with vector indexing (FAISS and Elasticsearch), implementing query transformation and metadata filtering that improved retrieval accuracy for technical documentation by 55%; migrated to Pinecone in 2021. Developed automated prompt engineering and versioning framework with integrated A/B testing for data-driven prompt optimization. Architected MLOps infrastructure using Docker, Kubernetes, and Terraform with GPU autoscaling and safe model rollouts; operated AWS-based ML stack (EC2, S3, EKS/ECS, Lambda, CloudWatch, SageMaker) with elastic autoscaling. Established model lifecycle management (versioning, drift detection, automated retraining) using MLflow and Weights & Biases. Collaborated with product, data engineers,
Machine Learning Engineer at Google
August 1, 2015 - May 1, 2018Developed NLP models for text classification, named entity recognition, and sentiment analysis using BiLSTM, CNN-based architectures, and ELMo with TensorFlow; early PyTorch adoption. Built scalable data pipelines with Apache Kafka and Apache Spark, processing 10M+ events daily for recommendation systems. Employed model optimization techniques (quantization, pruning, knowledge distillation) to reduce inference latency by 65% while preserving ~98% accuracy. Designed RESTful APIs using Flask and integrated with PostgreSQL and Redis for efficient data retrieval and caching.
Education
Master of Science in Computer Science at Stanford University
August 1, 2013 - June 1, 2015Bachelor of Science in Computer Science at University of Belgrade
September 1, 2009 - June 1, 2013Qualifications
Industry Experience
Healthcare, Financial Services, Software & Internet, Professional Services
Skills
Experience Level
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Hire a AI Engineer
We have the best ai engineer experts on Twine. Hire a ai engineer in Stockholm today.