I'm Peter Pechinin, a Senior Full-Stack & AI Engineer with 8+ years delivering production-grade AI systems, real-time voice platforms, and scalable web applications. I specialize in multi-agent orchestration, low-latency streaming architectures, and unified GraphQL gateways, with a track record spanning healthcare and SaaS. I focus on building reliable pipelines, privacy-conscious safeguards, and evaluation-driven release gates to minimize errors. I code across the stack in TypeScript and Python, design scalable data pipelines, and collaborate closely with product, clinical, and operations teams to ship AI copilots that users can trust. I value modular architectures, observable systems, and user-centric design that helps clinicians and operators work more effectively with technology.

Peter Pechinin

I'm Peter Pechinin, a Senior Full-Stack & AI Engineer with 8+ years delivering production-grade AI systems, real-time voice platforms, and scalable web applications. I specialize in multi-agent orchestration, low-latency streaming architectures, and unified GraphQL gateways, with a track record spanning healthcare and SaaS. I focus on building reliable pipelines, privacy-conscious safeguards, and evaluation-driven release gates to minimize errors. I code across the stack in TypeScript and Python, design scalable data pipelines, and collaborate closely with product, clinical, and operations teams to ship AI copilots that users can trust. I value modular architectures, observable systems, and user-centric design that helps clinicians and operators work more effectively with technology.

Available to hire

I’m Peter Pechinin, a Senior Full-Stack & AI Engineer with 8+ years delivering production-grade AI systems, real-time voice platforms, and scalable web applications. I specialize in multi-agent orchestration, low-latency streaming architectures, and unified GraphQL gateways, with a track record spanning healthcare and SaaS. I focus on building reliable pipelines, privacy-conscious safeguards, and evaluation-driven release gates to minimize errors.

I code across the stack in TypeScript and Python, design scalable data pipelines, and collaborate closely with product, clinical, and operations teams to ship AI copilots that users can trust. I value modular architectures, observable systems, and user-centric design that helps clinicians and operators work more effectively with technology.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Senior Full Stack Engineer at Monterey AI (Startup)
August 1, 2022 - July 31, 2025
Engineered a real-time AI-native dashboard and insights co-pilot UI using React, Next.js, TypeScript, TailwindCSS, React Query, Zustand, Recharts and D3.js; built a multi-channel ingestion pipeline (Python, FastAPI, Redis queues, SQS) to normalize thousands of messages from Slack, Zendesk, Intercom, and Discord with idempotency and schema enforcement; implemented Slack and Jira/Linear integrations to deliver real-time AI alerts and pre-filled tickets; architected a multi-hop RAG-based agent co-pilot using LangChain, GPT-4, and Milvus; fused BM25 with dense vector search to reduce irrelevant queries by ~35%; owned DevOps responsibilities (Docker, AWS EKS/Lambda/RDS/S3), CI/CD (GitHub Actions) and monitoring.
Software Development Engineer at Uber Technologies Inc.
December 1, 2019 - August 31, 2022
Designed and shipped real-time order tracking and live map visualization features for the Uber Eats Restaurant Dashboard and customer web/app using React/TypeScript, React Native, WebSockets, and Mapbox; delivered sub-second status updates via Kafka; developed replay tools for autonomy engineers to visualize sensor data and decision timelines, shortening debug cycles by 60%; architected a high-throughput real-time tracking pipeline ingesting millions of GPS/location events per minute and enabling AI features like dynamic ETA; built Terraform modules and IaC for provisioning multi-region PostgreSQL read replicas, Redis clusters, and Kafka brokers, reducing manual ops by 90% and improving disaster-recovery readiness.
Junior Software Developer at LivaNova
August 1, 2017 - October 31, 2019
Built data processing modules with JavaScript/Node.js for internal reporting; integrated device logs to support R&D for heart monitoring products; implemented supervised learning in Python for anomaly detection, reducing false positives by 30% and aiding compliance in medical software development; built internal tools using Vue.js and Python backend, facilitating data entry for trials and increasing cross-functional collaboration.
Senior Full Stack Engineer at Monterey AI
August 1, 2022 - July 1, 2025
Engineered a real-time AI-native dashboard and insights co-pilot UI using React, Next.js, TypeScript, TailwindCSS, React Query, Zustand, Recharts and D3.js; enhanced UX with streaming AI responses, state management optimization, and evidence-linked insight displays. Built a resilient multi-channel ingestion pipeline using Python, FastAPI, Redis queues, SQS, and Airflow-like scheduled jobs to normalize thousands of Slack, Zendesk, Intercom, and Discord messages per day with strong idempotency, deduplication, and schema enforcement. Implemented Slack and Jira/Linear integrations in Node.js/TypeScript, using webhooks and bot APIs to deliver real-time AI alerts on trending issues and to auto-generate pre-filled tickets directly from user feedback clusters. Architected and productionized a multi-hop RAG-based Agentic AI Copilot using LangChain, LangGraph, OpenAI GPT-4, and Milvus; enabled users to query 50k+ customer feedback records with natural language, powering evidence-backed insights
Senior AI Backend Engineer at Simform
October 1, 2024 - January 1, 2026
Designed and implemented production-grade LangGraph-based multi-agent orchestration using Google ADK to route clinical intents (triage, coding, note drafting) through a planner-retriever-critic loop, reducing ungrounded hallucinations by managing state across parallel agents. Built a low-latency voice session architecture with LiveKit as the WebRTC SFU, integrating streaming Speech-to-Text for real-time transcript chunking and bi-directional audio processing, enabling clinician-facing suggestions within 800ms p95 from speech end. Implemented a RAG pipeline using Pgvector with Vertex AI Search + FHIR R4—hybrid search, clinical chunking, sub-300ms retrieval latency, offline hit-rate/MRR evaluation, and citation guardrails before model generation. Standardized EHR writebacks and knowledge-base queries through Model Context Protocol (MCP) servers, replacing brittle point-to-point integrations with a reusable, auditable tool-calling layer. Enabled cross-framework agent collaboration using
Senior Software Engineer at Monterey AI
August 1, 2022 - September 1, 2025
Led development of an analyst-facing copilot UI in React/Next.js to visualize cited evidence chunks and agent reasoning traces, with interactive filtering by date, source, and sentiment. Built real-time ETL pipelines (Python/FastAPI, Redis, SQS, Airflow, Node.js) to ingest 1,500+ sources, implemented semantic chunking with rich metadata for citation lineage, and delivered a scalable GraphQL federation gateway (NestJS + Apollo Federation) to unify data across orchestration, retrieval, and tenant systems. Implemented enterprise SSO (Okta) and identity mapping to enforce secure, org-scoped access. Instrumented LangSmith tracing across multi-agent lifecycles and deployed an LLM-as-a-judge pipeline (GPT-4) with custom rubrics to auto-score production responses, reducing manual review by 85% and lowering hallucinated citations by 30%. Architected a multi-hop RAG-based Agentic AI Copilot using LangChain/LangGraph/OpenAI/GPT-4 and Milvus, enabling access to 50k+ customer feedback records with
Senior Software Engineer at Monterey AI (Startup)
August 1, 2022 - September 30, 2025
Built an analyst-facing copilot UI in React/Next.js and real-time ETL pipelines to ingest unstructured feedback from 1,500+ sources; developed semantic chunking with rich metadata to preserve citation lineage. Created a scalable GraphQL federation gateway unifying data from the agent orchestration layer, vector retrieval service, and tenant management; implemented directive-based authorization and resolver-level caching, reducing over-fetching and lowering complex query latency. Integrated enterprise SSO (Okta) and OAuth/OpenID Connect, propagating identity securely across a multi-tenant SaaS platform. Instrumented LangSmith tracing and an LLM-as-a-judge pipeline (GPT-4) with custom rubrics to auto-score production responses, cutting manual review by 85% and reducing hallucinated citations by 30%.

Education

Master of Science (MSc) in Machine Learning at University College London (UCL)
October 1, 2017 - September 1, 2018
Bachelor of Science (BSc) in Computer Science at University College London (UCL)
September 1, 2013 - June 1, 2016
Master of Science (MSc) in Machine Learning at University College London (UCL)
October 1, 2017 - September 30, 2018
Bachelor of Science (BSc) in Computer Science at University College London (UCL)
September 1, 2013 - June 30, 2016
Master of Science (MSc) in Machine Learning at University College London (UCL)
October 1, 2017 - September 30, 2018
Bachelor of Science (BSc) in Computer Science at University College London (UCL)
September 1, 2013 - June 30, 2016
MSc in Machine Learning at University College London (UCL)
October 1, 2017 - September 30, 2018
BSc in Computer Science at University College London (UCL)
September 1, 2013 - June 30, 2016

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Transportation & Logistics, Healthcare, Professional Services, Other, Life Sciences, Media & Entertainment
    paper Multi-Agent Insight Copilot with Hybrid RAG

    Designed and built a production-grade, multi-agent LLM copilot that answers complex product questions by grounding responses in millions of customer feedback entries using a hybrid retrieval pipeline.

    Implemented a retrieval strategy combining pgvector semantic search, keyword search, and metadata filters, followed by a cross-encoder reranker to deliver the most relevant evidence to GPT-4/Claude for final response composition.

    Securely integrated live tool access (Jira, Slack, GitHub) via an MCP server, allowing the agent to perform actions and fetch real-time context beyond static documents.

    Ensured answer trust and traceability by building a citation grounding layer and using LangSmith for full agent observability, debugging, and performance evaluation.

    paper OrthoAssist – Voice-First Clinical Copilot for Orthopedic Urgent Care

    At Simform, I led the backend development of OrthoAssist, a real‑time, voice‑first AI copilot designed for orthopedic urgent care clinics. The system listens to clinician‑patient conversations, drafts structured SOAP notes and orders in real time, suggests follow‑up questions, and retrieves relevant patient history from the EHR—all while maintaining strict HIPAA and PHI compliance.

    This solved two critical problems for the clinic network: documentation overhead (clinicians spending hours after each shift on manual charting) and AI adoption skepticism (fears around hallucinations, workflow disruption, and data privacy). By building a retrieval‑first, clinician‑in‑the‑loop system, we reduced note completion time by [X%—insert your real number if available] and achieved zero critical PHI violations across pilot clinics.

    Key technical challenges I owned included: orchestrating a multi‑agent runtime with real‑time streaming transcription, grounding all model outputs in FHIR‑based patient data to eliminate hallucination risk, standardizing EHR and imaging tool calls via the Model Context Protocol (MCP), and building an evaluation harness with regression gates to ensure clinical safety before every deployment.