I'm a Senior Software Engineer focusing on ML and data, passionate about turning real-world problems into dependable systems that people actually enjoy using. I work across ranking and vision models, classical ML, and fraud/risk systems, and I thrive on turning complex data signals into reliable, user-friendly solutions. I started on the data side building pipelines and signals, then grew to own end-to-end ML workflows from training to deployment. I value ownership, fast iteration, clear communication, and creating an environment where teammates do their best work.

Eric Xiao

I'm a Senior Software Engineer focusing on ML and data, passionate about turning real-world problems into dependable systems that people actually enjoy using. I work across ranking and vision models, classical ML, and fraud/risk systems, and I thrive on turning complex data signals into reliable, user-friendly solutions. I started on the data side building pipelines and signals, then grew to own end-to-end ML workflows from training to deployment. I value ownership, fast iteration, clear communication, and creating an environment where teammates do their best work.

Available to hire

I’m a Senior Software Engineer focusing on ML and data, passionate about turning real-world problems into dependable systems that people actually enjoy using. I work across ranking and vision models, classical ML, and fraud/risk systems, and I thrive on turning complex data signals into reliable, user-friendly solutions.

I started on the data side building pipelines and signals, then grew to own end-to-end ML workflows from training to deployment. I value ownership, fast iteration, clear communication, and creating an environment where teammates do their best work.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Software Engineer – ML at Google
May 1, 2022 - November 12, 2025
Contributed to a cross-functional team building a clinician-facing AI agent that delivers real-time, patient-scoped search and summarization across EHR data. Owned the retrieval and summarization pipeline, combining FHIR-based data search with Gemini-powered language models for contextual clinical insights. Designed and maintained data ingestion and normalization flows using Cloud Healthcare API and Healthcare Data Engine patterns to unify FHIR, HL7v2, and DICOM records. Integrated Vertex AI Search for Healthcare to support semantic search across structured and unstructured data, adding grounding for explainability and traceability. Deployed the system using Google’s ADK and Vertex AI Agent Builder, focusing on orchestration and observability. Partnered with security/compliance teams to ensure HIPAA-aligned access controls and full audit logging. Collaborated with the Document AI team to parse scanned clinical forms into structured FHIR Observations, expanding data coverage.
Software Engineer – ML at Google
May 1, 2022 - November 12, 2025
Worked on Ads Safety initiatives, building multimodal safety models using CLIP-like encoders and transformer text encoders in PyTorch/TensorFlow to detect risky ad content with limited labeled data. Implemented retrieval-Augmented Generation (RAG) pipelines where LLMs reviewed ads with context retrieved from ANN and vector search over image-text embeddings. Maintained large ingestion and feature pipelines for ad text, creatives, landing-page content, OCR outputs, and metadata. Retrained gradient-boosted and neural models for spam, misleading claims, and sensitive categories; integrated them with enforcement services via stable APIs. Added clustering pipelines to group similar ads for representative LLM review, designed LLM-driven reviewer workflows coordinating retrieval, reasoning, and ticketing APIs, and deployed CI/CD with canaries and automated rollbacks. Built dashboards to monitor precision/recall, regional false positives, token use, latency, and drift.
Data Scientist at Antenna
April 1, 2022 - April 1, 2022
Built core subscription-metrics pipelines on Databricks (Spark, Delta) and AWS S3 used by major streaming clients to track sign-ups, churn, and market share. Modeled KPIs like adds, cancels, churn, survival curves, and plan/distributor mix as dbt/SQL models with tests. Designed bronze/silver/gold layers for receipts, bank/app-store feeds, and distributor reports. Developed daily pipelines using Airflow and Databricks jobs with SLAs, retries, and alerting for data freshness and compute issues. Published feature tables for churn and LTV models, coordinating schemas and refresh cadences with data science teams. Optimized heavy dbt/Spark workloads via partitioning, clustering, caching, and autoscaling to reduce runtime and cost as data volumes grew.
Machine Learning Engineer at SimpleBet
January 1, 2020 - January 1, 2020
Built real-time microbet models for next-play markets across major US sports. Processed large play-by-play datasets using Spark on Databricks/AWS to produce reusable feature tables. Trained gradient-boosting, logistic regression, and selective deep models in Python/scikit-learn/PyTorch and deployed them on Docker/Kubernetes. Integrated model services with Kafka/RabbitMQ feeds to update odds under strict latency budgets and publish results to sportsbooks. Developed anomaly-detection and rule systems for suspicious betting patterns and surfaced alerts to integrity dashboards. Added safeguards such as smoothed odds updates, probability guardrails, and automatic kill switches when model metrics drifted.
Data Scientist at ClimaCell
January 1, 2019 - January 1, 2019
Built models predicting near-term flight delay/cancellation risk using hyper-local radar, cellular data, IoT, and schedule data. Developed an AWS S3 + Spark/Databricks pipeline for ingesting sensor and schedule feeds. Engineered time-series and route-level features and trained GBM/XGBoost models for airport-level risk scores. Added monitoring for data freshness and feature distributions per airport to detect stalled or shifted feeds.
Software Engineer at IBM Watson
July 1, 2018 - July 1, 2018
Built and deployed a Watson NLU–based ticket/document classifier as a Java/Python microservice on IBM Cloud, routing support tickets automatically to triage teams, reducing manual triage and speeding up response times.
Intern at Charles River Analytics
August 1, 2016 - August 1, 2016
Helped implement a DARPA prototype that detects objects in aerial imagery using early deep-learning computer-vision models in Python.
Intern at The Credit Junction
August 1, 2015 - August 1, 2015
Wrote Python and SQL ETL scripts to load financial statements and bank transactions into a Postgres data warehouse for internal credit risk analysis.
Data Scientist at Antenna
February 1, 2020 - April 1, 2022
Built core subscription-metrics pipelines on Databricks (Spark, Delta) and AWS S3 used by major streaming clients to track sign-ups, churn, and market share. Modeled KPIs like adds, cancels, churn, survival curves, and plan/distributor mix as dbt/SQL models with tests for counts, duplicates, and business rules. Designed bronze/silver/gold data layers for receipts, feeds, and distributor reports. Developed daily pipelines with Airflow and Databricks jobs (SLAs, retries, alerting); published feature tables for churn and LTV, coordinating schemas and refresh cadences with data science teams. Optimized heavy dbt/Spark workloads via partitioning, clustering, caching, and autoscaling. Built experiment logic for exposure vs control across geography, platform, and campaigns; joined subscription events with title-level viewership data to identify drivers of acquisition and retention; added guardrails such as minimum sample sizes and configuration validation. Partnered with insights and client t
Machine Learning Engineer at SimpleBet
January 1, 2019 - January 1, 2020
Built real-time microbet models for next-play markets across major US sports. Processed large play-by-play datasets using Spark on Databricks/AWS to produce reusable feature tables. Trained gradient-boosting, logistic regression, and selective deep models in Python (scikit-learn/PyTorch) and deployed services on Docker/Kubernetes. Integrated model services with Kafka/RabbitMQ to update odds under strict latency budgets and publish results to sportsbooks. Developed anomaly-detection and rule systems for suspicious betting patterns with surfaced alerts on integrity dashboards. Implemented safeguards such as smoothed odds updates, probability guardrails, and automatic kill switches when model metrics drifted.
Data Scientist at ClimaCell
August 1, 2018 - January 1, 2019
Built models predicting near-term flight delay/cancellation risk using hyper-local radar, cell-tower, IoT, and schedule data. Developed an AWS S3 + Spark/Databricks pipeline for ingesting sensor and schedule feeds. Engineered time-series and route-level features and trained GBM/XGBoost models for airport-level risk scores. Added monitoring for data freshness and feature distributions per airport to detect stalled or shifted feeds.
Software Engineer at IBM Watson
July 1, 2017 - July 1, 2018
Built and deployed a Watson NLU-based ticket/document classifier as a Java/Python microservice on IBM Cloud, routing support tickets automatically to triage, reducing manual handling and improving processing speed.

Education

B.S. Math & Computer Science at Brown University
January 11, 2030 - January 1, 2017
B.A. Applied Math at Brown University
January 11, 2030 - January 1, 2017
B.S. Math & Computer Science at Brown University
January 11, 2030 - January 1, 2017
B.A. Applied Math at Brown University
January 11, 2030 - January 1, 2017

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Healthcare, Financial Services, Media & Entertainment, Professional Services, Education