Architected batch and real-time data pipelines using Python, PySpark, Kafka, and AWS Kinesis for Chewy’s pet health care ecosystem (Vet Care clinics, pharmacy, telehealth), reducing data latency by 60% and enabling near real-time analytics. Built a lakehouse architecture using Apache Iceberg on S3, improving query performance and reducing data processing costs by 30%+ across health care and e-commerce data sets. Delivered feature-ready datasets for ML models and collaborated with cross-functional teams to drive production-ready data solutions.

Senior Data Engineer at Pinterest

October 1, 2020 - March 1, 2023

Designed scalable distributed ETL pipelines using Apache Spark, Python, Hive, and Airflow on AWS S3, processing 5+ TB of Ads and user engagement data daily, supporting Ads performance analytics and user insights dashboards. Led the Ads and user engagement analytics platform, including data ingestion, transformation, and aggregation pipelines, enabling real-time reporting and ML feature generation for personalized recommendations and ad targeting systems. Built monitoring, alerting, and orchestration frameworks across pipelines to ensure 99%+ data availability, reducing failures and improving SLA adherence for critical business KPIs. Collaborated with product, analytics, and ML teams to define data models and tracking schemas using Hive Metastore / AWS Glue Catalog, supporting experimentation frameworks (A/B testing) and campaign performance analysis. Delivered analytics datasets and workflows for conversion funnels, campaign tracking, and user engagement in insights using Spark SQL and

Data Engineer at Lyft

February 1, 2017 - October 1, 2020

Developed and maintained scalable ETL pipelines using Apache Spark (batch & structured streaming), Python, Java, and Apache Airflow on AWS (S3, Glue), processing 1+ TB/day of ride, ads, and user engagement data to support marketplace analytics. Built data pipelines leveraging Delta Lake / Apache Iceberg on S3 to enable efficient data storage, schema evolution, and reliable downstream consumption for analytics and reporting. Implemented near real-time ingestion and processing using Apache Spark and Kafka/Kinesis, supporting low-latency dashboards and ML-driven features such as personalization and ad targeting. Built monitoring and alerting systems using Airflow, CloudWatch, and DataDog, improving pipeline reliability and achieving 99%+ data availability for business-critical metrics. Collaborated with data science and product teams to design data models and tracking schemas using Hive Metastore / AWS Glue Catalog, supporting experimentation frameworks (A/B testing) and campaign performa