Led ROI-focused cloud cost optimization for Global Supply Chain and MES teams. Rightsized EC2 and RDS Oracle using Compute Optimizer, TCS, and DBCSI, reducing monthly costs by over 60%. Automated turning off/on EC2 and RDS around shifts using Python, Lambda, and EventBridge. Secured over $1M in RDS cost savings through 3-year partial-upfront Reserved Instances. Used PySpark to compute spend per site and group assets via regex-based naming conventions.

Senior Data Engineer at Ontada/McKesson

October 1, 2023 - December 1, 2023

Built, tested, and orchestrated PySpark ETL pipelines using Databricks Workflows. Implemented data cleaning at pipeline start, logic checks post-transform, and QA at end. Invoked ML libraries inside ETL to classify sentiment of medical record attachments. Used GitHub Actions for CI/CD unit tests.

Senior Data Engineer at Signify Health

February 1, 2023 - June 1, 2023

Implemented data freshness checks before ETL execution; compared star vs snowflake schema for Redshift; archived older tables with time-based naming; troubleshot Tableau production using Performance Recorder to validate scenarios.

Senior Data Engineer at Zillow Group

December 1, 2022 - June 1, 2023

Maintained production ETL pipelines with Airflow; resolved Workday API data-source configuration issues; troubleshot Tableau production with Performance Recorder to validate scenarios.

Data Engineer at UnitedHealth Group/Optum

March 1, 2020 - May 1, 2022

Prototyped Airflow-based ETL orchestration; built Avro-encoded Kafka Producer/Consumer for ML data integration with the field-support system; developed Python data cleaning tool to detect and repair data anomalies.

Data Engineer at AT&T/DirecTV

August 1, 2019 - February 1, 2020

Built Cloudera Hadoop clusters with Yarn and PySpark on Docker and VMware; evaluated AWS/GCP/Azure vs Snowflake for data warehousing; AWS/Redshift selected for oscillatory functionality and future opportunities.

Data Engineer at Episource

August 1, 2018 - May 1, 2019

Utilized PySpark RDDs with Lambda-based processing; ran PySpark jobs on AWS EMR with logs stored in S3.

Junior Data Engineer at Hart

December 1, 2015 - April 1, 2017

Automated software testing with a CI/CD pipeline; ported a 12-page SQL Server Stored Procedure to Apache stack on AWS, reducing runtime from one week to roughly two hours.