I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

JA -YUAN PENDLEY

I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

Available to hire

I’m a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance.

I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

Lead Big Data Engineer (AWS) at Pfizer
August 1, 2023 - Present
Delivered an end-to-end clinical-trial data platform on AWS, leveraging Databricks on AWS, S3, Glue, and Redshift/Athena to enable advanced analytics, regulatory reporting, and near real-time data availability. Implemented Delta Lake with Bronze/Silver/Gold, streaming ingestion via MSK/Kinesis, governance with Lake Formation and Glue Data Catalog, and CI/CD with CodePipeline/CodeBuild. Established PySpark validation with Great Expectations, end-to-end observability with CloudWatch, and optimized compute for cost and performance. Ensured regulatory traceability and audit readiness with 21 CFR Part 11 compliant change management.
Senior AWS Data Engineer at Credit Suisse
September 1, 2021 - July 31, 2023
Delivered end-to-end market-risk and compliance data platform on AWS using Glue, S3, and Redshift; integrated Kafka streams, Hive governance, and Python automation to improve data freshness, consistency, and audit readiness. Optimized ETL orchestration and governance with Lake Formation, and implemented Lambda/Step Functions for data quality checks and exception handling. Deployed Terraform modules for reproducible infrastructure.
Senior Data Engineer (GCP) at Home Depot
March 1, 2020 - August 31, 2021
Delivered a real-time retail data platform on Google Cloud Platform to process high-volume POS, e-commerce, and inventory events with low latency. Built a governed lakehouse on GCS + Databricks, centralized schema management, and automated Python-based data validation. Improved ingestion stability with Pub/Sub and Kafka, and orchestrated pipelines with Cloud Composer and Databricks Workflows; developed Looker/BigQuery reports and dashboards.
Senior Cloud Data Engineer (AWS) at Adidas
November 1, 2018 - February 29, 2020
Delivered a scalable AWS-based retail data platform to process POS, e-commerce, and inventory data with high reliability. Implemented governed S3 lake zones, Glue Catalogs, and EMR Spark/Scala transformations; automated end-to-end ETL with Lambda and Step Functions; cost optimization via Parquet storage and partition pruning; created dashboards for merchandising and supply-chain teams.
Senior Data Engineer (GCP) at Rivian Automotive
March 1, 2017 - October 31, 2018
Built a scalable GCP-based telemetry data platform to ingest, process, and analyze real-time vehicle telemetry, charging signals, and diagnostics. Implemented low-latency ETL/ELT pipelines with Dataflow and Dataproc, Pub/Sub-based streaming ingestion, BigQuery data modeling, and Looker visualizations; established governance and cost-optimized autoscaling.
Hadoop Data Engineer at Viddy
January 1, 2015 - February 28, 2017
Built a large-scale Hadoop analytics platform to process ad performance, content insights, and user engagement data. Developed Hive, Pig, and MapReduce pipelines; orchestrated with Airflow/Oozie; strengthened data quality and metadata governance; enabled self-service reporting for BI and product teams.
ETL Developer at The Cigna Group
January 1, 2014 - December 31, 2014
Built enterprise ETL workflows to migrate, validate, and process large volumes of claims and policy data for regulatory and operational reporting. Developed PL/SQL-driven transformations, automated data quality controls, and delivered standardized reporting dashboards; optimized SQL workflows and provided nightly production support.
Senior Data Engineer (AWS) at Adidas
November 1, 2018 - February 1, 2020
Delivered a scalable AWS-based retail data platform with governed S3 lake zones, Glue Catalogs, and EMR Spark/Scala transformations to optimize processing speed, cost, and data quality. Automated end-to-end workflows with Lambda, Step Functions, and CloudWatch; implemented Parquet-based storage optimizations reducing cost by ~22%; delivered daily merchandising dashboards via Athena and QuickSight.
Senior Cloud Data Engineer (GCP) at Rivian Automotive
March 1, 2017 - October 1, 2018
Built a scalable GCP-based telemetry data platform to ingest, process, and analyze real-time vehicle telemetry, charging signals, and diagnostics. Designed low-latency ETL/ELT pipelines with Dataflow and Dataproc (Spark), implemented Pub/Sub streaming ingestion, and established robust governance with IAM and CMEK. Created Looker dashboards and optimized autoscaling across Dataflow/Dataproc for cost efficiency.

Education

Bachelor’s in Technology - Software Development at New York Institute of Technology
January 11, 2030 - January 13, 2026
Master of Science in Data Science at New York Institute of Technology
January 11, 2030 - January 13, 2026
Bachelor’s in Technology - Software Development at New York Institute of Technology
January 11, 2030 - January 13, 2026
Master of Science in Data Science at New York Institute of Technology
January 11, 2030 - January 13, 2026

Qualifications

Azure Data Engineer Associate
January 11, 2030 - January 13, 2026
AWS Certified Data Analytics – Specialty
January 11, 2030 - January 13, 2026
Google Cloud Professional Data Engineer
January 11, 2030 - January 13, 2026
Microsoft Certified: Azure Data Engineer Associate
January 11, 2030 - January 13, 2026
AWS Certified Data Analytics – Specialty
January 11, 2030 - January 13, 2026
Google Cloud Professional Data Engineer
January 11, 2030 - January 13, 2026

Industry Experience

Healthcare, Financial Services, Retail, Transportation & Logistics, Manufacturing, Media & Entertainment, Software & Internet, Professional Services