Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.…I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

JA -YUAN PENDLEY

Data Scientist, Full Stack Developer, Cloud Developer, +2





I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.…I'm a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance. I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

Available to hire

I’m a data & cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. I specialize in building high-performance pipelines using Kafka, Hive, Spark, PySpark, Python, Databricks, and Airflow, and I align architecture with business strategy to improve reliability and governance.

I lead cross-functional teams to deliver secure, compliant, and scalable data ecosystems, embedding ML inference, implementing data contracts, and driving observability and cost optimization while supporting regulatory requirements.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Work Experience

Lead Big Data Engineer (AWS) at Pfizer

August 1, 2023 - Present

Delivered an end-to-end clinical-trial data platform on AWS, leveraging Databricks on AWS, S3, Glue, and Redshift/Athena to enable advanced analytics, regulatory reporting, and near real-time data availability. Implemented Delta Lake with Bronze/Silver/Gold, streaming ingestion via MSK/Kinesis, governance with Lake Formation and Glue Data Catalog, and CI/CD with CodePipeline/CodeBuild. Established PySpark validation with Great Expectations, end-to-end observability with CloudWatch, and optimized compute for cost and performance. Ensured regulatory traceability and audit readiness with 21 CFR Part 11 compliant change management.

Senior AWS Data Engineer at Credit Suisse

September 1, 2021 - July 31, 2023

Delivered end-to-end market-risk and compliance data platform on AWS using Glue, S3, and Redshift; integrated Kafka streams, Hive governance, and Python automation to improve data freshness, consistency, and audit readiness. Optimized ETL orchestration and governance with Lake Formation, and implemented Lambda/Step Functions for data quality checks and exception handling. Deployed Terraform modules for reproducible infrastructure.

Senior Data Engineer (GCP) at Home Depot

March 1, 2020 - August 31, 2021

Delivered a real-time retail data platform on Google Cloud Platform to process high-volume POS, e-commerce, and inventory events with low latency. Built a governed lakehouse on GCS + Databricks, centralized schema management, and automated Python-based data validation. Improved ingestion stability with Pub/Sub and Kafka, and orchestrated pipelines with Cloud Composer and Databricks Workflows; developed Looker/BigQuery reports and dashboards.

Senior Cloud Data Engineer (AWS) at Adidas

November 1, 2018 - February 29, 2020

Delivered a scalable AWS-based retail data platform to process POS, e-commerce, and inventory data with high reliability. Implemented governed S3 lake zones, Glue Catalogs, and EMR Spark/Scala transformations; automated end-to-end ETL with Lambda and Step Functions; cost optimization via Parquet storage and partition pruning; created dashboards for merchandising and supply-chain teams.

Senior Data Engineer (GCP) at Rivian Automotive

March 1, 2017 - October 31, 2018

Built a scalable GCP-based telemetry data platform to ingest, process, and analyze real-time vehicle telemetry, charging signals, and diagnostics. Implemented low-latency ETL/ELT pipelines with Dataflow and Dataproc, Pub/Sub-based streaming ingestion, BigQuery data modeling, and Looker visualizations; established governance and cost-optimized autoscaling.

Hadoop Data Engineer at Viddy

January 1, 2015 - February 28, 2017

Built a large-scale Hadoop analytics platform to process ad performance, content insights, and user engagement data. Developed Hive, Pig, and MapReduce pipelines; orchestrated with Airflow/Oozie; strengthened data quality and metadata governance; enabled self-service reporting for BI and product teams.

ETL Developer at The Cigna Group

January 1, 2014 - December 31, 2014

Built enterprise ETL workflows to migrate, validate, and process large volumes of claims and policy data for regulatory and operational reporting. Developed PL/SQL-driven transformations, automated data quality controls, and delivered standardized reporting dashboards; optimized SQL workflows and provided nightly production support.

Senior Data Engineer (AWS) at Adidas

November 1, 2018 - February 1, 2020

Delivered a scalable AWS-based retail data platform with governed S3 lake zones, Glue Catalogs, and EMR Spark/Scala transformations to optimize processing speed, cost, and data quality. Automated end-to-end workflows with Lambda, Step Functions, and CloudWatch; implemented Parquet-based storage optimizations reducing cost by ~22%; delivered daily merchandising dashboards via Athena and QuickSight.

Senior Cloud Data Engineer (GCP) at Rivian Automotive

March 1, 2017 - October 1, 2018

Built a scalable GCP-based telemetry data platform to ingest, process, and analyze real-time vehicle telemetry, charging signals, and diagnostics. Designed low-latency ETL/ELT pipelines with Dataflow and Dataproc (Spark), implemented Pub/Sub streaming ingestion, and established robust governance with IAM and CMEK. Created Looker dashboards and optimized autoscaling across Dataflow/Dataproc for cost efficiency.