AWS-focused Data Engineer with 5+ years of experience building scalable, real-time data pipelines and analytics platforms across SaaS and fintech. Skilled in PySpark, SQL, and event-driven architectures on AWS (Kinesis, Glue, EMR, Redshift). Proven track record of improving data reliability, cutting compute costs, and delivering production-grade data systems that support faster business insights.

Murtaza Ziya

AWS-focused Data Engineer with 5+ years of experience building scalable, real-time data pipelines and analytics platforms across SaaS and fintech. Skilled in PySpark, SQL, and event-driven architectures on AWS (Kinesis, Glue, EMR, Redshift). Proven track record of improving data reliability, cutting compute costs, and delivering production-grade data systems that support faster business insights.

Available to hire

AWS-focused Data Engineer with 5+ years of experience building scalable, real-time data pipelines and analytics platforms across SaaS and fintech. Skilled in PySpark, SQL, and event-driven architectures on AWS (Kinesis, Glue, EMR, Redshift). Proven track record of improving data reliability, cutting compute costs, and delivering production-grade data systems that support faster business insights.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

Analytics Engineer at AUZMOR
August 1, 2024 - November 6, 2025
Designed and deployed ETL pipelines using AWS Glue, Lambda, and S3 to transform raw transactional data into structured fact tables; improved data refresh latency from 2 hours to under 20 minutes. Built and optimized a Redshift data warehouse integrating multiple data sources via Kinesis Firehose; reduced ad-hoc query times by 40% and supported near real-time analytics. Automated data quality checks with CloudWatch and SNS alerts; cut manual QA efforts by 60% and improved reliability of daily reports. Implemented cost-optimization strategies across Redshift and S3 using compression, partition pruning, and lifecycle policies, lowering monthly storage and compute spend by roughly $75K annually.
AWS Data Engineer at INTUIT
July 31, 2024 - July 31, 2024
Engineered a real-time event streaming platform using AWS Kinesis, Lambda, and Firehose, capturing telemetry from QuickBooks services; scaled ingestion to 25M+ daily events with under 1-second lag. Refactored legacy ETL into PySpark jobs on AWS EMR, optimizing data joins and partitioning; reduced transformation runtime by 72% and saved ~$240K in annual compute cost. Implemented Lake Formation governance and access policies across S3 buckets, ensuring GDPR compliance and eliminating cross-team data duplication. Automated Athena query optimization and schema evolution via Glue Crawlers and Step Functions, cutting analyst query errors by 80% and enabling near-real-time refresh in dashboards. Partnered with data scientists to productionize SageMaker pipelines for credit-risk and churn prediction models, improving feature freshness and model accuracy by 9%.
Associate Data Engineer at VIVMA Software Inc.
August 1, 2022 - August 1, 2022
Developed serverless ingestion pipelines using AWS Lambda, API Gateway, and DynamoDB Streams to capture subscription, billing, and usage events from SaaS clients; scaled to 8M requests/day with zero downtime. Built daily PySpark aggregation jobs on EMR to compute ARR, churn, and engagement metrics; runtime dropped from 2 hrs to 25 min after partition tuning and optimized I/O. Implemented Kinesis Data Analytics for real-time trend detection in user activity, enabling proactive retention campaigns that cut voluntary churn by 11%. Set up automated CI/CD with CodeBuild and CodePipeline for ETL deployments, ensuring consistent schema versioning and rollback safety across staging and prod. Orchestrated Airflow-based workflows to sync Redshift marts with downstream BI and ML pipelines, maintaining 99.9% data-delivery SLAs across regional tenants.
Analytics Engineer at AUZMOR
August 1, 2024 - November 21, 2025
Designed and deployed ETL pipelines using AWS Glue, Lambda, and S3, transforming raw transactional data into structured fact tables; improved data refresh latency from 2 hours to under 20 minutes. Built and optimized a Redshift data warehouse integrating multiple data sources via Kinesis Firehose; reduced ad-hoc query times by 40% and supported near real-time analytics for internal dashboards. Automated data quality checks with AWS CloudWatch and SNS alerts, flagging anomalies and schema drifts across ingestion layers; cut manual QA efforts by 60% and improved reliability of daily reports. Implemented cost-optimization strategies across Redshift and S3 using compression, partition pruning, and lifecycle policies, lowering monthly storage and compute spend by roughly $75K annually.
Associate Data Engineer at VIVMA SOFTWARE INC.
August 31, 2022 - August 31, 2022
Developed serverless ingestion pipelines using AWS Lambda, API Gateway, and DynamoDB Streams to capture subscription, billing, and usage events from SaaS clients; scaled to 8M requests/day with zero downtime. Built daily PySpark aggregation jobs on EMR to compute ARR, churn, and engagement metrics for product teams; runtime dropped from 2 hrs to 25 min after partition tuning and optimized I/O. Implemented Kinesis Data Analytics for real-time trend detection in user activity, enabling proactive retention campaigns that cut voluntary churn by 11%. Set up automated CI/CD with CodeBuild and CodePipeline for ETL deployments, ensuring consistent schema versioning and rollback safety across staging and prod. Orchestrated Airflow-based workflows to sync Redshift marts with downstream BI and ML pipelines, maintaining 99.9% data-delivery SLAs across regional tenants.
Associate Data Engineer at Vivuma Software Inc.
August 31, 2022 - August 31, 2022
Developed serverless ingestion pipelines using AWS Lambda, API Gateway, and DynamoDB Streams to capture subscription, billing, and usage events from SaaS clients; scaled to 8M requests/day with zero downtime. Built daily PySpark aggregation jobs on EMR to compute ARR, churn, and engagement metrics for product teams; runtime dropped from 2 hrs to 25 min after partition tuning and optimized I/O. Implemented Kinesis Data Analytics for real-time trend detection in user activity, enabling proactive retention campaigns that cut voluntary churn by 11%. Set up automated CI/CD with CodeBuild and CodePipeline for ETL deployments, ensuring consistent schema versioning and rollback safety across staging and prod. Orchestrated Airflow-based workflows to sync Redshift marts with downstream BI and ML pipelines, maintaining 99.9% data-delivery SLAs across regional tenants.

Education

Master of Science in Data Analytics and Visualization at Stevens Institute of Technology
January 11, 2030 - May 1, 2024
Master of Science in Data Analytics and Visualization at Stevens Institute of Technology
January 11, 2030 - May 1, 2024
Master of Science in Data Analytics and Visualization at Stevens Institute of Technology
January 11, 2030 - May 1, 2024
Master of Science in Data Analytics and Visualization at Stevens Institute of Technology
January 11, 2030 - May 1, 2024

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Financial Services