I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

Supriya Bodapati

I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

Available to hire

I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality.

I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert

Work Experience

Senior Data Platform Engineer at PayPal
June 1, 2024 - Present
Architected scalable batch and near real-time data pipelines using Azure Databricks, PySpark, and Delta Lake, delivering analytics-ready datasets that supported fraud, risk, and operational reporting across multiple product domains and high-volume transaction flows. Designed standardized ELT ingestion frameworks with Azure Data Factory and ADLS Gen2, consolidating data from transactional systems and event-driven sources while reducing downstream reconciliation effort and manual data corrections for analytics teams by ~30%. Implemented a Lakehouse architecture (Bronze-Silver-Gold) on Databricks to separate raw, refined, and curated datasets, strengthening data quality controls and accelerating SQL-based analytics consumption for BI and reporting users by ~40%. Developed Spark Structured Streaming pipelines to process high-volume event data, supporting near real-time monitoring use cases and reducing end-to-end data availability latency from several minutes to under 60 seconds across cri
Cloud Data Engineer at Microsoft
November 1, 2023 - June 1, 2024
Built cloud-native batch data pipelines using Azure Data Factory, Databricks, and ADLS Gen2, migrating and processing 40+ TB of enterprise data to support analytics, finance, and operational reporting across multiple internal business units. Implemented Delta Lake-based data models on Azure Databricks, improving Power BI query performance by ~55% through optimized storage layouts, caching strategies, query pruning, and efficient data access patterns. Designed Spark-based transformation pipelines to convert raw operational data into curated analytical datasets, enabling consistent reporting and downstream consumption by finance, operations, and enterprise analytics teams. Automated workflow orchestration and dependency handling using Apache Airflow, reducing manual intervention and lowering recurring pipeline failure rates by ~30% across scheduled batch and incremental data workloads. Applied advanced SQL tuning techniques, including partitioning and clustering strategies, to ensure pre
Data Engineer at Amazon
August 1, 2020 - November 1, 2022
Engineered scalable ETL pipelines using AWS Glue, Amazon S3, and Redshift, processing 10+ TB/day of structured and semi-structured data to support operational analytics, enterprise reporting, and recurring business performance analysis. Designed partitioned data lake structures on Amazon S3 using optimized file formats and layout strategies, significantly reducing query scan costs by ~60% for Athena-based analytics and frequent ad-hoc business queries. Built near real-time ingestion pipelines with AWS Lambda and Kinesis, improving dashboard refresh latency by ~50% and enabling faster insight delivery for internal analytics teams and downstream business stakeholders. Developed Spark-based batch processing workflows on AWS EMR, achieving a sustained ~90% production job success rate across large-scale data transformation, enrichment, and aggregation workloads. Applied SQL optimization and data modeling best practices to improve Redshift query performance, supporting complex aggregations,
Associate Data Engineer at Wipro
May 1, 2019 - August 1, 2020
Developed ETL pipelines using Python, SQL, and Apache Spark to process high-volume datasets, supporting enterprise reporting, analytics, and recurring business intelligence use cases across multiple client-facing applications and internal systems. Assisted in building Kafka-based ingestion pipelines to capture streaming data from upstream systems, enabling faster data availability and improved data freshness for downstream analytics and reporting teams. Designed and optimized relational database schemas, improving SQL query execution time by ~40% for frequently accessed reporting and analytics tables used by business users and operational teams. Implemented data validation and consistency checks during ingestion to improve data accuracy, proactively identifying anomalies and reducing downstream reconciliation effort, rework cycles, and reporting discrepancies. Supported senior engineers in deploying cloud-based data workflows using AWS S3 and AWS Glue, ensuring scalable, reliable, and

Education

Masters in Computer Science at Northwest Missouri State University
January 11, 2030 - January 30, 2026
Bachelors in Computer Science at Jawaharlal Nehru Technological University
January 11, 2030 - January 30, 2026

Qualifications

IBM Data Engineering Professional Certificate
January 11, 2030 - January 30, 2026
Google Cloud Data Engineering, Big Data, and Machine Learning Fundamentals
January 11, 2030 - January 30, 2026
Data Warehousing for Business Intelligence
January 11, 2030 - January 30, 2026
Building Batch Data Pipelines on Google Cloud
January 11, 2030 - January 30, 2026
Learning Apache Spark
January 11, 2030 - January 30, 2026
Data Engineering Foundations
January 11, 2030 - January 30, 2026

Industry Experience

Software & Internet, Professional Services, Financial Services, Other