Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.…I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

Supriya Bodapati

Data Scientist, Full Stack Developer, Back-End Developer, +2





I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.…I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality. I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

Available to hire

I am Supriya Bodapati, a data engineer with 5+ years of experience building cloud-based data pipelines and analytics platforms using Spark, SQL, and cloud-native services across AWS, Azure, and GCP. I enable reliable reporting and data-driven decision-making for enterprise stakeholders by delivering scalable, analytics-ready datasets and governance-friendly pipelines, with a focus on reducing latency and improving data quality.

I collaborate closely with analytics, BI, product, and business teams to translate requirements into governed datasets and metrics, implement DataOps practices, and optimize performance across batch and near real-time pipelines. My goal is to empower faster insights and consistent KPI reporting across multiple business domains while maintaining strong security and governance standards.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Work Experience

Senior Data Platform Engineer at PayPal

June 1, 2024 - Present

Architected scalable batch and near real-time data pipelines using Azure Databricks, PySpark, and Delta Lake, delivering analytics-ready datasets that supported fraud, risk, and operational reporting across multiple product domains and high-volume transaction flows. Designed standardized ELT ingestion frameworks with Azure Data Factory and ADLS Gen2, consolidating data from transactional systems and event-driven sources while reducing downstream reconciliation effort and manual data corrections for analytics teams by ~30%. Implemented a Lakehouse architecture (Bronze-Silver-Gold) on Databricks to separate raw, refined, and curated datasets, strengthening data quality controls and accelerating SQL-based analytics consumption for BI and reporting users by ~40%. Developed Spark Structured Streaming pipelines to process high-volume event data, supporting near real-time monitoring use cases and reducing end-to-end data availability latency from several minutes to under 60 seconds across cri

Cloud Data Engineer at Microsoft

November 1, 2023 - June 1, 2024

Built cloud-native batch data pipelines using Azure Data Factory, Databricks, and ADLS Gen2, migrating and processing 40+ TB of enterprise data to support analytics, finance, and operational reporting across multiple internal business units. Implemented Delta Lake-based data models on Azure Databricks, improving Power BI query performance by ~55% through optimized storage layouts, caching strategies, query pruning, and efficient data access patterns. Designed Spark-based transformation pipelines to convert raw operational data into curated analytical datasets, enabling consistent reporting and downstream consumption by finance, operations, and enterprise analytics teams. Automated workflow orchestration and dependency handling using Apache Airflow, reducing manual intervention and lowering recurring pipeline failure rates by ~30% across scheduled batch and incremental data workloads. Applied advanced SQL tuning techniques, including partitioning and clustering strategies, to ensure pre

Data Engineer at Amazon

August 1, 2020 - November 1, 2022

Engineered scalable ETL pipelines using AWS Glue, Amazon S3, and Redshift, processing 10+ TB/day of structured and semi-structured data to support operational analytics, enterprise reporting, and recurring business performance analysis. Designed partitioned data lake structures on Amazon S3 using optimized file formats and layout strategies, significantly reducing query scan costs by ~60% for Athena-based analytics and frequent ad-hoc business queries. Built near real-time ingestion pipelines with AWS Lambda and Kinesis, improving dashboard refresh latency by ~50% and enabling faster insight delivery for internal analytics teams and downstream business stakeholders. Developed Spark-based batch processing workflows on AWS EMR, achieving a sustained ~90% production job success rate across large-scale data transformation, enrichment, and aggregation workloads. Applied SQL optimization and data modeling best practices to improve Redshift query performance, supporting complex aggregations,

Associate Data Engineer at Wipro

May 1, 2019 - August 1, 2020

Developed ETL pipelines using Python, SQL, and Apache Spark to process high-volume datasets, supporting enterprise reporting, analytics, and recurring business intelligence use cases across multiple client-facing applications and internal systems. Assisted in building Kafka-based ingestion pipelines to capture streaming data from upstream systems, enabling faster data availability and improved data freshness for downstream analytics and reporting teams. Designed and optimized relational database schemas, improving SQL query execution time by ~40% for frequently accessed reporting and analytics tables used by business users and operational teams. Implemented data validation and consistency checks during ingestion to improve data accuracy, proactively identifying anomalies and reducing downstream reconciliation effort, rework cycles, and reporting discrepancies. Supported senior engineers in deploying cloud-based data workflows using AWS S3 and AWS Glue, ensuring scalable, reliable, and