With 4+ years of data engineering experience, I design and optimize large-scale data solutions across AWS, Azure, and Snowflake. I specialize in building end-to-end ETL/ELT pipelines, real-time streaming platforms, and data lakehouse architectures using Python, SQL, Spark, Kafka, and Airflow.\n\nI excel at data modeling, performance optimization, and reducing latency to deliver reliable datasets for analytics, BI, and ML workloads. I thrive in Agile environments, collaborating with cross-functional teams to deliver scalable data products that drive measurable business outcomes.

Bhagya Sree Akula

With 4+ years of data engineering experience, I design and optimize large-scale data solutions across AWS, Azure, and Snowflake. I specialize in building end-to-end ETL/ELT pipelines, real-time streaming platforms, and data lakehouse architectures using Python, SQL, Spark, Kafka, and Airflow.\n\nI excel at data modeling, performance optimization, and reducing latency to deliver reliable datasets for analytics, BI, and ML workloads. I thrive in Agile environments, collaborating with cross-functional teams to deliver scalable data products that drive measurable business outcomes.

Available to hire

With 4+ years of data engineering experience, I design and optimize large-scale data solutions across AWS, Azure, and Snowflake. I specialize in building end-to-end ETL/ELT pipelines, real-time streaming platforms, and data lakehouse architectures using Python, SQL, Spark, Kafka, and Airflow.\n\nI excel at data modeling, performance optimization, and reducing latency to deliver reliable datasets for analytics, BI, and ML workloads. I thrive in Agile environments, collaborating with cross-functional teams to deliver scalable data products that drive measurable business outcomes.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

Data Engineer at Cisco Systems
December 1, 2024 - November 6, 2025
Built batch and real-time data pipelines in Python to ingest ~60M records weekly from APIs and log feeds into Snowflake and S3, enabling 5-minute telemetry analytics. Reduced ETL runtime from 8 hours to 90 minutes and cut Snowflake credits by 35% through incremental CDC loads using Snowpipe/Streams with partitioning and clustering. Engineered PySpark transformations and schema normalization on Databricks, improving downstream query performance by 40% and accelerating reporting for operations and customer success. Created reusable dbt models (staging/core/marts) with lineage, tests, and standardization, accelerating model rollout time by 60%. Designed Snowflake RBAC, masking policies, and object tagging to meet compliance. Implemented data validation with Great Expectations, auto-remediating 1.2K anomalies per month and improving data integrity by 38%. Optimized BI workloads by tuning SQL (views, clustering, pruning), reducing BI query time from 14 minutes to under 5 minutes for monthly
Data Engineer at Citius Tech
July 1, 2023 - July 1, 2023
Orchestrated NiFi ingestion pipelines across REST, SFTP, databases with schema registry and back-pressure, processing 25K+ records per day and reducing manual effort by 72%. Built dbt staging layers and curated marts with tests and documentation, reducing schema drift incidents by 50% and accelerating cross-team development. Migrated 1.2 TB Hadoop/HDFS logs nightly to Amazon Redshift using Python with parallelized COPY operations, improving query speed by 35%. Optimized Redshift schema design (distribution/sort keys, time-based partitioning), cutting KPI query times from 14 minutes to under 5 minutes. Implemented data quality frameworks with Great Expectations, remediating 3,400+ anomalies and improving integrity across completeness, uniqueness, and referential checks. Developed AWS Glue ETL jobs in PySpark to unify clinical and insurance data, processing 3M+ records/day and reducing integration latency by 48%. Automated data lineage and observability with dbt exposures, NiFi provenanc
Data Engineer at Hexaware Technologies
October 1, 2021 - October 1, 2021
Migrated transactional logs from on-prem SQL Server to Azure Data Lake Storage Gen2 using Azure Data Factory and parallelized copy operations, moving over 800M+ rows during weekend windows and reducing the ingestion window by 65%. Engineered PySpark jobs on Azure Databricks to aggregate 50 million user events daily, delivering bronze/silver/gold layers for downstream analytics and reporting. Implemented CDC pipelines with Kafka Connect (SQL Server source) and custom Python consumers, processing 2M change events per day with end-to-end latency under 5 minutes. Designed Parquet storage layouts with optimized partitioning and file sizing (target file sizes, compression), cutting BI query times by 42% and storage costs by 25%. Orchestrated nightly transformations using Apache Airflow (DAGs, retries, SLAs, task dependencies), coordinating 20+ tasks and delivering production datasets by 5AM each business day. Enforced schema contracts via JSON specifications and Python unit tests (pydantic/p
Data Analyst at Hexaware Technologies
June 1, 2021 - June 1, 2021
Built SQL-based analytical reports and ad hoc queries on Azure SQL and Databricks SQL to track user behavior, funnel drop-offs, and product adoption trends. Developed interactive dashboards in Power BI for weekly executive reviews, visualizing KPIs such as active users, session frequency, latency SLAs, and incident counts. Performed data profiling, cohort analysis, and A/B test readouts; defined data quality acceptance criteria with stakeholders and formalized metric definitions in a shared data dictionary. Optimized BI queries through aggregation tables, materialized views, and partition pruning, reducing dashboard load times by 35%. Collaborated with product and operations teams to translate business questions into analytical datasets and scheduled insights to improve decision lead time for quarterly planning.

Education

Master of Science in Computer Science at Wichita State University
August 1, 2023 - May 1, 2025
Bachelor of Technology in Computer Science Engineering at Vignan Institute of Information Technology
June 1, 2017 - June 1, 2021

Qualifications

Power Platform Fundamentals PL900
January 11, 2030 - November 6, 2025
Basics of Python Programming
January 11, 2030 - November 6, 2025
Data Analytics with Python
January 11, 2030 - November 6, 2025

Industry Experience

Software & Internet, Professional Services, Education