I am an experienced Data Engineer with over 8 years of expertise in building reliable and scalable data solutions using cloud platforms like AWS and Azure. I specialize in creating real-time and batch data pipelines with tools such as Spark, Kafka, and Databricks. I have a strong focus on migrating large datasets to the cloud, improving system performance, and reducing costs. I am comfortable working with both structured and unstructured data and prioritize making data accessible, secure, and valuable for analytics and reporting. As a strong team player, I enjoy solving complex problems and helping businesses make smarter data-driven decisions.

Sanjay Gurrapu

I am an experienced Data Engineer with over 8 years of expertise in building reliable and scalable data solutions using cloud platforms like AWS and Azure. I specialize in creating real-time and batch data pipelines with tools such as Spark, Kafka, and Databricks. I have a strong focus on migrating large datasets to the cloud, improving system performance, and reducing costs. I am comfortable working with both structured and unstructured data and prioritize making data accessible, secure, and valuable for analytics and reporting. As a strong team player, I enjoy solving complex problems and helping businesses make smarter data-driven decisions.

Available to hire

I am an experienced Data Engineer with over 8 years of expertise in building reliable and scalable data solutions using cloud platforms like AWS and Azure. I specialize in creating real-time and batch data pipelines with tools such as Spark, Kafka, and Databricks. I have a strong focus on migrating large datasets to the cloud, improving system performance, and reducing costs.

I am comfortable working with both structured and unstructured data and prioritize making data accessible, secure, and valuable for analytics and reporting. As a strong team player, I enjoy solving complex problems and helping businesses make smarter data-driven decisions.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

Senior Data Engineer at Truist
February 1, 2024 - Present
Led migration from legacy Hadoop systems to AWS focusing on real-time fraud detection and analytics using Kinesis and Redshift. Migrated Hadoop/Hive datasets to S3 and Redshift with Glue and DMS. Built real-time fraud detection pipelines with Spark Streaming and Kinesis. Designed S3 data lake with Glue Catalog enabling self-service analytics. Automated infrastructure with Terraform and containerized Spark jobs with Docker and Kubernetes on ECS. Integrated Snowflake for analytics and implemented CDC for Oracle/SQL Server sync. Optimized Redshift queries and secured pipelines with IAM, KMS, and Lake Formation. Monitored systems using CloudWatch and Datadog.
Data Engineer at PG & E
December 31, 2023 - August 26, 2025
Built scalable serverless ETL pipelines with Glue and Lambda and designed real-time tracking with Kinesis and DynamoDB. Automated data quality checks using Glue DataBrew and Deequ. Migrated Teradata data warehouse workloads to Redshift and optimized ETL with stored procedures. Developed CI/CD pipelines with Jenkins and Terraform. Created Tableau dashboards for analytics. Managed batch jobs with Airflow and secured data sharing with VPC peering and Lake Formation permissions. Used monitoring tools CloudWatch and SNS for operational alerts.
Azure Data Engineer at Corefront Technologies Pvt Ltd
July 31, 2022 - August 26, 2025
Designed a secure HIPAA-compliant healthcare analytics platform using Azure services. Built ADF pipelines ingesting data into Synapse and integrated Snowflake for audit trail reporting. Developed Dockerized PySpark transformation jobs deployed on AKS. Optimized queries with Delta Lake on ADLS Gen2 and used Azure Functions for event-triggered enrichment tasks. Automated Synapse deployments with ARM templates and Azure DevOps. Created Power BI dashboards with row-level security and set up alerts with Azure Monitor and Event Grid. Managed Cosmos DB for unstructured records and enabled version control with Azure Repos.
Big Data Engineer at Gigabyte Infocomm Pvt Ltd
December 31, 2020 - August 26, 2025
Migrated legacy ETL jobs to PySpark on Hadoop ecosystem. Improved performance of Hive queries with partitioning, bucketing, vectorization, and Tez execution engine. Integrated Sqoop, Flume for data ingestion and live streaming. Automated workflows with Oozie, scheduled batch jobs with Cron, and managed Hadoop security with Kerberos. Modeled HBase tables for high-volume transaction data. Monitored cluster health with Ambari and deployed components via Cloudera Manager.
Senior Data Engineer at Truist
February 1, 2024 - Present
Led the migration from legacy Hadoop systems to AWS focusing on high-speed fraud detection and real-time analytics with Kinesis and Redshift. Migrated large datasets using Glue and DMS and built real-time fraud detection pipelines using Spark Streaming. Designed an S3-based data lake with Glue Catalog for self-service analytics. Automated infrastructure provisioning using Terraform and Docker for Redshift and Snowflake integrations. Implemented Change Data Capture (CDC) with AWS DMS and Lambda for database synchronization. Optimized Redshift queries and secured pipelines with IAM, KMS, and Lake Formation. Containerized Spark streaming jobs and deployed them with Kubernetes on ECS while monitoring systems using CloudWatch and Datadog.
Data Engineer at PG & E
December 31, 2023 - August 26, 2025
Built scalable serverless real-time data ingestion and utility tracking pipelines. Migrated data warehouse workloads from Teradata to AWS Redshift with optimizations. Developed Lambda orchestrations for Kinesis ingestion, anomaly alerts, and Glue job retries. Automated data quality checks and deployed ETL scripts in Docker containers managed by Kubernetes. Managed CI/CD pipelines integrating Jenkins and Terraform for Docker and Snowflake deployments. Created Tableau dashboards and managed batch jobs with Airflow. Secured data sharing with VPC peering and fine-grained permissions using Lake Formation. Implemented logging and monitoring with CloudWatch and SNS.
Azure Data Engineer at Corefront Technologies Pvt Ltd
July 31, 2022 - August 26, 2025
Designed and implemented a secure, HIPAA-compliant healthcare analytics platform using Azure. Built ADF pipelines ingesting data into Synapse and utilized Snowflake for audit trail reporting. Developed Dockerized PySpark transformation jobs deployed on AKS. Optimized ADLS Gen2 storage queries using Delta Lake and Z-ordering. Created event-triggered data enrichment functions using Azure Functions and used Databricks for data cleansing. Automated deployments with ARM templates and Azure DevOps. Designed Power BI dashboards with row-level security and configured Azure Monitor alerts. Integrated Cosmos DB for unstructured records and managed container deployments and version control.
Big Data Engineer at Gigabyte Infocomm Pvt Ltd
December 31, 2020 - August 26, 2025
Migrated legacy ETL jobs to PySpark on Hadoop clusters and enhanced performance with Hive partitioning and bucketing. Integrated Sqoop and Flume for data ingestion and automated workflows with Oozie monitored via Ambari. Implemented Kerberos security and optimized YARN configurations. Modeled HBase tables for high-volume transactions and optimized Hive queries using vectorization and Tez engine. Integrated live data streaming via Flume and deployed Hadoop ecosystem components with Cloudera Manager.

Education

Masters in Computing and Information Systems at Youngstown State University
January 11, 2030 - August 26, 2025
Bachelors in Electronics Engineering and Technology at Geethanjali College of Engineering and Technology
January 11, 2030 - August 26, 2025
Masters in Computing and Information Systems at Youngstown State University
January 11, 2030 - August 26, 2025
Bachelors in Electronics Engineering and Technology at Geethanjali College of Engineering and Technology
January 11, 2030 - August 26, 2025

Qualifications

Microsoft Certified: Azure Data Engineer Associate (DP-203)
January 11, 2030 - August 26, 2025
Microsoft Certified: Azure Data Engineer Associate (DP-203)
January 11, 2030 - August 26, 2025

Industry Experience

Financial Services, Energy & Utilities, Healthcare, Software & Internet, Professional Services