Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I'm a data engineering professional with 8 years of experience in Azure Cloud, Big Data, and data warehousing. I design and optimize end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, Snowflake, and a wide range of data technologies to enable reliable, scalable analytics. I enjoy collaborating with cross-functional teams to solve complex data problems, improve data governance, and drive performance through thoughtful architecture and automation. Outside of work, I love exploring new data tech, sharing learnings through documentation and automation, and contributing to open data projects whenever possible. I bring a pragmatic, user-focused mindset to data platforms, ensuring trusted analytics while keeping security and compliance top of mind.…I'm a data engineering professional with 8 years of experience in Azure Cloud, Big Data, and data warehousing. I design and optimize end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, Snowflake, and a wide range of data technologies to enable reliable, scalable analytics. I enjoy collaborating with cross-functional teams to solve complex data problems, improve data governance, and drive performance through thoughtful architecture and automation. Outside of work, I love exploring new data tech, sharing learnings through documentation and automation, and contributing to open data projects whenever possible. I bring a pragmatic, user-focused mindset to data platforms, ensuring trusted analytics while keeping security and compliance top of mind.

Sowmya Cherukuri

Data Scientist, Data Analyst, Database Developer, +4





I'm a data engineering professional with 8 years of experience in Azure Cloud, Big Data, and data warehousing. I design and optimize end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, Snowflake, and a wide range of data technologies to enable reliable, scalable analytics. I enjoy collaborating with cross-functional teams to solve complex data problems, improve data governance, and drive performance through thoughtful architecture and automation. Outside of work, I love exploring new data tech, sharing learnings through documentation and automation, and contributing to open data projects whenever possible. I bring a pragmatic, user-focused mindset to data platforms, ensuring trusted analytics while keeping security and compliance top of mind.…I'm a data engineering professional with 8 years of experience in Azure Cloud, Big Data, and data warehousing. I design and optimize end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, Snowflake, and a wide range of data technologies to enable reliable, scalable analytics. I enjoy collaborating with cross-functional teams to solve complex data problems, improve data governance, and drive performance through thoughtful architecture and automation. Outside of work, I love exploring new data tech, sharing learnings through documentation and automation, and contributing to open data projects whenever possible. I bring a pragmatic, user-focused mindset to data platforms, ensuring trusted analytics while keeping security and compliance top of mind.

Available to hire

I’m a data engineering professional with 8 years of experience in Azure Cloud, Big Data, and data warehousing. I design and optimize end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks, Snowflake, and a wide range of data technologies to enable reliable, scalable analytics. I enjoy collaborating with cross-functional teams to solve complex data problems, improve data governance, and drive performance through thoughtful architecture and automation.

Outside of work, I love exploring new data tech, sharing learnings through documentation and automation, and contributing to open data projects whenever possible. I bring a pragmatic, user-focused mindset to data platforms, ensuring trusted analytics while keeping security and compliance top of mind.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Language

English

Fluent

Work Experience

Senior Data Engineer at Goldman Sachs

June 1, 2024 - November 7, 2025

Led end-to-end ETL/ELT pipelines using Azure Data Factory, Azure Databricks, Snowflake, and large-scale datasets, integrating on-premises and cloud sources (MySQL, Cassandra, Blob Storage, and Azure SQL DB). Designed and optimized SQL with indexes, views, stored procedures, functions, and packages to boost query performance. Applied data warehousing techniques in Snowflake, including cleansing, Slowly Changing Dimensions (SCD), surrogate keys, and change data capture. Built and tuned ETL transformations with Spark SQL and Spark DataFrames in Databricks and orchestrated via ADF pipelines. Developed Python and SnowSQL scripts for data movement and processing. Leveraged Azure Functions, Logic Apps, and Service Bus for automation, data integration, and pipeline monitoring. Built real-time streaming pipelines using Apache Kafka, Spark Streaming, and Hive to ingest, transform, and analyze streaming data. Maintained Hadoop and RDBMS data integration programs with Hive on Spark, Spark SQL, and

Senior Data Engineer at Mayo Clinic

May 1, 2024 - May 1, 2024

Handled large-scale data engineering initiatives leveraging Hadoop Cloudera and Microsoft Azure. Ingested data from diverse sources into Azure Data Lake, Azure Storage, Azure SQL, and Azure Synapse Analytics; processed with Databricks. Migrated on-premises Oracle ETL processes and SQL workloads to Azure using Data Factory for batch, near-real-time, and event-driven data movement. Built scalable batch and real-time data pipelines using Apache Kafka, Spark Streaming, Hive, and Databricks for ingestion, transformation, and analytics. Processed structured, semi-structured, and unstructured healthcare data (HL7, FHIR) with Scala, PySpark, Spark SQL, and Hive, implementing UDFs, partitioning, and bucketing. Designed lakehouse architectures with Azure Data Lake Gen2 with hierarchical namespace for HIPAA-compliant storage. Developed DBT models encoding business logic, data masking, dimensional modeling, and slowly changing dimensions. Orchestrated workflows with Apache Airflow, automating ETL,

Big Data Developer at Lowe's Companies, Inc.

October 1, 2022 - October 1, 2022

Imported data from MySQL to HDFS using Sqoop; migrated data from Oracle to Hadoop. Performed ETL and aggregations using Apache Spark (Scala, PySpark, Spark SQL); loaded results into Hive. Designed pipelines using Flume, Sqoop, Kafka, Spark, and Hive for ingesting and analyzing customer and streaming data. Implemented CI/CD pipelines for deployments in Hadoop environments; used Zookeeper for coordination and Oozie for workflow scheduling. Worked with AWS EC2, Python, Shell scripting, and Ambari for cluster management. Architected AWS data lake using S3, AWS Glue, and Redshift; designed GCP data lake using Cloud Storage (GCS) and BigQuery; Developed pipelines using GCP Dataflow (Apache Beam) for batch and streaming. Used Infrastructure as Code (IaC) with Terraform for AWS and GCP. Focused on cloud cost optimization through resource rightsizing and reserved instances.

Hadoop Developer at KeyBank

December 1, 2020 - December 1, 2020

Maintained source code in Git and GitHub; prepared ETL framework using Sqoop, Pig, Hive for data ingestion. Processed HDFS data, created external Hive tables, and reusable ingestion/repair scripts. Developed ETL jobs using Spark-Scala (RDDs, DataFrames, Spark SQL) for migration and reporting; built Spark Streaming applications for real-time analytics. Analyzed source data; modified data types for efficient handling. Designed PySpark solutions for SQL script analysis. Extracted data using Sqoop; performed transformations using Hive and MapReduce. Implemented automation for deployments using YAML scripts. Worked with Hive, Pig, HBase, Spark, Zookeeper, Flume, Kafka, Sqoop. Implemented data classification algorithms using MapReduce, optimized with partitioning, combiners, and distributed cache. Created technical documentation for ETL processes, data flows, and system architecture.

Data Warehouse Developer at Atlantis Architects

November 1, 2018 - November 1, 2018

Worked as SQL Server Analyst/Developer/DBA using SQL Server 2012, 2015, 2016. Managed and updated Erwin models (Logical/Physical) for CDS, ADM, and Reference DB. Tracked source control and deployment using TFS across environments. Wrote Triggers, Stored Procedures, Functions, and maintained structures using T-SQL. Managed file/file groups, table/index associations, query and performance tuning. Implemented ETL processes using SSIS packages for data extraction, transformation, and loading. Optimized SQL performance with indexing, reducing report generation time by 25%. Built data validation and cleansing routines in SSIS ensuring data quality. Implemented slowly changing dimensions (SCD Type 1 and 2) in the data warehouse. Created aggregate and summary tables to enhance reporting performance.

Senior Data Engineer at Mayo Clinic

November 1, 2022 - May 1, 2024

Hands-on Hadoop Cloudera and Microsoft Azure data engineering for healthcare. Ingested data into Azure Data Lake, Azure Storage, Azure SQL, and Azure Synapse Analytics; processed with Azure Databricks. Migrated on-premises Oracle ETL processes and SQL databases to Azure via Data Factory. Built scalable batch and real-time data pipelines using Kafka, Spark Streaming, Hive, and Databricks for ingestion, transformation, and analytics. Processed structured, semi-structured, and unstructured healthcare data using Scala, PySpark, Spark SQL, and Hive, including UDFs, joins, partitioning, and bucketing. Designed lakehouse architecture with Azure Data Lake Gen2 for HIPAA-compliant storage. Developed DBT models with data masking, dimensional modeling, and SCDs. Orchestrated workflows with Apache Airflow, automating ETL pipelines, data quality checks, and failure recovery. Implemented HL7 and FHIR standards for healthcare data integration and interoperability. Optimized ADF pipelines for performa

Big Data Developer at Lowe's Companies, Inc.

January 1, 2021 - October 1, 2022

Imported data from MySQL to HDFS using Sqoop; migrated data from Oracle to Hadoop. Performed ETL and aggregations using Apache Spark (Scala, PySpark, Spark SQL); loaded results into Hive. Designed pipelines using Flume, Sqoop, Kafka, Spark, Hive for ingesting and analyzing customer and streaming data. Implemented CI/CD for deployments in Hadoop environments. Used Zookeeper for coordination and Oozie for workflow scheduling. Worked with AWS EC2, Python, Shell scripting, and Ambari for cluster monitoring. Architected AWS data lake with S3, AWS Glue, and Redshift; developed UNIX scripts and YAML automation for job scheduling and builds. Designed GCP data lake using Cloud Storage and BigQuery for analytics; developed pipelines using GCP Dataflow (Apache Beam) for batch and streaming; used GCP Dataproc with BigQuery for scalable analytics. Implemented IaC with Terraform for AWS and GCP; focused on cloud cost optimization.

Hadoop Developer at KeyBank

February 1, 2019 - December 1, 2020

Maintained source code in Git/GitHub; prepared ETL framework using Sqoop, Pig, Hive for data ingestion. Developed Spark-Scala jobs (RDDs, DataFrames, Spark SQL) for migration and reporting; built Spark Streaming pipelines for real-time analytics. Analyzed source data and optimized data types. Implemented automation for deployments using YAML scripts. Worked with Hive, Pig, HBase, Spark, Zookeeper, Flume, Kafka, Sqoop; deployed on AWS (S3, DMS, RDS) and Git-based workflows. Created technical documentation for ETL processes and data flows.

Data Warehouse Developer at Atlantis Architects

September 1, 2017 - November 1, 2018

SQL Server Analyst/Developer/DBA handling SQL Server 2012-2016. Managed Erwin data models; wrote T-SQL triggers, stored procedures, functions; performed performance tuning and indexing. Implemented ETL with SSIS; built data validation and cleansing routines; implemented SCDs; created aggregate tables to boost reporting performance.