Hi, I’m Viveka Gangaraboina, a Senior Data Engineer with 8+ years of experience designing and scaling cloud-native data platforms, data lakehouse architectures, and distributed data pipelines for large-scale analytics. I specialize in AWS and Azure ecosystems, building high-performance data solutions with Snowflake, Databricks, Redshift, and Synapse Analytics to enable fast, reliable decision making. I enjoy turning complex data into scalable, governance-driven analytics. My focus includes real-time streaming, ETL/ELT pipelines using PySpark and SQL, ML pipelines with MLflow and SageMaker, and data governance and security. I thrive in cross-functional teams, automate deployments with CI/CD and IaC, and keep data quality and performance at the forefront of every project.

Viveka Gangaraboina

Hi, I’m Viveka Gangaraboina, a Senior Data Engineer with 8+ years of experience designing and scaling cloud-native data platforms, data lakehouse architectures, and distributed data pipelines for large-scale analytics. I specialize in AWS and Azure ecosystems, building high-performance data solutions with Snowflake, Databricks, Redshift, and Synapse Analytics to enable fast, reliable decision making. I enjoy turning complex data into scalable, governance-driven analytics. My focus includes real-time streaming, ETL/ELT pipelines using PySpark and SQL, ML pipelines with MLflow and SageMaker, and data governance and security. I thrive in cross-functional teams, automate deployments with CI/CD and IaC, and keep data quality and performance at the forefront of every project.

Available to hire

Hi, I’m Viveka Gangaraboina, a Senior Data Engineer with 8+ years of experience designing and scaling cloud-native data platforms, data lakehouse architectures, and distributed data pipelines for large-scale analytics. I specialize in AWS and Azure ecosystems, building high-performance data solutions with Snowflake, Databricks, Redshift, and Synapse Analytics to enable fast, reliable decision making.

I enjoy turning complex data into scalable, governance-driven analytics. My focus includes real-time streaming, ETL/ELT pipelines using PySpark and SQL, ML pipelines with MLflow and SageMaker, and data governance and security. I thrive in cross-functional teams, automate deployments with CI/CD and IaC, and keep data quality and performance at the forefront of every project.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert

Work Experience

Data Engineer at Conagra Brands
January 1, 2026 - Present
Implemented CDC-based data pipelines using AWS DMS and Snowflake Streams to achieve sub-minute data synchronization across order, warehouse, and logistics systems. Built and deployed end-to-end ML pipelines with Databricks, MLflow, and AWS SageMaker, reducing model deployment time by 40% and enabling scalable batch inference. Developed robust data movement pipelines using Sqoop for RDBMS-to-Hadoop transfers, Flume for streaming ingestion, and Oozie for dependable execution of large-scale processes. Designed Hadoop-based pipelines on AWS EMR (HDFS, MapReduce, Hive, Spark) and implemented Azure Data Factory/Databricks pipelines for multi-terabyte datasets. Worked with NoSQL stores (MongoDB, DynamoDB) to support semi-structured data. Led the adoption of Delta Lake to build an ACID-compliant data lakehouse, ensuring reliable batch and streaming processing. Built MLOps workflows with Jenkins, GitLab CI, and Kubernetes for continuous training, validation, and deployment. Modernized legacy SQ
Data Engineer at Silicon Valley Bank
April 1, 2023 - December 1, 2025
Designed and implemented a scalable AWS-based data lake architecture (S3, Snowflake, Redshift) housing 200+ TB of financial data for real-time and batch analytics. Built high-performance ETL/ELT pipelines using AWS Glue, PySpark, and Snowflake SQL, enabling automated ingestion from Oracle, SQL Server, and DynamoDB. Performed large-scale data processing and anomaly detection with Databricks, PySpark, and EMR to strengthen compliance and risk monitoring. Implemented event-driven architectures using Lambda, Step Functions, and SNS to support real-time alerts and automated workflows. Engineered Kafka-based streaming integrations with exactly-once delivery across heterogeneous data sources. Optimized Spark and Glue workloads on EMR auto-scaling clusters to balance performance and cost. Developed PySpark-based transformations for Dataproc/GCP environments, focusing on partitioning, caching, and broadcast joins. Created curated analytical datasets and data marts to standardize business defini
Data Engineer at SM Energy
January 1, 2021 - March 1, 2023
Led a hybrid cloud data platform on AWS, integrating on-premises systems to manage sensitive energy production data. Built ETL pipelines with AWS Glue to process 5TB+ daily of drilling, well-log, and production data into scalable cloud storage. Leveraged Snowflake as the enterprise data warehouse, applying optimization techniques and S3 integration for fast analytics. Moved legacy ETL from on-prem to Azure Data Lake using Informatica, ensuring uninterrupted access during modernization. Developed Python scripts for data quality validation across telemetry, production, and seismic datasets. Implemented end-to-end ELT pipelines with Fivetran + dbt to ingest ERP, SCADA, and drilling data into Snowflake/Redshift for analytics. Deployed serverless Lambda preprocessing for streaming IoT sensor data with SNS/SQS real-time alerts. Used Hadoop/Hive/HDFS for specific workloads with partitioning and bucketing. Orchestrated Airflow-based workflows, and built real-time streaming with Kafka/Kinesis.
Data Engineer at EPIC
February 1, 2018 - December 1, 2020
Built real-time data pipelines with AWS Glue and Apache Spark to process HL7/FHIR event data for HIPAA-compliant near real-time insights into admissions, discharges, and transfers. Engineered a Hadoop-based data lake with HDFS to consolidate EHR, claims, and operational datasets, enabling faster access and downstream analytics. Created PySpark and Spark SQL jobs in test environments for patient and clinical transformations, validating outputs prior to production deployment. Developed modular Python frameworks for cleaning, validation, and transformation across HBase, Hive, and Cassandra, improving reliability of healthcare data sources. Configured Snowflake objects and SnowSQL scripts to standardize ingestion and transformation of healthcare datasets, supporting advanced analytics and BI. Implemented MapReduce programs for large-scale healthcare data processing and migrated relational datasets into Hadoop using Sqoop with Oozie for incremental loads. Streamlined query performance in SQ

Education

Master of Science in Advanced Data Analytics at University of North Texas
January 11, 2030 - May 20, 2026

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Financial Services, Energy & Utilities, Retail, Software & Internet, Media & Entertainment