Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am an experienced Data Engineer with 8+ years of designing, building, and optimizing data lakehouse pipelines across AWS and Azure. I lead end-to-end healthcare data pipelines, integrating claims, clinical, and rebate datasets while maintaining HIPAA compliance.\n\nI collaborate with business, compliance, and analytics teams to translate requirements into governed, auditable data solutions, and I continually optimize performance and cost through partitioning, compression, and robust monitoring.…I am an experienced Data Engineer with 8+ years of designing, building, and optimizing data lakehouse pipelines across AWS and Azure. I lead end-to-end healthcare data pipelines, integrating claims, clinical, and rebate datasets while maintaining HIPAA compliance.\n\nI collaborate with business, compliance, and analytics teams to translate requirements into governed, auditable data solutions, and I continually optimize performance and cost through partitioning, compression, and robust monitoring.

VINAY KUMAR





I am an experienced Data Engineer with 8+ years of designing, building, and optimizing data lakehouse pipelines across AWS and Azure. I lead end-to-end healthcare data pipelines, integrating claims, clinical, and rebate datasets while maintaining HIPAA compliance.\n\nI collaborate with business, compliance, and analytics teams to translate requirements into governed, auditable data solutions, and I continually optimize performance and cost through partitioning, compression, and robust monitoring.…I am an experienced Data Engineer with 8+ years of designing, building, and optimizing data lakehouse pipelines across AWS and Azure. I lead end-to-end healthcare data pipelines, integrating claims, clinical, and rebate datasets while maintaining HIPAA compliance.\n\nI collaborate with business, compliance, and analytics teams to translate requirements into governed, auditable data solutions, and I continually optimize performance and cost through partitioning, compression, and robust monitoring.

Available to hire

I am an experienced Data Engineer with 8+ years of designing, building, and optimizing data lakehouse pipelines across AWS and Azure. I lead end-to-end healthcare data pipelines, integrating claims, clinical, and rebate datasets while maintaining HIPAA compliance.\n\nI collaborate with business, compliance, and analytics teams to translate requirements into governed, auditable data solutions, and I continually optimize performance and cost through partitioning, compression, and robust monitoring.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Work Experience

Data Engineer at CVS Health

July 1, 2023 - November 6, 2025

Designed and developed scalable ETL/ELT pipelines on AWS EMR with PySpark to process high-volume claims, clinical, and rebate data from multiple healthcare systems. Implemented serverless data orchestration using AWS Lambda and Step Functions for event-driven ingestion and transformation, improving reliability and reducing manual effort by 40%. Leveraged AWS Glue for metadata management, schema drift handling, and seamless integration with Amazon Athena and Snowflake for downstream analytics. Optimized Athena queries on partitioned and compressed S3 datasets for faster, cost-efficient interactive reporting. Automated provisioning with AWS CloudFormation and Terraform to ensure repeatable, secure deployments. Integrated Snowflake downstream as a data warehouse and developed PySpark-based data quality checks to enforce HIPAA compliance. Implemented monitoring and alerting using CloudWatch and SNS to detect and resolve EMR and Lambda workflow issues. Collaborated with business and data go

Data Engineer at UBS

December 1, 2021 - December 1, 2021

Designed and implemented scalable ETL pipelines using Azure Data Factory, Databricks, and ADLS Gen2, reducing daily data integration time by 60% and enabling near real-time availability of customer and sales data. Applied Medallion Architecture to enforce layered data quality, modular transformations, and auditable pipelines for sensitive banking data. Created customer 360 view by joining and enriching gold-layer datasets, contributing to a 25% lift in cross-sell campaign targeting. Automated orchestration with ADF triggers and parameterized pipelines, saving ~10 hours/week for the analytics teams. Secured pipelines with Azure Key Vault, RBAC, and Managed Identities in compliance with banking audit controls. Queried and validated curated data in Synapse SQL to power real-time dashboards monitoring branch KPIs, loan performance, and product sales trends. Delivered self-service curated datasets to risk and compliance teams via Synapse views, reducing reporting duplication.

Big Data Engineer at Vodafone Group

January 1, 2021 - January 1, 2021

Designed and deployed end-to-end ETL pipelines using Azure Data Factory, HDInsight (Spark), and Synapse, reducing data processing time by 60% and delivering near real-time call and service data to analytics teams. Built job scheduling and monitoring with the ADF Python SDK, decreasing pipeline downtime and enabling real-time alerting. Developed Spark-based Python UDFs to flatten and normalize nested JSON structures from APIs and events, improving transformation speed by 40% and reducing parsing errors. Integrated Azure Synapse with Event Hubs, Blob Storage, and Key Vault for secure data access and scalable querying. Migrated legacy on-prem HDFS data to Azure Data Lake Storage Gen2 via a secure self-hosted IR, reducing data silos and improving access latency by ~70%. Optimized streaming pipelines with Apache Kafka, NiFi, and Spark Streaming for sub-minute latency and faster anomaly detection. Automated CI/CD workflows using Azure DevOps with Git integration, reducing deployment time by

Hadoop Developer at Sterlite Technologies

March 1, 2018 - March 1, 2018

Ingested high-volume datasets using Apache Sqoop to transfer data between RDBMS, HDFS, and Hive, enabling seamless cross-platform integration. Optimized HiveQL queries and implemented partitioning strategies, reducing compute time and resource utilization by 30% for critical network analytics jobs. Developed and orchestrated Oozie workflows to automate Hive, Pig, and HDFS batch jobs, improving pipeline efficiency, maintainability, and reducing manual intervention. Created MapReduce and Pig scripts for data transformations on structured and semi-structured data, supporting product lifecycle tracking and customer event processing. Performed data profiling and quality validations, ensuring clean datasets for downstream BI tools and engineering analytics dashboards. Prototyped Spark SQL-based jobs to demonstrate significant performance improvements (2–5x faster) over legacy Hive workloads, supporting internal migration planning.