Hi, I’m Shiva Kumar. I design and build scalable data platforms and pipelines that empower analytics and decision-making across healthcare, financial services, and transportation. I enjoy turning complex, messy data into trusted datasets that enable near real-time insights and governance. I bring 6+ years of experience across AWS, Azure, and on-prem environments. I collaborate with data scientists, BI teams, and product partners to deliver business-driven data solutions while focusing on performance, cost efficiency, and data governance.

Shiva Kumar

Hi, I’m Shiva Kumar. I design and build scalable data platforms and pipelines that empower analytics and decision-making across healthcare, financial services, and transportation. I enjoy turning complex, messy data into trusted datasets that enable near real-time insights and governance. I bring 6+ years of experience across AWS, Azure, and on-prem environments. I collaborate with data scientists, BI teams, and product partners to deliver business-driven data solutions while focusing on performance, cost efficiency, and data governance.

Available to hire

Hi, I’m Shiva Kumar. I design and build scalable data platforms and pipelines that empower analytics and decision-making across healthcare, financial services, and transportation. I enjoy turning complex, messy data into trusted datasets that enable near real-time insights and governance.

I bring 6+ years of experience across AWS, Azure, and on-prem environments. I collaborate with data scientists, BI teams, and product partners to deliver business-driven data solutions while focusing on performance, cost efficiency, and data governance.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
See more

Work Experience

Cloud Data Engineer at Calian Group
May 1, 2023 - October 31, 2025
Building a scalable and secure AWS-based data platform to consolidate healthcare data sources, enabling analytics, compliance reporting, and near real-time insights for decision-making. Design and develop data pipelines using AWS Glue, Lambda, and Step Functions to ingest and process structured and unstructured healthcare data. Build and manage data lakes on Amazon S3 and integrate with Redshift for analytics and reporting. Implement PySpark-based ETL jobs on EMR for large-scale transformations and data cleansing. Apply data governance and compliance standards (HIPAA, PIPEDA) by enforcing encryption, access control, and data masking. Optimize pipeline performance and cost using AWS cost monitoring tools and partitioning strategies. Collaborate with data scientists to deliver curated datasets for ML and AI-driven use cases. Develop automated monitoring and alerting for pipeline health using CloudWatch.
Data Engineer at Manulife
April 1, 2023 - April 1, 2023
Enterprise Data Lake Migration on Azure: Migrated legacy on-premise data warehouse to Azure Data Lake and Synapse Analytics, enabling enterprise-wide access to a single source of truth for financial and customer data. Developed ETL pipelines using Azure Data Factory to ingest financial and claims data from multiple sources (SQL Server, Oracle, flat files). Implemented PySpark and Databricks notebooks for data transformations, validations, and aggregations. Modeled data in Azure Synapse Analytics to support BI and financial reporting. Integrated ADLS Gen2 with governance standards. Automated workflows with ADF triggers/pipelines for real-time and batch ingestion. Collaborated with BI teams to deliver Power BI dashboards from Synapse datasets. Tuned SQL queries and optimized storage formats (Parquet, Delta Lake) for performance and cost.
ETL Developer / Data Engineer at Lyft
January 1, 2021 - January 1, 2021
Built and maintained ETL workflows to support data-driven decision-making for rider engagement, pricing models, and operational insights. Designed and implemented ETL pipelines using Informatica and Python to integrate rider and driver data from multiple sources. Worked with Kafka streams to capture real-time events (trips, payments, user activity). Optimized large-scale queries and developed analytics-ready tables in Snowflake and Hive. Partnered with analysts and product managers to deliver insights on rider engagement and fraud detection. Migrated legacy SQL-based ETL workflows to scalable Spark/PySpark jobs. Developed monitoring solutions to ensure data quality and lineage tracking.
Cloud Data Engineer at Calian Group
May 1, 2023 - November 21, 2025
Designed and implemented scalable, secure AWS-based data platform consolidating healthcare data sources to enable advanced analytics, compliance reporting, and near real-time decision support. Built data lakes on S3 and integrated with Redshift for analytics; developed PySpark ETL on EMR for large-scale transformations; applied HIPAA and PIPEDA governance with encryption, access control, and data masking. Optimized pipelines for performance and cost using partitioning and AWS Cost Explorer; collaborated with data scientists to provision curated datasets for ML/AI; established automated monitoring and alerts with CloudWatch.

Education

Add your educational history here.

Qualifications

Add your qualifications or awards here.

Industry Experience

Healthcare, Financial Services, Software & Internet, Transportation & Logistics, Professional Services