Hi there! I am a results-driven Big Data Engineer and ETL Specialist with 5+ years of IT experience across Healthcare, Finance, Banking, and Investment Management. I design and implement scalable data platforms on AWS, Azure, and GCP, leveraging Hadoop ecosystems and modern ETL/ELT tooling to deliver reliable, high-quality data pipelines at scale. In my work, I focus on performance, data integrity, and governance to empower analytics and data-driven decision making. I enjoy collaborating with cross-functional teams to translate complex business requirements into robust data architectures. My experience spans building end-to-end data ingestion, processing, and analytics solutions using Databricks, Delta Lake, dbt, Airflow, and CI/CD, with a track record of delivering measurable improvements in data quality and operational efficiency.

Ritishek Kumar Karkal

Hi there! I am a results-driven Big Data Engineer and ETL Specialist with 5+ years of IT experience across Healthcare, Finance, Banking, and Investment Management. I design and implement scalable data platforms on AWS, Azure, and GCP, leveraging Hadoop ecosystems and modern ETL/ELT tooling to deliver reliable, high-quality data pipelines at scale. In my work, I focus on performance, data integrity, and governance to empower analytics and data-driven decision making. I enjoy collaborating with cross-functional teams to translate complex business requirements into robust data architectures. My experience spans building end-to-end data ingestion, processing, and analytics solutions using Databricks, Delta Lake, dbt, Airflow, and CI/CD, with a track record of delivering measurable improvements in data quality and operational efficiency.

Available to hire

Hi there! I am a results-driven Big Data Engineer and ETL Specialist with 5+ years of IT experience across Healthcare, Finance, Banking, and Investment Management. I design and implement scalable data platforms on AWS, Azure, and GCP, leveraging Hadoop ecosystems and modern ETL/ELT tooling to deliver reliable, high-quality data pipelines at scale. In my work, I focus on performance, data integrity, and governance to empower analytics and data-driven decision making.

I enjoy collaborating with cross-functional teams to translate complex business requirements into robust data architectures. My experience spans building end-to-end data ingestion, processing, and analytics solutions using Databricks, Delta Lake, dbt, Airflow, and CI/CD, with a track record of delivering measurable improvements in data quality and operational efficiency.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Work Experience

Senior Data Engineer at CIBC
September 1, 2024 - October 31, 2025
Designed and optimized T-SQL objects (Tables, Views, Triggers, Stored Procedures); tuned queries with indexing and execution plans; created and maintained complex data models and metadata repositories (ERWIN). Built and maintained ETL workflows using SSIS, Azure Data Factory (V1/V2), Hive, Pig, Spark SQL, and U-SQL. Migrated on-prem data (Oracle, SQL Server, DB2, MongoDB) to Azure Data Lake Storage using ADF pipelines. Implemented Slowly Changing Dimensions (SCD). Developed Delta Live Tables pipelines for incremental ingestion; authored modular Python automation scripts for data validation and API-based ETL workflows; automated reconciliation and reporting; enabled governance with data lineage and Unity Catalog; built dbt models; unit-tested Python modules; integrated CI/CD via Git/Bitbucket; created Power BI dashboards; tuned Spark/SQL workloads with Delta caching; implemented cost optimization strategies in Databricks.
Senior Data Engineer at Deutsche Bank
April 1, 2023 - April 1, 2023
Implemented Apache Airflow for authoring, scheduling, and monitoring complex ETL pipelines; deployed multiple DAGs to automate data workflows across Finance and Risk domains. Built ML workflows on GCP (TensorFlow, Dataproc) and leveraged AWS services (EC2, S3, Redshift, Data Pipeline, Lambda, Glue) for ingestion and analytics. Built streaming ingestion frameworks (Kafka → Databricks → Delta Lake) with fault-tolerance and exactly-once semantics. Orchestrated DAGs with BigQuery, Dataflow, and Dataproc processing over 2TB daily. Automated end-to-end ETL with Airflow and dbt; configured Data Pipeline to load from S3 into Redshift; integrated heterogeneous sources (Access, Excel, CSV, Oracle, flat files). Designed resilient pipelines with retry, checkpointing, and SLA monitoring; built GCP infrastructure from scratch using Terraform; deployed serverless AWS solutions (Glue, Lambda, S3) for near real-time ingestion into Redshift and GCP BigQuery; implemented cross-region cost/performance
Data Engineer at Acko
July 1, 2022 - July 1, 2022
Engineered Spark applications (PySpark, Spark SQL) for robust ETL and analytics; processed CSV/JSON/Parquet/COBOL formats to reveal usage patterns. Implemented Spark Streaming for real-time data quality checks; optimized SSIS packages for Azure integration; migrated Spark Streaming workloads to Azure Databricks DLT; developed end-to-end Python ETL scripts; established data governance and validation layers with Spark/NiFi; built DataStage jobs and parallelized processing; introduced Unity Catalog and Delta Sharing for secure data collaboration; designed serverless batch/streaming pipelines on AWS (Lambda, Glue, DynamoDB) for customer analytics; migrated Git workflows to Bitbucket; optimized Glue runtimes; integrated Databricks Repos with Azure DevOps; orchestrated ETL with SSIS, NiFi, Python, Spark transferring data to Hive/HBase/S3; integrated Bitbucket CI/CD with Jenkins and Airflow; designed dimensional schemas (Star/Snowflake); implemented data governance checks and Python operators
Senior Data Engineer at CIBC
September 1, 2024 - November 21, 2025
Designed and developed T-SQL objects (tables, views, triggers, stored procedures) for business reporting and operations; optimized queries with indexes and execution plan tuning; built and maintained data models and metadata repositories using ERWIN; designed and executed ETL workflows using SSIS, Azure Data Factory (ADF v1/v2), Spark SQL, Hive, Pig, Spark SQL, and U-SQL; migrated on-prem data sources (Oracle, SQL Server, DB2, MongoDB) to Azure Data Lake Storage using ADF pipelines; implemented Slowly Changing Dimensions (SCD) to maintain historical data; developed and maintained Delta Live Tables (DLT) pipelines in Azure Databricks for batch and streaming ingestion; translated business requirements into scalable GCP data models; authored modular Python scripts for automation, data validation, and API-based ETL workflows; automated data governance, lineage and schema evolution; integrated Unity Catalog for centralized governance; built dbt transformation models; implemented test automa

Education

Post Graduation in Business Insights & Analytics at Humber College Institute of Technology and Advanced Learning
January 11, 2030 - October 31, 2025
Bachelor of Engineering in Instrumentation & Control Engineering at Netaji Subhas Institute of Technology, University of Delhi
January 11, 2030 - October 31, 2025
Post Graduation in Business Insights & Analytics at Humber College Institute of Technology and Advanced Learning
January 11, 2030 - November 21, 2025
Bachelor of Engineering in Instrumentation & Control Engineering at Netaji Subhas Institute of Technology, University of Delhi, Delhi
January 11, 2030 - November 21, 2025

Qualifications

Add your qualifications or awards here.

Industry Experience

Financial Services, Healthcare, Software & Internet, Professional Services, Education