I am a results-driven AWS Data Engineer with 6+ years of experience designing and implementing scalable data pipelines, cloud ETL frameworks, and analytical platforms using AWS Glue, AWS Lambda, Redshift, EMR, Spark, Python, and SQL. I thrive on building robust, scalable data platforms and embracing cloud-native architectures that drive insights and business value. I specialize in developing serverless ETL, optimizing SQL queries, and managing large-scale data processing across S3, Redshift, DynamoDB, and EMR. I have hands-on experience building Glue Jobs, Step Functions, Kinesis/Fargate streaming, and orchestrating batch and real-time workflows. I frequently leverage Glue Studio, CI/CD pipelines on AWS CodePipeline and GitHub Actions, and infrastructure as code with Terraform/CloudFormation to deliver production-grade data solutions while maintaining security and cost efficiency.

Karan Ksheersagar

I am a results-driven AWS Data Engineer with 6+ years of experience designing and implementing scalable data pipelines, cloud ETL frameworks, and analytical platforms using AWS Glue, AWS Lambda, Redshift, EMR, Spark, Python, and SQL. I thrive on building robust, scalable data platforms and embracing cloud-native architectures that drive insights and business value. I specialize in developing serverless ETL, optimizing SQL queries, and managing large-scale data processing across S3, Redshift, DynamoDB, and EMR. I have hands-on experience building Glue Jobs, Step Functions, Kinesis/Fargate streaming, and orchestrating batch and real-time workflows. I frequently leverage Glue Studio, CI/CD pipelines on AWS CodePipeline and GitHub Actions, and infrastructure as code with Terraform/CloudFormation to deliver production-grade data solutions while maintaining security and cost efficiency.

Available to hire

I am a results-driven AWS Data Engineer with 6+ years of experience designing and implementing scalable data pipelines, cloud ETL frameworks, and analytical platforms using AWS Glue, AWS Lambda, Redshift, EMR, Spark, Python, and SQL. I thrive on building robust, scalable data platforms and embracing cloud-native architectures that drive insights and business value.

I specialize in developing serverless ETL, optimizing SQL queries, and managing large-scale data processing across S3, Redshift, DynamoDB, and EMR. I have hands-on experience building Glue Jobs, Step Functions, Kinesis/Fargate streaming, and orchestrating batch and real-time workflows. I frequently leverage Glue Studio, CI/CD pipelines on AWS CodePipeline and GitHub Actions, and infrastructure as code with Terraform/CloudFormation to deliver production-grade data solutions while maintaining security and cost efficiency.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Work Experience

Sr Data Engineer at TD Bank
February 1, 2024 - Present
Designed and developed large-scale ETL pipelines using AWS Glue (PySpark + Python) to ingest, transform, and curate billions of credit card transactions into Amazon S3 and Redshift, ensuring high performance and reliability across daily and hourly loads. Built serverless ingestion workflows using Lambda, EventBridge, and S3 event triggers, automating extraction from APIs, RDS sources, and on-prem systems via secure VPC endpoints. Leveraged Spark on EMR to build scalable batch/real-time processing frameworks with optimized partitioning, caching, and fault tolerance to support downstream fraud models. Designed Redshift star/snowflake schemas, leveraging sort keys, dist keys, late-binding views, Spectrum external tables, and incremental load logic to support analytics, compliance reporting, and BI dashboards. Implemented Kinesis Firehose → S3 → Redshift streaming pipelines to deliver near real-time transaction data with schema-aware transformations. Built QuickSight dashboards to visu
Sr Data Engineer at Keeper AI
October 1, 2023 - January 31, 2024
Built end-to-end ETL/ELT pipelines using AWS Glue Jobs and Glue Workflows, ingesting structured and semi-structured datasets into S3 data lake zones and Redshift fact/dimension layers. Developed Spark transformations using PySpark on EMR, enabling fast processing of user behavior data, clickstreams, chat interactions, and engagement logs across distributed clusters. Automated event-driven ingestion using EventBridge + Lambda, capturing real-time interactions and storing them in S3 for downstream processing. Created QuickSight dashboards integrated with Redshift to analyze engagement metrics, behavioral signals, feature usage, churn indicators, and user journey flows. Integrated Python ETL code with Boto3 to extract data from REST APIs, third-party tools, and cloud storage sources, transforming them into analytics-ready datasets. Designed DynamoDB tables for high-throughput ingestion of user interactions, powering real-time recommendation logic and graph-based queries. Implemented end-t
Data Engineer at Pimco
January 1, 2023 - September 30, 2023
Designed and optimized a daily-refresh AWS Glue pipeline delivering curated CRM, market, and sales datasets using S3, Glue, EMR, Lambda, and Redshift, achieving a 35% reduction in refresh time and ensuring on-time daily delivery. Built scalable daily ETL pipelines using AWS Glue (PySpark) to ingest CRM and market data from multiple sources into S3-based raw, refined, and curated layers. Engineered Python/Spark ETL logic on EMR to support low-latency transformations with optimized join strategies, broadcast hints, bucketing, and parallelism tuning. Automated workflows using Step Functions, orchestrating Glue Jobs, Lambda triggers, and Redshift queries with detailed audit logging and failure recovery logic. Built partitioned S3 storage layouts using Hive-style folder structures, Glue bookmarks, and incremental load patterns to reduce processing time and costs. Integrated real-time CRM events using API Gateway + Lambda, capturing updates and appending them to curated datasets. Developed Q
Data Engineer at Bosch
February 1, 2021 - August 31, 2022
Built ingestion pipelines using AWS Glue + Lambda to extract machine logs, IoT data, and production files (JSON/CSV/Excel) into S3 with automated error-handled workflows. Developed distributed PySpark transformations on EMR, converting raw machine signals into analytics-ready datasets for predictive maintenance and quality monitoring. Migrated legacy Hive SQL + on-prem ETL logic into Spark/Glue equivalents, significantly improving runtime and pipeline maintainability. Created real-time dashboards using QuickSight, integrating Redshift and S3 datasets to visualize machine health, production KPIs, defect patterns, and utilization metrics. Orchestrated batch and streaming workflows using Apache Airflow + Step Functions, managing dependencies and execution logic across multiple business divisions. Implemented infrastructure-as-code using Terraform to deploy S3 buckets, EMR clusters, Glue Jobs, IAM roles, and KMS encryption setups. Optimized SQL transformations in Redshift using distributio
Jr Data Engineer at HSBC
July 1, 2019 - January 31, 2021
Assisted senior engineers in developing foundational ETL pipelines using AWS Glue + Python, enabling automated ingestion from on-prem SQL servers into S3 and Redshift. Wrote SQL queries on Redshift and RDS to support data validation, profiling, reconciliation, and reporting tasks. Developed Python scripts using Boto3 for data cleaning, formatting, and transformation of financial datasets prior to Redshift loading. Supported the creation of QuickSight dashboards to monitor operational KPIs, customer activity, and data quality indicators. Participated in Agile standups, sprint planning, and documentation of pipeline logic using Git and Confluence.

Education

Master of Science in Data Science at Pace University
January 11, 2030 - January 7, 2026
Bachelor of Engineering at Nagpur University
January 11, 2030 - January 7, 2026

Qualifications

Add your qualifications or awards here.

Industry Experience

Financial Services, Software & Internet, Professional Services