Built and operated large-scale data pipelines on AWS using S3, Glue, EMR, Spark, and Snowflake, processing 2–3 million records daily and improving analytics data readiness by nearly 30%. Developed batch and incremental ETL workflows using PySpark, Informatica IICS, and SSIS, integrating data across Oracle, Teradata, HDFS, and Snowflake while stabilizing Autosys job schedules. Designed scalable dimensional and relational data models in Snowflake and Redshift using star and snowflake schemas, improving dashboard query performance by 35% for business and risk analytics teams. Wrote optimized SQL, T-SQL, PL/SQL, and Python scripts to support high-volume loads of 50–80 GB per cycle with SQL Loader, Teradata FastLoad, and MultiLoad. Tuned Spark and Informatica workloads through partitioning, lookup optimization, and parallel processing, shrinking ETL processing windows by 25–30%. Implemented data quality checks, SCD/CDC logic, and monitoring using CloudWatch, keeping warehouse layers o

Data Engineer at Capital One

February 1, 2024 - October 1, 2024

Architected and maintained AWS-based data pipelines using Python, Spark, and Kafka to process 4–6 TB of daily transactions, delivering cleaner data to risk and fraud teams within minutes. Migrated ELT pipelines to Snowflake and S3 with Python and Spark, reducing manual prep time by 25% and saving 50+ hours weekly. Collaborated with data scientists and platform engineers to deploy ML-driven fraud features, contributing to 8–12% accuracy improvements. Implemented data profiling and quality checks across 300+ tables to detect schema changes and sensitive fields, supporting audits and AML compliance. Integrated lineage and alerting to improve reliability and reduce operational noise.

Data Engineer at Nanthealth

August 1, 2023 - November 1, 2023

Coordinated with DBAs to validate new tables and metadata across DB2, SQL Server, and Oracle, enabling smooth migrations to AWS Aurora and reducing release issues by 20%. Migrated legacy workloads to AWS by configuring EC2, S3, RDS, and Glue jobs; automated ingestion with Lambda, Kinesis, and SQS, cutting manual effort by ~35%. Built PySpark, DBT, and Snowflake pipelines (SnowPipe/Streams) for JSON, CSV, and Parquet files, improving daily loads by 30%. Designed and tuned Qlik and QuickSight dashboards, managing refreshes and addressing performance to cut load times ~25%. Developed Python scripts and API utilities to move data between S3 and SQL Server, and supported data modeling in ERwin.

Data Engineer at Amazon Development Centre

August 1, 2020 - December 1, 2021

Built and coordinated data pipelines across AWS Glue, EMR, Athena, Redshift, and S3, processing nearly 2–3 TB of daily data and boosting data availability for analytics teams by 30%. Created ELT workflows using dbt, PySpark, and SQL to load and transform datasets into clean dimensional models, reducing downstream report refresh times by 25–28%. Set up CI/CD automation with Jenkins, GitHub, Terraform, and Docker, reducing manual deployment effort by 40% and enabling pipeline changes in hours. Migrated older ETL infrastructure to cloud-based AWS services, resulting in 35% faster ETL processing and 15% cost reductions. Built Talend, Airflow, and PySpark jobs to load millions of records into S3 and Redshift, adding quality checks and CloudWatch monitoring that cut data issues by 15%.

SQL Developer at Byju’s Think & Learn Pvt Ltd

December 1, 2018 - May 1, 2020

Managed daily SQL and PL/SQL development for student, course, and content data used by analytics teams and 1,500+ internal users. Structured relational models and optimized queries, indexes, and stored procedures, reducing average report load times by 25–35%. Built ETL workflows with SSIS, Informatica IICS, and Teradata tools to move and validate several million records per day across Oracle, SQL Server, AWS, and HDFS. Automated recurring data loads and quality checks using SQL Loader, UNIX shell scripts, and Autosys, cutting manual intervention by 40% and improving refresh consistency. Initiated schema documentation, SCD logic, and CDC updates, collaborating with analysts and developers to resolve data gaps and improve KPI reporting by ~18%.