Skills
Work Experience
Big Data Engineer at IBM/Truist Bank
November 1, 2024 - October 1, 2025Built and operated large-scale data pipelines on AWS using S3, Glue, EMR, Spark, and Snowflake, processing 2–3 million records daily and improving analytics data readiness by nearly 30%. Developed batch and incremental ETL workflows using PySpark, Informatica IICS, and SSIS, integrating data across Oracle, Teradata, HDFS, and Snowflake while stabilizing Autosys job schedules. Designed scalable dimensional and relational data models in Snowflake and Redshift using star and snowflake schemas, improving dashboard query performance by 35% for business and risk analytics teams. Wrote optimized SQL, T-SQL, PL/SQL, and Python scripts to support high-volume loads of 50–80 GB per cycle with SQL Loader, Teradata FastLoad, and MultiLoad. Tuned Spark and Informatica workloads through partitioning, lookup optimization, and parallel processing, shrinking ETL processing windows by 25–30%. Implemented data quality checks, SCD/CDC logic, and monitoring using CloudWatch, keeping warehouse layers o
Data Engineer at Capital One
February 1, 2024 - October 1, 2024Architected and maintained AWS-based data pipelines using Python, Spark, and Kafka to process 4–6 TB of daily transactions, delivering cleaner data to risk and fraud teams within minutes. Migrated ELT pipelines to Snowflake and S3 with Python and Spark, reducing manual prep time by 25% and saving 50+ hours weekly. Collaborated with data scientists and platform engineers to deploy ML-driven fraud features, contributing to 8–12% accuracy improvements. Implemented data profiling and quality checks across 300+ tables to detect schema changes and sensitive fields, supporting audits and AML compliance. Integrated lineage and alerting to improve reliability and reduce operational noise.
Data Engineer at Nanthealth
August 1, 2023 - November 1, 2023Coordinated with DBAs to validate new tables and metadata across DB2, SQL Server, and Oracle, enabling smooth migrations to AWS Aurora and reducing release issues by 20%. Migrated legacy workloads to AWS by configuring EC2, S3, RDS, and Glue jobs; automated ingestion with Lambda, Kinesis, and SQS, cutting manual effort by ~35%. Built PySpark, DBT, and Snowflake pipelines (SnowPipe/Streams) for JSON, CSV, and Parquet files, improving daily loads by 30%. Designed and tuned Qlik and QuickSight dashboards, managing refreshes and addressing performance to cut load times ~25%. Developed Python scripts and API utilities to move data between S3 and SQL Server, and supported data modeling in ERwin.
Data Engineer at Amazon Development Centre
August 1, 2020 - December 1, 2021Built and coordinated data pipelines across AWS Glue, EMR, Athena, Redshift, and S3, processing nearly 2–3 TB of daily data and boosting data availability for analytics teams by 30%. Created ELT workflows using dbt, PySpark, and SQL to load and transform datasets into clean dimensional models, reducing downstream report refresh times by 25–28%. Set up CI/CD automation with Jenkins, GitHub, Terraform, and Docker, reducing manual deployment effort by 40% and enabling pipeline changes in hours. Migrated older ETL infrastructure to cloud-based AWS services, resulting in 35% faster ETL processing and 15% cost reductions. Built Talend, Airflow, and PySpark jobs to load millions of records into S3 and Redshift, adding quality checks and CloudWatch monitoring that cut data issues by 15%.
SQL Developer at Byju’s Think & Learn Pvt Ltd
December 1, 2018 - May 1, 2020Managed daily SQL and PL/SQL development for student, course, and content data used by analytics teams and 1,500+ internal users. Structured relational models and optimized queries, indexes, and stored procedures, reducing average report load times by 25–35%. Built ETL workflows with SSIS, Informatica IICS, and Teradata tools to move and validate several million records per day across Oracle, SQL Server, AWS, and HDFS. Automated recurring data loads and quality checks using SQL Loader, UNIX shell scripts, and Autosys, cutting manual intervention by 40% and improving refresh consistency. Initiated schema documentation, SCD logic, and CDC updates, collaborating with analysts and developers to resolve data gaps and improve KPI reporting by ~18%.
Education
Master of Science in Business Analytics at University of Texas at Dallas, Richardson, TX, USA
January 11, 2030 - December 1, 2023Bachelor of Technology in Mechanical Engineering at VIT University, Vellore, Tamil Nadu, India
January 11, 2030 - May 1, 2017Qualifications
AWS Certified Cloud Practitioner
January 11, 2030 - January 29, 2026Industry Experience
Financial Services, Healthcare, Education, Professional Services, Software & Internet
Skills
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Addison today.