Involved in all phases of SDLC; developed and deployed scalable ETL pipelines in Databricks using PySpark and Delta Lake to process and transform large-scale data on AWS S3 and Azure Data Lake. Configured and managed Databricks clusters including job scheduling, autoscaling, and performance tuning for cost optimization. Developed Spark applications using PySpark and Spark SQL for data extraction, transformation and aggregation from multiple sources; implemented Hive UDFs, created external and staging tables with static and dynamic partitions and bucketing; migrated client Data Warehouse from on-premises to Azure; integrated streaming with Kafka; built end-to-end ETL and data movement solutions using Azure Data Factory and SSIS; created Oozie workflows to orchestrate Hive, Pig, Sqoop and Spark jobs; implemented auto-scaling for fault-tolerant systems; worked with HBase for loading large semi-structured data.

Data Engineer/ Spark at Lowe's

June 1, 2022 - August 1, 2024

Developed Spark jobs in Scala and Python on YARN/MRv2 for interactive and batch analysis; migrated data pipelines from on-premises to Azure; evaluated Spark performance on genomic data; built a NiFi workflow to ingest data from SFTP to Kafka; utilized Python libraries (NumPy, Pandas, NLP) for ETL and downstream NLP analyses; migrated MapReduce programs to Spark RDD/ DataFrame APIs; designed data flows using Azure Data Lake, Data Factory, and dashboards in Power BI; implemented real-time streaming with Kafka.

Data Engineer at Fifth Third Bank

February 1, 2019 - May 1, 2022

Administered Cloudera Hadoop clusters; analyzed Hadoop stack and tools (Pig, Hive, HBase, Sqoop); built Python-based visualizations with Tableau; implemented MapReduce-driven ETL across 20+ sources; moved data between HDFS and AWS Redshift; created Hive External Tables and HiveQL queries; validated data via custom MapReduce cleansing; facilitated data movements using SSIS and SSRS; managed cluster health with Zookeeper.

Data Engineer at Cybage Software Private Limited

July 1, 2017 - November 1, 2018

Developed MapReduce jobs in Java for data cleaning and pre-processing; installed and maintained Flume, Pig, Hive and HBase; collaborated with BI team on reporting; applied text mining to fraud patterns; used Sqoop to export/import data; built Hive queries and data loading pipelines; automated data loading with Oozie; supported data mining and clustering initiatives.

Junior Data Engineer at Dhruvsoft Services Private Limited

March 1, 2016 - June 1, 2017

Assisted in setting up a small Hadoop cluster; supported MapReduce log parsing and data transformations; wrote Hive scripts and Pig scripts for semi-structured web logs; imported data from MySQL into HDFS using Sqoop; monitored jobs and documented pipelines; contributed to data validation and reporting.