Integrated Azure Data Factory into the data engineering ecosystem, orchestrating and automating data movement tasks, data transformations, and workflows between on-premises sources, Azure storage, and Snowflake. Leveraged Azure Databricks and PySpark for complex data transformations, wrangling, and processing within the Azure cloud. Developed and optimized PySpark jobs for data cleansing, aggregation, and enrichment, ensuring data quality before loading into Snowflake's data warehouse. Designed and implemented database schemas, tables, views, and stored procedures to support data modeling and analysis. Conducted data quality assessments and data cleansing/validation. Participated in database migration projects upgrading SQL Server and migrating data to cloud-based PostgreSQL. Provided guidance to junior team members on SQL best practices and query optimization. Enhanced data pipelines by integrating ADF, Databricks, and PySpark; established robust, scalable pipelines and documentation;

Data Engineer at NTT Data

September 30, 2023 - September 30, 2023

Led migration of native Spark workloads to Databricks for Wells Fargo's data assets in a cloud-first environment. Implemented data governance via Databricks Catalog, metadata management, and data lineage. Applied Auto Compaction, Ordering, and Vacuuming to optimize Spark SQL data return times and storage efficiency. Designed cost-effective infrastructure configurations in Databricks to optimize resources and reduce cost. Used Massive Parallel Processing (MPP) in Spark SQL and PySpark to accelerate queries. Built an orchestration layer with Databricks Jobs and Azure Data Factory to automate data workflows, scheduling, and dependency management. Established CI/CD pipelines with Azure DevOps for Databricks code. Implemented near real-time data pipelines using Delta Lake and Structured Streaming. Enforced security in Delta Lake with access controls and data masking. Ingested data from on-premises systems, SAP, and Azure Data Lake Storage via ADF and PySpark. Enhanced metadata management an

Data Engineer at NTT Data

September 1, 2023 - September 1, 2023

Led migration of native Spark workloads to Databricks; Implemented data governance including Databricks Catalog, metadata management, and data lineage. Ensured regulatory compliance and governance across Wells Fargo assets. Implemented optimization techniques in Databricks (Auto Compaction, Ordering, Vacuum) to enhance data usability, storage efficiency, and query performance. Designed cost-effective infrastructure configurations in Databricks, optimizing resource utilization and minimizing costs. Utilized Massive Parallel Processing (MPP) in Spark SQL and PySpark. Developed orchestration with Databricks Jobs and Azure Data Factory for automated data workflows, scheduling, and dependency management. Implemented CI/CD pipelines for Databricks using Azure DevOps; near real-time data processing with Delta Lake and Structured Streaming; robust security in Delta Lake. Integrated data from on-premises, SAP, and cloud storage into Databricks using ADF and PySpark. Strengthened data governance

Data Engineer at Wipro

February 28, 2019 - February 28, 2019

Data migration from on-premise to AWS S3 using Transfer Family and RDS. Developed data pipelines to ingest data into Apache Druid from multiple data sources. Converted Hive/SQL queries into Spark transformations using Spark SQL and DataFrames. Deployed Hadoop clusters and Spark on Kubernetes. Created Python-based tools for ad-hoc data analysis and reporting; developed dashboards and reports. Integrated Python with big data frameworks for large-scale processing; developed PySpark scripts. Worked with HDFS Sink Connector to transfer data from Kafka topics to files on HDFS clusters. Built and administered a 10-node Kafka cluster and implemented Kafka MirrorMaker. Used Flink Streaming for real-time processing and deployed new APIs. Created automation scripts to onboard data applications in CI/CD pipelines with Jenkins and scheduled tasks with Airflow.