na

venkat sainath

na

Available to hire

na

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Azure Data Engineer at IBM
October 1, 2023 - November 21, 2025
Integrated Azure Data Factory into the data engineering ecosystem, orchestrating and automating data movement tasks, data transformations, and workflows between on-premises sources, Azure storage, and Snowflake. Leveraged Azure Databricks and PySpark for complex data transformations, wrangling, and processing within the Azure cloud. Developed and optimized PySpark jobs for data cleansing, aggregation, and enrichment, ensuring data quality before loading into Snowflake's data warehouse. Designed and implemented database schemas, tables, views, and stored procedures to support data modeling and analysis. Conducted data quality assessments and data cleansing/validation. Participated in database migration projects upgrading SQL Server and migrating data to cloud-based PostgreSQL. Provided guidance to junior team members on SQL best practices and query optimization. Enhanced data pipelines by integrating ADF, Databricks, and PySpark; established robust, scalable pipelines and documentation;
Data Engineer at NTT Data
September 1, 2023 - September 1, 2023
Led migration of native Spark workloads to Databricks; Implemented data governance including Databricks Catalog, metadata management, and data lineage. Ensured regulatory compliance and governance across Wells Fargo assets. Implemented optimization techniques in Databricks (Auto Compaction, Ordering, Vacuum) to enhance data usability, storage efficiency, and query performance. Designed cost-effective infrastructure configurations in Databricks, optimizing resource utilization and minimizing costs. Utilized Massive Parallel Processing (MPP) in Spark SQL and PySpark. Developed orchestration with Databricks Jobs and Azure Data Factory for automated data workflows, scheduling, and dependency management. Implemented CI/CD pipelines for Databricks using Azure DevOps; near real-time data processing with Delta Lake and Structured Streaming; robust security in Delta Lake. Integrated data from on-premises, SAP, and cloud storage into Databricks using ADF and PySpark. Strengthened data governance
Data Engineer at Wipro
February 28, 2019 - February 28, 2019
Data migration from on-premise to AWS S3 using Transfer Family and RDS. Developed data pipelines to ingest data into Apache Druid from multiple data sources. Converted Hive/SQL queries into Spark transformations using Spark SQL and DataFrames. Deployed Hadoop clusters and Spark on Kubernetes. Created Python-based tools for ad-hoc data analysis and reporting; developed dashboards and reports. Integrated Python with big data frameworks for large-scale processing; developed PySpark scripts. Worked with HDFS Sink Connector to transfer data from Kafka topics to files on HDFS clusters. Built and administered a 10-node Kafka cluster and implemented Kafka MirrorMaker. Used Flink Streaming for real-time processing and deployed new APIs. Created automation scripts to onboard data applications in CI/CD pipelines with Jenkins and scheduled tasks with Airflow.
Data Engineer at NTT Data
September 30, 2023 - September 30, 2023
Led migration of native Spark workloads to Databricks for Wells Fargo's data assets in a cloud-first environment. Implemented data governance via Databricks Catalog, metadata management, and data lineage. Applied Auto Compaction, Ordering, and Vacuuming to optimize Spark SQL data return times and storage efficiency. Designed cost-effective infrastructure configurations in Databricks to optimize resources and reduce cost. Used Massive Parallel Processing (MPP) in Spark SQL and PySpark to accelerate queries. Built an orchestration layer with Databricks Jobs and Azure Data Factory to automate data workflows, scheduling, and dependency management. Established CI/CD pipelines with Azure DevOps for Databricks code. Implemented near real-time data pipelines using Delta Lake and Structured Streaming. Enforced security in Delta Lake with access controls and data masking. Ingested data from on-premises systems, SAP, and Azure Data Lake Storage via ADF and PySpark. Enhanced metadata management an

Education

Master's in IT & Systems at Australia
January 11, 2030 - November 21, 2025

Qualifications

Bachelor's in Electronics & Communication, Engineering
January 11, 2030 - November 21, 2025
Bachelor's in Electronics & Communication Engineering
January 11, 2030 - November 21, 2025
Masters in IT & Systems
January 11, 2030 - November 21, 2025

Industry Experience

Software & Internet, Professional Services, Financial Services