I'm Tarun Reddy, a Data Engineer with 7+ years of hands-on experience in analysis, design, development, testing, deployment, and maintenance of IT data projects. I have strong expertise in Hadoop ecosystem (HDFS, MapReduce, Pig, Hive, HBase), Spark (Scala and PySpark), and cloud platforms (AWS and Azure). I design and implement scalable data pipelines, perform data profiling and cleansing, and build data models using star and Snowflake schemas for data warehouses and BI dashboards. I enjoy turning complex data into actionable insights through Tableau and Power BI, automating pipelines with Airflow and Azure Data Factory, and collaborating in Agile teams. I leverage containerization (Docker) and CI/CD practices to deliver reliable solutions, and I continuously learn to stay ahead of evolving data technologies.

Tarun Reddy

I'm Tarun Reddy, a Data Engineer with 7+ years of hands-on experience in analysis, design, development, testing, deployment, and maintenance of IT data projects. I have strong expertise in Hadoop ecosystem (HDFS, MapReduce, Pig, Hive, HBase), Spark (Scala and PySpark), and cloud platforms (AWS and Azure). I design and implement scalable data pipelines, perform data profiling and cleansing, and build data models using star and Snowflake schemas for data warehouses and BI dashboards. I enjoy turning complex data into actionable insights through Tableau and Power BI, automating pipelines with Airflow and Azure Data Factory, and collaborating in Agile teams. I leverage containerization (Docker) and CI/CD practices to deliver reliable solutions, and I continuously learn to stay ahead of evolving data technologies.

Available to hire

I’m Tarun Reddy, a Data Engineer with 7+ years of hands-on experience in analysis, design, development, testing, deployment, and maintenance of IT data projects. I have strong expertise in Hadoop ecosystem (HDFS, MapReduce, Pig, Hive, HBase), Spark (Scala and PySpark), and cloud platforms (AWS and Azure). I design and implement scalable data pipelines, perform data profiling and cleansing, and build data models using star and Snowflake schemas for data warehouses and BI dashboards.

I enjoy turning complex data into actionable insights through Tableau and Power BI, automating pipelines with Airflow and Azure Data Factory, and collaborating in Agile teams. I leverage containerization (Docker) and CI/CD practices to deliver reliable solutions, and I continuously learn to stay ahead of evolving data technologies.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Data Engineer at American Airlines
May 1, 2023 - Present
Led development of Spark-based data transformations and loading into HDFS using RDD, DataFrames, and Datasets; developed Spark UDFs to deduplicate columns and identify alphanumeric variables. Designed and implemented Scala workflows to pull data from cloud-based systems and apply transformations. Built Spark Streaming application to ingest data from cloud to Hive. Created shell scripts to ingest data into HDFS with partitioning by Hive. Optimized Spark SQL queries and DataFrames operations; integrated data from Oracle, Netezza, and Teradata. Built data pipelines with Tableau visuals; performed data analysis and profiling using SQL across Oracle, Netezza and Teradata. Implemented data ingestion into Oracle via Airflow. Integrated Azure Data Factory with Blob Storage to move data through Databricks into Data Lake Storage and Azure SQL Data Warehouse. Created pipelines in Azure Data Factory to extract, transform, and load data from multiple sources. Performed data pre-processing and clean
Data Engineer at ADP
November 1, 2022 - April 1, 2023
GATHERED requirements with business users to capture evolving analytics needs. Developed Spark applications using Scala for enrichment of clickstream data merged with user profile data. Designed Spark workflows to pull data from AWS S3 and Snowflake, applying transformations and loading into Spark RDDs for in-memory processing. Built Simple to complex MapReduce jobs using Hive and designed data pipelines to support analytics. Implemented Apache Airflow DAGs to automate data ingestion and retrieval. Profiled structured, semi-structured, and unstructured data across sources to identify patterns and implemented data quality metrics with queries and Python scripts. Created Hive queries to process large datasets and stored results in managed/external tables. Prepared dashboards in Tableau for summarizing configuration, quotes, orders, and e-commerce data. Implemented Databricks notebooks in PySpark for data transformation and landed data into Delta/Lake storage; orchestrated data movement u
Data Engineer at TransKwik, India
July 1, 2018 - July 1, 2022
Gathered business requirements and designed data products. Developed Spark applications for large-scale transformations and denormalization of relational datasets. Converted Hive/SQL queries into Spark transformations using Scala and Python, and implemented UDFs for JSON/XML data handling. Wrote Terraform scripts to automate AWS services (ELB, CloudFront, RDS, EC2, S3) and deployed AWS Lambda via Terraform and CloudFormation. Ingested data from AWS S3 into Spark RDDs, performed transformations and actions, and created MapReduce equivalents for semi/unstructured data (XML, JSON, log files). Built NoSQL data modeling in HBase and integrated with S3, DynamoDB, and Snowflake. Created Tableau dashboards and participated in weekly SCRUMs within an Agile environment.

Education

Add your educational history here.

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Media & Entertainment

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
See more

Hire a Data Scientist

We have the best data scientist experts on Twine. Hire a data scientist in Aubrey today.