Hi, I’m Mamatha Reddy. I’m a detail-oriented Data Engineer with 5+ years of experience designing and implementing Big Data and Analytics solutions across AWS, Azure, and GCP. I specialize in Databricks Lakehouse architecture, Delta Lake, Unity Catalog, Autoloader, and Structured Streaming, with deep hands-on expertise in Spark, Hadoop, and Python to build scalable data pipelines and reliable data platforms. Collaborating across healthcare and financial services domains, I focus on data quality, governance, and performance optimization, delivering near real-time analytics and governance-ready data assets. I’m adept at automating end-to-end data workflows, implementing secure CI/CD pipelines, and enabling proactive monitoring and observability to minimize downtime and ensure regulatory compliance.

Mamatha Reddy

Hi, I’m Mamatha Reddy. I’m a detail-oriented Data Engineer with 5+ years of experience designing and implementing Big Data and Analytics solutions across AWS, Azure, and GCP. I specialize in Databricks Lakehouse architecture, Delta Lake, Unity Catalog, Autoloader, and Structured Streaming, with deep hands-on expertise in Spark, Hadoop, and Python to build scalable data pipelines and reliable data platforms. Collaborating across healthcare and financial services domains, I focus on data quality, governance, and performance optimization, delivering near real-time analytics and governance-ready data assets. I’m adept at automating end-to-end data workflows, implementing secure CI/CD pipelines, and enabling proactive monitoring and observability to minimize downtime and ensure regulatory compliance.

Available to hire

Hi, I’m Mamatha Reddy. I’m a detail-oriented Data Engineer with 5+ years of experience designing and implementing Big Data and Analytics solutions across AWS, Azure, and GCP. I specialize in Databricks Lakehouse architecture, Delta Lake, Unity Catalog, Autoloader, and Structured Streaming, with deep hands-on expertise in Spark, Hadoop, and Python to build scalable data pipelines and reliable data platforms.

Collaborating across healthcare and financial services domains, I focus on data quality, governance, and performance optimization, delivering near real-time analytics and governance-ready data assets. I’m adept at automating end-to-end data workflows, implementing secure CI/CD pipelines, and enabling proactive monitoring and observability to minimize downtime and ensure regulatory compliance.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Data Engineer at Molina Health Care
January 1, 2024 - November 7, 2025
Designed and implemented end-to-end Azure Data Factory (ADF) pipelines to ingest large-scale claims, enrollment, and provider datasets from on-prem and SaaS systems into Azure Data Lake Storage Gen2 and Snowflake, reducing manual data movement by 70%. Built a medallion-architecture data lakehouse in Azure Databricks integrating structured SQL Server, Snowflake, and unstructured healthcare data, enabling unified access for analytics and compliance reporting. Developed PySpark-based ETL frameworks to transform multi-million-record member and claims feeds into curated gold layers, improving Azure Synapse query performance by 45% and enabling near real-time Power BI dashboards. Implemented incremental CDC ingestion pipelines using ADF Mapping Data Flows and Delta Lake with dynamic datasets, reducing latency across 15+ domains. Implemented metadata-driven ADF templates with dynamic datasets, reducing redundancy across 200+ workflows. Optimized Spark cluster performance and Synapse queries;
Data Engineer at Edward Jones
December 31, 2023 - December 31, 2023
Developed and optimized large-scale data pipelines on Google Cloud Platform using Cloud Composer, Dataflow, and BigQuery to process financial datasets exceeding several terabytes, improving data availability by 65%. Orchestrated real-time ETL workflows with Pub/Sub, Dataflow, and BigQuery Streaming API to handle live trades and client orders, achieving sub-minute latency for dashboards. Built incremental ingestion frameworks with Apache Beam, dbt, and BigQuery for daily portfolio valuations and risk metrics, reducing execution time by 40%. Created parameterized Dataflow templates for ingesting data from CRM, brokerage feeds, and custodial APIs, reducing code duplication by 50%. Implemented data quality validation with Cloud Functions, Python, and BigQuery to detect schema drift, duplicates, and reconciliation mismatches, improving accuracy by 35%. Built analytical data marts in BigQuery and dbt to power KPIs like AUM growth, trade speed, and client retention; enabling near real-time Lo
Data Engineer at Chubb
July 31, 2022 - July 31, 2022
Client: Chubb (Hyderabad, India) — Developed automated ETL frameworks on AWS Glue integrated with Amazon S3 and Snowflake, enabling ingestion of high-volume policy and claims data and accelerating analytics. Built a serverless data lake and warehouse using AWS Lambda, S3, AWS Glue Catalog, and Snowflake to centralize actuarial and claims datasets. Created PySpark transformations in AWS Glue Studio and orchestrated Snowflake data loads, cleansing and enriching 10M+ daily transactions, improving Redshift and Snowflake query performance by 40%. Engineered incremental CDC pipelines using AWS DMS, Kinesis Data Streams, and Snowflake Streams & Tasks to achieve near-zero latency ingestion from on-prem SQL systems. Partitioned, clustered, and compressed data in Redshift, S3, and Snowflake to cut storage costs by 30%. Integrated AWS Step Functions and Snowflake Tasks to orchestrate data movement across services; Enforced data quality validation with Python and SQL using CloudWatch and Snowpip
Data Engineer at Cigniti Technologies
April 30, 2020 - April 30, 2020
Implemented data ingestion pipelines using AWS Glue, Lambda, and Kinesis to process real-time streaming data from multiple sources with minimal latency. Developed and optimized ETL workflows with AWS Glue Studio and PySpark for data transformation, cleansing, and schema standardization across large datasets. Managed data storage solutions using Amazon S3 with cost-efficient lifecycle management, partitioning, and version control for scalable data lakes. Built automated data validation and quality checks with AWS Lambda and CloudWatch, improving data reliability. Integrated AWS Redshift for analytical workloads, optimizing SQL queries and configuring distribution keys to enhance performance. Collaborated with cross-functional teams to deploy data solutions via AWS CodePipeline and CloudFormation, ensuring reproducibility and adherence to DevOps best practices. Implemented secure IAM roles, KMS encryption, and VPC configurations to maintain data governance and security standards.

Education

Add your educational history here.

Qualifications

Databricks Certified Data Engineer Associate Cert Prep: Data Governance
January 11, 2030 - November 7, 2025
Fundamentals of Data Transformation for Data Engineering
January 11, 2030 - November 7, 2025
Snowflake SnowPro Core Cert Prep
January 11, 2030 - November 7, 2025

Industry Experience

Healthcare, Financial Services, Professional Services, Software & Internet, Other