Hello! I’m a results-oriented Data Engineer with 4+ years of experience designing and optimizing scalable ETL/ELT pipelines, cloud-native data lakes, and data warehouses across AWS, GCP, and Azure. I excel in Python, SQL, and Spark for real-time and batch processing, cloud data architecture, and distributed computing. I’ve led data governance, quality assurance, and observability initiatives, mentored engineers, and collaborated with data scientists and analysts to deliver ML feature stores and cross-domain BI solutions. I thrive in fast-paced, remote-first teams and enjoy building robust platforms with tools like Datadog, Prometheus, and Apache Ranger.

Sai Venkata Anil Thota

Hello! I’m a results-oriented Data Engineer with 4+ years of experience designing and optimizing scalable ETL/ELT pipelines, cloud-native data lakes, and data warehouses across AWS, GCP, and Azure. I excel in Python, SQL, and Spark for real-time and batch processing, cloud data architecture, and distributed computing. I’ve led data governance, quality assurance, and observability initiatives, mentored engineers, and collaborated with data scientists and analysts to deliver ML feature stores and cross-domain BI solutions. I thrive in fast-paced, remote-first teams and enjoy building robust platforms with tools like Datadog, Prometheus, and Apache Ranger.

Available to hire

Hello! I’m a results-oriented Data Engineer with 4+ years of experience designing and optimizing scalable ETL/ELT pipelines, cloud-native data lakes, and data warehouses across AWS, GCP, and Azure. I excel in Python, SQL, and Spark for real-time and batch processing, cloud data architecture, and distributed computing.

I’ve led data governance, quality assurance, and observability initiatives, mentored engineers, and collaborated with data scientists and analysts to deliver ML feature stores and cross-domain BI solutions. I thrive in fast-paced, remote-first teams and enjoy building robust platforms with tools like Datadog, Prometheus, and Apache Ranger.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Language

English
Fluent

Work Experience

Data Engineer at Molina Healthcare
September 1, 2024 - October 31, 2025
Designed and maintained robust, scalable ETL/ELT pipelines in Apache Spark and Python on AWS Glue and Redshift, integrating structured and unstructured healthcare claims, EHR, and third-party API data for real-time analytics and reporting. Built and optimized cloud-based data lakes and warehouses for performance and cost efficiency, migrating legacy batch frameworks to streaming architectures on AWS, decreasing pipeline runtime by 70%. Engineered real-time streaming systems using Kafka and Pub/Sub, enabling analytics-ready datasets and feature stores for predictive modeling. Automated data quality assurance workflows in Apache Airflow, improving reliability and reducing manual effort by 30%. Implemented data governance and lineage tracking using Apache Ranger and Atlas, ensuring regulatory compliance and robust data observability. Developed and managed transformation workflows and orchestration in Airflow, standardizing integration of diverse datasets. Monitored workflows and SLAs via
Data Engineer at Snit Solutions Pvt. Ltd
June 1, 2023 - June 1, 2023
Developed scalable ETL pipelines in Snowflake and Python using Airflow, Prefect, and dbt, automating ingestion and transformation, increasing data throughput by 20%. Architected and optimized cloud data warehouses, transforming semi-structured (JSON, CSV) and batch/streaming datasets for global operational analytics. Built and managed data models, including fact and dimension tables using Star and Snowflake schemas, supporting analytical product delivery. Automated data quality validation and monitoring using Datadog and Prometheus, improving system observability and reliability. Orchestrated SCD workflows for historical tracking and improved audit transparency. Reduced pipeline workflow runtime by 25% through SQL optimization and distributed computing. Integrated real-time streaming systems and developed scalable APIs for external reporting. Authored documentation, reusable templates, and conducted best-practice workshops for junior engineers.
Data Analyst at Legato Health Technologies
January 1, 2020 - January 1, 2020
Automated operational analytics pipelines in Python and SQL, cutting manual effort by 40% and accelerating data deliveries by 20%. Developed interactive dashboards in Power BI and Excel to visualize and track major metrics for customer analytics. Cleaned, transformed, and modeled large multi-source datasets, improving data reliability by 35%. Optimized SQL queries for scalable reporting and analytical efficiency. Collaborated on enterprise warehouse integrations, contributing to foundational ETL design and data modeling.

Education

Master of Science in Computer Science at Texas A&M University - Corpus Christi
January 11, 2030 - October 31, 2025

Qualifications

Machine Learning — Coursera | Stanford Online (Andrew Ng)
January 11, 2030 - October 31, 2025
IBM Data Engineering Professional Certificate — Coursera (In Progress)
January 11, 2030 - October 31, 2025

Industry Experience

Healthcare, Software & Internet, Professional Services