Hi, I’m Sai Hosuru, a Data Engineer based in the Los Angeles area with 4 years of experience delivering large-scale, cloud-based data solutions across AWS, Azure, and GCP. I specialize in building scalable ETL/ELT pipelines, real-time streaming solutions, and scalable data warehouses using Databricks, Snowflake, BigQuery, Spark, and a broad cloud toolkit. I’ve led initiatives that power fraud detection, credit risk modeling, and real-time customer insights, all while optimizing performance and cost. Beyond engineering, I bring hands-on experience in modern data architectures—data lakes, Delta/Medallion frameworks, and distributed processing—using Airflow, ADF, Cloud Composer, Terraform, and orchestration tooling. I enjoy collaborating with cross-functional teams to enforce data governance and security, automate pipelines, and deliver production-ready data platforms that meet business needs and enable timely analytics.

Sai Hosuru

Hi, I’m Sai Hosuru, a Data Engineer based in the Los Angeles area with 4 years of experience delivering large-scale, cloud-based data solutions across AWS, Azure, and GCP. I specialize in building scalable ETL/ELT pipelines, real-time streaming solutions, and scalable data warehouses using Databricks, Snowflake, BigQuery, Spark, and a broad cloud toolkit. I’ve led initiatives that power fraud detection, credit risk modeling, and real-time customer insights, all while optimizing performance and cost. Beyond engineering, I bring hands-on experience in modern data architectures—data lakes, Delta/Medallion frameworks, and distributed processing—using Airflow, ADF, Cloud Composer, Terraform, and orchestration tooling. I enjoy collaborating with cross-functional teams to enforce data governance and security, automate pipelines, and deliver production-ready data platforms that meet business needs and enable timely analytics.

Available to hire

Hi, I’m Sai Hosuru, a Data Engineer based in the Los Angeles area with 4 years of experience delivering large-scale, cloud-based data solutions across AWS, Azure, and GCP. I specialize in building scalable ETL/ELT pipelines, real-time streaming solutions, and scalable data warehouses using Databricks, Snowflake, BigQuery, Spark, and a broad cloud toolkit. I’ve led initiatives that power fraud detection, credit risk modeling, and real-time customer insights, all while optimizing performance and cost.

Beyond engineering, I bring hands-on experience in modern data architectures—data lakes, Delta/Medallion frameworks, and distributed processing—using Airflow, ADF, Cloud Composer, Terraform, and orchestration tooling. I enjoy collaborating with cross-functional teams to enforce data governance and security, automate pipelines, and deliver production-ready data platforms that meet business needs and enable timely analytics.

See more

Language

English
Fluent

Work Experience

Data Engineer at BTIS
March 1, 2024 - July 1, 2025
Migrated legacy data pipelines to cloud data platforms (Databricks, Snowflake, Redshift; Delta Lake), built real-time streaming with Spark/PySpark, and designed scalable ETL/ELT pipelines across AWS, Azure, and GCP. Automated orchestration with Airflow and Cloud Composer; implemented data governance, security controls, and data quality checks using dbt and metadata management. Supported fraud detection, credit risk modeling, and real-time customer analytics.
Data Engineer at BTIS
March 1, 2024 - July 1, 2025
Project: Commercial Auto. Migrated legacy financial and operational data into modern cloud data warehouses using Databricks, Snowflake, AWS Redshift, and Delta Lake. Implemented real-time ingestion pipelines with Apache Spark, PySpark, and Kafka, and built scalable analytics dashboards via Power BI/Tableau. Optimized ETL throughput through partitioning, clustering, and caching; implemented data quality checks with dbt, validation rules, and metadata governance. Established near-real-time ingestion and reporting for finance and operations teams.
Data Engineer at BTIS
August 1, 2025 - Present
Led the GLAUG 2025 initiative to deliver a scalable, cross-cloud data platform (AWS, Azure, GCP) with end-to-end pipelines using Databricks, Spark, Snowflake, and Delta Lake. Implemented near real-time ingestion with Airflow, Cloud Composer, and Terraform-managed infrastructure; optimized data processing and governance with Lake Formation and dbt. Enabled fraud detection and real-time customer analytics at scale.

Education

Bachelor of Science in Biology (Specialization: Bioinformatics) at University of California, San Diego
January 11, 2030 - January 7, 2026

Qualifications

Bachelor of Science in Biology (Bioinformatics specialization)
January 11, 2030 - January 7, 2026

Industry Experience

Software & Internet, Financial Services, Professional Services, Media & Entertainment, Education