I'm Anjit Subedi, a Senior Data Engineer who designs and scales cloud-native data platforms to support healthcare and insurance use cases. With 6+ years of hands-on experience across AWS, Azure, and GCP, I architect modern lakehouses, real-time streaming, and ML-ready data pipelines that accelerate analytics and empower data science. I thrive on building observable, governed data products, optimizing costs, and ensuring compliance (HIPAA, GDPR, SOX, GxP). I enjoy mentoring teams and turning complex data challenges into repeatable, automated solutions.

Anjit Subedi

I'm Anjit Subedi, a Senior Data Engineer who designs and scales cloud-native data platforms to support healthcare and insurance use cases. With 6+ years of hands-on experience across AWS, Azure, and GCP, I architect modern lakehouses, real-time streaming, and ML-ready data pipelines that accelerate analytics and empower data science. I thrive on building observable, governed data products, optimizing costs, and ensuring compliance (HIPAA, GDPR, SOX, GxP). I enjoy mentoring teams and turning complex data challenges into repeatable, automated solutions.

Available to hire

I’m Anjit Subedi, a Senior Data Engineer who designs and scales cloud-native data platforms to support healthcare and insurance use cases. With 6+ years of hands-on experience across AWS, Azure, and GCP, I architect modern lakehouses, real-time streaming, and ML-ready data pipelines that accelerate analytics and empower data science.

I thrive on building observable, governed data products, optimizing costs, and ensuring compliance (HIPAA, GDPR, SOX, GxP). I enjoy mentoring teams and turning complex data challenges into repeatable, automated solutions.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Senior Data Engineer
December 1, 2022 - Present
Architected a multi-zone Medallion Lakehouse on AWS S3 with Delta Lake and Snowflake, enabling same-day curated datasets and delivering analytics that were 3 to 4 times faster for clinical and population-health use cases. Built streaming and batch pipelines using Kafka, AWS Glue, Step Functions, and Apache Airflow to deliver service-level-agnostic data sets and near real-time alerts for care management. Implemented end-to-end data lineage and governance with Unity Catalog and Microsoft Purview, reducing data discovery effort and increasing analyst productivity.
Data Engineer at Texas Mutual Insurance Company
February 1, 2020 - September 1, 2022
Led migration of Azure and GCP-based data workloads for financial and clinical analytics, reducing infra costs by ~20% while improving reliability and scalability for high-volume pipelines. Engineered distributed data pipelines with Spark on Databricks, Azure Data Factory, and Google Dataflow; integrated Kafka and Apache Flink streaming; reduced end-to-end processing time by 35% and increased pipeline reliability by 40%. Built governed data marts on Azure Synapse, Google BigQuery, and Amazon Redshift to improve reporting speed and governance. Established CI/CD pipelines with Terraform and implemented centralized alerting and automated failover to reduce manual intervention.
Data Engineer
January 1, 2019 - January 1, 2020
Built ETL pipelines on AWS Glue and Apache Spark for claims and underwriting data. Automated ingestion from legacy systems into S3 with resilient patterns, reducing integration time by 35% and lowering support effort. Published self-service data marts and governed access in Snowflake and Redshift using role-based access control, shortening analysts' cycle times and increasing data governance.
Senior Data Engineer at Collective Health
December 1, 2022 - Present
Architected a multi-zone Medallion Lakehouse on AWS S3 with Delta Lake and Snowflake enabling same-day curated data sets and analytics for clinical and population-health use cases. Built high-throughput Spark pipelines on Databricks for training, validation, and inference; implemented real-time data streaming pipelines using Kafka and Apache Flink; designed and deployed observable, governed data products with Lake Formation, IAM, and Great Expectations-style data quality checks. Standardized dbt transformations across claims, providers, and member domains; automated CI/CD for data pipelines using Terraform, GitHub Actions, Docker, and cloud DevOps. Onboarded 60+ data sources via a metadata-driven ingestion framework; created population-health KPI dashboards in QuickSight/Tableau to drive targeted interventions.
Data Engineer
February 1, 2020 - September 30, 2022
Implemented Azure data platform components (Data Factory, Data Lake, Synapse, Event Hubs, DevOps, ML), Google Cloud Platform BigQuery, Snowflake, Redshift, and Delta Lake to build and maintain enterprise data pipelines. Built multi-cloud data lakes with Spark and Databricks; improved end-to-end processing time by ~35% and increased pipeline reliability by ~40%. Delivered governed data marts on Azure Synapse, Google BigQuery, and AWS Redshift to support clinical operations and finance reporting. Onboarded 60+ data sources through metadata-driven ingestion; established centralized governance and metadata management; implemented data quality checks with Great Expectations. Implemented CI/CD pipelines with Azure DevOps, Terraform, Docker, and GitHub Actions; automated deployment for data products; tuned Spark SQL; reduced data latency; improved model training/inference readiness for ML initiatives.
Data Engineer at Texas Mutual Insurance Company
January 1, 2019 - January 1, 2020
Built ETL pipelines on AWS Glue and Apache Spark for claims and underwriting data, increasing processing speed by 30% and enabling timely actuarial and risk analytics. Delivered self-service data marts and governance in Snowflake and Redshift; onboarded legacy systems to S3 with resilient patterns; enforced encryption at rest and in transit via IAM/KMS. Codified data contracts and acceptance tests using dbt and Great Expectations, reducing post-release defects and improving data trust. Migrated prioritized pre-merge sprints to AWS-based pipelines, reducing maintenance costs and increasing nightly batch reliability. Created role-based access controls and automated data lineage to improve audit readiness.

Education

Master of Science in Data Science and Analytics at Grand Valley State University
January 11, 2030 - January 8, 2026
Master of Science in Data Science and Analytics at Grand Valley State University, Allendale, MI
January 11, 2030 - January 8, 2026
Master of Science in Data Science and Analytics at Grand Valley State University
January 11, 2030 - January 8, 2026

Qualifications

AWS Certified Data Engineer - Associate
January 11, 2030 - January 8, 2026
Databricks Certified Data Engineer - Associate
January 11, 2030 - January 8, 2026
Snowflake Advanced Data Engineer
January 11, 2030 - January 8, 2026
AWS Certified Data Engineer - Associate
January 11, 2030 - January 8, 2026
Databricks Certified Data Engineer - Associate
January 11, 2030 - January 8, 2026
Snowflake Pro Advanced: Data Engineer
January 11, 2030 - January 8, 2026
AWS Certified Data Engineer - Associate
January 11, 2030 - January 8, 2026
SnowPro Advanced: Data Engineer
January 11, 2030 - January 8, 2026

Industry Experience

Healthcare, Financial Services, Professional Services, Software & Internet, Other