Available to hire
Hi, I’m Niharika Che, a data-focused Azure Data Engineer with 5+ years designing and implementing scalable, cloud-native data pipelines and analytics platforms on Azure. I specialize in building end-to-end ETL/ELT workflows using Azure Data Factory, Azure Databricks, Azure Synapse Analytics, ADLS Gen2, and serverless components to empower data-driven decision making.
I also thrive on governance, security, and collaboration across product, engineering, and analytics teams, delivering measurable improvements in data quality and reporting performance while enabling self-serve analytics for business users.
Skills
Work Experience
Data Engineer at Instacart
March 1, 2025 - PresentArchitected and maintained scalable Azure-based retail data pipelines, integrating 15+ data sources into Azure Data Lake Storage (ADLS) to improve data accessibility and analytical turnaround by 40%. Built end-to-end ELT pipelines using Azure Data Factory, Azure Databricks PySpark, and Azure Integration Runtime, with robust error handling and end-to-end logging. Implemented Azure Medallion Architecture (Bronze/Silver/Gold) with ADLS and Databricks to enforce schema validation, deduplication, and transformation standards, achieving 99% data reliability. Developed modular PySpark transformations to aggregate sales, manage product hierarchies, and segment customers, reducing redundant code by 30% and improving pipeline scalability. Modeled relational schemas in Azure SQL Database and Azure Synapse Analytics to support Power BI dashboards and downstream ML workloads. Automated and orchestrated workflows using ADF triggers and Apache Airflow, improving SLA compliance by 45% and enabling sta
Data Engineer at Molina Healthcare
April 1, 2022 - February 1, 2025Led modernization of legacy healthcare systems by migrating data to a secure cloud-based Azure data warehouse, improving accessibility, scalability, and reducing infrastructure costs by 30%. Designed and implemented real-time and batch ingestion using Apache Kafka with Azure services to integrate EHR, claims, and provider data at 99.9% reliability. Optimized ETL workflows in PySpark and Databricks to load multiple sources into a unified ADLS Gen2 data lake, improving data quality by 45% and reducing reporting time by 25%. Built scalable Databricks Spark pipelines to cleanse, standardize, and aggregate healthcare data, decreasing manual intervention. Created a centralized data lake on ADLS Gen2, integrated with Azure SQL/Synapse for analytics and Tableau/Power BI reporting, enabling a single source of truth for actuarial, clinical, and financial teams. Implemented data quality checks and automated rule-based validation to detect schema issues, duplicates, and missing values, boosting da
Data Engineer at PNC Financials
January 1, 2022 - March 1, 2022Transitioned PowerCenter mappings into PySpark jobs, enabling near real-time data processing for a cloud data warehouse across 10TB+ datasets and multiple domains. Orchestrated ETL pipelines using Python and Snowflake SnowSQL, integrating with Azure Synapse Pipelines and Azure Data Factory to enable cross-region data movement. Designed a scalable Azure architecture with Microsoft Purview DLP and robust monitoring via Azure Monitor and Log Analytics, supporting 5+ enterprise projects. Built custom processing pipelines using Apache Beam on Azure Batch to optimize storage and processing by 20%. Developed machine learning workflows with Azure ML and PySpark, improving customer behavior analysis for marketing insights, deployed with monitoring and 99.9% uptime. Implemented storage optimizations on ADLS Gen2 and Apache Iceberg to reduce cloud storage costs by 20%. Integrated ML workflows with PySpark for predictive analytics, improving decision-making for customer retention strategies.
Data Engineer at C-Edge Technologies
September 1, 2019 - December 1, 2020Designed and implemented a big data platform using Hadoop and Spark processing over 1 TB/day with 99.5% uptime on scalable Azure compute and storage. Developed ETL pipelines to improve data extraction by 35% and loading by 28%. Automated deployment of data infrastructure with Ansible and Terraform on Azure; deployed microservices on Kubernetes (AKS) with Docker for repeatable environments. Optimized data warehouse queries and indexes, achieving 45% performance improvement and 25% storage cost reduction. Implemented data quality monitoring with Azure-based tooling, reducing data anomalies by 60% and improving reporting accuracy. Built a self-service analytics platform using Tableau for 80+ business users, enabling 15% increase in data-driven decisions. Documented pipelines and schemas; supported Agile practices and CI/CD with Azure DevOps, improving team efficiency. Ensured security and compliance with encryption and data masking; adopted unified governance across assets.
Education
Master's degree at University of North Texas
January 11, 2030 - January 8, 2026Qualifications
AWS Certified Solutions Architect – Associate
January 11, 2030 - January 8, 2026Python for Data Science
January 11, 2030 - January 8, 2026Microsoft Certified Azure Fundamentals
January 11, 2030 - January 8, 2026Industry Experience
Software & Internet, Healthcare, Financial Services, Professional Services
Skills
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Denton today.