Hi, I’m Anjana Jha. I’m a Lead Big Data Engineer with 12+ years of experience delivering end-to-end data platforms that combine streaming, batch, ML, and analytics. I’ve helped financial services, healthcare, energy, and telecom organizations modernize legacy systems and migrate workloads to the cloud. I thrive on turning complex requirements into secure, scalable solutions using cloud-native, hybrid architectures across AWS, Azure, GCP, and on-prem Hadoop. I enjoy mentoring teams, collaborating with analysts and data scientists, and continuously improving data governance, security, and operational excellence.

Anjana Jha

Hi, I’m Anjana Jha. I’m a Lead Big Data Engineer with 12+ years of experience delivering end-to-end data platforms that combine streaming, batch, ML, and analytics. I’ve helped financial services, healthcare, energy, and telecom organizations modernize legacy systems and migrate workloads to the cloud. I thrive on turning complex requirements into secure, scalable solutions using cloud-native, hybrid architectures across AWS, Azure, GCP, and on-prem Hadoop. I enjoy mentoring teams, collaborating with analysts and data scientists, and continuously improving data governance, security, and operational excellence.

Available to hire

Hi, I’m Anjana Jha. I’m a Lead Big Data Engineer with 12+ years of experience delivering end-to-end data platforms that combine streaming, batch, ML, and analytics. I’ve helped financial services, healthcare, energy, and telecom organizations modernize legacy systems and migrate workloads to the cloud.

I thrive on turning complex requirements into secure, scalable solutions using cloud-native, hybrid architectures across AWS, Azure, GCP, and on-prem Hadoop. I enjoy mentoring teams, collaborating with analysts and data scientists, and continuously improving data governance, security, and operational excellence.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
See more

Language

English
Fluent

Work Experience

Lead Cloud & Data Engineer at Exelon Corporation
January 1, 2023 - Present
Architected secure, high-throughput streaming pipelines using Apache Kafka (on-prem) and Azure Event Hubs to ingest real-time meter and grid telemetry data (>150K events/sec) with minimal latency. Implemented end-to-end encryption and access controls (TLS, SASL, and Azure RBAC). Built a centralized data lake on Azure Blob Storage Gen2 and curated warehouse layers in Azure Synapse, enabling sub-second queries and <5-minute SLA for analytics. Optimized Databricks Spark ETL to process streaming data into analytics-ready tables, cutting batch times ~40% while handling >1 PB of historical data. Orchestrated pipelines with Airflow and Azure Data Factory; implemented data lineage; performed large-scale backfills on autoscaling Databricks clusters with spot instances, reducing compute costs by ~50%. Enforced governance with Azure AD RBAC and NSGs; codified infrastructure with Terraform and ARM templates; monitored end-to-end with Azure Monitor. Implemented serverless Azure Functions to automat
Senior Data & Cloud Engineer at Ameriprise Financial
May 1, 2020 - December 1, 2022
Designed end-to-end ETL pipelines using Azure Data Factory and Azure Databricks (PySpark/Scala) to ingest, cleanse, and standardize data into an Azure Data Lake. Optimized Spark jobs with partitioning and caching, reducing daily processing times by >40%. Modeled data warehouses in Azure Synapse Analytics, later integrating with Snowflake hosted on AWS for advanced analytics. Implemented secure data access policies across clouds (Azure RBAC and AWS IAM/KMS) with encryption in transit and at rest; automated policy audits via CI/CD. Established proactive monitoring with Azure Monitor and AWS CloudWatch; automated deployments with Azure DevOps, Terraform, and ARM; trained teams on Databricks and Snowflake, boosting release velocity by ~60%.
Senior Cloud & Data Engineer at US Cellular
February 1, 2018 - April 1, 2020
Designed ETL with AWS Glue, EMR, and PySpark/Scala transforming TBs of telecom data daily. Built real-time streaming pipelines with Kafka + Spark Streaming on EMR for low-latency insights. Designed data lakes on S3 with partitioning; enabled efficient querying via Athena and Redshift Spectrum; modeled enterprise data warehouses in Redshift. Implemented AWS Step Functions and Data Pipeline for orchestration with robust error handling. Monitored Spark jobs, crawlers, and query performance with CloudWatch; delivered Power BI dashboards backed by Redshift and Athena. Authored runbooks and best practices; achieved ~50% ETL runtime reduction and supported telecom compliance.
Data Engineer at Amgen
February 1, 2015 - January 1, 2018
Designed scalable ingestion pipelines with Kafka and Pub/Sub; PySpark on Dataproc and Dataflow to clean and transform petabyte-scale genomic and clinical trial data. Used Cloud Dataprep for pre-processing and anomaly detection. Integrated on-prem HDFS/Hive/NiFi with GCP services during phased cloud migration. Modeled schemas in BigQuery for low-latency multi-terabyte queries; deployed Bigtable and HBase for scalable semi-structured storage; implemented Cloud Functions for event-driven transformations. Wrote Spark SQL queries for financial and clinical trial extraction; implemented CI/CD with Jenkins and Git. Tuned Kafka clusters for throughput and exactly-once ingestion. Ensured data governance with IAM and VPC Service Controls; delivered BI dashboards in Tableau/Power BI for compliance and research.
Hadoop & Big Data Engineer at Nike
January 1, 2014 - January 1, 2015
Administered Hadoop clusters (Cloudera CDH, Hortonworks HDP) and migrated MapReduce to Spark (PySpark/Scala) for retail analytics. Built high-throughput Kafka pipelines for real-time e-commerce events with secure ACLs. Designed data lakes on S3 with partitioned schemas; queried via Athena and Redshift Spectrum. Modeled warehouses in Redshift; delivered dashboards in Power BI. Implemented Oozie-based ETL, Hive and Impala for interactive analytics; built disaster recovery and data quality checks; established CI/CD pipelines with Jenkins and Git; performed data migrations to ensure resilience and scalability.
BI Developer at Cosm Inc.
March 1, 2013 - December 1, 2013
BI Developer supporting a major financial services client. Collected, cleaned, and integrated data from multiple sources; built Tableau dashboards; performed data analysis to identify trends and opportunities; developed predictive models in Python to forecast churn and assess credit risk.

Education

Add your educational history here.

Qualifications

Master of Science in Data Science and Computational Intelligence
January 11, 2030 - January 13, 2026
Bachelor of Computer Applications (BCA)
January 11, 2030 - January 13, 2026
Cloudera Certified Associate (CCA) Data Analyst
January 11, 2030 - January 13, 2026
Confluent Certified Developer for Apache Kafka (CCDAK) Associate
January 11, 2030 - January 13, 2026
Master of Science in Data Science and Computational Intelligence
January 11, 2030 - January 13, 2026
Bachelor of Computer Applications (BCA)
January 11, 2030 - January 13, 2026
Cloudera Certified Associate (CCA) Data Analyst
January 11, 2030 - January 13, 2026
Confluent Certified Developer for Apache Kafka (CCDAK) Associate
January 11, 2030 - January 13, 2026

Industry Experience

Energy & Utilities, Financial Services, Healthcare, Retail, Telecommunications, Life Sciences, Software & Internet, Professional Services, Manufacturing