Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I'm a Data Engineer with over 5 years of experience building data pipelines, cloud platforms, and real-time analytics. I love solving data problems and turning complex datasets into reliable, scalable solutions. I enjoy working with Python, PySpark, and Databricks, and have hands-on experience across AWS, Azure, and GCP. I’ve designed and optimized ETL workflows with Airflow, DBT, and Terraform, and worked with Snowflake, BigQuery, and SQL for modeling and reporting. I'm familiar with Kafka, Flink, and Kinesis for streaming data, and I focus on creating clean, maintainable data systems that empower analytics and machine learning.…I'm a Data Engineer with over 5 years of experience building data pipelines, cloud platforms, and real-time analytics. I love solving data problems and turning complex datasets into reliable, scalable solutions. I enjoy working with Python, PySpark, and Databricks, and have hands-on experience across AWS, Azure, and GCP. I’ve designed and optimized ETL workflows with Airflow, DBT, and Terraform, and worked with Snowflake, BigQuery, and SQL for modeling and reporting. I'm familiar with Kafka, Flink, and Kinesis for streaming data, and I focus on creating clean, maintainable data systems that empower analytics and machine learning.

Kajol Shakya

Data Scientist, Full Stack Developer, Back-End Developer, +5





I'm a Data Engineer with over 5 years of experience building data pipelines, cloud platforms, and real-time analytics. I love solving data problems and turning complex datasets into reliable, scalable solutions. I enjoy working with Python, PySpark, and Databricks, and have hands-on experience across AWS, Azure, and GCP. I’ve designed and optimized ETL workflows with Airflow, DBT, and Terraform, and worked with Snowflake, BigQuery, and SQL for modeling and reporting. I'm familiar with Kafka, Flink, and Kinesis for streaming data, and I focus on creating clean, maintainable data systems that empower analytics and machine learning.…I'm a Data Engineer with over 5 years of experience building data pipelines, cloud platforms, and real-time analytics. I love solving data problems and turning complex datasets into reliable, scalable solutions. I enjoy working with Python, PySpark, and Databricks, and have hands-on experience across AWS, Azure, and GCP. I’ve designed and optimized ETL workflows with Airflow, DBT, and Terraform, and worked with Snowflake, BigQuery, and SQL for modeling and reporting. I'm familiar with Kafka, Flink, and Kinesis for streaming data, and I focus on creating clean, maintainable data systems that empower analytics and machine learning.

Available to hire

I’m a Data Engineer with over 5 years of experience building data pipelines, cloud platforms, and real-time analytics. I love solving data problems and turning complex datasets into reliable, scalable solutions.

I enjoy working with Python, PySpark, and Databricks, and have hands-on experience across AWS, Azure, and GCP. I’ve designed and optimized ETL workflows with Airflow, DBT, and Terraform, and worked with Snowflake, BigQuery, and SQL for modeling and reporting. I’m familiar with Kafka, Flink, and Kinesis for streaming data, and I focus on creating clean, maintainable data systems that empower analytics and machine learning.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Elasticsearch

Intermediate

Cassandra

Intermediate

Language

English

Fluent

Work Experience

Data Engineer at Johnson & Johnson

September 1, 2023 - Present

Designed scalable ETL pipelines using PySpark and Databricks, processing over 5TB of healthcare data daily from medical sources. Implemented real-time ingestion and analytics pipelines leveraging Flink, Kafka, and Kinesis; containerized microservices with Docker and Kubernetes; automated infrastructure provisioning with Terraform and CloudFormation. Ensured data security via IAM/RBAC and encryption. Built modular data pipelines with Airflow, DBT, and data-quality tooling to enable rapid data product delivery and governance across distributed health systems.

Data Engineer at Prudential Financial

October 1, 2021 - August 1, 2023

Designed scalable ETL pipelines using Databricks and Airflow to process insurance and financial data across 10+ domains; migrated legacy batch jobs to modular Spark-based workflows; established data models in Snowflake and BigQuery for KPI reporting; implemented data quality checks, RBAC, and data masking to support HIPAA and GDPR compliance.

Data Engineer at MetLife

August 1, 2020 - August 31, 2021

Built scalable ETL pipelines using PySpark and AWS Glue to ingest healthcare data from EHRs, eligibility feeds, and claims; implemented batch and streaming workflows with Kafka, Spark Structured Streaming, and Kinesis Firehose to improve real-time claims processing; migrated legacy ETL scripts to Spark-based pipelines and validated data quality with Great Expectations and SQL-driven checks; supported CI/CD and data governance.

Data Engineer at Prudential Financial, Inc.

October 1, 2021 - August 1, 2023

Designed scalable ETL pipelines using Azure Data Factory, Matillion, and Apache Airflow, automating ingestion from RDBMS and APIs into Azure Data Lake and BigQuery. Built modular data processing workflows with Databricks notebooks (PySpark) across 10+ business domains. Migrated on-prem workloads (SQL Server, Oracle, MongoDB) to cloud-native stores (Snowflake, BigQuery), reducing infra costs by ~40%. Supported hybrid cloud adoption by integrating legacy Azure pipelines with new datasets in GCP and event triggers via Cloud Pub/Sub. Developed data quality rules with Informatica Data Quality and automated validation checks with PySpark. Modeled data using Star and Snowflake schemas in Snowflake and Azure Synapse Analytics. Enabled data mesh practices with Azure Data Factory, DBT, and Power BI; built dashboards with Power BI/Looker; integrated Kafka and Azure Event Hubs for hybrid streaming; standardized SQL transformations with DBT; scheduled batch/incremental jobs with Matillion and Airfl

Data Engineer at MetLife, Inc.

August 1, 2020 - September 1, 2021

Built scalable ETL pipelines using PySpark and AWS Glue to ingest and transform healthcare data from EHR systems, eligibility databases, and claim feeds. Implemented batch and streaming data workflows with Kafka, Spark Structured Streaming, and Kinesis Firehose for real-time claims ingestion. Migrated legacy ETL scripts to modern Spark-based pipelines and performed performance tuning. Wrote transformation logic in dbt for analytics layers in Snowflake and Redshift. Created dashboards in Tableau and Power BI; automated data quality checks with Great Expectations and Python-based validations. Participated in CI/CD with Jenkins, GitHub Actions, and Terraform; managed schema evolution with AWS Glue Data Catalog and tagging for lineage visibility. Built basic API integrations with Flask and containerized services with Docker. Monitored workloads with CloudWatch and Grafana; contributed to data governance initiatives (RBAC, IAM, KMS encryption) and supported SageMaker model integration. Atte