I’m Snigdha Chanda, a Sr. Data Engineer with around 10 years of IT experience specializing in Spark, Hadoop, and cloud-based data pipelines. I have hands-on expertise in Python, Scala, and Java, building scalable data solutions using Spark, Hive, Sqoop, Oozie, Kafka, NiFi, and more. I have designed and implemented data architectures, ETL processes, and real-time analytics across AWS, GCP, and Azure. I enjoy collaborating with analytics and product teams to translate business requirements into robust data pipelines and dashboards, leveraging Databricks and Snowflake for advanced analytics, and automating workflows with Airflow and Oozie. I am comfortable in Agile environments and have experience spinning up EMR clusters and working across on-prem and cloud platforms.

Snigdha Chanda

I’m Snigdha Chanda, a Sr. Data Engineer with around 10 years of IT experience specializing in Spark, Hadoop, and cloud-based data pipelines. I have hands-on expertise in Python, Scala, and Java, building scalable data solutions using Spark, Hive, Sqoop, Oozie, Kafka, NiFi, and more. I have designed and implemented data architectures, ETL processes, and real-time analytics across AWS, GCP, and Azure. I enjoy collaborating with analytics and product teams to translate business requirements into robust data pipelines and dashboards, leveraging Databricks and Snowflake for advanced analytics, and automating workflows with Airflow and Oozie. I am comfortable in Agile environments and have experience spinning up EMR clusters and working across on-prem and cloud platforms.

Available to hire

I’m Snigdha Chanda, a Sr. Data Engineer with around 10 years of IT experience specializing in Spark, Hadoop, and cloud-based data pipelines. I have hands-on expertise in Python, Scala, and Java, building scalable data solutions using Spark, Hive, Sqoop, Oozie, Kafka, NiFi, and more. I have designed and implemented data architectures, ETL processes, and real-time analytics across AWS, GCP, and Azure.

I enjoy collaborating with analytics and product teams to translate business requirements into robust data pipelines and dashboards, leveraging Databricks and Snowflake for advanced analytics, and automating workflows with Airflow and Oozie. I am comfortable in Agile environments and have experience spinning up EMR clusters and working across on-prem and cloud platforms.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Work Experience

Sr. Data Engineer at Nike
January 1, 2025 - November 4, 2025
Involved in building and optimizing data pipelines, architectures and data sets using PySpark. Developed multiple ETL processes, including complex transformations such as joins and aggregations to populate analytical tables. Integrated Azure Databricks with Azure Data Lake Storage to perform scalable data processing and advanced analytics. Developed ETL pipelines in and out of data warehouse using combination of Python and Snowflake SnowSQL, writing SQL queries against Snowflake. Collaborated with analytics, visualization and product teams to assist in building and optimizing products. Utilized Azure Databricks for real-time data integration and processing, leveraging Apache Spark for distributed data computing. Exported data from Hive into Snowflake to support data visualization. Configured and maintained Amazon S3 buckets, including setting bucket policies; leveraged S3 and Glacier for data backup and archival purposes. Optimized ETL processes in Azure Databricks using Spark optimiza
Data Engineer at Walmart
December 1, 2024 - December 1, 2024
Involved in building real-time data ingesting pipelines using NiFi. Designed and implemented scalable data flow architectures using Google Cloud Dataflow and Pub/Sub, enabling seamless batch and real-time data processing across distributed systems. Built CI/CD pipelines for deployment of PySpark scripts and DAG files, using Jenkins. Worked on creating pipelines to ingest data from Hadoop to GCP BigQuery. Designed and built data pipelines using Spark, supporting multiple lines of business to generate key operational metrics. Worked on formulating business metric queries on GCP BigQuery and validating across multiple datasets. Orchestrated complex data flow pipelines in Cloud Composer (Apache Airflow), automating data ingestion, transformation, and delivery across multiple GCP services such as BigQuery, Cloud Storage, and Data Catalog. Developed Hive and Presto scripts to generate weekly and monthly performance reports and leveraged Spark SQL/DataFrames to conduct ad-hoc analysis on Hive
Hadoop Developer at OTSI Global
March 1, 2022 - March 1, 2022
Determining the viability of a business problem for a Big Data solution. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala. Imported millions of structured data records from relational databases using Sqoop import to process. Developed Spark code using Scala and Spark-SQL for faster testing and data processing. Created and worked Sqoop jobs with incremental load to populate Hive External tables. Developed mappings using data processor transformation to load data from Word/PDF documents to HDFS. Solved performance issues in Hive scripts with understanding of joins, groupings and aggregations. Employed Azure Databricks for advanced data modeling and transformation of sales data, creating predictive models to forecast sales trends and customer behavior. Fetching the HQL results into CSV files and handing over to reporting team. Collaborated with team engineers to produce high quality code using Agile software development. Worked on POC for strea
ETL Developer at Amity Soft Technologies Pvt Ltd
July 1, 2018 - July 1, 2018
Developed batch jobs in AWS Glue and executed workflows on EMR. Built data lake using S3 for structured and semi-structured customer data. Created Athena queries for ad hoc access to raw logs and marketing datasets. Built real-time ingestion streams using Kinesis and transformed with Lambda. Designed Redshift schema optimized for business reporting. Performed data extraction from RDBMS using Sqoop and loaded into Hive. Developed and maintained Cassandra and MongoDB instances. Integrated third-party APIs for transactional data ingestion. Scheduled workflows using Oozie and Airflow. Created and published dashboards in Tableau. Migrated data workflows from on-prem SSIS to AWS-native tools. Built automated tests for transformation rules in Python. Improved query performance by 40% using partition pruning and compression. Implemented data encryption and VPC access to secure data pipelines. Built CI/CD pipelines using Jenkins for data workflows. Conducted unit and UAT testing for ETL pipelin

Education

Add your educational history here.

Qualifications

Databricks Certified Data Engineer
January 11, 2030 - November 4, 2025

Industry Experience

Software & Internet, Professional Services