I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

Nikhil Muneshwar

I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

Available to hire

I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments.

I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Data Engineer at RMSI
November 1, 2025 - Present
Processed HM Land Registry geospatial datasets, extracting and transforming data for integration into multiple GIS systems. Built automated ETL workflows for shapefiles, GML, and GeoJSON, ingesting 2+ TB of spatial archives. Engineered batch geoprocessing pipelines with schema mapping, CRS transformations, and attribute normalization across large cadastral datasets. Applied ML to 3,000+ scanned geospatial documents to extract map-based textual features. Designed scalable AWS-based data architectures for multi-GB to TB-scale spatial datasets and automated data transformations with AWS Glue. Implemented QA/QC across 20+ spatial attributes to align source records with GIS layers.
Data Engineer at ATDev
August 1, 2022 - December 1, 2023
Architected and maintained ETL pipelines processing over 5 TB of structured and semi-structured data weekly using Apache Airflow, Spark, and Kafka. Integrated 15+ data sources, consolidating data into centralized storage for analytics and predictive modeling. Optimized Spark transformations for multi-terabyte datasets with partitioning and caching. Built cloud-based data infrastructure using AWS S3, Redshift, and CloudWatch for scalable storage, workload scheduling, and monitoring. Tuned Redshift distribution keys and query plans for large analytical workloads. Containerized deployment workflows with Docker for consistent environments. Implemented automated data validation with schema enforcement across 20+ relational tables.
Data Engineer at Trinity Technolabs
June 1, 2021 - July 1, 2022
Designed and optimized ETL workflows for structured data ingestion, processing 3+ TB weekly using Apache Airflow, Spark, and Kafka. Centralized outputs from 15+ data sources into a data warehouse for reporting and analytics. Implemented real-time streaming pipelines using Kafka. Configured AWS infrastructure (S3, Redshift, CloudWatch) for scalable storage and monitoring. Improved Spark transformation efficiency through partitioning, memory optimization, and join performance. Established data validation protocols with schema enforcement across 20+ relational tables.

Education

MSc in Data Science at Kingston University, UK
January 1, 2024 - July 1, 2025
B.Tech in Information Technology at Vishwakarma Institute of Technology, India
August 1, 2017 - May 1, 2021

Qualifications

Deloitte - Data Analytics Job Simulation
January 11, 2030 - April 29, 2026
British Airways - Data Science Job Simulation
January 11, 2030 - April 29, 2026

Industry Experience

Software & Internet, Professional Services, Education

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more