Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.…I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

Nikhil Muneshwar

Data Scientist, Data Analyst, Full Stack Developer, +1





I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.…I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments. I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

Available to hire

I am a data professional with 2+ years of industry experience designing and maintaining scalable data pipelines that process over 5 TB per week in enterprise environments.

I am proficient in Python, Apache Spark, Airflow, Kafka, and AWS for building reliable ETL workflows, distributed systems, and cloud-based platforms. I have delivered ML solutions for OCR-based document processing, computer vision analytics, and RAG-driven LLM applications using LangChain and Bedrock, enabling real-time insights and production deployment. I have a strong background in data modeling, geospatial engineering, and end-to-end ML lifecycle management.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Intermediate

Intermediate

Intermediate

Intermediate

Language

English

Fluent

Work Experience

Data Engineer at RMSI

November 1, 2025 - Present

Processed HM Land Registry geospatial datasets, extracting and transforming data for integration into multiple GIS systems. Built automated ETL workflows for shapefiles, GML, and GeoJSON, ingesting 2+ TB of spatial archives. Engineered batch geoprocessing pipelines with schema mapping, CRS transformations, and attribute normalization across large cadastral datasets. Applied ML to 3,000+ scanned geospatial documents to extract map-based textual features. Designed scalable AWS-based data architectures for multi-GB to TB-scale spatial datasets and automated data transformations with AWS Glue. Implemented QA/QC across 20+ spatial attributes to align source records with GIS layers.

Data Engineer at ATDev

August 1, 2022 - December 1, 2023

Architected and maintained ETL pipelines processing over 5 TB of structured and semi-structured data weekly using Apache Airflow, Spark, and Kafka. Integrated 15+ data sources, consolidating data into centralized storage for analytics and predictive modeling. Optimized Spark transformations for multi-terabyte datasets with partitioning and caching. Built cloud-based data infrastructure using AWS S3, Redshift, and CloudWatch for scalable storage, workload scheduling, and monitoring. Tuned Redshift distribution keys and query plans for large analytical workloads. Containerized deployment workflows with Docker for consistent environments. Implemented automated data validation with schema enforcement across 20+ relational tables.

Data Engineer at Trinity Technolabs

June 1, 2021 - July 1, 2022

Designed and optimized ETL workflows for structured data ingestion, processing 3+ TB weekly using Apache Airflow, Spark, and Kafka. Centralized outputs from 15+ data sources into a data warehouse for reporting and analytics. Implemented real-time streaming pipelines using Kafka. Configured AWS infrastructure (S3, Redshift, CloudWatch) for scalable storage and monitoring. Improved Spark transformation efficiency through partitioning, memory optimization, and join performance. Established data validation protocols with schema enforcement across 20+ relational tables.