Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.…I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

Arun M. Prasad

Data Scientist, Data Analyst, Full Stack Developer, +5





I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.…I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

Available to hire

I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives.

I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

Skills

Experience Level

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Expert

Intermediate

Language

English

Fluent

Work Experience

Senior Data Engineer at NTT DATA BUSINESS SOLUTIONS

November 1, 2023 - November 1, 2023

Designed and implemented ETL pipeline using AWS Glue and PySpark to extract structured and semi-structured data from multiple source systems, cleanse and normalize it, and load curated datasets into Amazon S3 Data Lake zones (Raw → Clean → Product). Automated large-scale data transformations on AWS EMR clusters, optimizing Spark jobs for data processing through partitioning and Parquet compression. Integrated AWS Glue Data Catalog with Athena for ad-hoc SQL queries; Developed interactive dashboards in Amazon QuickSight; Implemented secure data governance with S3 bucket policies, Glue role-based access, and Athena query auditing through CloudTrail and IAM.

AI / Data Engineer at Monetary Authority of Singapore (MAS)

November 1, 2023 - November 3, 2025

Architected end-to-end AI/ML pipelines on Cloudera AI, enabling secure, collaborative access to data for enterprise teams. Built MLOps pipelines with CI/CD using GitHub Actions, Docker, and Kubernetes for scalable model deployment. Developed a PySpark Structured Streaming pipeline to process real-time data from Kafka topics with robust schema enforcement. Designed PySpark ETL pipelines ingesting multi-source data into Enterprise Data Lake and created Hive tables on cleaned Parquet foundations to accelerate AI/ML workflows. Leveraged Denodo for virtualized unified data views across heterogeneous systems.

Senior Data Engineer at Marina Bay Sands (MBS)

January 31, 2023 - January 31, 2023

Upgraded Hadoop cluster from CDH 5.14.4 to CDP 7.1.7; Resolved QA/Production issues. Designed and deployed a secure Confluent Kafka 7.2 cluster using Ansible, implementing SSL/TLS encryption, SASL/PLAIN authentication, and RBAC to safeguard data in motion and platform access. Built a real-time event streaming pipeline from Microsoft SQL Server to Confluent Kafka using Debezium CDC, transforming events with ksqlDB and delivering processed data to a SQL Data Warehouse via Kafka Sink connectors. Troubleshot and resolved QA/Prod issues to enhance platform reliability and data ingestion/processing.

Senior Data Engineer at Marina Bay Sands (MBS)

January 1, 2023 - January 1, 2023

Upgraded Hadoop cluster from CDH 5.14.4 to CDP 7.1.7; Resolved QA and Production issues. Designed and deployed a secure Confluent Kafka 7.2 cluster using Ansible, implementing SSL/TLS encryption, SASL/PLAIN authentication, and Role-Based Access Control (RBAC) to safeguard data in motion and platform access. Built a real-time event streaming pipeline from Microsoft SQL Server to Confluent Kafka using Debezium CDC, transforming events with ksqlDB and delivering processed data to a SQL Data Warehouse via Kafka Sink connectors. Troubleshot and resolved issues in QA and Production environments, enhancing platform reliability and ensuring seamless data ingestion, streaming, and processing.

Senior Data Engineer at GOVERNMENT SKILLSFUTURE SINGAPORE (SSG)

December 1, 2021 - December 1, 2021

Upgraded the Hadoop cluster from CDH 5.14.4 to CDH 6.3.0, ensuring platform stability, compatibility, and improved performance across the data ecosystem. Built and automated ETL data pipeline frameworks using Spark SQL, Sqoop, and Hive to extract, clean, transform, and load data from Oracle into the enterprise data lake. Led data lake implementation, migration, and end-to-end data flow automation, enabling data science teams to efficiently access curated datasets for model development and analytics.

Data Engineer at DBS, SINGAPORE

August 1, 2021 - August 1, 2021

Created Spark Streaming ingestion jobs (PySpark) to read data from Kafka topics, transform it, and load it into Hive. Developed dashboards to monitor mobile and internet banking feature health in real time. Successfully reduced the Hadoop small file count from 7.5 million to 2 million and collaborated with the SRE team to optimize long-running queries and improve overall cluster performance. Extracted MAS regulatory data and generated reusable data assets for the bank using AWS CloudFormation, Lambda, and S3. Implemented data flow automation, data lake migration, and CI/CD pipelines for banking operations.

Data Engineer at CIMB BANK, MALAYSIA

July 1, 2019 - July 1, 2019

Managed and monitored Hadoop cluster CDH 5.13, with all included services, including resolving ongoing issues in the data center environment. Managed Hadoop cluster, Spark jobs, and ETL pipelines for financial datasets. Built predictive models for credit card defaulters using PySpark and optimized Hive queries. Migrated hundreds of TBs to AWS S3, improving cluster performance and reducing small file issues.

Data Engineer at VODAFONE TELECOMMUNICATION, INDIA

September 1, 2017 - September 1, 2017

Archival data migration to bring huge volumes of data from Oracle DB into the Hadoop environment. Designing data models and optimizing queries on Hive and Spark SQL. Implemented partitioning and bucketing techniques in Hive to improve performance.

Software Consultant at STARHUB TELECOMMUNICATION, SINGAPORE

December 31, 2016 - December 31, 2016

Software consulting assignments delivering data/ETL solutions; helped design scalable data processing pipelines and analytics.

Software Consultant at STARHUB TELECOMMUNICATION, SINGAPORE

December 1, 2016 - December 1, 2016

Provided software consultancy on data and analytics initiatives, contributing to data integration, reporting, and platform enhancements across telecom projects.

Application Analyst at MANAGEMENT DEVELOPMENT INSTITUTE OF SINGAPORE

February 1, 2013 - February 1, 2013

Analytical and data-focused roles supporting enterprise applications and data workflows within the institute.

Siebel Administrator and EIM Developer at TATA CONSULTANCY SERVICES, INDIA

April 1, 2010 - April 1, 2010

Siebel administration and EIM development work supporting data integration and master data management initiatives.