I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

Arun M. Prasad

I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives. I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

Available to hire

I am Arun M. Prasad, a Senior Data Engineer with 10 years of experience delivering enterprise-scale ETL/ELT pipelines, data lakes, AI/ML workflows, and MLOps across financial services, banking, government, telecom, and hospitality. I design scalable, secure data platforms that accelerate business insights and AI initiatives.

I thrive on turning complex data into reliable, governed assets and collaborating with cross-functional teams to enable data-driven decisions. My hands-on expertise spans streaming pipelines (Kafka/Confluent), Spark, AWS, and data governance, with a strong focus on regulatory compliance and production-grade MLOps.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
See more

Language

English
Fluent

Work Experience

AI / Data Engineer at Monetary Authority of Singapore (MAS)
November 1, 2023 - November 3, 2025
Architected end-to-end AI/ML pipelines on Cloudera AI, enabling secure, collaborative access to data for enterprise teams. Built MLOps pipelines with CI/CD using GitHub Actions, Docker, and Kubernetes for scalable model deployment. Developed a PySpark Structured Streaming pipeline to process real-time data from Kafka topics with robust schema enforcement. Designed PySpark ETL pipelines ingesting multi-source data into Enterprise Data Lake and created Hive tables on cleaned Parquet foundations to accelerate AI/ML workflows. Leveraged Denodo for virtualized unified data views across heterogeneous systems.
Senior Data Engineer at NTT DATA BUSINESS SOLUTIONS
November 1, 2023 - November 1, 2023
Designed and implemented ETL pipeline using AWS Glue and PySpark to extract structured and semi-structured data from multiple source systems, cleanse and normalize it, and load curated datasets into Amazon S3 Data Lake zones (Raw → Clean → Product). Automated large-scale data transformations on AWS EMR clusters, optimizing Spark jobs for data processing through partitioning and Parquet compression. Integrated AWS Glue Data Catalog with Athena for ad-hoc SQL queries; Developed interactive dashboards in Amazon QuickSight; Implemented secure data governance with S3 bucket policies, Glue role-based access, and Athena query auditing through CloudTrail and IAM.
Senior Data Engineer at Marina Bay Sands (MBS)
January 1, 2023 - January 1, 2023
Upgraded Hadoop cluster from CDH 5.14.4 to CDP 7.1.7; Resolved QA and Production issues. Designed and deployed a secure Confluent Kafka 7.2 cluster using Ansible, implementing SSL/TLS encryption, SASL/PLAIN authentication, and Role-Based Access Control (RBAC) to safeguard data in motion and platform access. Built a real-time event streaming pipeline from Microsoft SQL Server to Confluent Kafka using Debezium CDC, transforming events with ksqlDB and delivering processed data to a SQL Data Warehouse via Kafka Sink connectors. Troubleshot and resolved issues in QA and Production environments, enhancing platform reliability and ensuring seamless data ingestion, streaming, and processing.
Senior Data Engineer at GOVERNMENT SKILLSFUTURE SINGAPORE (SSG)
December 1, 2021 - December 1, 2021
Upgraded the Hadoop cluster from CDH 5.14.4 to CDH 6.3.0, ensuring platform stability, compatibility, and improved performance across the data ecosystem. Built and automated ETL data pipeline frameworks using Spark SQL, Sqoop, and Hive to extract, clean, transform, and load data from Oracle into the enterprise data lake. Led data lake implementation, migration, and end-to-end data flow automation, enabling data science teams to efficiently access curated datasets for model development and analytics.
Data Engineer at DBS, SINGAPORE
August 1, 2021 - August 1, 2021
Created Spark Streaming ingestion jobs (PySpark) to read data from Kafka topics, transform it, and load it into Hive. Developed dashboards to monitor mobile and internet banking feature health in real time. Successfully reduced the Hadoop small file count from 7.5 million to 2 million and collaborated with the SRE team to optimize long-running queries and improve overall cluster performance. Extracted MAS regulatory data and generated reusable data assets for the bank using AWS CloudFormation, Lambda, and S3. Implemented data flow automation, data lake migration, and CI/CD pipelines for banking operations.
Data Engineer at CIMB BANK, MALAYSIA
July 1, 2019 - July 1, 2019
Managed and monitored Hadoop cluster CDH 5.13, with all included services, including resolving ongoing issues in the data center environment. Managed Hadoop cluster, Spark jobs, and ETL pipelines for financial datasets. Built predictive models for credit card defaulters using PySpark and optimized Hive queries. Migrated hundreds of TBs to AWS S3, improving cluster performance and reducing small file issues.
Data Engineer at VODAFONE TELECOMMUNICATION, INDIA
September 1, 2017 - September 1, 2017
Archival data migration to bring huge volumes of data from Oracle DB into the Hadoop environment. Designing data models and optimizing queries on Hive and Spark SQL. Implemented partitioning and bucketing techniques in Hive to improve performance.
Software Consultant at STARHUB TELECOMMUNICATION, SINGAPORE
December 1, 2016 - December 1, 2016
Provided software consultancy on data and analytics initiatives, contributing to data integration, reporting, and platform enhancements across telecom projects.
Application Analyst at MANAGEMENT DEVELOPMENT INSTITUTE OF SINGAPORE
February 1, 2013 - February 1, 2013
Analytical and data-focused roles supporting enterprise applications and data workflows within the institute.
Siebel Administrator and EIM Developer at TATA CONSULTANCY SERVICES, INDIA
April 1, 2010 - April 1, 2010
Siebel administration and EIM development work supporting data integration and master data management initiatives.
Senior Data Engineer at Marina Bay Sands (MBS)
January 31, 2023 - January 31, 2023
Upgraded Hadoop cluster from CDH 5.14.4 to CDP 7.1.7; Resolved QA/Production issues. Designed and deployed a secure Confluent Kafka 7.2 cluster using Ansible, implementing SSL/TLS encryption, SASL/PLAIN authentication, and RBAC to safeguard data in motion and platform access. Built a real-time event streaming pipeline from Microsoft SQL Server to Confluent Kafka using Debezium CDC, transforming events with ksqlDB and delivering processed data to a SQL Data Warehouse via Kafka Sink connectors. Troubleshot and resolved QA/Prod issues to enhance platform reliability and data ingestion/processing.
Software Consultant at STARHUB TELECOMMUNICATION, SINGAPORE
December 31, 2016 - December 31, 2016
Software consulting assignments delivering data/ETL solutions; helped design scalable data processing pipelines and analytics.

Education

Master of Technology (M.Tech) at SASTRA University, School of Computing
January 1, 2005 - January 1, 2007
Master of Technology at SASTRA University, Thanjavur, India
January 1, 2005 - January 1, 2007

Qualifications

Predicting with MLOps on Cloudera AI
January 11, 2030 - November 3, 2025
Cloudera Data Engineer
January 11, 2030 - November 3, 2025
Cloudera Certified Administrator for Apache Hadoop
January 11, 2030 - November 3, 2025
AWS Data Analytics & ML
January 11, 2030 - November 3, 2025
Denodo Certified Architect Associate
January 11, 2030 - November 3, 2025
Google SRE Skills & Technologies
January 11, 2030 - November 3, 2025
Predicting with MLOps on Cloudera AI
January 11, 2030 - November 3, 2025
Cloudera Data Engineer
January 11, 2030 - November 3, 2025
Cloudera Certified Administrator for Apache Hadoop
January 11, 2030 - November 3, 2025
AWS Data Analytics & ML
January 11, 2030 - November 3, 2025
Denodo Certified Architect Associate
January 11, 2030 - November 3, 2025
Google SRE Skills & Technologies
January 11, 2030 - November 3, 2025

Industry Experience

Financial Services, Government, Telecommunications, Retail, Travel & Hospitality