I am a data engineer and AI/LLM specialist with 6 years of experience designing, optimizing, and deploying large-scale data systems, real-time streaming architectures, and AI-driven solutions. I have built production-grade data pipelines handling petabyte-scale data, engineered end-to-end pipelines with Spark, Kafka, Hadoop, and SQL, CI/CD and led cloud-native infrastructure improvements that cut processing time and costs. I enjoy turning complex data challenges into scalable, business-impacting solutions.

Ritwik Raj

I am a data engineer and AI/LLM specialist with 6 years of experience designing, optimizing, and deploying large-scale data systems, real-time streaming architectures, and AI-driven solutions. I have built production-grade data pipelines handling petabyte-scale data, engineered end-to-end pipelines with Spark, Kafka, Hadoop, and SQL, CI/CD and led cloud-native infrastructure improvements that cut processing time and costs. I enjoy turning complex data challenges into scalable, business-impacting solutions.

Available to hire

I am a data engineer and AI/LLM specialist with 6 years of experience designing, optimizing, and deploying large-scale data systems, real-time streaming architectures, and AI-driven solutions.

I have built production-grade data pipelines handling petabyte-scale data, engineered end-to-end pipelines with Spark, Kafka, Hadoop, and SQL, CI/CD and led cloud-native infrastructure improvements that cut processing time and costs. I enjoy turning complex data challenges into scalable, business-impacting solutions.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate

Language

English
Advanced

Work Experience

Senior Data Engineer II at Apple
May 1, 2025 - May 1, 2025
Designed, implemented, and optimized production-grade data pipelines and real-time streaming architectures to support AI-driven analytics. Led end-to-end data processing with Kafka and Spark Streaming; integrated Apache Iceberg for scalable storage; leveraged AWS Glue for ETL; improved data reliability and system resilience. Drove cloud-native architecture and MLOps readiness, enabling scalable data processing on petabyte-scale datasets and close collaboration with product teams.
Software Engineer - Data Platform at Ola Cabs
November 1, 2022 - November 1, 2022
Built microservices to transfer real-time data from Kafka and MySQL to a data lake (S3), deployed on Kubernetes for scalable and efficient data storage. Led a proof-of-concept to deploy Apache Pinot and Trino on a Kubernetes cluster, enabling sub-second query performance on high-throughput Kafka topic data and reducing analytics latency by 50%.
Cloud Data Engineer at Amazon Web Services
July 1, 2021 - July 1, 2021
Developed EMR-based data processing pipelines and debugging tools to efficiently analyze and troubleshoot jobs running on EMR clusters; reduced debugging time and improved system reliability. Leveraged AWS services (DMS, S3, EMR, Glue, Redshift, Athena) to streamline big data processing and built end-to-end ETL workflows in collaboration with data science teams.

Education

B.E. (CSE) at B.M.S. Institute of Technology
January 11, 2030 - June 1, 2020

Qualifications

AWS Certified Solutions Architect - Associate
January 11, 2030 - November 4, 2025

Industry Experience

Computers & Electronics, Software & Internet, Media & Entertainment