I am a cloud data engineer who designs and builds scalable batch and real-time data pipelines on Google Cloud Platform and other modern tools. I enjoy solving complex data problems, automating workflows, and delivering actionable analytics through dashboards. I thrive in cross-functional teams and love turning data into insights that drive business impact. In my work, I combine practical engineering with a focus on performance and maintainability, always aiming for robust, scalable solutions and clear communication with data and infra teams.

Ruthvik Kumar Yadav

I am a cloud data engineer who designs and builds scalable batch and real-time data pipelines on Google Cloud Platform and other modern tools. I enjoy solving complex data problems, automating workflows, and delivering actionable analytics through dashboards. I thrive in cross-functional teams and love turning data into insights that drive business impact. In my work, I combine practical engineering with a focus on performance and maintainability, always aiming for robust, scalable solutions and clear communication with data and infra teams.

Available to hire

I am a cloud data engineer who designs and builds scalable batch and real-time data pipelines on Google Cloud Platform and other modern tools. I enjoy solving complex data problems, automating workflows, and delivering actionable analytics through dashboards. I thrive in cross-functional teams and love turning data into insights that drive business impact.

In my work, I combine practical engineering with a focus on performance and maintainability, always aiming for robust, scalable solutions and clear communication with data and infra teams.

See more

Experience Level

Expert
Expert
Expert
Expert
Intermediate

Work Experience

Cloud Data Engineer at Independent Projects (GCP, Kafka, Spark)
November 1, 2025 - November 1, 2025
Designed and deployed an end-to-end batch ETL pipeline for 2.7M+ NYC Taxi records using GCS → Dataproc (PySpark) → BigQuery. Automated ingestion of raw CSVs into GCS with Python scripts and Airflow DAGs for daily processing. Implemented PySpark jobs to clean, transform, and partition data for analytics. Loaded optimized, partitioned data into BigQuery and created analytical views for revenue trends, trip behaviors, and KPIs. Built an interactive Looker Studio dashboard for insights. Automated ingestion, transformation, and orchestration reduced manual workload by 80%. Also built a real-time streaming ETL pipeline ingesting taxi trip events via Apache Kafka and processing with Spark Structured Streaming; configured brokers/topics, parsed JSON events, normalized timestamps, and wrote continuous micro-batch outputs with checkpointing.

Education

Master of Science in Computer & Information Sciences at Lewis University
January 11, 2030 - May 1, 2023
Bachelor of Technology in Information Technology at St. Martin's Engineering College
January 11, 2030 - June 1, 2019

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services