I'm a dynamic data engineer with a passion for building cloud-native, event-driven data systems that power real-time analytics and AI workflows. I thrive on solving complex data problems and delivering reliable, observable data platforms for production use. Over the years, I’ve built scalable ETL and streaming architectures on AWS and Kubernetes, improved observability and automation, and collaborated across product, security, and data science teams to deliver reliable data and feature delivery for AI initiatives.

Lehtinen Eero

I'm a dynamic data engineer with a passion for building cloud-native, event-driven data systems that power real-time analytics and AI workflows. I thrive on solving complex data problems and delivering reliable, observable data platforms for production use. Over the years, I’ve built scalable ETL and streaming architectures on AWS and Kubernetes, improved observability and automation, and collaborated across product, security, and data science teams to deliver reliable data and feature delivery for AI initiatives.

Available to hire

I’m a dynamic data engineer with a passion for building cloud-native, event-driven data systems that power real-time analytics and AI workflows. I thrive on solving complex data problems and delivering reliable, observable data platforms for production use.

Over the years, I’ve built scalable ETL and streaming architectures on AWS and Kubernetes, improved observability and automation, and collaborated across product, security, and data science teams to deliver reliable data and feature delivery for AI initiatives.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert

Language

Javanese
Advanced

Work Experience

Senior Data Engineer at Scissero (Remote)
August 1, 2023 - September 1, 2025
Designed and led implementation of a modular, high-throughput finance platform data pipeline ingesting market and transactional feeds using Spark Structured Streaming, Kafka, and AWS managed queues to enable sub-second data availability for downstream analytics and reporting. Built resilient ETL and event processing layers on AWS using S3, Lambda, and containerized services on EKS, applying schema evolution patterns with Apache Iceberg and partitioning strategies to reduce storage costs and improve query performance for large finance datasets. Implemented robust data validation, lineage, and monitoring pipelines with automated alerting via CloudWatch and custom observability tooling, reducing production incident mean-time-to-detect by 35% for critical financial ingestion and processing pipelines. Optimized real-time enrichment and feature computation workflows by tuning Spark jobs, right-sizing cluster resources, and applying adaptive query planning and state-store improvements to lowe
Data Engineer at Capgemini (Remote)
June 1, 2021 - August 1, 2023
Designed and implemented production-grade event-driven architectures for a cloud-native analytics product using Scala, Spark, and Kafka to reliably process streaming transactions and audit events at multi-GB throughput with precise ordering guarantees. Built microservices and stream processors deployed to Amazon EKS, leveraging Bash scripting and CI pipelines to automate container builds and rollouts, improving release consistency while reducing manual deployment errors across engineering squads. Implemented persistence layers across DynamoDB for low-latency key-value access and RDS for transactional needs, modeling hybrid schemas and eventual-consistency patterns to maintain data correctness during burst traffic and failover events. Optimized batch and streaming ETL with Spark Structured Streaming, tuning checkpointing, watermarking, and state management to achieve deterministic processing and reduce recovery windows during job restarts in production. Integrated message queuing and ma
Software Engineer at Amazon Web Services (Onsite)
August 1, 2017 - June 1, 2021
Developed core subsystems for Amazon Redshift Spectrum, implementing query planning, metadata management, and connector integrations to enable direct S3 querying and reduce the need for full data ingestion into warehouse nodes. Implemented high-performance data access layers in Java and contributed C++ optimizations for vectorized execution paths, improving throughput and lowering CPU overhead for mixed analytical workloads across customer clusters. Designed connectors and integrations with AWS Glue and external metastores to enable schema discovery, partition pruning, and compatibility with columnar formats such as Parquet and ORC, increasing query efficiency for large data lakes. Implemented security and governance features using AWS IAM controls, fine-grained access policies, and CloudWatch-based auditing pipelines to provide enterprise-grade compliance and traceability for customer queries and data access. Automated performance regression tests and observability dashboards, authore

Education

Bachelor's Degree at University of Helsinki
January 1, 2013 - January 1, 2017

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Financial Services