I'm a Senior Data Engineer with over 9 years of experience designing and scaling cloud-native data platforms on AWS and GCP. I specialize in analytics, applied machine learning (NLP and computer vision), and building robust data architectures. Currently I focus on end-to-end data engineering, actively participating in Data Science and Generative AI initiatives to experiment and unlock business value from data. I bridge engineering, analytics, and business with scalable, sustainable data solutions.

Luis Paraguay

I'm a Senior Data Engineer with over 9 years of experience designing and scaling cloud-native data platforms on AWS and GCP. I specialize in analytics, applied machine learning (NLP and computer vision), and building robust data architectures. Currently I focus on end-to-end data engineering, actively participating in Data Science and Generative AI initiatives to experiment and unlock business value from data. I bridge engineering, analytics, and business with scalable, sustainable data solutions.

Available to hire

I’m a Senior Data Engineer with over 9 years of experience designing and scaling cloud-native data platforms on AWS and GCP. I specialize in analytics, applied machine learning (NLP and computer vision), and building robust data architectures.

Currently I focus on end-to-end data engineering, actively participating in Data Science and Generative AI initiatives to experiment and unlock business value from data. I bridge engineering, analytics, and business with scalable, sustainable data solutions.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate

Language

Spanish; Castilian
Fluent
Portuguese
Advanced
English
Advanced
French
Intermediate

Work Experience

Senior Data Engineer (Contractor) at Carrefour
May 1, 2025 - Present
Design and maintain end-to-end data pipelines on Google Cloud Platform (GCP) using Apache Airflow for orchestration. Develop batch and streaming processing with Spark/PySpark on Dataproc, managing datasets with Apache Hudi. Build transformations using dbt to integrate multiple sources (e-commerce, travel) into a Lakehouse. Build data flows to extract Customer domain data and publish vector embeddings to ChromaDB. Implement IaC with Terraform/OpenTofu and HCP Terraform for infrastructure provisioning. Implement CI/CD pipelines in GitLab for automated deployment of data solutions. Participate in PoCs for LLM agents using Google ADK.
Senior Data Engineer at Pragma for Nequi
July 1, 2025 - October 1, 2025
Designed, developed, and deployed data products on AWS using DynamoDB, Athena, S3, Glue, and Airflow (MWAA). Developed batch processing pipelines in PySpark using Apache Hudi and Iceberg, creating tables in AWS Glue Data Catalog and synchronizing them with Metabase. Implemented dbt transformations to integrate multiple sources from AWS Data Catalog under a Data Mesh approach. Applied Infrastructure as Code with Terraform for automated AWS resource provisioning. Developed CI/CD pipelines using GitHub and Azure DevOps for automated deployment of data solutions.
Data Technical Lead (BI & Analytics) at La Positiva Seguros y Reaseguros
April 1, 2023 - April 1, 2025
Technically led the implementation and evolution of an AWS Lakehouse with Databricks under the Medallion architecture, including data replication from SAP ERP to S3. Defined tasks, estimated effort, and supervised Big Data projects. Designed and executed batch and near real-time ingestion processes using Qlik Replicate, integrating Oracle, Informix, and SAP sources into the lakehouse. Implemented and optimized PySpark pipelines, transforming data into Delta Lake and cataloging it in Databricks Unity Catalog. Ensured data quality, integrity, and security, aligning developments with data architecture and governance standards. Implemented DevOps-based deployments using CloudFormation, Git, and Jenkins.
Senior Data Engineer (Contractor) at Fot
July 1, 2022 - December 1, 2023
Designed data architecture for transactional systems, data warehouse, and data lake, developing ETL and ELT processes using AWS services, APIs, and Pentaho. Implemented data ingestion processes on AWS via APIs and web scraping in Docker containers, using Python/PySpark and applying data quality rules, deployed via IaC (CloudFormation). Built commercial dashboards in Power BI using DAX and connected to Amazon Redshift. Developed document management automation using Azure Logic Apps, Azure Functions, and Blob Storage, deployed with Terraform and Jenkins. Implemented a real-time sentiment analysis pipeline using Scikit-Learn and deployed on AWS ECS for internal consumption.
Solutions Architect at Kushki Peru
July 1, 2021 - June 1, 2022
Coordinated the design of products and features, gathering and analyzing technical and functional requirements with Product, UX/UI, and Architecture teams. Defined and documented technical solutions for digital and data products, producing flow diagrams, component maps, functional specifications, and actor definitions. Supported product analysts and development teams in defining and estimating user stories. Collaborated in designing AWS-based data architectures for analytics use cases, using services such as Athena and EMR.
Senior Data Engineer / BI Engineer at Rimac Seguros y Reaseguros
June 1, 2019 - June 1, 2021
Developed commercial databases for automotive and life annuities lines, integrating them with Salesforce and CRM platforms. Designed data extraction and processing architectures using AWS (Glue, Lambda, Step Functions, Redshift, Athena) and GCP (BigQuery). Implemented ELT processes and data pipelines, including ingestion from Salesforce, Adobe Analytics, lead databases, and core systems into the data lake using Python/PySpark. Built BI dashboards in QlikView and Power BI. Implemented data governance and privacy controls. Developed an OCR computer vision model (TensorFlow/PyTorch) exposed via FastAPI/Flask as an inference API.
ETL Developer / Data Architecture at NTT Data
September 1, 2016 - March 1, 2019
Participated in the design of OLTP and OLAP data architectures for corporate clients (Repsol and Claro) using SQL Server and Oracle. Led CI/CD for database objects and implemented ETL processes with Informatica PowerCenter and Oracle PL/SQL for data lake and analytics projects. Optimized SQL queries and stored procedures to improve performance.

Education

Master’s Degree in IT Management at ESAN, Peru
January 11, 2030 - January 1, 2023
Master’s Degree in IT Management at La Salle, Spain
January 11, 2030 - January 1, 2022
Systems Engineering at Universidad César Vallejo, Peru
January 11, 2030 - January 1, 2015

Qualifications

Introducción a los servicios en la nube de Microsoft Azure
January 11, 2030 - January 1, 2025
IBM Data Engineering Professional Certificate
January 11, 2030 - January 1, 2025
Introduction to Microsoft Azure Cloud Services
January 11, 2030 - January 1, 2025
Scrum Master Certified (SMC)
January 11, 2030 - January 1, 2018
ITIL Fundamentals
January 11, 2030 - January 1, 2017
Business Intelligence with Microsoft SQL Server
January 11, 2030 - January 1, 2014
SQL and Data Modeling
January 11, 2030 - January 1, 2013

Industry Experience

Software & Internet, Professional Services, Financial Services, Media & Entertainment, Retail

Experience Level

Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate