I'm a Data Engineer who specializes in building scalable big data ecosystems, real-time analytics platforms, and cloud-based data infrastructure. Based in Morocco, I have over four years of experience helping organizations modernize their data environments and unlock the full potential of their data through automation, cloud migration, and advanced analytics. My expertise spans cloud platforms (AWS, Azure, IBM Cloud, GCP), data processing frameworks (PySpark, Kafka, Airflow, DBT), and modern data platforms like Databricks and Snowflake. What differentiates me from other developers is my ability to bridge the gap between data engineering and business impact — transforming raw data into accessible, reliable, and actionable insights. I thrive in designing end-to-end data architectures, optimizing data pipelines for performance and scalability, and delivering solutions that empower cross-functional teams to make data-driven decisions. Employment and project experience Data Engineer — IBM Built large-scale data lake and analytics ecosystems for enterprise clients Designed and automated data ingestion from APIs, BigQuery, and structured sources into a cloud-based data lake. Migrated on-premise frameworks to serverless PySpark pipelines on IBM Cloud. Engineered real-time data processing with Apache Kafka and implemented monitoring and alerting systems. Developed reconciliation mechanisms and CDC processes to ensure data quality and consistency. Tools: Python, PySpark, IBM Cloud, MongoDB, SQL, Kafka. Data Engineer — Freelance Developed and deployed data platforms for financial and logistics clients Built a Snowflake data warehouse integrating multiple data sources for financial reporting. Created ETL pipelines in Airflow for automated data ingestion and transformation. Designed AWS Glue jobs for Snowflake-to-S3 data migration. Tools: Snowflake, Python, SQL, Airflow, PostgreSQL, AWS Glue. Energy Data Hub Project — IBM Asset Team Architected a reusable analytics solution for the utilities industry Built modular backend services with AWS Lambda (Python, Node.js) and Terraform automation. Integrated CI/CD deployment pipelines using GitHub API and EC2 automation. Designed multi-workspace governance models inspired by Unity Catalog. Tools: AWS, Python, Node.js, Terraform, PostgreSQL. AMI 2.0 PoC — Utility Sector Designed real-time energy analytics and outage prediction system Developed Kafka-based real-time processing pipelines for smart meter data. Deployed ML models for outage prediction at the edge and integrated with REST APIs. Tools: AWS, Python, PySpark, Kafka, Docker.

Zakaria Moulay Taj

I'm a Data Engineer who specializes in building scalable big data ecosystems, real-time analytics platforms, and cloud-based data infrastructure. Based in Morocco, I have over four years of experience helping organizations modernize their data environments and unlock the full potential of their data through automation, cloud migration, and advanced analytics. My expertise spans cloud platforms (AWS, Azure, IBM Cloud, GCP), data processing frameworks (PySpark, Kafka, Airflow, DBT), and modern data platforms like Databricks and Snowflake. What differentiates me from other developers is my ability to bridge the gap between data engineering and business impact — transforming raw data into accessible, reliable, and actionable insights. I thrive in designing end-to-end data architectures, optimizing data pipelines for performance and scalability, and delivering solutions that empower cross-functional teams to make data-driven decisions. Employment and project experience Data Engineer — IBM Built large-scale data lake and analytics ecosystems for enterprise clients Designed and automated data ingestion from APIs, BigQuery, and structured sources into a cloud-based data lake. Migrated on-premise frameworks to serverless PySpark pipelines on IBM Cloud. Engineered real-time data processing with Apache Kafka and implemented monitoring and alerting systems. Developed reconciliation mechanisms and CDC processes to ensure data quality and consistency. Tools: Python, PySpark, IBM Cloud, MongoDB, SQL, Kafka. Data Engineer — Freelance Developed and deployed data platforms for financial and logistics clients Built a Snowflake data warehouse integrating multiple data sources for financial reporting. Created ETL pipelines in Airflow for automated data ingestion and transformation. Designed AWS Glue jobs for Snowflake-to-S3 data migration. Tools: Snowflake, Python, SQL, Airflow, PostgreSQL, AWS Glue. Energy Data Hub Project — IBM Asset Team Architected a reusable analytics solution for the utilities industry Built modular backend services with AWS Lambda (Python, Node.js) and Terraform automation. Integrated CI/CD deployment pipelines using GitHub API and EC2 automation. Designed multi-workspace governance models inspired by Unity Catalog. Tools: AWS, Python, Node.js, Terraform, PostgreSQL. AMI 2.0 PoC — Utility Sector Designed real-time energy analytics and outage prediction system Developed Kafka-based real-time processing pipelines for smart meter data. Deployed ML models for outage prediction at the edge and integrated with REST APIs. Tools: AWS, Python, PySpark, Kafka, Docker.

Available to hire

I’m a Data Engineer who specializes in building scalable big data ecosystems, real-time analytics platforms, and cloud-based data infrastructure. Based in Morocco, I have over four years of experience helping organizations modernize their data environments and unlock the full potential of their data through automation, cloud migration, and advanced analytics.

My expertise spans cloud platforms (AWS, Azure, IBM Cloud, GCP), data processing frameworks (PySpark, Kafka, Airflow, DBT), and modern data platforms like Databricks and Snowflake. What differentiates me from other developers is my ability to bridge the gap between data engineering and business impact — transforming raw data into accessible, reliable, and actionable insights. I thrive in designing end-to-end data architectures, optimizing data pipelines for performance and scalability, and delivering solutions that empower cross-functional teams to make data-driven decisions.

Employment and project experience

Data Engineer — IBM
Built large-scale data lake and analytics ecosystems for enterprise clients

Designed and automated data ingestion from APIs, BigQuery, and structured sources into a cloud-based data lake.

Migrated on-premise frameworks to serverless PySpark pipelines on IBM Cloud.

Engineered real-time data processing with Apache Kafka and implemented monitoring and alerting systems.

Developed reconciliation mechanisms and CDC processes to ensure data quality and consistency.
Tools: Python, PySpark, IBM Cloud, MongoDB, SQL, Kafka.

Data Engineer — Freelance
Developed and deployed data platforms for financial and logistics clients

Built a Snowflake data warehouse integrating multiple data sources for financial reporting.

Created ETL pipelines in Airflow for automated data ingestion and transformation.

Designed AWS Glue jobs for Snowflake-to-S3 data migration.
Tools: Snowflake, Python, SQL, Airflow, PostgreSQL, AWS Glue.

Energy Data Hub Project — IBM Asset Team
Architected a reusable analytics solution for the utilities industry

Built modular backend services with AWS Lambda (Python, Node.js) and Terraform automation.

Integrated CI/CD deployment pipelines using GitHub API and EC2 automation.

Designed multi-workspace governance models inspired by Unity Catalog.
Tools: AWS, Python, Node.js, Terraform, PostgreSQL.

AMI 2.0 PoC — Utility Sector
Designed real-time energy analytics and outage prediction system

Developed Kafka-based real-time processing pipelines for smart meter data.

Deployed ML models for outage prediction at the edge and integrated with REST APIs.
Tools: AWS, Python, PySpark, Kafka, Docker.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate
Intermediate
See more

Language

English
Fluent
Arabic
Fluent
French
Intermediate

Work Experience

data enginerr at IBM
February 7, 2022 - Present

Education

Baccalaureate at Hassan II High School
January 1, 2016 - January 1, 2016
Data Engineer at National School of Applied Sciences of Al Hoceima
January 1, 2017 - January 1, 2022
Bachelor’s Degree in Computer Science - Business Intelligence at National School of Applied Sciences of Al Hoceima
January 1, 2017 - January 1, 2022
Physical Science at Hassan II High School
January 11, 2030 - January 1, 2016

Qualifications

Snowflake SnowPro Core Certification
January 11, 2030 - November 1, 2025
AWS Certified Data Analytics – Specialty
January 11, 2030 - November 1, 2025
Microsoft Azure Fundamentals
January 11, 2030 - November 1, 2025
MongoDB Associate Developer
January 11, 2030 - November 1, 2025
SnowPro Core Certification
January 11, 2030 - November 1, 2025
AWS Certified Data Analytics – Specialty
January 11, 2030 - November 1, 2025
Databricks Certified Data Engineer Associate
January 11, 2030 - November 1, 2025
Azure Fundamentals
January 11, 2030 - November 1, 2025

Industry Experience

Software & Internet, Professional Services, Media & Entertainment, Energy & Utilities, Manufacturing
    paper Azure Data Analytics Platform

    Designed and implemented a scalable, low-latency Data Analytics Platform using Azure
    and Databricks to support cross-functional teams with real-time data access and advanced
    analytics. The platform significantly improved data-driven decision-making, streamlined
    workflows, and enhanced operational agility across the organization.
    Built Azure Data Factory (ADF) pipelines to extract, transform, and load data from
    OracleDB and Azure Data Lake Storage, ensuring reliable and efficient data flows.
    Automated real-time data ingestion using Databricks Auto Loader from Azure Data
    Lake into raw storage layers.
    Developed Delta Live Tables (DLT) and Databricks SQL queries to transform raw data
    into optimized star schema models for reporting and analytics.
    Engineered advanced data transformation jobs in PySpark, integrated with ADF for
    orchestration and scheduling of end-to-end data pipelines.
    Tools: Microsoft Azure, Databricks, SQL, PySpark, Delta Lake

    paper Data Lake Project

    Build a data lake ecosystem for the client that includes automated processes to load data
    from various sources (APIs, structured data, semi-structured data, BigQuery, etc.) and
    deliver it to different teams,
    manage the data lake by implementing monitoring mechanisms to oversee these
    processes and an alert system to ensure a quick response to any issues.
    transform the project architecture into a serverless architecture by building Pyspark
    ETLs that replace the old python framework to handle large files using a serverless
    services provided by IBM Cloud.
    build process for real time data processing into the data lake using Apache Kafka.
    Provide daily support to all the teams that works with data lake to insure availability of
    the system and data integrity, also adapt our pipelines to any changes comes from the
    sources.
    Build Reconciliation mechanism for data validation
    perform data Analytics on semi Structured Data stored in MongoDB.
    Build CDC mechanism to capture data changes in a relational dataset
    Tools: python, pyspark,lBM cloud, mongodb, SQL,inux,kafka