I’m a Data Engineer who specializes in building scalable big data ecosystems, real-time analytics platforms, and cloud-based data infrastructure. Based in Morocco, I have over four years of experience helping organizations modernize their data environments and unlock the full potential of their data through automation, cloud migration, and advanced analytics.
My expertise spans cloud platforms (AWS, Azure, IBM Cloud, GCP), data processing frameworks (PySpark, Kafka, Airflow, DBT), and modern data platforms like Databricks and Snowflake. What differentiates me from other developers is my ability to bridge the gap between data engineering and business impact — transforming raw data into accessible, reliable, and actionable insights. I thrive in designing end-to-end data architectures, optimizing data pipelines for performance and scalability, and delivering solutions that empower cross-functional teams to make data-driven decisions.
Employment and project experience
Data Engineer — IBM
Built large-scale data lake and analytics ecosystems for enterprise clients
Designed and automated data ingestion from APIs, BigQuery, and structured sources into a cloud-based data lake.
Migrated on-premise frameworks to serverless PySpark pipelines on IBM Cloud.
Engineered real-time data processing with Apache Kafka and implemented monitoring and alerting systems.
Developed reconciliation mechanisms and CDC processes to ensure data quality and consistency.
Tools: Python, PySpark, IBM Cloud, MongoDB, SQL, Kafka.
Data Engineer — Freelance
Developed and deployed data platforms for financial and logistics clients
Built a Snowflake data warehouse integrating multiple data sources for financial reporting.
Created ETL pipelines in Airflow for automated data ingestion and transformation.
Designed AWS Glue jobs for Snowflake-to-S3 data migration.
Tools: Snowflake, Python, SQL, Airflow, PostgreSQL, AWS Glue.
Energy Data Hub Project — IBM Asset Team
Architected a reusable analytics solution for the utilities industry
Built modular backend services with AWS Lambda (Python, Node.js) and Terraform automation.
Integrated CI/CD deployment pipelines using GitHub API and EC2 automation.
Designed multi-workspace governance models inspired by Unity Catalog.
Tools: AWS, Python, Node.js, Terraform, PostgreSQL.
AMI 2.0 PoC — Utility Sector
Designed real-time energy analytics and outage prediction system
Developed Kafka-based real-time processing pipelines for smart meter data.
Deployed ML models for outage prediction at the edge and integrated with REST APIs.
Tools: AWS, Python, PySpark, Kafka, Docker.
Skills
Language
Work Experience
Education
Qualifications
Industry Experience
Designed and implemented a scalable, low-latency Data Analytics Platform using Azure
and Databricks to support cross-functional teams with real-time data access and advanced
analytics. The platform significantly improved data-driven decision-making, streamlined
workflows, and enhanced operational agility across the organization.
Built Azure Data Factory (ADF) pipelines to extract, transform, and load data from
OracleDB and Azure Data Lake Storage, ensuring reliable and efficient data flows.
Automated real-time data ingestion using Databricks Auto Loader from Azure Data
Lake into raw storage layers.
Developed Delta Live Tables (DLT) and Databricks SQL queries to transform raw data
into optimized star schema models for reporting and analytics.
Engineered advanced data transformation jobs in PySpark, integrated with ADF for
orchestration and scheduling of end-to-end data pipelines.
Tools: Microsoft Azure, Databricks, SQL, PySpark, Delta Lake
Build a data lake ecosystem for the client that includes automated processes to load data
from various sources (APIs, structured data, semi-structured data, BigQuery, etc.) and
deliver it to different teams,
manage the data lake by implementing monitoring mechanisms to oversee these
processes and an alert system to ensure a quick response to any issues.
transform the project architecture into a serverless architecture by building Pyspark
ETLs that replace the old python framework to handle large files using a serverless
services provided by IBM Cloud.
build process for real time data processing into the data lake using Apache Kafka.
Provide daily support to all the teams that works with data lake to insure availability of
the system and data integrity, also adapt our pipelines to any changes comes from the
sources.
Build Reconciliation mechanism for data validation
perform data Analytics on semi Structured Data stored in MongoDB.
Build CDC mechanism to capture data changes in a relational dataset
Tools: python, pyspark,lBM cloud, mongodb, SQL,inux,kafka
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Casablanca today.