Skills
Experience Level
Work Experience
Education
Qualifications
Industry Experience
Context:
Within the StayInCharm project, the team needed a reliable infrastructure to centralize, transform, and analyze data to support report creation for analysts.
Action:
I designed and automated a complete ETL pipeline in a fully containerized Docker environment. I configured the PostgreSQL database as well as the Superset visualization tool, ensuring smooth integration within the data ecosystem.
Result:
The analytics team now has access to a robust and automated data platform, enabling them to create reports and dashboards independently, with reliable and always up-to-date data.
In Vietnam, one of the activities with the greatest environmental impact is rice cultivation. An alternative method to the traditional one—designed to reduce carbon emissions—has been developed and requires monitoring the condition of rice fields (flooded/non-flooded) to determine the areas where each method is being used. This monitoring is carried out through the analysis of satellite images, which, however, become ineffective in the visible wavelengths during cloudy weather or once the plants reach a certain growth stage. Therefore, radar satellite imagery is used.
In this context, I collaborated with the company Globéo to develop machine learning models aimed at predicting the flooding status of fields in the Mekong Delta.
Working directly with the researchers in charge of the project, I created ML models using multi-frequency radar data, first in tabular format and later with neural networks.
Through data preprocessing techniques and hyperparameter tuning, we improved accuracy from around 60% for the simplest models to 80%, and then to 95% after aggregating results by field/zone.
Project carried out in collaboration with Globéo, based on several years of geolocated sowing dates in the Mekong Delta. Objective: automatically identify sowing seasons and their interannual shifts, despite strong spatial variability and the circular nature of the calendar (day of year).
Circular encoding of the day of year (sin/cos) and normalization to enable robust learning.
Development of a local clustering pipeline using sliding spatial windows (70% overlap) to reduce the excessive geographical bias of global models.
KMeans clustering (k=3) with consistent label alignment (Hungarian method) and soft-voting aggregation to stabilize results.
Full automation of data processing and resolution of an MKL/KMeans bug on Windows.
Reliable segmentation of sowing seasons across the entire Mekong Delta, independent of climatic gradients.
Stable and robust classification in mixed or transitional areas.
Methodological foundation for analyzing seasonal drifts and interannual trends.
Hire a Data Scientist
We have the best data scientist experts on Twine. Hire a data scientist in Toulouse today.