Hi, I’m Firas Tlili Gafsa, an AI and machine learning engineer based in Gabes, Tunisia. With 7+ years of experience developing cutting-edge deep learning models, ML pipelines, and end-to-end AI solutions across medical imaging, agriculture, and multimedia, I bring hands-on expertise in Python, PyTorch, TensorFlow, OpenCV, and cloud platforms (AWS, Google Cloud, Azure). I thrive in cross-functional teams and hybrid academic-industry settings, delivering scalable, production-ready systems using Docker, Kubernetes, MLflow, and DVC. My work spans computer vision, natural language processing, and generative AI, and I enjoy turning complex problems into tangible impact—from automating medical image analysis to enabling real-time video analytics for sports and security.

Firas Tlili

Hi, I’m Firas Tlili Gafsa, an AI and machine learning engineer based in Gabes, Tunisia. With 7+ years of experience developing cutting-edge deep learning models, ML pipelines, and end-to-end AI solutions across medical imaging, agriculture, and multimedia, I bring hands-on expertise in Python, PyTorch, TensorFlow, OpenCV, and cloud platforms (AWS, Google Cloud, Azure). I thrive in cross-functional teams and hybrid academic-industry settings, delivering scalable, production-ready systems using Docker, Kubernetes, MLflow, and DVC. My work spans computer vision, natural language processing, and generative AI, and I enjoy turning complex problems into tangible impact—from automating medical image analysis to enabling real-time video analytics for sports and security.

Available to hire

Hi, I’m Firas Tlili Gafsa, an AI and machine learning engineer based in Gabes, Tunisia.

With 7+ years of experience developing cutting-edge deep learning models, ML pipelines, and end-to-end AI solutions across medical imaging, agriculture, and multimedia, I bring hands-on expertise in Python, PyTorch, TensorFlow, OpenCV, and cloud platforms (AWS, Google Cloud, Azure).

I thrive in cross-functional teams and hybrid academic-industry settings, delivering scalable, production-ready systems using Docker, Kubernetes, MLflow, and DVC. My work spans computer vision, natural language processing, and generative AI, and I enjoy turning complex problems into tangible impact—from automating medical image analysis to enabling real-time video analytics for sports and security.

See more

Language

English
Fluent
French
Fluent
Arabic
Fluent

Work Experience

Machine Learning Engineer at Omdena
September 1, 2022 - Present
Omdena is a global platform where organizations build AI solutions to real-world problems. •Led an Omdena Local Chapter in Tunisia, organizing and managing a team of volunteers to collaborate on social impact projects and mentoring individuals to develop skills in data science and machine learning. •Developed deep learning models using Python and TensorFlow to detect olive leaf disease with 95% accuracy, helping local farmers cut expenses by 20% through early intervention strategies. •Developed computer vision models for Red Blood Cell Classification to Diagnose Sickle Cell Disease that increased predictive accuracy by 25%, utilizing Python, TensorFlow, and OpenCV, and collaborating closely with data scientists and engineers from Benin
Deep Learning Research Engineer at Intelligent Machines Lab, University of Gabes
January 1, 2022 - December 31, 2022
•Automated soccer highlight extraction using deep learning and a custom YOLOv7 model fine-tuned with PyTorch, achieving a 90% increase in detection accuracy for key events and reducing video processing time by 90%. •Developed an image extraction pipeline (Python), a video compression tool (OpenCV, FFMPEG), and a full-stack web app (Django, React) to condense and deliver 90-minute matches into 5-minute summaries, cutting editing time by 80% and enabling real-time access to highlights. •Drove cross-functional collaboration in a hybrid academic-industry setting, aligning research goals with technical execution to accelerate innovation in sports video analysis.

Education

Master of Automatic Electrical Engineering at University of Gabes
September 1, 2019 - December 26, 2022
Coursework: Artificial Intelligence, Computer Vision, Deep learning, Data Research and Optimization, Probabilities and Statistics, Linear programming, data structures and processing, Signal Processing, Architecture and programming of embedded software systems, robots, neural network applications applications, machine learning learning.

Qualifications

Google Cloud Professional Machine Learning Engineer Certification
January 11, 2024 - April 18, 2026
Microsoft Certified: Azure Data Scientist Associate
January 11, 2024 - April 18, 2026
Google Cloud Professional Data Engineer Certification
January 11, 2024 - April 18, 2026
Google Cloud Professional Cloud Architect Certification
January 12, 2024 - April 18, 2026
Google Cloud Associate Cloud Engineer Certification
January 11, 2022 - April 18, 2026
Stanford University Machine Learning Specialization
January 6, 2022 - April 18, 2026
GANs Specialist Certification
January 11, 2022 - April 18, 2026
Huawei Certified ICT Associate: AI
January 11, 2020 - April 18, 2026

Industry Experience

Media & Entertainment, Healthcare, Software & Internet, Education, Agriculture & Mining, Energy & Utilities, Manufacturing, Non-Profit Organization, Professional Services
    paper End-to-End Kidney Disease Classification with MLflow, DVC, and Cloud Deployment

    Project Description

    End-to-End Kidney Disease Classification with MLflow, DVC, and Cloud Deployment is a fully productionized machine learning pipeline designed to deliver accurate and scalable predictions for kidney disease diagnosis.

    The project implements a complete ML lifecycle, starting from data ingestion and preprocessing to model training, evaluation, and deployment. Using Scikit-learn and Pandas, the system processes clinical data to build a reliable classification model capable of identifying kidney disease patterns with high consistency.

    A key focus of the project is reproducibility and experiment management. MLflow is used to track experiments, log model parameters, and compare performance metrics, enabling systematic model optimization. DVC (Data Version Control) ensures proper versioning of datasets and pipeline stages, allowing seamless collaboration and reproducible workflows across different environments.

    The deployment pipeline is fully automated using GitHub Actions, enabling continuous integration and delivery (CI/CD). The application is containerized with Docker and deployed on AWS infrastructure, specifically EC2 for compute and ECR for container registry management, ensuring scalability and reliability in production.

    A lightweight Flask API serves the trained model, allowing real-time inference through REST endpoints. This makes the system easily accessible for integration into external applications or healthcare platforms.

    Security and infrastructure management are handled באמצעות AWS IAM, ensuring controlled access and safe deployment practices. The modular design of the pipeline allows easy updates, retraining, and scaling as new data becomes available.

    Overall, this project demonstrates how modern MLOps tools and cloud technologies can be combined to build a robust, reproducible, and scalable machine learning system for healthcare applications.

    https://www.twine.net/signin

    paper Multimodal AI Agent for Enhanced Content Understanding

    Project Description

    Multimodal AI Agent for Enhanced Content Understanding is an advanced Retrieval-Augmented Generation (RAG) system designed to enable intelligent interaction with both textual and visual data. The project integrates Large Language Models (LLMs) and Vision-Language Models (VLMs) to provide a unified solution for understanding complex, multimodal content.

    The system leverages LlamaIndex to orchestrate data ingestion, indexing, and retrieval across multiple document types, including PDFs, PowerPoint presentations, and images. NVIDIA NIM microservices are used to efficiently serve optimized AI models, ensuring high-performance inference for both text and visual tasks.

    At its core, the platform uses Milvus as a high-performance vector database to store and retrieve embeddings for semantic search. This enables real-time querying of large-scale multimodal datasets, where user inputs are matched with the most relevant contextual information before being processed by the underlying models.

    A key feature of the system is its ability to interpret and analyze visual data alongside text. By incorporating models like DePlot, the agent can extract structured insights from images such as charts, diagrams, and visual documents, enhancing the depth and accuracy of responses.

    The user interface is built with Streamlit, providing an interactive chat-based experience where users can upload documents or images and ask natural language questions. The system responds with context-aware answers that combine textual understanding and visual reasoning.

    Designed with modularity and scalability in mind, the architecture supports seamless integration of additional models and data sources. This makes it adaptable for various use cases, including document analysis, business intelligence, research assistance, and knowledge management.

    Overall, this project demonstrates the power of multimodal AI systems in bridging the gap between text and visual information, delivering a more comprehensive and intuitive approach to content understanding.

    https://www.twine.net/signin

    paper End-to-End Medical Chatbot with LLMs, LangChain, Pinecone, and LLMOps

    Project Description

    End-to-End Medical Chatbot with LLMs, LangChain, Pinecone, and LLMOps is a production-ready intelligent assistant designed to deliver accurate, context-aware medical information through a scalable and secure architecture.

    The system is built around a Retrieval-Augmented Generation (RAG) pipeline, combining Large Language Models (LLMs) with a vector database to ensure responses are grounded in reliable medical data. User queries are first processed and enriched using LangChain, then matched against indexed medical knowledge stored in Pinecone. Relevant context is retrieved and passed to the LLM to generate precise and informative answers.

    A strong emphasis is placed on clinical relevance, safety, and privacy. The chatbot integrates validation layers to filter and verify outputs, reducing the risk of misleading or unsafe medical advice. This makes the system suitable for healthcare-related applications where trust and accuracy are critical.

    The backend is implemented using Flask, providing lightweight and efficient API endpoints for real-time interaction. The application is containerized with Docker and deployed on AWS EC2, ensuring scalability and reliability. Continuous integration and deployment (CI/CD) pipelines are managed באמצעות GitHub Actions, enabling automated testing and streamlined updates.

    The system is designed following LLMOps best practices, including modular architecture, reproducible workflows, and efficient model integration. It supports real-time query handling while maintaining performance and responsiveness.

    Overall, this project demonstrates how modern LLM technologies can be combined with retrieval systems and cloud infrastructure to build a robust, production-grade medical chatbot capable of delivering trustworthy and contextually accurate health information.
    https://www.twine.net/signin

    paper AI Football Video Analysis System

    Project Description

    AI Football Video Analysis System is a comprehensive computer vision solution designed to extract real-time insights from football match footage. The system combines advanced object detection, tracking, and spatial analysis techniques to monitor player behavior, estimate ball possession, and generate meaningful performance analytics.

    At its core, the system uses a YOLOv11 model (Ultralytics) to accurately detect players, referees, and the ball in each frame. These detections are enhanced with tracking and motion analysis techniques to maintain consistent identities and capture dynamic interactions throughout the match.

    To enable deeper analysis, the project integrates KMeans clustering to group players by team based on visual features such as jersey color. Optical flow is used to estimate movement patterns and player trajectories, providing insights into speed, positioning, and overall activity on the field.

    A key component of the system is perspective transformation, which maps the broadcast camera view into a top-down representation of the pitch. This allows for more accurate spatial reasoning, including player positioning, distance measurements, and zone-based analysis.

    The system generates real-time analytics such as ball possession statistics, player movement tracking, and match dynamics visualization. Results can be visualized through overlays on the video as well as plotted using data analysis libraries like Matplotlib and Pandas.

    Developed in Python using tools such as OpenCV, NumPy, and Scikit-learn, the project is structured for flexibility and experimentation, with development carried out in environments like Jupyter Notebook and VS Code. Version control is maintained using Git for reproducibility and collaboration.

    Overall, this project demonstrates how modern AI and computer vision techniques can transform raw sports footage into actionable insights, enabling performance analysis, tactical evaluation, and enhanced understanding of the game.

    paper Detection Transformers Fine-Tuning for Custom Object Detection

    Project Description

    Detection Transformers Fine-Tuning for Custom Object Detection is an end-to-end deep learning project focused on building a high-precision bone fracture detection system using Transformer-based architectures.

    The project leverages DETR (Detection Transformer) models, fine-tuned on a custom dataset of approximately 1,200 COCO-formatted X-ray images. The system is trained to detect and classify five distinct types of bone fractures, enabling accurate and automated analysis of medical imaging data.

    Built with PyTorch Lightning and Hugging Face Transformers, the training pipeline is modular, scalable, and optimized for experimentation. It integrates data preprocessing, model training, validation, and evaluation into a streamlined workflow, ensuring reproducibility and efficient development cycles.

    To enhance usability and interpretability, the system incorporates OpenCV and Supervision for visual validation. Predictions are displayed alongside ground truth annotations, allowing side-by-side comparison that supports clinical insight and model transparency.

    The model achieves over 90% precision while maintaining fast inference speeds (under 50ms per image), making it suitable for real-time or near-real-time diagnostic support applications. Visualization tools, including Matplotlib, are used to monitor training performance and analyze results.

    The dataset preparation and annotation process follows the COCO standard, ensuring compatibility with modern detection frameworks and tools such as Roboflow. Development and experimentation are conducted in environments like Google Colab and Jupyter Notebook, with version control managed through Git.

    Overall, this project demonstrates the effectiveness of Transformer-based object detection models in the medical imaging domain, delivering a reliable and efficient solution for automated fracture detection with strong potential for real-world clinical applications.

    paper Fine-Tuning Large Language Models (LLMs) Efficiently with Unsloth + LoRA

    Project Description

    Fine-Tuning Large Language Models (LLMs) Efficiently with Unsloth + LoRA is a high-performance pipeline designed to enable scalable and cost-effective customization of multi-billion parameter language models using limited computational resources.

    This project focuses on parameter-efficient fine-tuning techniques, combining Unsloth optimization with LoRA (Low-Rank Adaptation) to significantly reduce memory usage and training time. By leveraging 4-bit quantization and mixed precision, the pipeline allows large models to be fine-tuned on a single GPU without compromising performance.

    The workflow covers the entire fine-tuning lifecycle. It begins with structured dataset preparation in chat format, ensuring compatibility with conversational models. Training is performed using Hugging Face Transformers and TRL’s SFTTrainer, integrated with PEFT for efficient adaptation of model weights. The pipeline is optimized for stability and speed, making it suitable for rapid experimentation and iteration.

    In addition to training, the project includes evaluation and validation steps to assess model performance, as well as flexible export options for deployment across different environments. Models can be saved and reused in multiple formats, supporting integration into real-world applications.

    Designed with reproducibility and modularity in mind, the pipeline is implemented using Python and runs seamlessly in environments like Google Colab and Jupyter Notebook. Version control with Git ensures organized experimentation and collaboration.

    Overall, this project demonstrates how modern optimization techniques can make advanced LLM fine-tuning accessible, enabling developers to adapt powerful language models efficiently for domain-specific tasks without requiring extensive hardware infrastructure.

    paper Real-Time Human Activity Recognition Video Data Annotation Tool

    Project Description

    Real-Time Human Activity Recognition Video Data Annotation Tool is a professional desktop application designed to streamline the creation of high-quality annotated datasets for human activity recognition tasks in security and surveillance domains.

    Built using PyQt5, the application provides an intuitive graphical interface that enables users to annotate multi-person video data efficiently in real time. It integrates YOLOv11 pose estimation models with OpenCV to detect and track human body keypoints across frames, allowing precise activity labeling with minimal manual effort.

    The system is powered by a scalable, multi-threaded video processing pipeline that ensures smooth performance even with high-resolution or long-duration video streams. This architecture separates video decoding, model inference, and UI rendering, resulting in responsive and efficient annotation workflows.

    A key strength of the tool lies in its robust data management capabilities. It supports COCO-compliant annotation structures, ensuring compatibility with modern deep learning pipelines. Automatic backup mechanisms safeguard project data, while a modular design allows easy extension and maintenance.

    To maximize usability across different machine learning workflows, the tool offers multi-format export options, including YOLO, Pascal VOC (XML), and CSV. This flexibility enables seamless integration with various training frameworks and platforms such as Ultralytics and Roboflow.

    The application also features a complete project-based workflow, allowing users to organize datasets, manage annotations, and maintain consistency across large-scale labeling tasks. By combining automation with user control, it significantly reduces the time and effort required to build training datasets for activity recognition models.

    Overall, this project demonstrates a practical and scalable solution for real-time video annotation, bridging the gap between raw video data and structured datasets required for advanced human activity recognition systems.

    paper End-to-End Pre-Harvest Apple Quality Grading System Using Computer Vision and Deep Learnin

    Project Description

    Apple Grading System is an end-to-end intelligent solution for automated apple quality assessment before harvest. It leverages computer vision and deep learning to detect apples in images, classify their health condition, extract visual features, and assign standardized quality grades through a user-friendly web interface.

    The system follows a multi-stage pipeline where input images are processed using a fine-tuned YOLOv8 model for apple detection. Each detected apple is cropped and passed to a ResNet18-based classifier to identify diseases such as blotch, rot, and scab, or determine if the fruit is healthy. Additional image processing techniques are applied to extract key visual features like color quality and size estimation.

    A rule-based grading engine combines classification results with extracted features (e.g., redness, uniformity, and size) to assign grades (A, B, or C), reflecting the overall quality of each apple. The final output includes annotated images, detailed per-apple analysis, and aggregated grading summaries.

    The backend is built with FastAPI, providing RESTful endpoints for single and batch image analysis, while efficiently handling asynchronous uploads and model inference. Results are stored in a SQLite database using SQLAlchemy, enabling persistent history tracking and quick retrieval of past analyses.

    On the frontend, a React-based single-page application offers an interactive experience with features such as drag-and-drop uploads, batch processing, visual result dashboards, history management, and PDF export capabilities.

    The system is designed for scalability and deployment flexibility, with trained models exported in multiple formats including PyTorch, ONNX, TensorFlow, and TFLite for cross-platform compatibility.

    Overall, this project demonstrates a complete AI-powered pipeline that integrates machine learning, backend services, and modern frontend development to deliver a practical agricultural quality control solution.

Hire a AI Engineer

We have the best ai engineer experts on Twine. Hire a ai engineer in Tunis today.