I'm an ML/Data Engineer specializing in NLP, Generative AI, and scalable data platforms. I lead the delivery of LLM/agent workflows and high-throughput ETL to improve data quality, reliability, and time-to-insight across research and enterprise. I'm a cross-functional operator with a track record in governance and FAIR data at Nestlé R&D, CERN, Elsevier, and UvA.

Ilias Koutsakis

I'm an ML/Data Engineer specializing in NLP, Generative AI, and scalable data platforms. I lead the delivery of LLM/agent workflows and high-throughput ETL to improve data quality, reliability, and time-to-insight across research and enterprise. I'm a cross-functional operator with a track record in governance and FAIR data at Nestlé R&D, CERN, Elsevier, and UvA.

Available to hire

I’m an ML/Data Engineer specializing in NLP, Generative AI, and scalable data platforms. I lead the delivery of LLM/agent workflows and high-throughput ETL to improve data quality, reliability, and time-to-insight across research and enterprise.
I’m a cross-functional operator with a track record in governance and FAIR data at Nestlé R&D, CERN, Elsevier, and UvA.

See more

Experience Level

Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Expert
Intermediate
Intermediate
See more

Language

English
Fluent

Work Experience

Senior Data Specialist at Nestlé Research & Development
July 1, 2023 - Present
Led core data engineering efforts: streamlined weekly data deliveries/ingestion for the primary Food Safety database, built end-to-end pipelines for processing and alignment, and consolidate the results in Snoeflake data lakes. Designed, deployed, and scaled Generative AI solutions (LangChain, LLMs, RAG, NER; LLM pipelines w/ FAISS) processing millions of data points to significantly advance automation for ETL, data analysis, and reporting, boosting critical data quality metrics. Drove strategic data management and governance initiatives (standardization, Microsoft Purview), including the correction/enhancement of over 20 years of legacy data for a major digital transformation program. Collaborated effectively with diverse stakeholders across cross-functional teams (business, data science, IT) to successfully scale analytics projects globally.
AI Research Engineer / Data Steward at University of Amsterdam (IRLab)
February 1, 2023 - September 5, 2025
Supported advanced NLP/LLM research and provided data stewardship for university labs within the Information Retrieval Lab (IRLab). Built and deployed end-to-end ML pipelines (Transformers, PyTorch, Docker) providing infrastructure for analysis, visualization, and experiments used by researchers. Established improved methods for research data management (preservation, reusability - FAIR principles) as Data Steward, and provided technical guidance and code reviews to students/researchers.
Data Engineer at CERN
August 1, 2021 - September 5, 2025
Contributed to the development of data policies and tools for Open Science, with a primary focus on CERN Analysis Preservation, a FAIR research data preservation platform. Implemented integrations with external services such as GitHub and Zenodo, utilized Docker and microservices for seamless deployment, and created data models that accommodated CERN’s experimental requirements while ensuring platform stability and functioning APIs. Fostered close collaboration with CERN experiments and continuously engaged with platform users, excelling in effective communication with diverse stakeholders to gather and specify tailored requirements that met their data management needs. Became the technical owner for Scoap3, a HEP Open Access initiative, managing all technical aspects related to deployment, maintenance, and enhancement, while working closely with platform administrators and ensuring clear communication with all involved parties.
Machine Learning Engineer at Elsevier
August 1, 2018 - September 5, 2025
Utilized, processed and analyzed vast textual datasets within the scientific publishing industry; designed, developed, and deployed internal tools and PoCs addressing challenges such as document classification, scientific document OCR, and named entity recognition. Delivered regular data-driven analytics insights on Scopus, evaluating data quality and completeness, and contributed to the enhancement of Knovel by incorporating custom NLP-based plugins for improved semantic search capabilities; also developed and deployed custom applications for data quality control. Extracted and analyzed large datasets using Scala, Spark, and the Databricks platform. Collaborated with various domain experts and stakeholders, mentored new hires and interns, and fostered a collaborative work environment in the Content & Innovation department.
NLP (NATURAL LANGUAGE PROCESSING) RESEARCH INTERN at Elsevier
June 1, 2017 - September 5, 2025
Conducted research on novel document classification techniques using word embeddings in the domain of scientific literature, as part of MSc thesis; additionally I developed Classy, an automated tool for comparing and evaluating classification algorithms focused on text data, resulting in a paper presented at the ACM CIKM 2017 conference in Singapore.
Summer of Code Student at European Space Agency
September 1, 2016 - September 5, 2025
Selected for ESA’s 'Summer of Code in Space' program, where I collaborated with the Italian Mars Society on a project focused on solar data extraction, analysis, and developing a solar storm prediction system.
Software Developer at CERN
May 1, 2016 - September 5, 2025
Enhanced InspireHEP, a leading digital repository for High Energy Physics, by implementing both frontend and backend features, and fostered collaborations with institutions such as DESY and Stanford University, ensuring the platform met the needs of a diverse user base, ultimately serving over 50,000 users worldwide. Developed a curation tool for HEP papers, leveraging semi-supervised machine learning techniques and Python programming to efficiently classify papers and manage end-to-end data workflows from extraction to storage. Implemented and authored my BSc thesis on 'Clusterix', an ETL-like tool for real-time data preprocessing, clustering, and visualization, aimed at creating a recommendation system for canonical affiliation/institution names using NLP and paper metadata; presented a workshop on the project at IEEE VDS 2016.
Software Developer Intern at Atlassian
April 1, 2014 - September 5, 2025
Enhanced JIRA, Atlassian’s flagship product, by implementing new functionalities, improving knowledge base search efficiency for Sales and Support teams, and automating issue creation through integration with external form APIs, resulting in increased productivity for support engineers.
Senior Data Specialist at Nestlé R&D
July 1, 2023 - Present
Led delivery of intelligent agent workflows (LangChain, OpenAI, local LLMs, RAG, NER) to automate complex data processing; raised correctness/completeness from ~30% to ~78% on critical datasets. Engineered and operationalized LLM pipelines (FAISS + OpenAI embeddings) processing millions of data points to advance automation in analytics and reporting. Streamlined weekly ingestion & cleaning for the primary Food Safety database, improving reliability for downstream analytics. Initiated and managed remediation for 20 years of historical data; drove governance (standards, docs) with Microsoft Purview; authored technical reports and scaled projects with data science & IT partners.
AI Research Engineer / Data Steward at University of Amsterdam (IRLab)
April 1, 2023 - September 5, 2025
Built end-to-end ML pipelines (Transformers, PyTorch, Docker) used by researchers for analysis, visualization, and experiments. Established FAIR-aligned research data management methods; served as Data Steward across labs; provided code reviews and guidance.
Data Engineer at CERN
August 1, 2021 - September 5, 2025
Developed data policies and tools for Open Science, focusing on CERN Analysis Preservation (FAIR research data preservation platform). Integrated external services (GitHub, Zenodo) with Dockerized microservices; delivered stable, experiment-ready APIs and data models. Served as primary technical owner for SCOAP³ (Open Access): deployment, maintenance, upgrades, stakeholder coordination.
Machine Learning Engineer at Elsevier
August 1, 2018 - September 5, 2025
Delivered internal tools/PoCs for document classification, OCR, and NER; accelerated document-processing workflows. Enhanced Knovel with semantic search; produced Scopus data quality insights using Scala, Spark, and Databricks; mentored interns.
NLP Research Intern at Elsevier
June 1, 2017 - September 5, 2025
Researched document classification with word embeddings (MSc thesis); developed 'Classy' tool; work published at ACM CIKM 2017.
Summer of Code Student at European Space Agency
September 1, 2016 - September 5, 2025
Built an ML pipeline for solar storm prediction with data extraction, preprocessing, modeling and visualization.
Software Developer at CERN
May 1, 2016 - September 5, 2025
Enhanced InspireHEP (50k+ users) with front- and back-end features; collaborated with DESY and Stanford. Developed a curation tool using semi-supervised ML and Python; authored BSc thesis 'Clusterix' (visual analytics for clustering).
Software Developer Intern at Atlassian
April 1, 2014 - September 5, 2025
Shipped JIRA features; improved knowledge-base search; integrated Wufoo ® JIRA for automated issue creation.
Senior Data Specialist at Nestlé R&D
July 1, 2023 - Present
Led delivery of intelligent agent workflows (LangChain, OpenAI, local LLMs, RAG, NER) to automate complex data processing; raised correctness/completeness from ~30% to ~78% on critical datasets. Engineered and operationalized LLM pipelines (FAISS + OpenAI embeddings) processing millions of data points to advance automation in analytics and reporting. Streamlined weekly ingestion & cleaning for the primary Food Safety database, improving reliability for downstream analytics. Initiated and managed remediation for 20 years of historical data; drove governance (standards, docs) with Microsoft Purview; authored technical reports and scaled projects with data science & IT partners.
Data Engineer at CERN
August 1, 2021 - September 5, 2025
Developed data policies and tools for Open Science, focusing on CERN Analysis Preservation (FAIR research data preservation platform). Integrated external services (GitHub, Zenodo) with Dockerized microservices; delivered stable, experiment-ready APIs and data models. Served as primary technical owner for SCOAP³ (Open Access): deployment, maintenance, upgrades, stakeholder coordination.
AI Research Engineer / Data Steward at University of Amsterdam (IRLab)
April 1, 2023 - September 5, 2025
Built end-to-end ML pipelines (Transformers, PyTorch, Docker) used by researchers for analysis, visualization, and experiments. Established FAIR-aligned research data management methods; served as Data Steward across labs; provided code reviews and guidance.
Machine Learning Engineer at Elsevier
August 1, 2018 - September 5, 2025
Delivered internal tools/PoCs for document classification, OCR, and NER; accelerated document-processing workflows. Enhanced Knovel with semantic search; produced Scopus data quality insights using Scala, Spark, and Databricks; mentored interns.
NLP Research Intern at Elsevier
June 1, 2017 - September 5, 2025
Researched document classification with word embeddings (MSc thesis); developed 'Classy' tool; work published at ACM CIKM 2017.
Summer of Code Student at European Space Agency
September 1, 2016 - September 5, 2025
Built an ML pipeline for solar storm prediction with data extraction, preprocessing, modeling and visualization.
Software Developer at CERN
May 1, 2016 - September 5, 2025
Enhanced InspireHEP (50k+ users) with front- and back-end features; collaborated with DESY and Stanford. Developed a curation tool using semi-supervised ML and Python; authored BSc thesis 'Clusterix' (visual analytics for clustering).
Software Developer Intern at Atlassian
April 1, 2014 - September 5, 2025
Shipped JIRA features; improved knowledge-base search; integrated Wufoo JIRA for automated issue creation.

Education

Master of Science in Data Science at University of Amsterdam
January 11, 2030 - January 1, 2017
Bachelor of Science in Computer Science at University of Athens
January 11, 2030 - January 1, 2016
MSc, Data Science at University of Amsterdam
January 1, 2015 - January 1, 2017
BSc, Computer Science at University of Athens
January 1, 2013 - January 1, 2016
MSc, Data Science at University of Amsterdam
January 11, 2030 - January 1, 2017
BSc, Computer Science at University of Athens
January 11, 2030 - January 1, 2016

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Media & Entertainment, Education, Professional Services, Government, Consumer Goods, Life Sciences, Other