Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (ASTAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.…Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (ASTAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

Mahani Aljunied





Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (ASTAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.…Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (ASTAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

Available to hire

Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I’ve worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (A*STAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages.

My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I’ve worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I’ve also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

Skills

Experience Level

AI Data Labelling

Expert

Data Science

Beginner

Python

Beginner

Language

Malay

Fluent

English

Fluent

Indonesian

Advanced

Thai

Beginner

Vietnamese

Beginner

Bengali

Beginner

Urdu

Beginner

Arabic

Beginner

Work Experience

Senior Data Engineer, Lead Linguist at Damo Academy, Alibaba

July 1, 2025 - July 1, 2025

Led multilingual NLP data initiatives for multimodal LLMs (Lingshu). Responsibilities included pretraining data support, domain term adaptation, harm-filtering, labeling verification, and building data pipelines; designed benchmarks SeaExam, M3Exam, SeaBench; coordinated data curation across SEA languages; supervised a core internal team and vendor annotators across multiple countries; evaluated model outputs and guided data selection.

Contract Engineer / Senior Research Engineer II/III at Institute for Infocomm Research, A-STAR

December 31, 2017 - December 31, 2017

Customized EN-ID MT for maritime reports; ID-EN social media MT data curation and quality assessment; Malay text normalization; Thai POS tagging for joint-lab; Malay TTS data creation; speech translation contributions; grapheme-to-phoneme converter; copy editing of Advanced Chinese Spoken Language Processing materials.

English Phonetics / Phonology Trainer at Linguistic Systems

December 31, 2006 - December 31, 2006

Designed and delivered IPA-based English phonetics training; developed pedagogy and materials; taught pronunciation and phonology to local teachers.

Freelance Linguist at Independent / Freelance

July 31, 2004 - July 31, 2004

Transcribing non-standard English speech to phonemes; contributed to Malay speech dataset collection; translation/editing of government-related Malay content; adapted rule-based English analyzer to process emails.

Operations Lead, Translation Service Unit at Kent Ridge Digital Labs / ISS

February 7, 1995 - December 31, 2000

Led translation service operations; recruited, trained, and supervised translators and post-editors; produced marketing materials, client engagement, and quality assurance for MT-related services; helped establish multilingual name variations and language-specific processing.