Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (A*STAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

Mahani Aljunied

Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I've worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (A*STAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages. My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I've worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I've also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

Available to hire

Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I’ve worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (A*STAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages.

My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I’ve worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I’ve also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.

See more

Language

Malay
Fluent
English
Fluent
Indonesian
Advanced
Thai
Beginner
Vietnamese
Beginner
Bengali
Beginner
Urdu
Beginner
Arabic
Beginner

Work Experience

Senior Data Engineer, Lead Linguist at Damo Academy, Alibaba
July 1, 2025 - July 1, 2025
Led multilingual NLP data initiatives for multimodal LLMs (Lingshu). Responsibilities included pretraining data support, domain term adaptation, harm-filtering, labeling verification, and building data pipelines; designed benchmarks SeaExam, M3Exam, SeaBench; coordinated data curation across SEA languages; supervised a core internal team and vendor annotators across multiple countries; evaluated model outputs and guided data selection.
Contract Engineer / Senior Research Engineer II/III at Institute for Infocomm Research, A-STAR
December 31, 2017 - December 31, 2017
Customized EN-ID MT for maritime reports; ID-EN social media MT data curation and quality assessment; Malay text normalization; Thai POS tagging for joint-lab; Malay TTS data creation; speech translation contributions; grapheme-to-phoneme converter; copy editing of Advanced Chinese Spoken Language Processing materials.
English Phonetics / Phonology Trainer at Linguistic Systems
December 31, 2006 - December 31, 2006
Designed and delivered IPA-based English phonetics training; developed pedagogy and materials; taught pronunciation and phonology to local teachers.
Freelance Linguist at Independent / Freelance
July 31, 2004 - July 31, 2004
Transcribing non-standard English speech to phonemes; contributed to Malay speech dataset collection; translation/editing of government-related Malay content; adapted rule-based English analyzer to process emails.
Operations Lead, Translation Service Unit at Kent Ridge Digital Labs / ISS
February 7, 1995 - December 31, 2000
Led translation service operations; recruited, trained, and supervised translators and post-editors; produced marketing materials, client engagement, and quality assurance for MT-related services; helped establish multilingual name variations and language-specific processing.

Education

B.A (Hons.) in English Language & Linguistics at National University of Singapore (NUS)
January 1, 1990 - December 31, 1994
B.A in Sociology (Social Theory, Ethnic Relations, Social Psychology) & English Language double major at National University of Singapore (NUS)
January 1, 1993 - December 31, 1993
Certificate IV in TESOL (Teaching English as a 2nd Language) at CA International College
January 1, 2013 - December 31, 2013
Conversational Thai Language at NUS Extramural Studies
January 1, 1997 - December 31, 1997

Qualifications

Best English Language & Linguistics Student
January 1, 1993 - December 31, 1993
Certificate IV in TESOL
January 1, 2013 - December 31, 2013

Industry Experience

Software & Internet, Education, Other

Hire a Data Annotator

We have the best data annotator experts on Twine. Hire a data annotator in Singapore today.