Solution-oriented, theoretically grounded AI/NLP linguist who thrives on turning complex multilingual data into practical AI solutions. I’ve worked for over 7 years as a lead linguist in Alibaba, and more than 10 years in the national R&D institute I2R (A*STAR). There, I helped to build and mentor core annotation teams, shaping data curation across languages and modalities, and across different systems, including machine translation engines, various models for NLP tasks like sentiment analysis, chunking, as well as data for foundational LLM models for Southeast Asian languages.
My work concentrates on advancing AI usage for under-represented languages and under-skilled domains. In Alibaba, I was the key contributor to building and designing these LLM benchmarks: SeaExam, M3Exam, SeaBench, AudioBench. I’ve worked on quality reviewing of both training and open source tuning data, besides curating new tuning data. Besides foundational LLMs, I’ve also designed data annotation projects sentiment analysis models training, improving pronunciation dictionaries, and writing rubrics for model output evaluation across different task categories. I combine hands-on data work along with coaching linguists to improve model coverage, safety, and real-world impact with the rest of the team members.
Experience Level
Language
Work Experience
Education
Qualifications
Industry Experience
Experience Level
Hire a Data Annotator
We have the best data annotator experts on Twine. Hire a data annotator in Singapore today.