Hi, I’m Yuhe Fan, a machine learning researcher focused on scalable language models and multi-GPU distributed training. I enjoy turning complex ideas into efficient systems, from hybrid model architectures with sparse attention to rigorous evaluation protocols that validate general LM capabilities. I completed my MSc in Applied Computing at the University of Toronto, and a BSc in Computer Science at McGill University, where I earned Dean's List and scholarships. I thrive in collaborative, fast-paced environments like Noah's Ark Lab, and I’m always eager to tackle new ML challenges in data-rich settings.

C4ng

Hi, I’m Yuhe Fan, a machine learning researcher focused on scalable language models and multi-GPU distributed training. I enjoy turning complex ideas into efficient systems, from hybrid model architectures with sparse attention to rigorous evaluation protocols that validate general LM capabilities. I completed my MSc in Applied Computing at the University of Toronto, and a BSc in Computer Science at McGill University, where I earned Dean's List and scholarships. I thrive in collaborative, fast-paced environments like Noah's Ark Lab, and I’m always eager to tackle new ML challenges in data-rich settings.

Available to hire

Hi, I’m Yuhe Fan, a machine learning researcher focused on scalable language models and multi-GPU distributed training. I enjoy turning complex ideas into efficient systems, from hybrid model architectures with sparse attention to rigorous evaluation protocols that validate general LM capabilities.

I completed my MSc in Applied Computing at the University of Toronto, and a BSc in Computer Science at McGill University, where I earned Dean’s List and scholarships. I thrive in collaborative, fast-paced environments like Noah’s Ark Lab, and I’m always eager to tackle new ML challenges in data-rich settings.

See more

Experience Level

Expert
Expert
Expert
Intermediate
Intermediate
Intermediate
Intermediate

Work Experience

LLM Researcher at Noah's Ark Lab
February 1, 2025 - January 31, 2026
Architected novel hybrid model architectures by integrating linear mixers (Mamba, GDN, SWA) with customized sparse attention to break the quadratic complexity bottleneck of Transformers. Orchestrated large-scale multi-GPU distributed pre-training for 1B-parameter models; optimized data pre-processing and training pipelines to handle 100B-token datasets. Designed comprehensive evaluation protocols to validate general LM capabilities, retrieval, and long-context reasoning; performed rigorous latency/memory profiling, achieving 3x throughput and 40% lower peak memory at 32k context length.
Research Intern at Noah's Ark Lab
May 1, 2024 - December 1, 2024
Researched Adaptive Computation by implementing and extending the Mixture-of-Depths (MoD) architecture; introduced threshold-based routers and lightweight computation pathways to optimize compute allocation. Developed a joint post-training framework (SFT + Knowledge Distillation), recovering general LM performance to near-Transformer levels (≤1% gap) on LM-Harness benchmarks after MoD integration.

Education

Master of Science in Applied Computing at University of Toronto
September 1, 2023 - June 1, 2025
Bachelor of Science in Computer Science at McGill University
September 1, 2019 - May 1, 2023

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Computers & Electronics