Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am Masoud Karami, a PhD candidate in Computer Engineering at Polytechnique Montréal. I develop cognitively inspired evaluation frameworks for Large Language Models (LLMs), focusing on robustness, reasoning, and human alignment. I have designed a Python-based LLM Serial Memory Task pipeline to evaluate recall accuracy, forgetting dynamics, and distractor handling, and I have integrated it into real-world software engineering tasks such as automated code review. My work benchmarks over 20 foundation models (e.g., LLaMA-2, LLaMA-3, CodeLlama, Mistral, Phi, Qwen, Claude, GPT-3.5, GPT-4) using large-scale HPC workflows on Compute Canada. I investigate RLHF and prompt engineering (chain-of-thought, step-back prompting) and deliver reproducible, high-performance evaluation workflows.…I am Masoud Karami, a PhD candidate in Computer Engineering at Polytechnique Montréal. I develop cognitively inspired evaluation frameworks for Large Language Models (LLMs), focusing on robustness, reasoning, and human alignment. I have designed a Python-based LLM Serial Memory Task pipeline to evaluate recall accuracy, forgetting dynamics, and distractor handling, and I have integrated it into real-world software engineering tasks such as automated code review. My work benchmarks over 20 foundation models (e.g., LLaMA-2, LLaMA-3, CodeLlama, Mistral, Phi, Qwen, Claude, GPT-3.5, GPT-4) using large-scale HPC workflows on Compute Canada. I investigate RLHF and prompt engineering (chain-of-thought, step-back prompting) and deliver reproducible, high-performance evaluation workflows.

Masoud Karami

AI Engineer, Data Scientist, Developer, +4





I am Masoud Karami, a PhD candidate in Computer Engineering at Polytechnique Montréal. I develop cognitively inspired evaluation frameworks for Large Language Models (LLMs), focusing on robustness, reasoning, and human alignment. I have designed a Python-based LLM Serial Memory Task pipeline to evaluate recall accuracy, forgetting dynamics, and distractor handling, and I have integrated it into real-world software engineering tasks such as automated code review. My work benchmarks over 20 foundation models (e.g., LLaMA-2, LLaMA-3, CodeLlama, Mistral, Phi, Qwen, Claude, GPT-3.5, GPT-4) using large-scale HPC workflows on Compute Canada. I investigate RLHF and prompt engineering (chain-of-thought, step-back prompting) and deliver reproducible, high-performance evaluation workflows.…I am Masoud Karami, a PhD candidate in Computer Engineering at Polytechnique Montréal. I develop cognitively inspired evaluation frameworks for Large Language Models (LLMs), focusing on robustness, reasoning, and human alignment. I have designed a Python-based LLM Serial Memory Task pipeline to evaluate recall accuracy, forgetting dynamics, and distractor handling, and I have integrated it into real-world software engineering tasks such as automated code review. My work benchmarks over 20 foundation models (e.g., LLaMA-2, LLaMA-3, CodeLlama, Mistral, Phi, Qwen, Claude, GPT-3.5, GPT-4) using large-scale HPC workflows on Compute Canada. I investigate RLHF and prompt engineering (chain-of-thought, step-back prompting) and deliver reproducible, high-performance evaluation workflows.

Available to hire

I am Masoud Karami, a PhD candidate in Computer Engineering at Polytechnique Montréal. I develop cognitively inspired evaluation frameworks for Large Language Models (LLMs), focusing on robustness, reasoning, and human alignment. I have designed a Python-based LLM Serial Memory Task pipeline to evaluate recall accuracy, forgetting dynamics, and distractor handling, and I have integrated it into real-world software engineering tasks such as automated code review.

My work benchmarks over 20 foundation models (e.g., LLaMA-2, LLaMA-3, CodeLlama, Mistral, Phi, Qwen, Claude, GPT-3.5, GPT-4) using large-scale HPC workflows on Compute Canada. I investigate RLHF and prompt engineering (chain-of-thought, step-back prompting) and deliver reproducible, high-performance evaluation workflows.