Looks like you have JavaScript disabled. For the full Twine experience, you will need to re-enable it.

I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project. In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.…I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project. In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.

Malte Ro Buchwald

PRO

Data Scientist, AI Engineer, AI Developer, +2





I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project. In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.…I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project. In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.

Available to hire

Copenhagen, Denmark

I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project.

In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.

Skills

Python

Pytorch

Data Science

Experience Level

Python

Expert

Pytorch

Expert

Data Science

Expert

Intermediate

SQL

Intermediate

Language

Danish

Fluent

English

Advanced

French

Intermediate

Work Experience

Student Assistant at SODAS, Københavns Universitet

January 1, 2024 - Present

Created LLM-based annotations for a dataset used for a study of cooperation behavior, mapping from plain speech to labels {0, 1, -1}. Also contributed to fine-tuning an LLM during the position.