I am a highly driven researcher with a background in philosophy and a strong theoretical curiosity about reinforcement learning. My path in machine learning and data science has drawn me toward RL, particularly RLHF, because it sits at the intersection of environment modeling and human judgment. During my studies I explored large parts of the RL literature, including Barton and Sutton, and followed online coursework on DP-methods for solving MDPs. I am eager for an opportunity to deepen my learning in RLHF to bridge my knowledge with the needs of a team or project.
In practice, I have worked with human annotations, dataset curation, and RAG concepts. At SODAS (University of Copenhagen) I created LLM-based annotations for a study on cooperation behavior, translating plain speech into labels {0, 1, -1}. My BA thesis examined evaluation metrics for machine translation, highlighting how high-quality human annotations drive model robustness and how human preferences can guide model outputs. I also augmented an existing dataset with AI-generated paraphrases to study data generation and labeling scrutiny, and I built a small FAISS-based vector store to explore RAG workflows using my own writings.
Skills
Language
Work Experience
Education
Qualifications
Industry Experience
Skills
Hire Malte Ro Buchwald today
To get started post up your job and then invite Malte Ro Buchwald to your job.