I'm Kunhong Yu, a data scientist and multimodal ML researcher with a track record of turning messy data into actionable insights. I design scalable data pipelines, build end-to-end ML systems, and explore vision-language and NLP problems across fintech, e-commerce, and research settings, collaborating with cross-functional teams worldwide. I enjoy translating complex data into practical solutions, deploying models to production, and mentoring others on best practices for experimentation, reproducibility, and responsible AI. I’m excited to apply my skills to real-world challenges in data engineering, ML research, and product development.

Kunhong Yu

I'm Kunhong Yu, a data scientist and multimodal ML researcher with a track record of turning messy data into actionable insights. I design scalable data pipelines, build end-to-end ML systems, and explore vision-language and NLP problems across fintech, e-commerce, and research settings, collaborating with cross-functional teams worldwide. I enjoy translating complex data into practical solutions, deploying models to production, and mentoring others on best practices for experimentation, reproducibility, and responsible AI. I’m excited to apply my skills to real-world challenges in data engineering, ML research, and product development.

Available to hire

I’m Kunhong Yu, a data scientist and multimodal ML researcher with a track record of turning messy data into actionable insights. I design scalable data pipelines, build end-to-end ML systems, and explore vision-language and NLP problems across fintech, e-commerce, and research settings, collaborating with cross-functional teams worldwide.

I enjoy translating complex data into practical solutions, deploying models to production, and mentoring others on best practices for experimentation, reproducibility, and responsible AI. I’m excited to apply my skills to real-world challenges in data engineering, ML research, and product development.

See more

Experience Level

Expert
Expert
Expert
Expert

Language

English
Fluent
Polish
Beginner

Work Experience

Web Scraping Data Engineer at Point72 Poland
September 1, 2025 - September 1, 2025
Developed and optimized web scraping pipelines using Python, BeautifulSoup, and Playwright to extract data from JavaScript-heavy pages and APIs, ensuring compliance with legal guidelines. Led migration of CI/CD pipelines from Jenkins to Airflow, designed DAGs, and built custom operators to integrate scraping tasks with analytics systems. Incorporated large-language-model-based processing to extract and structure data from images, PDFs, and long texts.
Deep Learning Researcher at VSPN
February 1, 2021 - February 1, 2021
Detected main actions in phone screen images with YOLOv3 and performed text recognition with CRNN OCR. Converted trained model to ONNX for production deployment.
Machine Learning Researcher/Data Scientist at Kingstar Fintech
December 1, 2020 - December 1, 2020
Developed end-to-end CRNN OCR models to detect bounding boxes and recognize text on receipts. Improved text clarity using AutoEncoder and distillation for smaller models and distributed training. Applied Conditional GANs to generate augmented datasets for image captcha recognition; built multi-task models using GoogLeNet and MobileNetv2.
Machine Learning Engineer Intern at PhoenixNet
September 1, 2018 - September 1, 2018
Cleaned and preprocessed user clickstream data; tuned hyperparameters of deep learning recommendation models; improved training efficiency.
Web Scraping Data Engineer (Project: Japanese Job Posting) at Point72
May 1, 2024 - May 1, 2024
Japanese Job Posting project: built scrapers and reusable frameworks to collect and process large-scale data from multiple Japanese job posting sites, with CI/CD pipelines using Jenkins and Airflow. Leveraged PySpark for large-scale analysis with data quality checks and time-series visualization.
Core Developer at Point72
May 1, 2025 - May 1, 2025
LLM Image and PDF Data Parsing: parsed PDFs from diverse sources using an internal GPT client to extract data from images, pure texts, and image-like tables. Implemented iterative improvement with chain-of-thought, in-context learning, and reflection mechanisms to refine accuracy.
Core Developer at UW
April 1, 2023 - April 1, 2023
Prompt Tuning by Context Template Pool Optimisation for Vision-Language Model: reviewed state-of-the-art literature, generated ideas with advisors, and developed a novel two-step algorithm for CLIP model prompt-tuning. Validated with large-scale experiments and disseminated results.
Core Developer at UW
June 1, 2022 - June 1, 2022
FullyLight: Keras-like neural network classifier implemented in R with a vectorized forward-backward pass; interactive visualization of training dynamics via Shiny.
Core Developer at Kingstar Fintech
May 1, 2020 - May 1, 2020
Anomaly Detection in Financial Transactions: preprocessed data with Python/SQL (Pandas/Sklearn/Matplotlib/Seaborn); evaluated multiple ML algorithms (logistic regression, Random Forest, AutoEncoder-based, XGBoost, CatBoost) to detect fraudulent activity; documented architecture and data handling in Markdown/Wiki.

Education

PhD in Computer Science (Multimodal Machine Learning) at Auckland University of Technology (AUT)
October 1, 2025 - November 8, 2025
Master’s Degree in Data Science for Business Analytics (DSBA) at University of Warsaw (UW)
October 1, 2021 - June 1, 2023
Master’s Degree in Computer Science at Beijing Jiaotong University (BJU)
September 1, 2016 - June 1, 2019
Bachelor’s Degree in Internet of Things (IoT) at Anhui Normal University
September 1, 2012 - June 1, 2016

Qualifications

Add your qualifications or awards here.

Industry Experience

Software & Internet, Professional Services, Media & Entertainment, Financial Services, Education