Computer Vision

Off-the-Shelf Datasets

We also have video datasets, audio datasets, image datasets or text datasets available.

VoxCeleb

VoxCeleb is a large-scale audio-visual speech dataset built from YouTube interview clips, widely used to train and benchmark deep speaker recognition models for speaker verification, speaker identification, and robust “in-the-wild” voice AI.

Video Datasets

Speech Recognition

Casual Conversations Dataset

Casual Conversations is a large scale multimodal (video + audio) benchmark dataset built to evaluate and audit computer vision and speech models for accuracy across diverse ages, genders, apparent skin tones, and lighting conditions.

Computer Vision

Off-the-Shelf Datasets

VoxCeleb

Casual Conversations Dataset

We have many more datasets

Hire Experts

Find Work

Resources

Hire Freelancers

Comparison

Twine Network