Face Recognition

Off-the-Shelf Datasets

Check out our off-the-shelf data sets on faces. Need to find your own dataset? Get a free quote for custom data by contacting us.

We also have video datasets, audio datasets, image datasets or text datasets available.

Audio Datasets

Speech Recognition

VoxCeleb

VoxCeleb is a large-scale audio-visual speech dataset built from YouTube interview clips, widely used to train and benchmark deep speaker recognition models for speaker verification, speaker identification, and robust “in-the-wild” voice AI.

Video Datasets

Speech Recognition

Audio-visual speech with multiple speakers

Large-scale audio-visual dataset comprising speech clips with no interfering background signals.

Video Datasets

Activity Detection

Audio-visual emotion recognition

These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.

Video Datasets

Activity Detection

Instructional cooking videos

Each video contains some number of procedure steps to fulfill a recipe. All the procedure segments are temporal localized in the video with starting time and ending time. The distributions of 1) video duration, 2) number of recipe steps per video, 3) recipe segment duration and 4) number of words per sentence are shown below.

Video Datasets

Biometrics

audio-visual recordings of sign language

This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.

Video Datasets

Activity Detection

A dataset for lipreading using sequences of video frames

Human lipreading performance increases for longer words, indicating the importance of features capturing temporal context in an ambiguous communication channel.

Video Datasets

Biometrics

A dataset of videos of talking faces with transcriptions

Data were collected from 100 subjects, yielding over thousand instances of synchronized data

Video Datasets

Biometrics

Lip Reading in the Wild (LRW)

The package including the videos and the metadata is available for non-commercial, academic research.

Face Recognition

Off-the-Shelf Datasets

VoxCeleb

Audio-visual speech with multiple speakers

Audio-visual emotion recognition

Instructional cooking videos

audio-visual recordings of sign language

A dataset for lipreading using sequences of video frames

A dataset of videos of talking faces with transcriptions

Lip Reading in the Wild (LRW)

Brazil - Images Multi-pose Face Data

Sri Lanka - Images Multi-pose Face Data

Pakistan - Images Multi-pose Face Data

Nepal - Images Multi-pose Face Data

Netherlands - Images Multi-pose Face Data

Egypt - Images Multi-pose Face Data

Kenya - Images Multi-pose Face Data

South Africa - Images Multi-pose Face Data

Italy - Images Multi-pose Face Data

Poland - Images Multi-pose Face Data

Germany - Images Multi-pose Face Data

France - Images Multi-pose Face Data

Portugal - Images Multi-pose Face Data

Spain - Images Multi-pose Face Data

UAE - Images Multi-pose Face Data

Malaysia - Images Multi-pose Face Data

Hong Kong - Images Multi-pose Face Data

Indonesia - Images Multi-pose Face Data

Singapore - Images Multi-pose Face Data

Japan - Images Multi-pose Face Data

China - Images Multi-pose Face Data

USA - Images Multi-pose Face Data

India - Images Multi-pose Face Data

UK - Images Multi-pose Face Data

Driver Behavior Collection Data

Indoor Scene Recognition

UK Face Dataset

Face Dataset (Age Training)

We have many more datasets

Hire Experts

Find Work

Resources

Hire Freelancers

Comparison

Twine Network