Casual Conversations Dataset

Video Datasets

Casual Conversations is a large scale multimodal (video + audio) benchmark dataset built to evaluate and audit computer vision and speech models for accuracy across diverse ages, genders, apparent skin tones, and lighting conditions.

Download Dataset Download Sample Request Quote Request Sample

Files

45186

Size

15 GB

Format

MP4

Duration

Average per video length: ~1 Minute

Country

USA

Participants

3011

Languages

Updated

December 11, 2025

Description

The Casual Conversations dataset contains 45,000+ videos from 3,011 consented participants, created specifically to evaluate the performance and reliability of pre-trained AI models in computer vision and audio applications. The videos feature paid individuals who agreed to participate in the project and explicitly provided age and gender labels themselves. The videos were recorded in the U.S. with a diverse set of adults in various age, gender and apparent skin tone groups. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale in addition to annotations of whether the videos were recorded in low ambient lighting conditions.

To support multimodal and speech research, all spoken content is manually transcribed by human annotators and is available with the dataset. The dataset is intended for use under the permitted purposes defined in the data user agreement.

Labels
Age (self-provided): 3,011
Gender (self-provided): 3,011
Skin Tone (human labelled): 3,011
Lighting (human labelled): 45,186
Speech Transcriptions (human labelled): 45,186

Dataset Technical Specification

Number of files:

45186

Total dataset size:

15 GB

Duration:

Average per video length: ~1 Minute

Format:

MP4

Sample rate:

Resolution:

Dataset Demographics

📍 Country:

USA

🧍 Gender:

📅 Age:

👥 Number of participants:

3011

🛡️ Consent & Compliance

Participants agreed to participate in the project and explicitly provided their age and gender labels themselves.

License Limited; see full license language for use

Summary of license permissions

- You can evaluate models on the provided labels

- You cannot train any model with the provided labels

‍

Download Dataset Download Sample Request Quote Request Sample

Casual Conversations Dataset

Description

Sample Download

Licence

Version Info

Dataset Technical Specification

Dataset Demographics

🛡️ Consent & Compliance

Audio-visual emotion recognition

Audio-visual speech with multiple speakers

A dataset for lipreading using sequences of video frames

European License Plate Recognition

Casual Conversations Dataset

audio-visual recordings of sign language

AI Solutions

Resources

Hire Experts

Casual Conversations Dataset

Description

Sample Download

Licence

Version Info

Dataset Technical Specification

Dataset Demographics

🛡️ Consent & Compliance

Related Datasets

Audio-visual emotion recognition

Audio-visual speech with multiple speakers

A dataset for lipreading using sequences of video frames

European License Plate Recognition

Casual Conversations Dataset

audio-visual recordings of sign language

AI Solutions

Resources

Hire Experts