
The Casual Conversations dataset contains 45,000+ videos from 3,011 consented participants, created specifically to evaluate the performance and reliability of pre-trained AI models in computer vision and audio applications. The videos feature paid individuals who agreed to participate in the project and explicitly provided age and gender labels themselves. The videos were recorded in the U.S. with a diverse set of adults in various age, gender and apparent skin tone groups. A group of trained annotators labeled the participants’ apparent skin tone using the Fitzpatrick scale in addition to annotations of whether the videos were recorded in low ambient lighting conditions.
To support multimodal and speech research, all spoken content is manually transcribed by human annotators and is available with the dataset. The dataset is intended for use under the permitted purposes defined in the data user agreement.
Labels
Age (self-provided): 3,011
Gender (self-provided): 3,011
Skin Tone (human labelled): 3,011
Lighting (human labelled): 45,186
Speech Transcriptions (human labelled): 45,186
Participants agreed to participate in the project and explicitly provided their age and gender labels themselves.
License Limited; see full license language for use
Summary of license permissions
- You can evaluate models on the provided labels
- You cannot train any model with the provided labels