A dataset of videos of talking faces with transcriptions

Data were collected from 100 subjects, yielding over thousand instances of synchronized data
Files
1000
Size
Format
wav

A large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interaction, biometric authentication, recognition systems, domain transfer, and speech recognition.

Dataset Technical Specification

Number of files:
1000
Total dataset size:
Duration:
Format:
wav
Sample rate:
Resolution:

Dataset Demographics

Country:
Worldwide
Gender:
M/F 50-50%
Age:
18-55
Number of participants:
100