Audio-visual speech with multiple speakers

Video Datasets

Large-scale audio-visual dataset comprising speech clips with no interfering background signals.

Download Dataset Download Sample Request Quote Request Sample

Files

100

Size

4 GB

Format

wav

Duration

8 hours

Country

Worldwide

Participants

Languages

English

Updated

December 8, 2025

Description

The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 20 hours of video segments with approximately 40 distinct speakers, spanning a wide variety of people, languages and face poses.

Dataset Technical Specification

Number of files:

100

Total dataset size:

4 GB

Duration:

8 hours

Format:

wav

Sample rate:

48 Khz

Resolution:

N/A

Dataset Demographics

📍 Country:

Worldwide

🧍 Gender:

M/F 60-40%

📅 Age:

18-55

👥 Number of participants:

🛡️ Consent & Compliance

✅ GDPR Compliant

Consent Summary:

All participants provided informed consent for data collection and usage in AI training applications.

Data Collection:

Data was collected through the Boom app phone conversation module with full participant awareness and agreement.

Ethical Considerations:

All data has been anonymized and privacy-preserving measures have been implemented to protect participant identities.

Download Dataset Download Sample Request Quote Request Sample

Audio-visual speech with multiple speakers

Description

Sample Download

Licence

Version Info

Dataset Technical Specification

Dataset Demographics

🛡️ Consent & Compliance

Audio-visual speech with multiple speakers

Lip Reading in the Wild (LRW)

A dataset of videos of talking faces with transcriptions

A dataset for lipreading using sequences of video frames

Instructional cooking videos

Casual Conversations Dataset

AI Solutions

Resources

Hire Experts

Audio-visual speech with multiple speakers

Description

Sample Download

Licence

Version Info

Dataset Technical Specification

Dataset Demographics

🛡️ Consent & Compliance

Related Datasets

Audio-visual speech with multiple speakers

Lip Reading in the Wild (LRW)

A dataset of videos of talking faces with transcriptions

A dataset for lipreading using sequences of video frames

Instructional cooking videos

Casual Conversations Dataset

AI Solutions

Resources

Hire Experts