Browse our off-the-shelf audio data sets below, including wake words and voice commands, soundscape recordings, and more. Don’t see the data you need? Contact us for a free quote.
VoxCeleb is a large-scale audio-visual speech dataset built from YouTube interview clips, widely used to train and benchmark deep speaker recognition models for speaker verification, speaker identification, and robust “in-the-wild” voice AI.
Wake word "Alexa" in EU Spanish (es_ES) (e.g., "Eh, Alexa, cuéntame un chiste."). Each participant has recorded on average 70 utterances (minimum 50, maximum 75). This data set contains the voice command only.