Voice Commands

Off-the-Shelf Datasets

Check out our off-the-shelf data sets on voice commands. These datasets have the voice commands used by Siri, Alexa, Bixby, and Google Assistant, which are all popular voice assistants. Need to find your own voice commands? Get a free quote for custom data by contacting us.

Audio-visual emotion recognition
These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.
Instructional cooking videos
Each video contains some number of procedure steps to fulfill a recipe. All the procedure segments are temporal localized in the video with starting time and ending time. The distributions of 1) video duration, 2) number of recipe steps per video, 3) recipe segment duration and 4) number of words per sentence are shown below.
audio-visual recordings of sign language
This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.
A dataset of videos of talking faces with transcriptions
Data were collected from 100 subjects, yielding over thousand instances of synchronized data
Lip Reading in the Wild (LRW)
The package including the videos and the metadata is available for non-commercial, academic research.
Mandarin (Shanghai) (China) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Mandarin spoken in Shanghai, China
Romanian (Romania) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Romanian spoken in Romania
Polish (Poland) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Polish spoken in Poland
Panjabi (Pakistan) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Panjabi spoken in Pakistan
Mongolian (Mongolia) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Mongolian spoken in Mongolia
Mandarin (Traditional) (Taiwan) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Mandarin (Traditional) spoken in Taiwan
Mandarin (Simplified) (China) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Mandarin (simplified) spoken in China
Lao (Laos) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Lao spoken in Laos
Kannada (India) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Kannada spoken in India
Greek (Greece) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Spanish spoken in Spain
German (Switzerland) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, German spoken in Switzerland
French (Algeria) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, French spoken in Algeria
Farsi/Persian (Iran) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Farsi/Persian spoken in Iran
English (UAE) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, English spoken in UAE
English (Philippines) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, English spoken in Philippines
English (Hong Kong) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, English spoken in Hong Kong
English (Australia) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, English spoken in Australia
Dutch (Netherland) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Dutch spoken in Netherland
Dutch (Belgium) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Dutch spoken in Belgium
Spanish (Mexico) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Spanish spoken in Mexico
Spanish (ESP) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Spanish spoken in Spain
Catalan (Spain) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Catalan spoken in Catalonia, Spain.
Sinhalese (Sri Lanka) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Sinhalese spoken in Sri Lanka
Vietnamese General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Vietnamese spoken in Vietnam
Tamil General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Tamil spoken in India
Singaporean-English General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Singaporean-English spoken in Singapore
Punjabi General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Punjabi spoken in India
Malay General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Malay spoken in Malaysia
Bahasa (Indonesia) General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Bahasa spoken in Indonesia
Thai General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Thai spoken in Thailand
Gujarati General Conversation data
Unscripted conversation between two people. Approx. Audio Duration (Range) - 15-60 minutes, Gujarati spoken in India.
UK Voice Commands Dataset
Voice Commands in the English Language
Alexa Wake Words in Canadian French (Adults)
This data set contains recordings of the wake word "Alexa" in Canadian French (fr_CA) (e.g., "Alexa, raconte-moi une blague.").
Alexa Voice Commands in EU Spanish (Adults)
Wake word "Alexa" in EU Spanish (es_ES) (e.g., "Eh, Alexa, cuéntame un chiste."). Each participant has recorded on average 70 utterances (minimum 50, maximum 75). This data set contains the voice command only.
Alexa Wake Words in Mexican Spanish (Adults)
This data set contains recordings of the wake word "Alexa" in Mexican Spanish (es_MX) used in voice commands (e.g., "Oye Alexa, cuéntame un chiste.").
Siri Wake Words and Voice Commands in US English
US English voice commands including the wake word "Hey Siri" from 103 participants of age 19-68.
Wake Words and Voice Commands in Korean with Seoul Dialect
Korean voice commands including the wake word "Hi Bixby" from 52 participants in Seoul.
Google Wake Words and Voice Commands in US English
US English voice commands including the wake word "OK Google" from 103 participants of age 19-68.