Off-the-Shelf Datasets

These off-the-shelf video data sets can be used for computer vision applications such as facial recognition, object detection, and other visual recognition use cases. Don’t see the video data you need? Contact us for a free quote.
Audio-visual speech with multiple speakers
Large-scale audio-visual dataset comprising speech clips with no interfering background signals.
Audio-visual emotion recognition
These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.
Instructional cooking videos
Each video contains some number of procedure steps to fulfill a recipe. All the procedure segments are temporal localized in the video with starting time and ending time. The distributions of 1) video duration, 2) number of recipe steps per video, 3) recipe segment duration and 4) number of words per sentence are shown below.
audio-visual recordings of sign language
This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.
A dataset for lipreading using sequences of video frames
Human lipreading performance increases for longer words, indicating the importance of features capturing temporal context in an ambiguous communication channel.
A dataset of videos of talking faces with transcriptions
Data were collected from 100 subjects, yielding over thousand instances of synchronized data
Lip Reading in the Wild (LRW)
The package including the videos and the metadata is available for non-commercial, academic research.
Fire Videos Data
No one was asked to light a fire for this dataset. Video is taken from people's observations.
European License Plate Recognition
Licence plate recognition training model in Europe