The total duration of videos is 2 hours. The dataset adopted different cameras to shoot fire videos. The shooting time includes day and night.The dataset can be used for tasks such as fire detection.
This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.
Casual Conversations is a large scale multimodal (video + audio) benchmark dataset built to evaluate and audit computer vision and speech models for accuracy across diverse ages, genders, apparent skin tones, and lighting conditions.