A dataset of video clips with spoken and visual attributes

Published:
January 27, 2023

Dataset has been designed for the activity classification task of 31 activities. The videos were clipped per activity, resulting in a total of 2,000 short RGB+D video samples. Activities were performed in a natural manner. As a result, the dataset poses a unique combination of challenges: high intra-class variation, high-class imbalance, and activities with similar motion and high duration variance. Example: Make coffee, pour water, add sugar, etc.

Dataset Technical Specification

Number of files:
2000
Total dataset size:
Duration:
Format:
wav
Sample rate:
Resolution:

Dataset Demographics

Country:
Worldwide
Gender:
M/F 50-50%
Age:
18-55
Number of participants:
100