Cityscapes is a large-scale urban street-scene dataset with stereo video and high-quality pixel-level annotations, built for benchmarking semantic segmentation, instance segmentation, and panoptic scene understanding for autonomous driving and smart-city computer vision.
These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.
Dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.