
The Cityscapes Dataset provides a diverse collection of stereo video sequences captured across 50 cities, representing real world urban variability in architecture, traffic density, weather, and lighting. It includes 5,000 finely annotated frames with pixel-accurate labels plus an additional 20,000 weakly annotated frames, enabling both high-fidelity evaluation and large-scale learning workflows.
Cityscapes is purpose-built for semantic urban scene understanding, supporting core vision tasks such as pixel-level semantic labeling, instance-level segmentation, and panoptic segmentation, key building blocks for autonomous driving perception stacks, robot navigation, and road-scene analytics. With its mix of fine and weak supervision, the dataset is also ideal for modern AI research themes like semi-supervised learning, weak supervision, self-training, and generalization testing of deep neural networks on complex street environments.
This Cityscapes Dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation.