Top Object Recognition Video Datasets of 2022

Object recognition – or object detection – is used within AI/ML models to help organizations understand real-time objects. Often, it requires a lot of data and training to handle these datasets correctly.

For those interested in Object Recognition video datasets, Twine has brought together our top selection – so you don’t have to go looking.

Are you ready?

Let’s dive into our list of the best Object Recognition video datasets in 2022.

Do you want to build a custom dataset? We specialize in helping companies create high-quality custom audio and video datasets. Find out more here

Here are our top picks for Object Recognition video datasets:

1. Largest Open Object Recognition Video Dataset

The BDD100K dataset focuses is used for driving, and in particular, multitask learning. This dataset is comprised of multi-object segmentation tracking, image tagging, road object detection, semantic segmentation, lane detection, drivable area segmentation, instance segmentation, multi-object detection tracking, domain adoption, and imitation training. 


  • 100k 40-second high-definiton videos
  • 1000 hours of driving experience with more than 100 million frames
  • Geographic, environmental, and weather diversity

Access the dataset

Not quite your style? Check out these alternatives:

  • In the BOLD Dataset (Detecting Biological Locomotion in Video), the goal is to develop a computational approach to detect biolocomotion in any unprocessed video. Contains 1,348 videos, with objects ranging from human, terrestrial quadruped, bird, reptile, cetacean, seal, fish, stingray, eel, sea snake, insects, spiders, scorpion, lobster, ball, car, train, motorbike, submarine, airplane, and helicopters.
  • The DroneCrowd Dataset has 112 video clips with 33,600 HD frames in various scenarios. 20,800 people trajectories with 4.8 million heads and several video-level attributes.

2. Best Action Recognition Video Dataset 

Something-something-v2 is an action recognition dataset of realistic action videos, collected from YouTube. It is one of the largest and most widely used datasets in the research community for benchmarking state-of-the-art video action recognition models.


  • 220,847 short trimmed videos from 174 action categories
  • The total download size is 19.4 GB
  • Contains WebM-files using the VP9 codec

Access the dataset


  • The CAMEL Dataset provides a benchmark for visual-infrared object detection. Autonomous vehicles are equipped with infrared cams to detect objects in adverse conditions – 30 video sequences with 1K+ annotations.
  • The CRCV Real-world Anomaly Detection Dataset consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies such as fighting, road accident, burglary, robbery, etc. as well as normal activities.

3. Best Sign Detection Video Dataset

LISA or Laboratory for Intelligent & Safe Automobiles Traffic Sign Dataset is a set of annotated frames and videos that comprises US traffic signs. LISA is published in two stages, i.e. one with photos and one with both videos and pictures.


  • Images collected from different cameras
  • 47 US sign types
  • 7855 annotations on 6610 boundaries

Access the dataset


  • EgoHands Hand Segmentation Dataset contains 48 Google Glass videos of complex, first-person interactions between two people. 15,053 ground-truth labeled hands.
  • The Linkopings Traffic Sign Dataset, similar to the LISA Dataset, helps to detect traffic signs in images and video. Contains 3488 traffic signs, with sequences from highways and cities recorded from more than 350 km of Swedish roads.

4. Best Bounding Box Annotation Video Dataset

YouTube-BoundingBoxes is a large-scale data set of videos with densely-sampled, high-quality, single-object bounding box annotations. The dataset uses publicly visible YouTube videos, automatically selected to feature objects in natural settings without editing or post-processing, often recorded using a phone.


  • extracted from 240,000 different publicly visible YouTube videos
  • 380,000 15 – 20 second long video segments
  • 23 types of objects

Access the dataset


  • The Objectron Dataset has 15k annotated video clips with over 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes, manually annotated 3D bounding boxes for each object.
  • MovieNet is a holistic dataset for movie understanding. Over 1,100 movies, 1.1M characters with bounding boxes and identities.

5. Best Face Recognition Video Dataset

VPCD is a dataset that contains multi-modal annotations (face, body, and voice) for all primary and secondary characters from a range of diverse TV shows and movies. Includes face-tracks for all primary and secondary characters are annotated and labeled with identity.


  • 35,000+ face tracks
  • 23+ hours, variable clip lengths
  • 300+ characters, from 6 diverse TV shows

Access the dataset


  • CondensedMovies is a story-based retrieval dataset with contextual embeddings – includes 33,976 captioned clips from 3,605 movies, 400K+ face-tracks, 8K+ labeled characters, 20K+ subtitles, densely pre-extracted features for each clip (RGB, Motion, Face, Subtitles, Scene).
  • APES (Audiovisual Person Search) is a dataset containing untrimmed videos of faces that are densely annotated. Over 1.9K identities are labeled along 36 hours of video, dense temporal annotations that link faces to speech segments of the same identity.

Wrapping up

To conclude, here are top picks for the best NLP Speech datasets for your projects:

  1. Largest Open Object Recognition Video Dataset: BDD100K Dataset
  2. Best Action Recognition Video Dataset: Something-something-v2 Dataset
  3. Best Sign Detection Video Dataset: LISA Traffic Light Dataset
  4. Best Bounding Box Annotation Video Dataset: YouTube-BoundingBoxes Dataset
  5. Best Face Recognition Video Dataset: VPCD Dataset

We hope that this list has either helped you find a dataset for your project or, realize the myriad of options available to you. 

If there are any datasets you would like us to add to the list then please let us know here.

If you would like to find out more about how we could help build a custom dataset for your project then please don’t hesitate to contact us!

Let us help you do the math – check our AI dataset project calculator.

Ready to learn more? Check out our Dataset Archives:

Twine AI

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods.