13+ Image Classification Datasets for Machine Learning

Image classification, the cornerstone of computer vision, unlocks a world of possibilities. Imagine AI systems that diagnose diseases from medical scans, robots navigating environments seamlessly, or self-driving cars recognising traffic signs. These marvels rely on robust datasets, the training ground for AI models. But with countless options available, which dataset is right for you?

This curated list explores 13+ diverse image classification datasets, catering to various project needs and complexities. Whether you need images of everyday objects, people, nature scenes or something more niche, you’ll find some helpful training data here.

Popular Image Classification Datasets


The MNIST database of handwritten digits is one of the most classic machine learning datasets. With 60,000 training images and 10,000 test images of 0-9 digits (10 classes of digits), MNIST is excellent for benchmarking image classification models. Ideal for testing basic algorithms and understanding image classification fundamentals.

Official dataset page

2. CIFAR-10/100

This dataset is known for its manageability and is composed of 60,000 32×32 color images, neatly divided into 10 classes with 6,000 images per class. Of these, 50,000 serve as the training subset, with the remaining 10,000 earmarked for testing. The CIFAR-10’s moderate size makes it ideal for experiments where computational resources are limited. Perfect for benchmarking and exploring convolutional neural networks (CNNs).

The dataset’s classes are listed below, along with ten randomly selected photos from each:

Similar to its sibling, CIFAR-100 ramps up complexity by offering 100 classes and are grouped into 20 superclasses each containing 600 images. Broken down, that’s 500 images per class for training and a hundred for testing. This increase in categories presents a more challenging scenario for models, sharpening the skills of classification algorithms. The dataset’s classes are listed below:

aquatic mammals
beaver, dolphin, otter, seal, whale
aquarium fish, flatfish, ray, shark, trout
orchids, poppies, roses, sunflowers, tulips
food containers
bottles, bowls, cans, cups, plates
fruit and vegetables
apples, mushrooms, oranges, pears, sweet peppers
household electrical devices
clock, computer keyboard, lamp, telephone, television
household furniture
bed, chair, couch, table, wardrobe
bee, beetle, butterfly, caterpillar, cockroach
large carnivores
bear, leopard, lion, tiger, wolf
large man-made outdoor things
bridge, castle, house, road, skyscraper
large natural outdoor scenes
cloud, forest, mountain, plain, sea
large omnivores and herbivores
camel, cattle, chimpanzee, elephant, kangaroo
medium-sized mammals
fox, porcupine, possum, raccoon, skunk
non-insect invertebrates
crab, lobster, snail, spider, worm
baby, boy, girl, man, woman
crocodile, dinosaur, lizard, snake, turtle
small mammals
hamster, mouse, rabbit, shrew, squirrel
maple, oak, palm, pine, willow
vehicles 1
bicycle, bus, motorcycle, pickup truck, train
vehicles 2
lawn-mower, rocket, streetcar, tank, tractor

Official dataset page

3. ImageNet

The behemoth of image classification, boasting 14 million of hand-annotated images across thousands of categories. The vastness and depth of ImageNet provide a rigorous benchmark for image classification dataset prowess. Ideal for large-scale training and pushing the boundaries of AI.

  • Total number of non-empty WordNet synsets: 21841
  • Total number of images: 14197122
  • Number of images with bounding box annotations: 1,034,908
  • Number of synsets with SIFT features: 1000
  • Number of images with SIFT features: 1.2 million

Official dataset page

4. ObjectNet

Crowdsourcing was used to gather test set photos for the ObjectNet dataset. Because it contains objects in odd places inside realistic, intricate backgrounds, this image set is distinct. Algorithms for object recognition suffer greatly from these crowded environments. For assessing the resilience of an algorithm taught via transfer learning, this makes it perfect.

ObjectNet is distinct from ImageNet and CIFAR-100 because it is intended only for computer vision system testing, not as a training dataset.

Details of the dataset:

  • 50,000 test photos with controls for viewpoint, rotation, and backdrop
  • 313 distinct object classes, 113 of which have ImageNet overlap

Official dataset page

5. Scene Understanding (SUN) dataset

The Scene Categorisation benchmark was created using this dataset, which was made available by Princeton University.

Details of the dataset:

  • 108,753 photos, of which 76,128 are training photos
  • 10,875 pictures of validation
  • 21,750 test pictures
  • 397 groups
  • 100 JPEG pictures minimum per category
  • a maximum of 120,000 pixels per picture

Official dataset page

6. Intel Image Classification dataset 

The Intel Image Classification dataset, initially compiled by Intel, contains approximately 25,000 images of natural scenes from around the world. The images are divided into categories such as mountains, glaciers, seas, forests, buildings, and streets.

Details of the dataset:

  • ~25,000 images are grouped into categories like streets, buildings, woods, mountains, seas, and glaciers.
  • 14,000 training images; 3,000 validation; 7,000 test images

Official dataset page

Image Classification Datasets for Specialised Domains

7. Open Images V7

Embrace diversity with ~9 million images, annotated with object bounding boxes, object segmentation masks, visual relationships, and localised narratives

The dataset is the largest one currently available with object position annotations, containing a total of 16 million bounding boxes for 600 object classes on 1.9 million photos.

Official dataset page

8. Food-101 

Craving image recognition deliciousness? This dataset consists of 101,000 images of diverse dishes for restaurant recommendation systems or dietary analysis. With 750 training and 250 test images for each category, the labels for test images have been manually cleaned. Although the training set does contain some noise

Official dataset page

9. Fashion-MNIST 

Dress your AI with 70,000 28×28 fashion images. It is divided into a training set with 60,000 images and a test set with 10,000 images. Each example is a 28 by 28 pixel grayscale image associated with a label from 10 classes. Perfect for e-commerce applications or personal style recommendations.

Official dataset page

Real-World Challenges Image classification datasets

10. COCO (Microsoft Common Objects in Context) 

Immerse yourself in 330,000 images, each annotated with 80 object categories with 5 captions describing the scene. Ideal for object detection, segmentation, captioning tasks, and training models to understand complex visual relationships.

Official dataset page

11. Places365  

This is a scene recognition dataset which consists of 10 million images comprising 434 scene classes. The dataset comes in two versions: Places365-Standard, which has 1.8 million train and 36000 validation images from K=365 scene classes, and Places365-Challenge-2016, which has 6.2 million extra images in the training set and adds 69 new scene classes (for a total of 8 million train images from 434 scene classes).

Official dataset page

12. CelebA 

This dataset is ideal for testing and training facial recognition, emotion detection, and demographic analysis models especially those that identify facial features like brown hair, smiles, and spectacles wearers.


  • 202,599 number of face images of various celebrities
  • 10,177 unique identities, but names of identities are not given
  • 40 binary attribute annotations per image
  • 5 landmark locations

Official Dataset Page

Other datasets for Image Classification

13. DeepGlobe 

It is a large-scale geographic dataset that was created for research in deep learning for automated feature extraction from satellite imagery. Details:

  • It contains over 1.17 million geographic images extracted from DigitalGlobe satellites covering rural areas, urban areas, mountains, roads, water bodies, and forests across the globe.
  • The images are high-resolution with 30 cm per pixel. The images are in RGB-NIR format with 4 channels – red, green, blue and near-infrared.
  • The dataset has three main challenges: road extraction, building detection, and land cover classification.
  • Road extraction: 8,579 images with pixel-level annotations of roads.
  • Building detection: 24,586 building footprint polygon annotations across 1,146 image chips.
  • Land cover classification: 1.17 million images categorised into 7 land cover classes – urban, agriculture, rangeland, forest, water, barren and unknown.

Offical Dataset Page

14. FGVC Aircraft 

This dataset is for fine-grained image classification of aircraft types for classification or detection tasks. It contains 10,200 images spanning 102 different aircraft variants, such as Boeing 747-400, Airbus A320, etc. The images are colour images of varying sizes. Commonly used to evaluate fine-grained visual categorisation (FGVC) models which aim to distinguish between sub-categories within a broader category like aircraft or birds. Here are some key details about it:

Official dataset page

To help make model-building easier, we have put together a list of over 150 Open Audio and Video Datasets.

Remember, the perfect dataset doesn’t exist. Consider factors like task complexity, image quality, annotation detail, and computational resources before diving in.

And if you can’t find the ideal dataset? We’ve got you covered! At Twine AI, we specialise in creating custom image classification datasets tailored to your specific needs. Contact us today to discuss your unique project and unlock the power of custom-built data!

Twine AI

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods.