100+ Open Audio and Video Datasets

100+ open audio and video datasets twine thumbnail

At Twine, we specialize in helping AI companies create high-quality custom audio and video AI datasets.  

But, we also get asked a lot what good off-the-shelf audio and video datasets are out there – we started searching for lists of them and realized how limited they were.

That’s why we decided to do something about it. 

Neatly packed into this article, we’ve put together a list of 100+ open audio and video datasets, so you no longer have to struggle to find them.

Not only have we included them in an easy listicle format, but we’ve also listed the number of recordings in each dataset, the number of participants involved, the languages of the speech content, the file size, and file type.

Audio Datasets

Urban Sound 8K dataset

No. Recordings: 8732

File Size: 13.84KB

Filetype: .WAV/.CSV

Language(s): US English

Description: Contains Urban sounds from 10 classes like an air conditioner, dog bark, drilling, siren, street music, etc.

https://urbansounddataset.weebly.com/urbansound8k.html

Mozilla Common Voice

No. Recordings: 75,879

File Size: 63Gb

Filetype: MP3

Language(s): US English

Description: An open-source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

https://commonvoice.mozilla.org/en/datasets

HiEve

No. Recordings: 1,000,000

Filetype: MP4

Language(s): US English

Description: The largest collection of poses which focuses on very challenging and realistic tasks of human-centric analysis in various crowds & complex events, including subway getting on/off, collision, fighting, and earthquake escape

http://humaninevents.org/

Voices Obscured in Complex Environmental Settings (VOICES) Dataset

No. Recordings: 3,903

File Size: 1.3Gb

Filetype: MP3

Language(s): US English

Description: A creative commons speech dataset targeting acoustically challenging and reverberant environments with robust labels and truth data for transcription, denoising, and speaker identification.

https://iqtlabs.github.io/voices/

Free Spoken digit dataset

No. Recordings: 3000

No. Participants: 6

File Size: 10Mb

Filetype: WAV

Language(s): US English

Description: A simple audio or speech data which consists of recordings of spoken English digits

https://github.com/Jakobovski/free-spoken-digit-dataset

The Stereo Human Pose Estimation Dataset

No. Recordings: 630

No. Participants: 26

File Size: 197.8Mb

Filetype: JPEG

Language(s): US English

Description: A dataset of stereo image pairs suited for stereo human pose estimation of upper-body people.

https://www.uco.es/investiga/grupos/ava/node/47

The Spoken Wikipedia Corpora

No. Recordings: 5,397

No. Participants: 879

File Size: 23Gb

Filetype: MP3

Language(s): US English

Description: This is a corpus of aligned spoken Wikipedia articles from the English, German, and Dutch Wikipedia

https://nats.gitlab.io/swc/

TED-LIUM

No. Recordings: 1,495

Language(s): US English

Description: Audio transcription of TED talks. 1495 TED talks audio recordings along with full-text transcriptions of those recordings

http://www.openslr.org/51/

Speech Commands Dataset

No. Recordings: 65,000

Language(s): US English

Description: 65,000 one-second long utterances of 30 short words, by thousands of different people

http://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html

Persian Consonant Vowel Combination (PCVC) Speech Dataset

No. Recordings: 30,000

No. Participants: 217

Filetype: MAT

Language(s): US English

Description: This dataset contains 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker) with a length of 30000 data samples.

https://github.com/S-Malek/PCVC

Arabic Speech Corpus

No. Recordings: 5439

Filetype: WAV

Language(s): Arabic

Description: Phonetic and orthographic transcriptions of more than 3.7 hours of MSA speech aligned with a recorded speech on the phoneme level

http://en.arabicspeechcorpus.com/

TIMIT

No. Recordings: 6,300

No. Participants: 630

Filetype: WAV

Language(s): US English

Description: Recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences

https://github.com/philipperemy/timit/blob/master/README.md

Mivia Audio Events Dataset

 No. Recordings: 6,000

Filetype: WAV

Language(s): US English

Description: 6,000 events of surveillance applications, namely glass breaking, gunshots, and screams

Urban Sound Dataset

No. Recordings: 1,302

Filetype: WAV

Language(s): US English

Description: 1302 labeled sound recordings. Each recording is labeled with the start and end times of sound events from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, enginge_idling, gun_shot, jackhammer, siren, and street_music

https://urbansounddataset.weebly.com/urbansound.html

Clotho Dataset

No. Recordings: 4,981

Filetype: MP3

Language(s): US English

Description:

A novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions

https://zenodo.org/record/3490684#.YQEqHVNKg-R

FSD50K

No. Recordings 51,197: 

Filetype: WAV

Language(s): US English

Description:

An open dataset of human-labeled sound events containing Freesound clips unequally distributed in 200 classes

https://zenodo.org/record/4060432#.X3xrgi8RqL4

Vocal Imitation Set v1.1.3

File Size: 7.6Gb

Filetype: WAV

Language(s): US English

Description:

A collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound

https://zenodo.org/record/1340763#.Xlj1By2ZN24

Google Audio set

No. Recordings: 2,084,320

Filetype: WAV

Language(s): US English

Description:

635 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos

https://research.google.com/audioset

CALLHOME American English Speech

No. Recordings: 120

No. Participants: 240

Language(s): US English

Description: 120 unscripted 30-minute telephone conversations between native speakers of English

https://catalog.ldc.upenn.edu/LDC97S42

LibriSpeech ASR Corpus

No. Recordings: 1,000

Filetype: MP3

Language(s): US English

Description: 1,000 hours of 16kHz read English speech

https://www.openslr.org/12

Speech Accent Archive

No. Recordings: 2,140

File Size: 907Mb

Filetype: MP3

Language(s): US English

Description: Parallel English speech samples from 177 countries

https://www.kaggle.com/rtatman/speech-accent-archive

Phone Conversation Data Sample

No. Recordings: 1,822

Filetype: WAV

Language(s): US English

Description: Conversations in Dutch, Japanese, and Irish English

https://summalinguae.com/data-sets/phone-conversation-data/

Alexa Wake Word Voice Samples

No. Recordings: 24

Filetype: WAV

Language(s): US English

Description: Sample of 24 Alexa wake word recordings in four languages

https://summalinguae.com/data-sets/alexa-wake-word-data/

The LJ Speech Dataset

No. Recordings: 1,300

File Size: 2.6Gb

Filetype: CSV

Language(s): US English

Description: Public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books

https://keithito.com/LJ-Speech-Dataset/

AISHELL-2

No. Recordings: 1,000,000

No. Participants: 1,991

Language(s): Mandarin

Description: The largest free speech corpus available for Mandarin ASR research

https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell2

AEDD

No. Recordings: 500

No. Participants: 5

Language(s): US English

Description: 500 utterances by a diverse group of actors (over 5 actors) simulating various emotions

http://m3c.web.auth.gr/research/aesdd-speech-emotion-recognition/

ANAD

No. Recordings: 1,384

No. Participants: 8

File Size: 2Gb

Filetype: WAV

Language(s): US English

Description: 1384 recording by multiple speakers; 3 emotions: angry, happy, surprised

https://www.kaggle.com/suso172/arabic-natural-audio-dataset

AudioMNIST

No. Recordings: 30,000

No. Participants: 60

Filetype: MP3

Language(s): US English

Description: Consists of 30000 audio samples of spoken digits (0-9) of 60 different speakers

https://github.com/soerenab/AudioMNIST

BAVED

No. Recordings: 1,935

No. Participants: 61

File Size: WAV

Filetype: 97.8Mb

Language(s): US English

Description: 1935 recording by 61 speakers (45 male and 16 female).

https://www.kaggle.com/a13x10/basic-arabic-vocal-emotions-dataset

CMU-MOSEI

No. Participants: 1,000

Language(s): US English

Description: 65 hours of annotated video from more than 1000 speakers and 250 topics; 6 Emotions (happiness, sadness, anger, fear, disgust, surprise) + Likert scale.

https://www.amir-zadeh.com/datasets

CMU-MOSI

No. Recordings: 2,199

Language(s): US English

Description: 2199 opinion utterances with annotated sentiment; Sentiment annotated between very negative to very positive in seven Likert steps

https://www.amir-zadeh.com/datasets

CMU Wilderness

No. Participants: 699

Filetype: Mp3

Language(s): US English

Description: Speech dataset with voice actors of many accents reciting passages from the Bible

http://festvox.org/cmu_wilderness/

CREMA-D

No. Recordings: 7,442

No. Participants: 91

File Size: 163Mb

Filetype: GIT-LFS

Language(s): US English

Description: 7,442 original clips from 91 actors. These clips were from 48 male and 43 female actors between the ages of 20 and 74 coming from a variety of races and ethnicities

https://github.com/CheyneyComputerScience/CREMA-D

DAPS Dataset

No. Recordings: 100

No. Participants: 200

Language(s): US English

Description: 20 speakers (10 female and 10 male) reading 5 excerpts each from public domain books

https://archive.org/details/daps_dataset

Deep Clustering Dataset

File Size: 12Mb

Filetype: WAV / Mp3 / OGG 

Language(s): US English

Description: Training deep discriminative embeddings to solve the cocktail party problem

https://www.merl.com/demos/deep-clustering

DEMoS

No. Recordings: 9697

No. Participants: 68

Language(s): US English

Description: 9365 emotional and 332 neutral samples produced by 68 native speakers https://zenodo.org/record/2544829

EEKK

No. Recordings: 1234

No. Participants: 10

Filetype: MP3

Language(s): US English

Description: 26 text passages read by 10 speakers; 4 main emotions: joy, sadness, anger, and neutral

https://metashare.ut.ee/repository/download/4d42d7a8463411e2a6e4005056b40024a19021a316b54b7fb707757d43d1a889/

Emo-DB

No. Recordings: 500

No. Participants: 10

Language(s): US English

Description: 800 recordings spoken by 10 actors (5 males and 5 females); 7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust

http://emodb.bilderbar.info/index-1280.html

EmoFilm

No. Recordings: 1115

Filetype: WAV

Language(s): US English

Description: 1115 audio instances sentences extracted from various films

https://zenodo.org/record/1326428

Emotional Voice dataset – Nature

No. Recordings: 2519

No. Participants: 100

Language(s): US English

Description: 2,519  speech samples were produced by 100 actors from 5 cultures

https://www.nature.com/articles/s41562-019-0533-6

Emov-DB

No. Recordings: 

No. Participants: 4

File Size: 1.58GB

Language(s): US English

Description: Recordings for 4 speakers- 2 males and 2 females; The emotional styles are neutral, sleepiness, anger, disgust, and amused

https://mega.nz/#F!KBp32apT!gLIgyWf9iQ-yqnWFUFuUHg!mYwUnI4K

EMOVO

No. Recordings: 84

No. Participants: 6

Language(s): US English

Description: 6 actors who played 14 sentences; 6 emotions: disgust, fear, anger, joy, surprise, sadness

http://voice.fub.it/activities/corpora/emovo/index.html

eNTERFACE05

No. Participants: 42

File Size: 801MB

Language(s): US English

Description: Videos by 42 subjects, coming from 14 different nationalities; 6 emotions: anger, fear, surprise, happiness, sadness and disgust

http://www.enterface.net/enterface05/docs/results/databases/project2_database.zip

GEMEP corpus

No. Recordings: 145

No. Participants: 10

Filetype: MP3

Language(s): US English

Description: 10 actors portraying 10 different emotional states

https://www.unige.ch/cisa/gemep

IEMOCAP

No. Participants: 10

Filetype: WAV

Language(s): US English

Description: 12 hours of audiovisual data by 10 actors; 5 emotions: happiness, anger, sadness, frustration, and neutral

https://sail.usc.edu/iemocap/iemocap_release.htm

Keio-ESD

Filetype: WAV

Language(s): US English

Description: A set of human speech with vocal emotion spoken by a Japanese male speaker; 47 emotions including angry, joy, disgusting, downgrading, funny, worried, gentle, relief, indignation, shame, etc.

http://research.nii.ac.jp/src/en/Keio-ESD.html

MSP-IMPROV

No. Recordings: 8,438

No. Participants: 12

Language(s): US English

Description: 20 sentences by 12 actors; 4 emotions: angry, sad, happy, neutral, other, without agreement

https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Improv.html

MSP Podcast Corpus

No. Recordings: 62140

No. Participants: 3260

Language(s): US English

Description: 100 hours by over 100 speakers – annotated with emotional labels using attribute-based descriptors

https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html

NISQA Speech Quality Corpus

No. Recordings: 14,000

No. Participants: 3,260

Language(s): US English

Description: Includes 14k speech samples with simulated (codecs, packet-loss, background noise) and live (mobile phone, Zoom, Skype, WhatsApp) voice call degradation conditions

https://github.com/gabrielmittag/NISQA/wiki/NISQA-Corpus

OGVC

No. Recordings: 9114 

No. Participants: 4

Language(s): US English

Description: 9114 spontaneous utterances and 2656 acted utterances by 4 professional actors

https://sites.google.com/site/ogcorpus/home/en

RECOLA

No. Participants: 46

Language(s): US English

Description: 3.8 hours of recordings by 46 participants; negative and positive sentiment (valence and arousal)

https://diuf.unifr.ch/main/diva/recola/download.html

The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)

No. Recordings: 7,356

No. Participants: 247

File Size: 24.8Gb

Filetype: WAV

Language(s): US English

Description: 7356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent

https://zenodo.org/record/1188976#.XrC7a5NKjOR

SAVEE Dataset

No. Recordings: 480

No. Participants: 4

Filetype: MP4

Language(s): US English

Description: 4 male actors in 7 different emotions, 480 British English utterances in total

http://kahlan.eps.surrey.ac.uk/savee/

SEMAINE

No. Recordings: 95

No. Participants: 21

Language(s): US English

Description: 95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions

https://semaine-db.eu/

ShEMO 3000 

No. Recordings: 3,000

No. Participants: 87

Filetype: WAV

Language(s): US English

Description: Semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers

https://github.com/mansourehk/ShEMO

Spoken Commands dataset

No. Recordings: 10,000,000

File Size: 10MB per word

Language(s): US English

Description: A testbed for voice activity detection algorithms and for recognition of syllables (single-word commands). 3 speakers, 1,500 recordings (50 of each digit per speaker), English pronunciations

https://github.com/JohannesBuchner/spoken-command-recognition

Tess

No. Recordings: 2,800

No. Participants: 2

Filetype: WAV

Language(s): US English

Description: 2,800 recordings by 2 actresses; 7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutrality.

https://tspace.library.utoronto.ca/handle/1807/24487

Thorsten dataset

No. Recordings: 22668

Filetype: WAV

Language(s): US English

Description: German language dataset, 22,668 recorded phrases, 23 hours of audio, phrase length 52 characters on average.

https://github.com/thorstenMueller/deep-learning-german-tts/

URDU-Dataset

No. Recordings: 400

No. Participants: 38

Filetype: WAV

Language(s): US English

Description: 400 utterances by 38 speakers (27 male and 11 female); 4 emotions: angry, happy, neutral, and sad.

https://github.com/siddiquelatif/urdu-dataset

VCTK dataset

No. Recordings: 44,000

No. Participants: 110

File Size: 10.94GB

Filetype: TXT

Language(s): US English

Description: 110 English speakers with various accents; each speaker reads out about 400 sentences. Samples are mostly 2–6 s long, at 48 kHz 16 bits, for a total dataset size of ~10 GiB.

https://datashare.is.ed.ac.uk/handle/10283/3443

VIVAE

No. Recordings: 1,085

No. Participants: 12

File Size: 93.5MB

Filetype: VIVAE

Language(s): US English

Description: Non-speech, 1085 audio files by ~12 speakers; non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak).

https://zenodo.org/record/4066235

VoxPopuli

No. Recordings: 400,000

File Size: 6.4T

Filetype: WAV

Language(s): US English

Description: 100K hours of unlabelled speech data for 23 languages, 1.8K hours of transcribed speech data for 16 languages, and 17.3K hours of speech-to-speech interpretation data for 16×15 directions.

https://github.com/facebookresearch/voxpopuli

Video

Twenty Billion Neurons Crowd Acting video dataset collection

No. Recordings: 220847

File Size: 19.4GB

Filetype: WEBM

Language(s): US English

Description: Large-scale Human-centric Video Analysis in Complex Events

https://20bn.com/products/datasets

The VIRAT Video Dataset

No. Recordings: 262

File Size: 12MB

Filetype: PDF

Language(s): US English

Description: The VIRAT Video Dataset is designed to be realistic, natural, and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories than existing action recognition datasets

https://viratdata.org/

The WebVid-10M Dataset

No. Recordings: 10700000

File Size: 2.5MB

Filetype: MP4

Language(s): US English

Description: A large-scale dataset of short videos with textual descriptions sourced from the web

https://m-bain.github.io/webvid-dataset/

The MECCANO Dataset

No. Recordings: 73206

No. Participants: 93

File Size: 32.3GB

Filetype: MP4

Language(s): US English

Description: The first dataset of egocentric videos to study human-object interactions in industrial-like settings.

https://iplab.dmi.unict.it/MECCANO/

Moments In Time

No. Recordings: 1,000,000

File Size: 150MB

Filetype: MP4

Language(s): US English

Description: A large-scale dataset for recognizing and understanding action in videos

http://moments.csail.mit.edu/

Something Something Dataset

No. Recordings: 220847

File Size: 19.4GB

Filetype: WEBM

Language(s): US English

Description: A large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects

https://20bn.com/datasets/something-something

BDD100K

No. Recordings: 100000

File Size: 3.9GB

Filetype: MP4

Language(s): US English

Description: Comprises ten tasks and 100K videos to estimate the progress of image recognition algorithms on autonomous driving

https://github.com/bdd100k/bdd100k

Kinetics-700

No. Recordings: 650,000

File Size: 24.3MB

Filetype: MP4

Language(s): US English

Description: A large, high-quality video dataset of URL links to approximately 650000 Youtube video clips that cover 700 human action classes.

https://deepmind.com/research/open-source/kinetics

Casual Conversations Dataset

No. Recordings: 45,186

No. Participants: 3011

File Size: 15GB

Filetype: MP4

Language(s): US English

Description: 45,000 videos (3,011 participants) and intended to be used for assessing the performance of already trained models in computer vision and audio applications

https://ai.facebook.com/datasets/casual-conversations-dataset/

VoxCeleb

No. Recordings: 1,000,000

No. Participants: 7,000

File Size: 133MB

Filetype: MP4

Language(s): US English

Description: An audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube

https://www.robots.ox.ac.uk/~vgg/data/voxceleb/

TV Human Interaction Dataset

No. Recordings: 300

File Size: 156MB

Filetype: MP4

Language(s): US English

Description: 300+ videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss

http://www.robots.ox.ac.uk/~alonso/tv_human_interactions.html

THUMOS Dataset

No. Recordings: 25,000,000

File Size: 385KB

Filetype: MP4

Language(s): US English

Description: A large collection of video clips of different kinds; the dataset can be used for action classification

https://www.crcv.ucf.edu/THUMOS14/home.html

50 Salads Dataset

No. Participants: 25

File Size: 31GB

Filetype: RGB

Language(s): US English

Description: Fully annotated 4.5 hour dataset of RGB-D video + accelerometer data, capturing 25 people preparing two mixed salads each.

http://cvip.computing.dundee.ac.uk/datasets/foodpreparation/50salads/

YoutubeFace

No. Recordings: 3425

No. Participants: 1595

Filetype: MP4

Language(s): US English

Description: A database of face videos designed for studying the problem of unconstrained face recognition in videos.

http://www.cs.tau.ac.il/~wolf/ytfaces/

PaSc

No. Recordings: 9376

No. Participants: 293

Language(s): US English

Description: Facial recognition 9,376 still images and 2,802 videos of 293 people

https://www.nist.gov/publications/challenge-face-recognition-digital-point-and-shoot-cameras

iQIYI-VID

No. Recordings: 600000

No. Participants: 5000

Filetype: MP4

Language(s): US English

Description: The largest video dataset for multi-modal person identification. It is composed of 600K video clips of 5,000 celebrities.

https://arxiv.org/pdf/1811.07548.pdf

COIN

No. Recordings: 11827

File Size: 8.47MB

Filetype: JSON

Language(s): US English

Description: 11,827 videos related to 180 different tasks, which were all collected from YouTube

https://coin-dataset.github.io/

CityScapes

No. Recordings: 25000

File Size: 51.92GB

Filetype: JPG

Language(s): US English

Description: A large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities

AVA-Kinetics Dataset

No. Recordings: 3650000

No. Participants: 39000

File Size: 7.7MB

Filetype: CSV

Language(s): US English

Description: AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity.

https://research.google.com/ava/index.html

Activity Net

No. Recordings: 20,194

File Size: 600GB

Filetype: JSON

Language(s): US English

Description: A Large-Scale Video Benchmark for Human Activity Understanding

http://activity-net.org/

Kinetics

No. Recordings: 650000

File Size: 24.3MB

Filetype: MP4

Language(s): US English

Description: A collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging.

https://deepmind.com/research/open-source/kinetics

Yahoo-Flickr Creative Commons 100 Million Dataset

No. Recordings: 100000000

File Size: 15GB

Filetype: MP4

Language(s): US English

Description: The YFCC100M is the largest publicly and freely usable multimedia collection, containing  around 99.2 million photos and 0.8 million videos from Flickr, all of which were shared under one of the various Creative Commons licenses

http://multimediacommons.org/

UMDFaces

No. Recordings: 4067888

No. Participants: 11377

File Size: 173MB

Filetype: MP4

Language(s): US English

Description: UMDFaces is a face dataset divided into two parts: Still Images – 367,888 face annotations for 8,277 subjects and Video Frames – Over 3.7 million annotated video frames from over 22,000 videos of 3100 subjects.

https://www.umdfaces.io/

Condensed Movies

No. Recordings: 462,000

File Size: 250GB

Filetype: MP4

Language(s): US English

Description: A large-scale video dataset, featuring clips from movies with detailed captions

https://www.robots.ox.ac.uk/~vgg/research/condensed-movies/

AVSpeech

No. Recordings: 290,000

File Size: 128MB

Filetype: MP4

Language(s): US English

Description: AVSpeech is a new, large-scale audio-visual dataset comprising speech video clips with no interfering background noises

https://looking-to-listen.github.io/avspeech/

EyeC3D

No. Participants: 21

File Size: 3.9GB

Language(s): US English

Description: 3D video eye tracking dataset

https://www.epfl.ch/labs/mmspg/downloads/eyec3d/

MoVi

No. Recordings: 1890

No. Participants: 90

File Size: 1.3MB

Filetype: MP4

Language(s): US English

Description: A large multi-purpose human motion and video dataset

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253157

Thör

No. Recordings: 22668

File Size: WAV

Language(s): US English

Description: A public dataset of human motion trajectories, recorded in a controlled indoor experiment.

http://thor.oru.se/

SEWA

No. Participants: 398

Filetype: WAV

Language(s): US English

Description: More than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures; emotions are characterized using valence and arousal.

https://db.sewaproject.eu/

Conclusion

We hope you found the right dataset to kickstart your project – after all, machine learning starts with the right data!

If you didn’t manage to find something you were looking for – don’t panic. Twine has got your back. We’ve worked with several AI companies all over the world, so we know what we’re talking about when it comes to creating AI datasets. 

If you’d like a further discussion with our team and want to learn more about how we can help, please don’t hesitate to reach out here. We’re dedicated to helping source quality information for you. 

Twine

Twine

Twine's platform curates the best quality creative freelancers to grow your business, saving time and money whilst ensuring quality results on your projects.