Get speech datasets for voice recognition.

With our community of over 400,000+ skilled professionals, we can provide both voice datasets and transcription services.
Speech datasets for voice recognition
Map of the world highlighting counties we work in across the globe

Worldwide languages

Twine AI covers 153 languages and thousands of dialects. Our community of freelancers provides us with an unparalleled global resource for talent.

Participants from a variety of backgrounds, languages, dialects, and cultures can assist in creating a suitable dataset for you. No matter the requirements, we will be able to find and source the right speech dataset for your project.
Contact us
Example AI data request

Accurate transcription

We can work with global data collectors using our speech recording software or professional voice actors for voice datasets, who will provide vocal data using their own recording studios.

For transcription, we can use our network to train ML models by converting speech to text by humans.
Get started

Benefits of
Twine
AI

Benefits of working with Twine AI
Brand designers

Quality

Highest-quality audio recordings in uncompressed WAV 44kHz, 16-bit format. Ideal for training speech recognition models whether for on-chip or cloud-based software.

Scalability

Twine specializes in building voice datasets in major global languages with thousands of dialects. From complex scripts to to simple, our participants are vetted in our QA process train your models.
Person holding globe

Customizable

Participants can follow a script or record conversational audio. You can set a requirement for gender balance, accent variety and specific language requirements.
Audio

Audio Formats

Files can be delivered as needed for your project, whether split, or consolidated. If you prefer compressed audio, or uncompressed we can deliver what you need with relevant meta data.

Project Manager

Have your data collection project run by an experienced  Project Manager who can ensure all participants are following instructions and work with you to improve the collection process.

Payments

We take away the headache of paying participants from all over the world with our integrated payment solution.

Here’s the steps to start working together

1

Project Planning

Establish priorities and define deliverables with your Twine Project Manager. They'll work to develop a custom solution to meet your project objectives and timeline.
2

Production

We vet and onboard collectors of your preferred demographics, set them the collection task, and pay them securely once you've approved.
3

Delivery

Training data is packaged and formatted to your specification, then shared for your final approval.
Get started

How do I find out more about Twine AI?

Other kinds of data we provide:

Vision data:
We can work with long-range biometrics, meaning we create video datasets with participants at long distances from the camera. This can be across a wide range of demographics including gender, ethnicity, age, and body size. Alternatively, we can look at facial biometrics by working with participants at a close range. Whilst creating these types of video datasets, the demographics we can look at include gender, ethnicity, age, and facial distinctions (eye colour, glasses, etc). Learn more.
Our other AI resources:
We like to keep our audience well informed on everything regarding data. Our Twine Blog has its own AI category, and within it, we have listicles of the highest-quality, open-sourced datasets out there right now. We have an article on 100+ Open Audio and Video Datasets, 100+ Speech Datasets, and listicles of datasets in almost every language you can think of!

We also have an AI Newsletter, which we send out to our AI/ML audience, providing them with the latest industry news.

Want to be in the loop on LinkedIn? Check out our Twine AI LinkedIn page, where we post our latest dataset listicles, and other exciting articles + media from the AI/ML space.
Headshot photos of example portfolios
Example of an active campaign
White check mark in a green circle

Need to build datasets?

Check mark
Datasets for speech recognition
Check mark
Audio scene analysis
Check mark
Single person or multi-person conversation content
Check mark
Multi-language capabilities
Get started
Check mark
Datasets for object tracking or detection
Check mark
Human action recognition and biometics
Check mark
Human facial recognition
Check mark
Drone video datasets
Learn more

Contact Us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.