Ethical data collection and data labeling to train better models.

We can provide both image and video data for long-range biometrics, facial biometrics services, and many more vision categories.
White check mark in a green circle
Ethical collection
White check mark in a green circle
Global participants
Data labeling and annotation services
Example of an active campaign

Datasets and labelling

Use Twine AI's crowd to get quality datasets at unprecedented scale

Off-the-Shelf Datasets

Audio-visual speech with multiple speakers
Large-scale audio-visual dataset comprising speech clips with no interfering background signals.
Audio-visual emotion recognition
These expressions are produced at two levels of emotional intensities (regular and strong) except for the neutral emotion that only contains regular intensity.
Instructional cooking videos
Each video contains some number of procedure steps to fulfill a recipe. All the procedure segments are temporal localized in the video with starting time and ending time. The distributions of 1) video duration, 2) number of recipe steps per video, 3) recipe segment duration and 4) number of words per sentence are shown below.
Visual question-answering tasks
Dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer.
audio-visual recordings of sign language
This corpus contains 15 spontaneous dialogues and multi-participant conversations by deaf signers, 10 of which were recorded in authentic settings like a deaf club and a bar, 5 were recorded in the lab.
A dataset for lipreading using sequences of video frames
Human lipreading performance increases for longer words, indicating the importance of features capturing temporal context in an ambiguous communication channel.

Benefits of

Six benefits of working with Twine AI
Person holding globe

Ethically Sourced Data

Each participant opts-in and will be provided with instructions they must agree to. The participant signs their consent to the use of their IP or image. We do not scrape data.

Skilled + Unskilled

Our network consists of expert freelancers, such as consultants in ML, creatives to make content, such as voiceover artists, and freelancers who work on unskilled collections at scale.

Skills include data scientists, engineers, voiceover artists, videographers, actors, and 200+ freelancer roles.
Brand designers

Tagging + Labelling

Ensure you get the most from your data context added. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods. Learn more.

Project Manager

Have your data collection project run by an experienced  Project Manager who can ensure all participants are following instructions and work with you to improve the collection process.
AI and ML

Feedback Loop

Your Project Manager will host weekly meetings, or at a schedule that works for you. They will get feedback on the collected data so we can optimise and improve the collection pipeline.

Speed + Scale

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly.

Consolidated Payments

We take away the headache of paying participants from all over the world with our integrated payment solution.

Data Storage

Store datasets at excellent pricing. Visualise your data diversity, connect data to your applications and control access. Also enable third part data auditing to prove how ethical your data is.

Why should I get ethical data?

Ethical data collection is become more important to data scientists


Bias in data collection is a distortion that results in information not being truly representative of the model you are attempting to build. Engineer your dataset to minimise bias.
Person holding globe


Consent means ensuring all participants agreed to provide the data and understand the implications. We will ensure all participants have given their consent.

Data Projection

Knowing where the data has come from is essential to ensure if all the checks and balances have taken place, and you could risk tainting your algorithm.

Here's what you get whichever plan you pick


Project Planning

Establish priorities and define deliverables with your Twine Project Manager. They'll work to develop a custom solution to meet your project objectives and timeline.


We vet and onboard collectors of your preferred demographics, set them the collection task, and pay them securely once you've approved.


Training data is packaged and formatted to your specification, then shared for your final approval.
Get started

Customer Stories

Frequently asked questions

Still feeling unsure? More questions? These might help!
What is “training data” in machine learning?

Training data refers to data that has been collected to be fed to a machine learning model to help the model learn more about the data.

Training data can be of various forms, including images, voice, text, or features depending on the machine learning model being used and the task at hand to be solved.It can be annotated or unannotated.

When training data is annotated, the corresponding label is referred to as ground truth.

What do you mean by customised data?

We mean that your dataset will be produced by participants according to your specific specifications, there are no generalised datasets here at Twine AI, unless you specifically want one.

What is ethical data collection?

By securing clear and informed consent from participants for use of their data, and giving insight into how this data might be used, we ensure ethical data collection.

What is data labeling?

Data labeling refers to the annotation process of adding tags or labels to raw data such as images, videos, text, and audio.

These tags form a representation of what class of objects the data belongs to and helps a machine learning model learn to identify that particular class of objects when encountered in data without a tag.

Why is data labeling important?

Before an AI system can identify images or analyze text on its own, it must be “trained” with hand-labeled examples. In the case of self-driving cars, that means manually labeling millions of images and videos.

Let’s imagine you want to train a sentiment analysis model. You’ll need to feed the AI model labeled examples (or “training data”) of positive, negative, and neutral sentiment. And beyond that, you’ll need to include sometimes ambiguous phrases that demonstrate human language at its most complex level, like sarcasm and irony – some of the most difficult sentiments for a machine, or even humans, to detect.

Good quality training data is key to determining the success of AI tools. It must be relevant, free from noise (like errors, duplicates, and irrelevant data) and it must be labeled correctly. Get your training data and labels in order and you’ll be able to rely on this information to improve your products, services, and everyday processes.

How is the data delivered?

You determine the mode and speed of delivery. We will provide instructions to each participant regarding the type or recording you require and what metadata needs to be provided, if it needs to be provided. We instruct contributors based on your specification.

What can I expect from my Project Manager?

Our experienced team takes your project specifications and creates custom procedures designed to maximise success. Your Project Manager is responsible for running the project: writing out the labeling instructions, ensuring the labeling quality is consistent and sourcing expert labelers.

They will be your point person for updates and the achievement of milestones.

How much does it cost and how do we pay?

The cost is based on the number of unique participants you require and how much work they need need to do.

We work with clients on a rolling monthly subscription, so you cancel or pause at any point. Payments can be made by credit card or invoice depending on the size of the project.

Do you have examples?

We can create samples if you contact us. We also have our collation of over 100 voice and visual open datasets.

Headshot photos of example portfolios
Example of an active campaign
White check mark in a green circle

Ethical datasets made easy

Check mark
Datasets for speech recognition
Check mark
Audio scene analysis
Check mark
Single person or multi-person conversation content
Check mark
Multi-language capabilities
Learn more
Check mark
Datasets for object tracking or detection
Check mark
Human action recognition and biometics
Check mark
Human facial recognition
Check mark
Drone video datasets
Learn more

How do I find out more about Twine AI?

Other kinds of data we provide:

Audio datasets:
We can create speech datasets across a wide range of demographics including gender, language, location, dialect, accent, and age. We can also create speech datasets using professional voice actors who have their own recording studios. If you’d like to use our transcription services, we can convert speech to text in order to train your ML models. Learn more.

Video datasets:
We can work with long-range biometrics, meaning we create video datasets with participants at long distances from the camera. This can be across a wide range of demographics including gender, ethnicity, age, and body size. Alternatively, we can look at facial biometrics by working with participants at a close range. Whilst creating these types of video datasets, the demographics we can look at include gender, ethnicity, age, and facial distinctions (eye colour, glasses, etc). Learn more.
Our other AI resources:

We like to keep our audience well informed on everything regarding data. Our Twine Blog has its own AI category, and within it, we have listicles of the highest-quality, open-sourced datasets out there right now. We have an article on 100+ Open Audio and Video Datasets, 100+ Speech Datasets, and listicles of datasets in almost every language you can think of!

We also have an AI Newsletter, which we send out to our AI/ML audience, providing them with the latest industry news.

Want to be in the loop on LinkedIn? Check out our Twine AI LinkedIn page, where we post our latest dataset listicles, and other exciting articles + media from the AI/ML space.

Contact Us

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.