The Best Health Datasets of 2022

Health data is highly sought-after when training machine learning models. That said, it’s not always easy to find health datasets to train your models. 

That’s why we’ve done the tricky bit for you. We’ve searched high and low here at Twine to find the best health datasets.

Are you ready?

Let’s dive in.


Here are our top picks for Health Datasets:

Centers for Disease Control (CDC) Dataset

The CDC Dataset provides data on a wide variety of health-related topics like diabetes, life expectancy, cancer, and obesity. They also provide other resources you can use to find more data including the likes of COVID-19, and death and mortality rate.

Access the dataset

Drugs and FDA Dataset

This dataset provides technical information for users who are familiar with working with databases or spreadsheets. All fields are separated by tab delimiters. Each table’s primary key, data types, field lengths, and nulls appear in the list below. This data file is updated once per week, on Tuesday. The FDA provides data about what drugs are currently approved in the US, only in a database or CSV form.

Access the dataset

World Health Organization (WHO) Dataset

The WHO dataset provides data about different health-related topics. Ranging from road safety, water, and sanitation, to mental health. In this portal, you will find the most up-to-date global health data, including regional and country data organized separately in a variety of health-centered areas. The data can be visualized on charts and maps which you can download.

Access the dataset

1000 Genomes Dataset

The 1000 Genomes Dataset is an international collaboration that has established the most detailed catalog of human genetic variation, including SNPs, structural variants, and their haplotype context. 

The final phase of the project sequenced more than 2500 individuals from 26 different populations around the world and produced an integrated set of phased haplotypes with more than 80 million variants for these individuals.

Access the dataset


Wrapping up

To conclude, here are the top picks for the best health datasets for your projects:

  1. Centers for Disease Control (CDC) Dataset
  2. Drugs and FDA Dataset
  3. World Health Organization (WHO) Dataset
  4. 1000 Genomes Dataset

We hope that this list has helped you find a dataset for your project or, realize the myriad options available. 

Please let us know if there are any datasets you would like us to add to the list.

If you want to learn more about how we could help build a custom dataset for your project, don’t hesitate to contact us!

Let us help you do the math – check our AI dataset project calculator.

Ready to learn more? Check out our Dataset Archives:

Twine AI

Harness Twine’s established global community of over 400,000 freelancers from 190+ countries to scale your dataset collection quickly. We have systems to record, annotate and verify custom video datasets at an order of magnitude lower cost than existing methods.