There's increasingly more and more biomedical data available as researchers openly share their results.


The problem

Though these data could be reused in other analyses, these datasets can often be difficult to discover. Data is scattered across general purpose data repositories, specific biomedical data sites, data aggregators, and primary literature. Moreover, the metadata that is used to describe these datasets isn't standardized — meaning that it's harder to find useful data for particular research.

The NIAID Data Portal aggregates different data sources together in a searchable platform, making it easier to find datasets.

We also standardize the dataset metadata to a common form, increasing the findability of these datasets. Schema.org provides a widely accepted format regonizable by major search engines and data portals.

While some dataset providers already use this format to describe their datasets, others provide structured metadata in their own format. Learn more about our efforts to encapsulate non-standard dataset metdata in schema.org's format.


Our solution


The future

Aggregating together open datasets which have structured metadata is only the first step. Though all the datasets harvested and aggregated on this site organize dataset metadata in the same format, they take advantage of the schema.org standard to different extents. Additionally, since the dataset metadata is provided by the data generators, they vary widely in how much information about the dataset is given and how standardized this information is. Learn more about how the data sources use the schema.org standard

More work will be needed to be done by the community to standardize how we apply the schema.org format and the vocabulary we use to describe things like diseases, species, and funding institutions.

As a first step, the NIAID Data Dissemination Working Group has created a schema for describing biological datasets, building off the schema.org standard. Register a dataset using the NIAID schema

Data sources

We assemble the datasets from the following repositories:


Suggest a new source