Dataset schema in NIAID Data Portal

A critical part of assembling datasets from different sources is making sure that they use a common schema to describe these datasets.

We rely on schema.org's Dataset schema, but each source uses the standard differently. View NIAID's Minimal Dataset Schema

There are schema.org Dataset properties

... but only are used by any of our dataset repositories.

not used prevalence in source 100% 0

As part of the NIAID Data Portal project, we coerce metadata about each of the datasets pulled from the repositories into a common format, based off of schema.org's Dataset schema.

This schema is intended to be descriptive and flexible, encompassing 119 different properties. But not all of these properties are used equally within the repositories — or even at all.

Of those, only the description and name are nearly universally used. In other cases, a property is required in some data sources but absent in the others, like variableMeasured in omicsdi. For some variables, there seems to be a consensus on which synonym tends to be used: citation is commonly used, despite the fact that there is a similar property, publication, available in the schema.

To promote greater uniformity in how biological datasets are described in their metadata, the NIAID Data Dissemination Working Group has developed a NIAID-specific dataset schema. This schema focuses on a minimal set of metadata in the schema.org Dataset standard that we consider essential to describe and find biolgical datasets. Additionally, the schema extends the schema.org schema to add a few new properties specific to biological datasets.

View NIAID's Minimal Dataset Schema Register a Dataset View data sources