FAQ for N3C PPRL Datasets

Jump To: VIRAL VARIANCEMORTALITY

VIRAL VARIANCE

  • Are they technical specifications available?

    Yes, detailed technical specifications can be found here.

  • Who do I contact for help?

    If you have questions about the please submit a ticket to:

  • Do I need to send both the summary and sequence data?

    No, you do not. The pandemic is a public health crisis, and we appreciate any information you are willing to send. N3C has taken a phased approach to participation in viral variant information.

    Phase I requires a simple file, CSV with 4 fields (Specimen ID, N3C pseudo-ID, Specimen Date and Viral Variant Summary) that you place in your existing organization’s N3C sFTP folder.

    Value Submitted Pango Sub_Lineage Pango Lineage WHO Lineage
    "DELTA" NULL NULL DELTA
    "B.1.617.2" NULL B.1.617.2 DELTA (imputed)
    "AY.4" AY.4 B.1.617.2 (imputed) DELTA (imputed)
    DELTA, B.1.617.2 NULL B.1.617.2 (parsed from text) DELTA (parsed from text)

    Phase II is the submission of the PPRL file to the honest broker and the submission of the viral variant sequence to NCBI. More info.

  • How can I get access to the linked viral variant sequence data and N3C medical history?

    Access to the integrated N3C data and NCBI viral variant sequence data will require an “Interconnect Agreement” between two or more NIH data enclaves at different institutes. The NIH Interconnect agreement is currently being developed but no timeframe for its availability has been established.

  • Do you want information on every Viral Variants?

    We are initially only curating a subset Variants Being Monitored (VBM) which can be found here.

  • What is WHO Lineage and PANGO Lineage?

    The World Health Organization, (WHO) Lineage is the “common name” and is always a Greek letter. This was implemented by the WHO to facilitate communications about viral variants (See below).

    The WHO Linage is a very high-level concept or summary label of a viral variant. Other lineages like PANGO are ontologies and have thousands of rows that delineate anything from subtype of a viral variant to the earliest date recorded. In addition, the WHO classification the CDC Classification can be found here.

    World Health Organization, WHO - Naming SARS-CoV-2 variants More info

    “The established nomenclature systems for naming and tracking SARS-CoV-2 genetic lineages by GISAID, Nextstrain and Pango are currently and will remain in use by scientists and in scientific research. To assist with public discussions of variants, WHO convened a group of scientists from the WHO Virus Evolution Working Group (now called the Technical Advisory Group on Virus Evolution), the WHO COVID-19 reference laboratory network, representatives from GISAID, Nextstrain, Pango and additional experts in virological, microbial nomenclature and communication from several countries and agencies to consider easy-to-pronounce and non-stigmatizing labels for VOI and VOC. At the present time, this expert group convened by WHO has recommended using letters of the Greek Alphabet, i.e., Alpha, Beta, Gamma, Delta which will be easier and more practical to be discussed by non-scientific audiences” .

    Naming SARS-CoV-2 variants More info

  • Do I need to parse the Viral Variant Summary data?

    NO! We want to make this is as easy as possible and will take any combination of WHO Lineage, or PANGO Parent or subtype values. As long as the files include delimiters between values, N3C will parse and curate this information prior to it being made available for investigators in N3C.

  • What information will be available for investigators?

    Name Description Data Example
    specimen_who_lineage WHO Lineage Name this is a Greek Letter i.e., Alpha, Beta, Gamma, Delta which is easier and more practical to be discussed by non-scientific audiences More info Delta
    specimen_PANGO_lineage “The PANGO nomenclature is being used by researchers and public health agencies worldwide to track the transmission and spread of SARS-CoV-2, including variants of concern. This website documents all current PANGO lineages and their spread, as well as various software tools which can be used by researchers to perform analyses on SARS-COV-2 sequence data” More info More info B.1.617.2
    specimen_Pango Sub_Lineage See above AY
    specimen_designation Designation defines transmissibility, disease severity, risk of reinfection, and impacts on diagnostics and vaccine performance More info VOC
    specimen_date The date the specimen was collected (dd/mm/yyyy) More info 07/13/21

MORTALITY

  • Can I tell if the cause of death was COVID?

    Supplemental PPRL Mortality data does not indicate cause of death. The harmonized OMOP CDM may provide a cause of death that indicates the cause of death was COVID. However, the Government, Private Obituary, and ObituaryData.com sources do not provide a cause of death.

  • Why is there a higher incidence of deaths reported on the first day of the month or the 15th day of the month?

    The Government source does not know the exact date of death for all reported deaths. If they know only the month of death (e.g., November 2021), they will provide the date of death as the first day of that month (e.g., 11/1/2021). If they know only the year of death (e.g., 2021), they will provide the date of death as the first day of that year (e.g., 1/1/2021).

    Likewise, the Private Obituary source does not know the exact date for all reported deaths. If they know only the month of death (e.g., November 2021), they will provide the date of death as either the first day of that month or the middle of that month (e.g., 11/1/2021 or 11/15/2021). If they know only the year of death (e.g., 2021), they will provide the date of death as the first day of that year (e.g., 1/1/2021).

  • Are there any training or educational resources for this dataset?

    Several resources are available within the Enclave to learn more about both PPRL and the Mortality dataset. The N3C PPRL Module contains introductory information on both PPRL and the Mortality dataset. There is also a N3C PPRL Mortality Data Guide, which contains more specific information about the Mortality dataset, and how to use it.