Skip to main content
. 2022 Feb 16;11:giac003. doi: 10.1093/gigascience/giac003

Table 3.

: Minimal (required) contextual data fields

Field name1 Definition Guidance
specimen collector sample ID The user-defined name for the sample Every Sample ID from a single submitter must be unique. It can have any format, but we suggest that you make it concise, unique, and consistent within your laboratory, and as informative as possible
sample collected by The name of the agency that collected the original sample The name of the agency should be written out in full (with minor exceptions) and consistent across multiple submissions
sequence submitted by The name of the agency that generated the sequence The name of the agency should be written out in full (with minor exceptions) and be consistent across multiple submissions
sample collection date The date on which the sample was collected Record the collection date accurately in the template. Required granularity includes year, month, and day. Before sharing these data, ensure that this date is not considered identifiable information. If this date is considered identifiable, it is acceptable to add “jitter” to the collection date by adding or subtracting calendar days. Do not change the collection date in your original records. Alternatively, “received date” may be used as a substitute in the data you share. The date should be provided in ISO 8601 standard format “YYYY-MM-DD”
geo_loc name (country) Country of origin of the sample Provide the country name from the pick list in the template
geo_loc name (state/province/region) State/province/region of origin of the sample Provide the state/province/region name from the GAZ geography ontology. Search for geography terms at https://www.ebi.ac.uk/ols/ontologies/gaz
Organism Taxonomic name of the organism Use “Severe acute respiratory syndrome coronavirus 2”
Isolate Identifier of the specific isolate This identifier should be an unique, indexed, alphanumeric ID within your laboratory. If submitted to the INSDC, the “isolate” name is propagated throughout different databases. As such, structure the “isolate” name to be ICTV/INSDC compliant in the following format: “SARS-CoV-2/host/country/sampleID/date”
host (scientific name) The taxonomic, or scientific name of the host Common name or scientific name are required if there was a host. Scientific name example: Homo sapiens. Select a value from the pick list. If the sample was environmental, put “not applicable.”
host disease The name of the disease experienced by the host This field is only required if there was a host. If the host was a human select COVID-19 from the pick list. If the host was asymptomatic, this can be recorded under “host health state details.” “COVID-19” should still be provided if the patient is asymptomatic. If the host is not huma, and the disease state is not known or the host appears healthy, put “not applicable.”
purpose of sequencing The reason that the sample was sequenced The reason why a sample was originally collected may differ from the reason why it was selected for sequencing. The reason a sample was sequenced may provide information about potential biases in sequencing strategy. Provide the purpose of sequencing from the pick list in the template. The reason for sample collection should be indicated in the “purpose of sampling” field
sequencing instrument The model of the sequencing instrument used Select a sequencing instrument from the pick list provided in the template
consensus sequence software name The name of software used to generate the consensus sequence Provide the name of the software used to generate the consensus sequence
consensus sequence software version The version of the software used to generate the consensus sequence Provide the version of the software used to generate the consensus sequence
1

Through consultation and consensus, 14 fields were prioritized for SARS-CoV-2 surveillance, which are considered required in the specification. Field names, definitions, and guidance are presented.