Skip to main content
. 2020 Oct 19;2:20. doi: 10.1186/s42522-020-00026-3

Table 2.

The minimum set of metadata fields recommended by GenomeTrakr for BioSample submission of bacterial pathogens. Consult the “Populating the NCBI Pathogen metadata template protocol” [32] for expanded, up-to-date guidance

Required fields Description
strain This is the authoritative ID used within NCBI Pathogen Detection and for the PulseNet/GenomeTrakr networks. Although the Strain ID can have any format, we suggest that it be unique, concise, and consistent within your laboratory (e.g. CFSAN123456). There are downstream advantages to the name being entirely alpha-numeric, so avoid special characters if possible.
sample_name Sample Name is another unique identifier for the pure culture isolate and required by NCBI for BioSample submission (it cannot be left blank). It can have any format, but we suggest that it be the same as the strain name or contain another identifier important to the isolate or submitting laboratory. NCBI validates this attribute for uniqueness, so you cannot use “missing, or “not collected”. This identifier is NOT available in NCBI-PD.
organism The organism name should include the most descriptive information you have at time of submission, adhering to proper nomenclature in NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser. Check spelling carefully!
collected_by Name of laboratory that sequenced the isolate (or institute that collected the sample). Abbreviations are ok if they are well-known in the community (e.g. FDA or CDC).
attribute_package This field provides the pathogen type (or “isolation type”). Allowed values are “Pathogen.cl” (for human clinical pathogens) or “Pathogen.env” (for environmental, food, or animal clinical isolates). The value provided in this field drives validation of other fields and cannot be left blank.
collection_date Date of sampling in ISO 8601 standard: “YYYY-mm-dd”, “YYYY-mm” or “YYYY” (e.g., 1990–10–30, 1990–10, or 1990).
geo_loc_name Geographical origin of the sample using controlled vocabulary: http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, e.g., “Canada: Vancouver”. Country and state are required for GenomeTrakr isolates from the US, e.g. “USA: CA”.
isolation_source Describes the physical, environmental and/or local geographical sample from which the organism was derived. Avoid generic terms such as patient isolate, sample, food, surface, clinical, product, source, environment.
host aFor Pathogen.cl only: “Homo sapiens” if clinical isolate.
host_disease aFor Pathogen.cl only: Name of relevant disease, e.g., Salmonella gastroenteritis. This field must use controlled vocabulary provided at: http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh. Label this field “not collected” if unknown for clinical isolates. Leave blank for all Pathogen.env isolates.
bioproject_accession The accession number of the BioProject(s) to which the BioSample belongs (PRJNAxxxxxx).
lat_lon Provide latitude and longitude to support “geo_loc_name”. This field is required to be populated by NCBI. However, if this level of detail is not available, GenomeTrakr recommends including “missing” or “not collected” here.

a “For Pathogen.cl only”: These fields are mandatory ONLY if isolate is from a human clinical sample. If isolate was collected from food/water/env or animal sources, these fields should be left blank