Table 2.
Required fields | Description |
---|---|
strain | This is the authoritative ID used within NCBI Pathogen Detection and for the PulseNet/GenomeTrakr networks. Although the Strain ID can have any format, we suggest that it be unique, concise, and consistent within your laboratory (e.g. CFSAN123456). There are downstream advantages to the name being entirely alpha-numeric, so avoid special characters if possible. |
sample_name | Sample Name is another unique identifier for the pure culture isolate and required by NCBI for BioSample submission (it cannot be left blank). It can have any format, but we suggest that it be the same as the strain name or contain another identifier important to the isolate or submitting laboratory. NCBI validates this attribute for uniqueness, so you cannot use “missing, or “not collected”. This identifier is NOT available in NCBI-PD. |
organism | The organism name should include the most descriptive information you have at time of submission, adhering to proper nomenclature in NCBI taxonomy database: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser. Check spelling carefully! |
collected_by | Name of laboratory that sequenced the isolate (or institute that collected the sample). Abbreviations are ok if they are well-known in the community (e.g. FDA or CDC). |
attribute_package | This field provides the pathogen type (or “isolation type”). Allowed values are “Pathogen.cl” (for human clinical pathogens) or “Pathogen.env” (for environmental, food, or animal clinical isolates). The value provided in this field drives validation of other fields and cannot be left blank. |
collection_date | Date of sampling in ISO 8601 standard: “YYYY-mm-dd”, “YYYY-mm” or “YYYY” (e.g., 1990–10–30, 1990–10, or 1990). |
geo_loc_name | Geographical origin of the sample using controlled vocabulary: http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, e.g., “Canada: Vancouver”. Country and state are required for GenomeTrakr isolates from the US, e.g. “USA: CA”. |
isolation_source | Describes the physical, environmental and/or local geographical sample from which the organism was derived. Avoid generic terms such as patient isolate, sample, food, surface, clinical, product, source, environment. |
host | aFor Pathogen.cl only: “Homo sapiens” if clinical isolate. |
host_disease | aFor Pathogen.cl only: Name of relevant disease, e.g., Salmonella gastroenteritis. This field must use controlled vocabulary provided at: http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh. Label this field “not collected” if unknown for clinical isolates. Leave blank for all Pathogen.env isolates. |
bioproject_accession | The accession number of the BioProject(s) to which the BioSample belongs (PRJNAxxxxxx). |
lat_lon | Provide latitude and longitude to support “geo_loc_name”. This field is required to be populated by NCBI. However, if this level of detail is not available, GenomeTrakr recommends including “missing” or “not collected” here. |
a “For Pathogen.cl only”: These fields are mandatory ONLY if isolate is from a human clinical sample. If isolate was collected from food/water/env or animal sources, these fields should be left blank