Skip to main content
. 2024 May 31;9(6):e01415-23. doi: 10.1128/msystems.01415-23

TABLE 1.

Additional metadata attributes created for this projecta

Additional metadata attributes Definition and guidance for GenomeTrakr laboratories
BioSample
 collection_timeb For grab samples: the time of day the sample was collected in your time zone, 1–12 AM to 1–12 PM.
 specimen_processing Replicate and/or pooling information, critical for interpreting results
 specimen_processing_id Identifier used to track replicates and/or pooled samples
 specimen_processing_details Description of the experimental design, describing the technical or biological replicates and/or pooling design.
 collection_site_idb ID that uniquely identifies the sample collection site among other sample collection sites in this BioProject. It must be unique at the level of the submitter’s data BioProject. Where possible, and with agreement from the facility, include the full name of the wastewater treatment plant. If anonymity is requested, create a masking ID to use for all samples collected at this site (e.g., AL-plant-1).
 project_name A concise name that describes the overall project or name of the coordinated sequencing effort from which the sequencing was organized.
 collection_volumeb The volume of the sample collected, in mL
 concentration_methodb The method used to concentrate a target organism, nucleic acid, or organelle within a sample.
 extraction_methodb The protocol used to extract nucleic acids (DNA, RNA, or TNA) from a sample.
 extraction_controlb Organism (or nucleic acid) used in the extraction protocol to determine successful extraction.
 instantaneous_flowb The rate of flow past the meter at a given moment in time, converted into a standard MGD or L/D. For our project, the time of this measurement should correspond to when the grab sample was taken, and should be reported in units of liters per day.
Sequence read archive
 enrichment_kitb Method used to enrich the target pathogen(s).
 amplicon_PCR_primer_scheme Name and version of the primer scheme used to generate the amplicons for sequencing.
 library_preparation_kit Library preparation method used to convert a set of amplicons into a library ready for sequencing.
 quality_control_method Name of the method or pipeline used to evaluate sequence quality, often called "QC pipeline."
 quality_control_method_version Version number of the quality control pipeline or method used.
 quality_control_determination Result of the quality control assessment. Leave blank if pass/fail thresholds have not been established or choose to flag an issue if known.
 quality_control_issues If there’s a known or suspected quality control issue present in the sequence, choose from the available picklist to flag the issue, or create your own.
 quality_control_details Free text space to include additional description of the flagged quality control issue.
 dehosting_method The method used to remove host reads from the raw sequencing file.
 sequence_submitter_contact_email Email contact for the lab that sequenced the isolate.
 raw_sequence_data_processing_method The method used for raw data processing such as removing barcodes, adapter trimming, filtering, etc.
a

Contextual data attributes describing the wastewater site and local conditions, specimen replicate and pooling information, and laboratory methods employed through the nucleotide extraction process were added to NCBI’s BioSample template. Contextual data attributes describing the methods employed for sequencing SARS-CoV-2, sequence quality control assessment, and any automated data processing steps were added to the SRA metadata template. Where possible, we re-used existing NCBI attributes.

b

New custom attributes created specifically for this project.