. 2024 May 31;9(6):e01415-23. doi: 10.1128/msystems.01415-23

TABLE 1.

Additional metadata attributes created for this project^a

Additional metadata attributes	Definition and guidance for GenomeTrakr laboratories
BioSample
collection_time^b	For grab samples: the time of day the sample was collected in your time zone, 1–12 AM to 1–12 PM.
specimen_processing	Replicate and/or pooling information, critical for interpreting results
specimen_processing_id	Identifier used to track replicates and/or pooled samples
specimen_processing_details	Description of the experimental design, describing the technical or biological replicates and/or pooling design.
collection_site_id^b	ID that uniquely identifies the sample collection site among other sample collection sites in this BioProject. It must be unique at the level of the submitter’s data BioProject. Where possible, and with agreement from the facility, include the full name of the wastewater treatment plant. If anonymity is requested, create a masking ID to use for all samples collected at this site (e.g., AL-plant-1).
project_name	A concise name that describes the overall project or name of the coordinated sequencing effort from which the sequencing was organized.
collection_volume^b	The volume of the sample collected, in mL
concentration_method^b	The method used to concentrate a target organism, nucleic acid, or organelle within a sample.
extraction_method^b	The protocol used to extract nucleic acids (DNA, RNA, or TNA) from a sample.
extraction_control^b	Organism (or nucleic acid) used in the extraction protocol to determine successful extraction.
instantaneous_flow^b	The rate of flow past the meter at a given moment in time, converted into a standard MGD or L/D. For our project, the time of this measurement should correspond to when the grab sample was taken, and should be reported in units of liters per day.
Sequence read archive
enrichment_kit^b	Method used to enrich the target pathogen(s).
amplicon_PCR_primer_scheme	Name and version of the primer scheme used to generate the amplicons for sequencing.
library_preparation_kit	Library preparation method used to convert a set of amplicons into a library ready for sequencing.
quality_control_method	Name of the method or pipeline used to evaluate sequence quality, often called "QC pipeline."
quality_control_method_version	Version number of the quality control pipeline or method used.
quality_control_determination	Result of the quality control assessment. Leave blank if pass/fail thresholds have not been established or choose to flag an issue if known.
quality_control_issues	If there’s a known or suspected quality control issue present in the sequence, choose from the available picklist to flag the issue, or create your own.
quality_control_details	Free text space to include additional description of the flagged quality control issue.
dehosting_method	The method used to remove host reads from the raw sequencing file.
sequence_submitter_contact_email	Email contact for the lab that sequenced the isolate.
raw_sequence_data_processing_method	The method used for raw data processing such as removing barcodes, adapter trimming, filtering, etc.

^{^a}

Contextual data attributes describing the wastewater site and local conditions, specimen replicate and pooling information, and laboratory methods employed through the nucleotide extraction process were added to NCBI’s BioSample template. Contextual data attributes describing the methods employed for sequencing SARS-CoV-2, sequence quality control assessment, and any automated data processing steps were added to the SRA metadata template. Where possible, we re-used existing NCBI attributes.

^{^b}

New custom attributes created specifically for this project.