Skip to main content
mSystems logoLink to mSystems
. 2024 May 31;9(6):e01415-23. doi: 10.1128/msystems.01415-23

SARS-CoV-2 wastewater variant surveillance: pandemic response leveraging FDA’s GenomeTrakr network

Ruth E Timme 1,, Jacquelina Woods 2, Jessica L Jones 2, Kevin R Calci 2, Rachel Rodriguez 2, Candace Barnes 2, Elizabeth Leard 2, Mark Craven 3, Haifeng Chen 3, Cameron Boerner 3, Christopher Grim 1, Amanda M Windsor 1, Padmini Ramachandran 1, Tim Muruvanda 4, Hugh Rand 4, Bereket Tesfaldet 4, Jasmine Amirzadegan 5, Tunc Kayikcioglu 6, Tamara Walsky 5, Marc Allard 1, Maria Balkey 1, C Hope Bias 5, Eric Brown 1, Kathryn Judy 1, Tina Pfefer 1, Sandra M Tallent 1, Maria Hoffmann 1; The GenomeTrakr Laboratory consortium, James Pettengill 4
Editor: Christopher W Marshall7
PMCID: PMC11326115  PMID: 38819130

ABSTRACT

Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA‘s genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA’s GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA.

IMPORTANCE

This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA’s laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.

KEYWORDS: SARS-CoV-2, wastewater surveillance, data structures, FAIR data, data standards, pathogen genomic surveillance, wastewater based epidemiology, covid-19, GenomeTrakr

INTRODUCTION

All viruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolve over time, accumulating random mutations within their genomes that result in new variants and lineages. Although tracking the early spread of SARS-CoV-2 was primarily done through PCR tests, sequencing the entire genome facilitates the identification and tracking of new mutations and lineages. This is especially important when those mutations alter clinical characteristics, such as replicating faster than others, causing different symptoms or severity of disease, or eluding vaccines or therapeutic treatments. In early 2021, the first “variants of concern” started to emerge from SARS-CoV-2 (1), for example, Alpha (B.1.1.7), Beta (B.1.351), and (Gamma) P.1. Suddenly, merely testing for the presence of the virus was not sufficient to track the pandemic. The full-genome sequence became necessary to identify new mutations, emerging variants, and sub-lineages.

The U.S. Food and Drug Administration’s (FDA) GenomeTrakr Program (2), a pathogen genomic surveillance network led by the FDA Center for Food Safety and Applied Nutrition (CFSAN), has been collaborating with other U.S. government and state public health agencies (3) to use whole-genome sequence data to ensure food safety and assist with epidemiological investigations of foodborne pathogens since 2012. This laboratory network comprises 31 federal and state public health laboratories, each equipped with the instrumentation and trained personnel required for pathogen sequencing and data submission to the NIH’s National Center for Biotechnology Information (NCBI). By design, the network is focused on sequencing pathogens from food samples, food facilities, the farm environment, and adjacent waterways. Resulting genomic data informs regulatory decisions around foodborne disease outbreaks or food production environments. A dedicated funding model supports these activities, which include submitting raw sequence data along with a minimum set of contextual data to the publicly accessible NCBI database in real time (4). This model, while good general practice for a publicly funded pathogen surveillance network, is also an ideal model for the rapid sharing of pathogen genome sequence data during a global pandemic.

Although SARS-CoV-2 is not a virus that causes foodborne illness, several factors contributed to the tapping of GenomeTrakr to leverage its laboratory network for sequencing SARS-CoV-2 genomes and assist efforts of the U.S. government to better monitor the spread of new SARS-CoV-2 variants and mutations. Funding for this work came from the American Rescue Plan Act of 2021, which included public health funding for pandemic response. Wastewater was chosen as a surveillance tool for multiple reasons. It is optimal for acquiring timely population-level sequence data given its full suite of circulating and emerging mutations, which are valuable for independent validation and verification of FDA-approved therapeutics, diagnostics, and vaccines. New lineages of SARS-CoV-2 can be identified in wastewater samples up to a week prior to being detected in health-care seeking individuals from the same population (57). Routine wastewater samples also provide a relatively unbiased capture of genomic variation from the entire sewage catchment area, as opposed to clinical samples of a given population providing limited information on circulating variants. These samples may also reveal cryptic lineages not seen in the clinical sequence database (8). Furthermore, site locations can be targeted for monitoring specific types of populations (e.g., food production and agriculture workers). Choosing wastewater sites that captured circulating SARS-CoV-2 among these populations would meet these goals for the FDA and compliment efforts by the Centers for Disease Control and Prevention (CDC) and regional partners, which were initially focused on urban sewer sheds.

Our goal here is to provide an overview of FDA’s efforts to perform timely wastewater surveillance for SARS-CoV-2 by leveraging the existing GenomeTrakr laboratory network, as well as provide some lessons learned in this endeavor. We also give an overview of the sequence data collected throughout this project, identify which laboratory methods yielded high quality data, and describe our best practices for implementing these methods within a public health setting.

MATERIALS AND METHODS

Five major steps were necessary to build capacity for timely SARS-CoV-2 wastewater surveillance by the GenomeTrakr sequencing laboratories: (i) fund GenomeTrakr laboratories recruited for this project; (ii) test, optimize, develop, and publish new laboratory methods for sequencing population-level SARS-CoV-2 from wastewater samples; (iii) develop and publish data analysis methods that assessed the sequence quality of raw data and predicted proportions of SARS-CoV-2 variants within each sample; (iv) develop and publish protocols for timely data submission to NCBI; and (v) create a public dashboard to visualize variant data from those routine data submissions across the network, providing timely data release and data analysis for public health applications.

Laboratory funding and site selection

GenomeTrakr laboratories are supported by the FDA Laboratory Flexible Funding Model (9). With additional funding provided by the American Rescue Plan Act of 2021, in the spring of 2021, participating labs were invited to apply to participate in this special pandemic response wastewater project. Participating labs were required to select a minimum of two regional wastewater sites for routine sample collection, which involved sampling 1–2 times per week over a period of at least 6 months. The selected regional wastewater sites were chosen in an attempt to capture areas within each respective state that had higher populations of food and agriculture workers, assisted by county-level maps generated within FDA’s 21 FORWARD (10) data platform. The SARS-CoV-2 RNA in each sample would be sequenced, following RT-qPCR detection, and labs would submit both their sequencing data and a suite of rich contextual data to the NCBI as soon as possible.

Laboratory method development

In 2021, methods for enriching and sequencing SARS-CoV-2 from wastewater samples were in the early stages of being developed, with most laboratories focused on adapting targeted amplification panels used for clinical sequencing (1113) to wastewater samples and a few also exploring oligo-capture approaches (14). A comprehensive set of standardized procedures for the entire wastewater processing workflow was needed. This workflow included sample collection, detection and quantification of SARS-CoV-2, SARS-CoV-2 sequencing, analysis, data submission, and visualization. Existing methods covering this workflow were tested, optimized, and published within the GenomeTrakr workspace on protocols.io (15). This platform facilitated real-time communication with version control to our laboratories and to the broader community. In total, 16 new protocols were drafted and published including wastewater sample collection (16), concentration and nucleic acid extraction (1720), SARS-CoV-2 detection by RT-qPCR (2123), and SARS-CoV-2-targeted amplification and sequencing (2427). As long as the participating laboratories adhered to the tiled amplicon + short read sequencing approach established for this project, they had the option of adopting our methods and following our protocols or using different methods of their choice.

Quality control

At the start of this project in 2021, quality control (QC) checkpoints in the laboratory workflow as well as final QC thresholds for sequence data of SARS-CoV-2 from a mixed population sample had not yet been defined. One project objective was to identify those crucial QC checkpoints within the laboratory workflow and define thresholds for pass/fail at each of these steps that would yield data of sufficient quality to calculate relative abundance of circulating variants within a given sample.

NCBI data structure

Raw sequence data plus an extensive suite of contextual data describing the wastewater catchment area, site location information, methods for sampling, nucleic acid extraction, and sequencing the target pathogen all need to be structured and standardized so that data could be compared within our study and most importantly, among studies. To ensure our data were findable, accessible, interoperable, and reusable (FAIR) (28), we defined a standard data structure, or “data object model” (DOM) for pathogen-targeted sequence data from environmental sources. To accomplish this, we modified an existing DOM widely used for genomic pathogen surveillance (4, 29). This environmental pathogen DOM is a standard data structure that provides interoperability across public and private data repositories for population-level pathogen sequence data collected from environmental sources (wastewater, water, soil, air, etc.) (Fig. 1). This data structure includes a BioProject describing the scope of study (for our study, one BioProject per lab). Linked to the BioProject are a set of BioSamples set at the nucleic acid extraction level. These BioSample records include a wide variety of sample attributes, including the geographic location where the water was collected, specific site information, and sampling/concentration/and nucleic acid methods. Lastly, raw sequence data along with contextual data describing the experimental sequencing methods, filtering, and QC assessment are linked to the BioSample records.

Fig 1.

Fig 1

NCBI data structure for population-level pathogen surveillance, or environmental pathogen data object model (DOM). This Env pathogen DOM has sample and sequence contextual data required for analyzing wastewater sequence data with a single target pathogen, SARS-CoV-2. The flag on the BioProject represents the automated human-read scrubbing by NCBI for all data submissions linked to this project.

Our data package needed to include several key pieces of contextual data not included in Version 1 of NCBI’s BioSample SARS-CoV-2 wastewater template (30) or in their generic Sequence Read Archive (SRA) metadata template. To fill this gap, we re-used fields from other packages where possible (Table 1), including (i) sample-level pooling and replicate information, (ii) sequence-level methods used for the targeted amplification of SARS-CoV-2 (31), and (iii) known QC information as determined by the submitter (32). New custom attributes were created where needed (Table 1) to capture sample collection information (collection_time, collection_volume, instantaneous_flow, and collection_site_id) and laboratory methods for sequencing (enrichment_kit). After the data structure was defined, we published an NCBI submission protocol adhering to this structure that included the custom BioSample and SRA metadata templates, capturing the full suite of contextual data needed for this project (33).

TABLE 1.

Additional metadata attributes created for this projecta

Additional metadata attributes Definition and guidance for GenomeTrakr laboratories
BioSample
 collection_timeb For grab samples: the time of day the sample was collected in your time zone, 1–12 AM to 1–12 PM.
 specimen_processing Replicate and/or pooling information, critical for interpreting results
 specimen_processing_id Identifier used to track replicates and/or pooled samples
 specimen_processing_details Description of the experimental design, describing the technical or biological replicates and/or pooling design.
 collection_site_idb ID that uniquely identifies the sample collection site among other sample collection sites in this BioProject. It must be unique at the level of the submitter’s data BioProject. Where possible, and with agreement from the facility, include the full name of the wastewater treatment plant. If anonymity is requested, create a masking ID to use for all samples collected at this site (e.g., AL-plant-1).
 project_name A concise name that describes the overall project or name of the coordinated sequencing effort from which the sequencing was organized.
 collection_volumeb The volume of the sample collected, in mL
 concentration_methodb The method used to concentrate a target organism, nucleic acid, or organelle within a sample.
 extraction_methodb The protocol used to extract nucleic acids (DNA, RNA, or TNA) from a sample.
 extraction_controlb Organism (or nucleic acid) used in the extraction protocol to determine successful extraction.
 instantaneous_flowb The rate of flow past the meter at a given moment in time, converted into a standard MGD or L/D. For our project, the time of this measurement should correspond to when the grab sample was taken, and should be reported in units of liters per day.
Sequence read archive
 enrichment_kitb Method used to enrich the target pathogen(s).
 amplicon_PCR_primer_scheme Name and version of the primer scheme used to generate the amplicons for sequencing.
 library_preparation_kit Library preparation method used to convert a set of amplicons into a library ready for sequencing.
 quality_control_method Name of the method or pipeline used to evaluate sequence quality, often called "QC pipeline."
 quality_control_method_version Version number of the quality control pipeline or method used.
 quality_control_determination Result of the quality control assessment. Leave blank if pass/fail thresholds have not been established or choose to flag an issue if known.
 quality_control_issues If there’s a known or suspected quality control issue present in the sequence, choose from the available picklist to flag the issue, or create your own.
 quality_control_details Free text space to include additional description of the flagged quality control issue.
 dehosting_method The method used to remove host reads from the raw sequencing file.
 sequence_submitter_contact_email Email contact for the lab that sequenced the isolate.
 raw_sequence_data_processing_method The method used for raw data processing such as removing barcodes, adapter trimming, filtering, etc.
a

Contextual data attributes describing the wastewater site and local conditions, specimen replicate and pooling information, and laboratory methods employed through the nucleotide extraction process were added to NCBI’s BioSample template. Contextual data attributes describing the methods employed for sequencing SARS-CoV-2, sequence quality control assessment, and any automated data processing steps were added to the SRA metadata template. Where possible, we re-used existing NCBI attributes.

b

New custom attributes created specifically for this project.

Data flow and visualization

To effectively communicate and visualize the evolving landscape of SARS-CoV-2 variants detected in wastewater sites throughout the course of our project, we constructed an interactive dashboard in Tableau Desktop (Tableau Software LLC, Seattle, WA). Tableau offered a user-friendly interface, a broad set of dashboard design and development features, and could easily integrate multiple data sources, such as cloud queries of NCBI tables and output files from variant analysis pipelines.

The public dashboard needed to integrate several sources of data to present an informative snapshot of the project’s progress (Fig. 2). NCBI Entrez queries summarized BioSample records without sequence data. Amazon Web Services (AWS) Athena queries of the SRA metadata table summarized metadata attached to raw sequence (SRA) and BioSample records. New sequence submissions under the BioProject PRJNA757291 were downloaded daily and analyzed with CFSAN’s Wastewater Analysis Pipeline (C-WAP) (34) for both QC metrics and to infer relative abundances of SARS-CoV-2 lineages in each sample, computed using the Freyja method (8). C-WAP has been repackaged as Aquascope (https://github.com/CDCgov/aquascope), but the underlying algorithm remains the same, focusing on quality control (QC) metrics and using Freyja to infer relative abundances of SARS-CoV-2 lineages in each sample (35). A static list of BioProjects, laboratory names, and wastewater sites served to organize the records recovered through NCBI and aid with the final dashboard visualizations.

Fig 2.

Fig 2

Data sources for the public dashboard summarizing wastewater surveillance for SARS-CoV-2 variants. The compilation of information for the public dashboard involved two distinct NCBI queries and a sequence analysis pipeline. Daily queries were executed to capture new submissions, and the newly obtained summary data were incorporated into the public dashboard guided by information in the static file. Raw data for the dashboard are available for download here: https://github.com/CFSAN-Biostatistics/WW-SC2-variant-estimations.

An important goal of the dashboard was to identify key aspects of the project that would be important to public health, such as geographic regions, stakeholders, temporality, sampling and sequencing progress, and variant calling. A map was used to display geography, stakeholders, and sampling and sequencing progress aspects to help communicate the scale of the project and the number of participating labs. A bar graph was used to show which SARS-CoV-2 variants were detected week to week, along with their relative abundances. Finally, a Gantt chart displayed the progress of participating labs in sampling and sequencing their samples over time.

Users were encouraged to explore the data visualizations by using filters to select which details they most wanted to see. Users could filter the dashboard by state, laboratory, and wastewater collection site. A quality control filter was added to the dashboard on the public-facing webpage for users to filter data by % genome uncovered.

Protocol pilot exercise

As this project entailed building methods to support expanded wastewater surveillance for state public health laboratories, it was important to establish consistency in analyses performed across participating laboratories. At the start of this project, FDA distributed a set of raw wastewater samples to each funded laboratory. These samples served two purposes: (i) they provided an early, standardized set of samples laboratories could use to test new methods and (ii) sequence data collected from each laboratory helped FDA identify which methods met the quality control requirements for this project. FDA collected four large volumes of wastewater (Table 2), comprising two samples taken about a month apart, each with two pseudo-replicates (grab samples taken back-to-back from the same location at the WWTP). Each large-volume sample was then aliquoted into 800 mL samples. The October 2021 samples (WPP-sample_SA-1.01, WPP-sample_SA-2.01) were then spiked with 106 copies of wild-type SARS-CoV-2 reference RNA (ATCC Heat Inactivated 2019 Novel Coronavirus strain nCoV/USA/WA-1/2020 Part #VR-1986HK). All samples were frozen at −80°C and then shipped to the laboratories on dry ice (four 800 mL samples in each shipment). Each laboratory was asked to sequence the population of SARS-CoV-2 from each of these samples using methods of their choice then to submit their resulting raw sequence and contextual data to NCBI.

TABLE 2.

Wastewater protocol pilot exercise samples

BioSample “sample_name” Collection date WWTP location Treatment
WPP-sample_B.01 20 September 2021 Mobile, AL Raw wastewater
WPP-sample_C.01 20 September 2021 Mobile, AL Raw wastewater
WPP-sample_SA-1.01 21 October 2021 Pascagoula, MS Raw wastewater spiked with wt SARS-CoV-2, 106 copies/800 mL
WPP-sample_SA-2.01 21 October 2021 Pascagoula, MS Raw wastewater spiked with wt SARS-CoV-2, 106 copies/800 mL

RESULTS

Participating laboratories

Twenty GenomeTrakr laboratories plus the FDA-CFSAN laboratories received funding for this special project (Table 3). Each laboratory identified at least two wastewater sites (Table S1) for routine sampling (1–2 times a week) for a minimum of 6 months. In total, samples from 81 sites were included in this project. Where feasible, sites were in counties with a higher relative percentage of food and agriculture workers. Sites included both municipal wastewater treatment plants and direct wastewater lines from food processing facilities—spanning both urban and rural populations.

TABLE 3.

List of participating laboratories

Laboratory names
Arizona State Department of Health Services, T-Gen North
California Department of Public Health
Indiana State Department of Health
Kentucky State Cabinet for Health and Family Services
Massachusetts State Department of Public Health
Nevada State Public Health Laboratory, University of Nevada—Reno
New Jersey State Department of Agriculture
New Jersey Department of Health
New Mexico State University—Las Cruces
North Carolina State University—Raleigh
Ohio State Department of Agriculture
Pennsylvania State University—University Park
Rhode Island Department of Health, State Health Laboratory
South Carolina Department of Health and Environmental Control
South Dakota State University
Texas Department of State Health Services
Virginia Division of Consolidated Laboratory Services
Washington State Department of Agriculture
Washington State Department of Health
West Virginia Department of Agriculture
FDA-Center for Food Safety and Applied Nutrition
a

Twenty GenomeTrakr laboratories plus FDA-CFSAN were funded for this project.

Wastewater protocol pilot exercise

Ten laboratories participated in a pilot exercise to acssess different laboratory methods being utilized: nine labs provided sequence data from all four distributed samples (Table 2). Those nine laboratories successfully amplified and sequenced SARS-CoV-2 from the four WPP samples (Table S2) and submitted their resulting sequences to NCBI (BioProject, PRJNA767800). These submissions were obtained by a diverse array of methods: seven extraction methods, six concentration methods, four enrichment strategies, seven primer schemes, and seven library preparation methods. One remaining lab opted out of the sequencing portion of the exercise, as they had encountered issues with their ddPCR (droplet digital PCR) method.

The cumulative submissions from multiple laboratories totaled 17 data sets for the four samples. Although the limited number of replicates precludes drawing definitive conclusions about individual or combined methods, several overarching trends emerged that informed our subsequent decisions for real-time sampling. In particular, the “Percent reads aligned” metric confirmed the robust specificity of three different enrichment methods for the SARS-CoV-2 virus: QIAseq DIRECT SARS-CoV-2, NEBNext ARTIC SARS-CoV-2, and Illumina COVIDSeq. The “Percent SARS-CoV-2 genome covered” demonstrated the strong performance of most primer schemes assessed in this exercise. Furthermore, across all tested methods, the variant analyses were largely consistent across samples. Specifically, the sequences for “WPP-sample_B.01” and “WPP-sample_C01” showed mostly Delta variants, while “WPP-sample_SA-1.01” and “WPP-sample_SA-2.01” revealed strong wild-type signal originating from the spiked-in synthetic virus (Fig. S1).

Quality control thresholds

We established initial QC thresholds for our sequence data after a thorough review of data collected from both the protocol pilot exercise and the first couple months of sequencing efforts. Important considerations for setting these thresholds included determining the percentage of the SARS-CoV-2 genome coverage required to confidently identify population-level variants and sub-lineages, the necessary depth of coverage to capture most of the circulating lineages, the percentage of SARS-CoV-2 in the raw sequence data, and the identification and removal of human sequencing reads prior to public release. We described four QC bins that capture major categories of sequence quality (Table 4) and proposed QC thresholds for three metrics we identified as important for determining high-quality data: % SARS-CoV-2 reads, % SARS-CoV-2 genome uncovered, and average genome coverage depth. These thresholds, deemed appropriate based on early data collection, served as a preliminary benchmark. However, we acknowledge the need for a rigorous validation process to fine-tune these thresholds to suit specific applications, recognizing that different use cases—such as general population variant tracking vs the validation of a new diagnostic kit—may require different QC thresholds.

TABLE 4.

Quality control (QC) for SARS-CoV-2 sequencing data

QC bin QC bin description % genome uncovered (<10×) Average coverage Other observations Submit to NCBI Tag for SRA attribute: “quality_control_ determination” Included in FDA dashboard
A No QC issues evident <5% >1,000× >50% reads are SARS-CoV-2 Yes No quality control issues identified Yes
B Some QC issues, but variant calling likely OK 6%–40% 100×–1,000× Yes Minor quality control issues identified Yes
C Insufficient data for confidence in variant calling 40%–95% 10×–100× Low fraction of lineage-specific mutations, (C-WAP reports) Yes Sequence flagged for potential quality control issues No
F Significant QC and/or study design issues >95% <10× <5% reads SARS-Cov-2, suspected contamination (SNR low), low sequence quality, etc. No Sequence flagged for significant quality control issues No
a

Four QC categories, or bins, were established based on thresholds set for various QC metrics. Sequence tags were developed to communicate these categories directly on the sequence file in an attribute called “quality_control_determination.” Only data with QC tagged in the A or B bin was visualized on the public FDA variant analysis dashboard.

Once we had a QC target for sequence data, we identified three critical QC checkpoints in laboratory workflow (Fig. 3). For QC check #1, samples containing no detectible SARS-CoV-2 RNA were deemed to have failed QC and were not processed further. However, samples that contained any level of target RNA, even at very low levels, were considered “passing” and sent on to the cDNA synthesis step for amplification. For QC check #2, to determine whether investing time in library preparation and sequencing was justifiable in terms of cost and effort, a thorough quality assessment of the PCR product (targeted enrichment) was conducted using the Qubit HS kit and a fragment size analyzer, such as Agilent Tape Station or Bioanalyzer (24, 26). Samples passing this QC step were selected for sequencing. For QC check #3, the final major QC check involved a rigorous evaluation of the raw sequencing data. For this purpose, we developed SSQuAWK4 (36), to automate the QC process for our laboratories; this tool was then made publicly accessible through a custom Galaxy instance, GalaxyTrakr (37). We also used a thorough QC evaluation and variant calling pipeline, CFSAN Wastewater Analysis Pipeline (C-WAP), via command line interface (34). Both reports included key summary metrics, such as percentage of total reads aligned to the SARS-CoV-2 reference genome, average depth of coverage, and percentage of the SARS-CoV-2 genome uncovered (<10×).

Fig 3.

Fig 3

Sample collection and sequencing workflow. SARS-CoV-2 wastewater surveillance sample analysis process and critical quality control checkpoints recommended for this project.

Based on that QC assessment, each sequence was assigned a QC bin (A, B, C, or F). While we performed QC assessments from very beginning of data collection, there was some uncertainty about what would qualify as a “high” or “low” quality sequencing run. Therefore, we submitted almost all sequence data to SRA, along with metadata and QC evaluations (32). Then, to guide future laboratory practices, we devised a decision matrix (Table 4), which became instrumental for downstream selections of which samples merited being featured on the public-facing dashboards.

Summary of data collected

Routine, systematic, wastewater sample collection for this project was initiated in September 2021 and ended by June 2023, with contributing laboratories submitting sequences in staggered 6-month time periods (Fig. S2). When detectable levels of SARS-CoV-2 were present, determined by quantification of COVID-specific RT/dd -PCR targets as a first phase screening, targeted amplicon approaches were used to sequence the SARS-CoV-2 RNA in the sample. In total, 3,406 wastewater samples were collected, of which 2,517 were subjected to sequencing. The resulting raw sequence data and comprehensive set of standard contextual data were submitted to laboratory-specific NCBI BioProjects, nested under the umbrella BioProject PRJNA757291. Every sample collected and tested for this project has a BioSample entry, even the ones that were not sequenced, thereby providing a unique data set within NCBI that includes both positive and negative samples.

Standard terminology describing sample collection and sequencing methods were included as attributes on both the sample record (BioSample) and and experiment records (SRA submission). Sample processing methods utilized varied across the project (Table 5). There was good representation of composite vs grab samples, n = 2,023 (59.7%) and n = 1,466 (40.3%), respectively. Most labs collected raw wastewater, n = 3,305 (97.5%), with a few primary effluent and post-grit removal samples also included. Among the concentration methods used, 90% of samples were concentrated using one of five methods: Ceres Nanotrap (n = 1,212), Innovaprep ultrafiltration (n = 604), Promega large volume TNA capture kit (n = 455), PEG (polyethylene glycol) precipitation + ultracentrifugation (n = 420), and Centricon 100 k (n = 357). For nucleic acid extraction, the most employed method was the Qiagen MagMAX Viral Kit (n = 939, 28%), followed by a variety of similiar Promega extraction kits (n = 627, 19%).

TABLE 5.

Sample methods included on NCBI’s public BioSample records

Method # of biosamples # of labs % of total
Sample type
 Composite 2,023 18 59.7
 Grab 1,466 8 40.3
Sample matrix
 Raw wastewater 3,306 19 97.5
 Post-grit removal 79 1 2.3
 Primary effluent 4 1 0.1
 Missing 1 1 <0.1
Viral concentration method
 Ceres Nanotrap 1,212 8 35.8
 Innovaprep ultrafiltration 604 5 17.8
 Promega wastewater large volume TNA capture kit 455 5 13.4
 Peg precipitation + ultracentrifugation 420 4 12.4
 Centricon 100k 357 1 10.5
 Zymo water concentration buffer 155 1 4.6
 Skim milk flocculation 99 1 2.9
 Membrane filtration with acidification and MgCl2 64 2 1.9
 Innovaprep CP select 19 1 0.5
 Backflushed raw ww using rexseed filters; Promega wastewater large column TNA capture kit 4 1 0.1
Nucleotide extraction method
 Qiagen MagMAX Viral Kit 939 2 27.7
 Promega Extraction Kits 627 5 18.5
 Qiagen AllPrep Powerviral DNA/RNA kit 424 4 12.5
 Qiagen Rneasy Powerwater Kit 324 4 9.6
 Zymo quick-rna viral kit 345 2 10.2
 Zymo Environ Water RNA Kit (R2042) 262 1 7.7
 QIAamp Viral RNA mini kit 201 5 5.9
 Neb monarch total rna miniprep kit + zymo onestep pcr inhibitor removal kit 155 1 4.6
 Macherey-Nagel nucleomag DNA/RNA water kit 108 1 3.2
 Ceres Nanotrap 4 2 0.1
a

Methods cover type of sample collection, wastewater sample matrix, viral concentration method, and nucleotide extraction method.

Sequencing methods were captured at the experiment-level, attached to the raw sequence data(Table 6). Target enrichment methods encompassed broad categories of tiled amplicon approaches, with 90% of submissions choosing QIAseq DIRECT (n = 923, 37%), NEBNext ARTIC (n = 825, 33%), or the Illumina COVIDSeq Assay (n = 433, 17%). PCR primer schemes for these enrichment approaches evolved alongside the virus—in total there were 11 different primer schemes used across the project. Seven different library preparation kits were utilized to prepare the SARS-CoV-2 amplicons for sequencing and 10 different sequencing platforms were used to generate sequence data. Illumina instruments comprised 90% of the sequences submitted (n = 2,257), followed by Oxford Nanopore Technology (ONT) (n = 256, 10%), and finally, a small number sequenced on a PacBio instrument (n = 4).

TABLE 6.

Sequencing methods included on NCBI’s public SRA recordsa

Method # of sequences # of labs % of total
Enrichment kit
 QIAseq DIRECT SARS-CoV-2 923 3 36.7
 NEBNext ARTIC SARS-CoV-2 RT-PCR Module 825 12 32.8
 Illumina COVIDSeq Assay 433 1 17.2
 Swift Normalase Amplicon SARS-COV-2 Panels 199 1 7.9
 Not applicable 137 4 5.4
Amplicon PCR primer scheme
 ARTIC V3 125 1 5.0
 ARTIC V4 308 1 12.2
 ARTIC V4.1 131 4 5.2
 NEB VarSkip 1 a Long 18 1 0.7
 NEB VarSkip 1 a Short 125 5 5.0
 NEB VarSkip 2 a Short 307 8 12.2
 NEB VarSkip 2b Short 369 7 14.7
 QIAseq DIRECT SARS-CoV-2—Boosted 97 2 3.9
 QIAseq DIRECT SARS-CoV-2 primers 838 3 33.3
 SARS-CoV-2 SNAP primer pool 199 1 7.9
Library preparation kit
 Amplicon sequencing kit (PacBio) 4 1 0.2
 Illumina DNA Prep 513 8 20.4
 Ligation sequencing kit 286 3 11.4
 NEBNext ARTIC SARS-CoV-2 Library Prep Kit (Illumina) 514 5 20.4
 NEBNext Ultra II FS DNA Library Prep for Illumina 6 1 0.2
 QIAseq DIRECT Unique Dual Index Prep 995 4 39.5
 Swift Normalase Amplicon SARS-CoV-2 Panels 199 1 7.5
Sequencing instrument
 Illumina iSeq 100 55 2 2.2
 Illumina MiniSeq 440 4 17.5
 Illumina MiSeq 1,622 14 64.4
 Illumina NovaSeq 6000 83 1 3.3
 NextSeq 550 57 1 2.3
 GridION 8 1 0.3
 MinION 248 1 9.9
 Sequel II 4 1 0.2
a

Methods cover enrichment kit (general approach for enriching the target pathogen), Amplicon PCR primer scheme, library preparation kit, and sequencing instrument name.

Quality of sequence data

As could be expected from a multi-laboratory project using various field sampling approaches and performing simultaneous method development and data collection, the quality of sequences submitted for this project exhibited significant variability, ranging from exceptional to very low quality, based on the predefined thresholds outlined in Table 4. Of the 2,255 Illumina short read sequences, 1,381 (61%) were categorized as having “no quality control issues” (A bin), 219 (10%) as “minor quality control issues” (B bin), 633 (28%) as having “potential quality control issues” (C bin), and 22 (<1%) were flagged for “significant quality control issues” (F bin). For the average depth of SARS-CoV-2 genome coverage thresholds, we targeted 1,000× as an ideal, while considering 100× as the minimum. Coverage across our dataset ranged widely from less than 10× to over 135,000× (Fig. 4a), with a notable concentration of sequences below these thresholds flagged as low quality (F bin) (Fig. 4a). Percent of genome uncovered (i.e., percent of the SARS-CoV-2 genome not sequenced with at least 10× coverage) showed a similar pattern, with 72% of submissions meeting our threshold of 40% (Fig. 4b). Sequences for which more than 40% of the genome had not been sequenced were predominantly tagged with C and F QC bins (Fig. 4b). Conversely, submissions under this threshold, for which more of the genome had been successfully sequenced, were mostly assigned an A or B bin, indicating the coverage of the SARS-CoV-2 genome was suitable for variant analysis. Finally, for each submission, we computed the percentage of raw sequence reads that mapped to the SARS-CoV-2 genome. We see a general trend of reads with high percentages of SARS-CoV-2 being higher quality (Fig. 4c) and, conversely, reads with lower percentages of SARS-CoV-2 having a lower QC assessment, although there is no obvious inflection point at 50%, which was our target goal. We had plenty of sequences categorized as high quality, in Bin A, even though only a small fraction of reads might have been identified as SARS-CoV-2.

Fig 4.

Fig 4

QC summary metrics for short-read Illumina data. Three panels summarize the quality of population-level SARS-CoV-2 sequence data collected and submitted for this project: (a) average depth of coverage across the SARS-CoV-2 genome (average coverage), (b) percent of the SARS-CoV-2 genome uncovered at <10×, and (c) percent of raw sequence reads that aligned to the SARS-CoV-2 genome. Quality control determinations made by the submitter (QC bins A, B, C, or F) are also summarized in each panel.

Variant analysis

To visualize the variants and sub-lineages in wastewater samples over time, we plotted their relative abundance against week of collection, from 12 September 2021 through 4 June 2023 (Fig. 5) (38). The dashboard was updated when new submissions appeared at NCBI, with a maximum frequency of once per day, aiming for current data representation. Analysts also continuously monitored public health news for mentions of new and clinically important variants that should be added to the dashboard’s legend. Sub-lineages that were not of public health importance or did not contribute more than 1% relative abundance within each sample were collectively categorized as “Others” for the purposes of the public dashboard.

Fig 5.

Fig 5

Relative abundances of variants and sublineages over time. Stacked bar chart showing the average variant and sub-lineage proportions for samples collected during that week. For the sake of visibility, only sub-lineages with a relative abundance of ≥5% for at least one week are displayed. The rest, regardless of its designated interest to the WHO or CDC, were treated as parts of their parent lineage until a sub-lineage had sufficient relative abundance to meet the ≥5% threshold.

Across the contributing laboratories, most of the sampling occurred in 2022 (Fig. S2), resulting in a few dashboard gaps in late 2022 and 2023 (Fig. 5). Samples collected from September 2021 through early December 2021 all belonged to Delta sub-lineages. Omicron BA.1 made its initial appearance during the week of 12 December 2021, swiftly replacing nearly all circulating Delta lineages within the subsequent month. Following this, Omicron BA.2 was identified in our samples in mid-March 2022, taking over from BA.1 by the end of April 2022. In early May, Omicron BA.4 emerged and circulated until October 2022 although it never reached dominance. In late April 2022, Omicron BA.5 was detected and became the predominant circulating sub-lineage until October 2022 when Omicron BQ lineages started appearing. The first widely circulating hybrid Omicron lineage, XBB, emerged in November 2022 and maintained dominance through June 2023.

Turnaround time

In line with our project’s primary objective of delivering timely pandemic sequence data for public health purposes, we evaluated the turnaround time (TAT) as the number of days from sample collection to NCBI data release for each participating laboratory. Our analysis revealed two distinct categories of laboratories based on their approach to sample processing (Fig. 6). The first category consisted of five laboratories, including FDA, that processed samples as they were collected, resulting in an average TAT range of approximately 15–30 days. The second category included 12 laboratories that initially collected samples but processed them at a later date due to various factors, including supply-chain delays for reagents and instruments, hesitancy within state public health laboratories to publically release data that were collected using non-validated methods (e.g., Lab P), and staffing shortages due to pandemic response burden. Within this category, the average TAT exhibited significant variation, ranging from 60 to 410 days between sample collection and data submission.

Fig 6.

Fig 6

Turnaround time from sample collection to NCBI data release. Box and whisker plot showing number of days between sample collection date and NCBI release date for each participating laboratory.

DISCUSSION

This project represents the first nation-wide, culture-free, population-level surveillance of a pathogen, with the intention to make data publicly available as it was collected. Within 6 months of funding acquisition, the GenomeTrakr program successfully implemented surveillance for a novel pathogen, sourced from a new-to-the-program sample origin, despite the requirement for developing new sample collection and preparation methods, optimization of novel sequencing and analysis methods, and need for novel contextual data fields. We demonstrated that these methods work and multiple U.S. local public health, agriculture, and academic laboratories within our network are now equipped and trained to execute these methods when requested at short notice. Data generated through these accomplishments underscore the enduring potential of wastewater sampling as an emerging surveillance tool.

Though largely successful, we encountered several challenges inherent to the targeted amplicon approach chosen for sequencing SARS-CoV-2 in the samples. Our initial primer sets for the targeted amplicons had been designed on previously circulating lineages of SARS-CoV-2; however, the ongoing evolution of the SARS-CoV-2 genome during multiple Omicron waves (BA.2, BA.4, BA.5) (39) resulted in periodic dropouts in coverage, or primer pairs that would suddenly stop working. Minor updates to the primer schemes were released in response (40, 41), however, these needed to be verified internally to ensure they worked before we recommend their adoption across our network of laboratories. This proved demanding to keep pace with, necessitating continuous evolution of protocols and metadata template updates alongside our routine surveillance efforts.

Due to their intrinsic reliance on the external sources regarding all SARS-CoV-2 variants ever reported in the literature, similar adaptations were required to ensure bioinformatic analyses were robust and the data analysis pipelines remained current. Each time a new variant or sub-lineage of significance was named, the variant database needed to be updated and the entire data set feeding the dashboard required re-analysis. This dynamic stands in stark contrast to the WGS protocol employed over the past decade (42), where a consistent protocol works reliably for all enteric bacterial pathogens, and updates to that protocol are infrequent occurrences.

As a direct result of constantly updating laboratory methods, there were periods of time when we were not confident in the variant calling until we were sure participant labs had implemented the primer updates. For example, as the virus mutated further in early 2022, multiple “Omicron” lineages were co-circulating, resulting in some lineages only differing by a few loci, further compounded by multiple other mutations evolving under convergent evolution (43). If the sequencing missed one or more of these diagnostic loci due to a now-suboptimal experimental design, we would expect an over-representation of parent lineages, mirrored by under-representation of the true variant(s). To address this problem, we attempted to use the QC flags to communicate how confident we were with our sequencing data.

Despite those challenges, wastewater is an ideal environmental sample to target for this project because it captures pathogen shedding at the population or subpopulation level within a spatially explicit geographic region (sewershed or subsewershed). Unlike well-established WGS-based surveillance systems (3, 4), the SARS-CoV-2 amplicon-based sequencing approach requires no culturing step, shaving days to weeks off the turnaround time from sample collection to acquiring sequencing results. For this reason, this project met two important goals for FDA’s GenomeTrakr program: (i) to contribute timely genomic data for SARS-CoV-2 pandemic response and (ii) to develop capacity and best practices for culture-independent, population-level, environmental surveillance for other pathogens of interest to the FDA, namely, enteric pathogens central to our food safety mission. Incorporating a signal provided through wastewater sampling to the existing U.S. surveillance strategies for enteric pathogens would provide a more complete picture of where pathogens are and are not circulating across the country, enabling more precise scoping of foodborne outbreaks (4447). The potential for this expansion is currently being explored within the framework of the US National Wastewater Surveillance System (48, 49).

Drawing from our success in managing a laboratory network funded to sequence pure-culture enteric pathogens isolated from environmental and other non-human sources, with NCBI serving as our primary repository (4), we propose the following best practices for employing a comparable distributed laboratory model and utilizing NCBI as the primary repository for the implementation of culture-independent, population-level sequencing of a pathogen from wastewater. (i) Establish a standard data structure, or data object model (DOM), within NCBI (or other repository within the International Nucleotide Sequence Database Collaboration [INSDC]) to capture the sequence data and large suite of contextual data. (ii) Create a custom FAIR contextual data standard that captures relevant sample and sequence metadata, maps to the DOM, and is interoperable with existing INSDC standards. (iii) Define the critical steps within the methods for assessing QC and set thresholds for determining next steps. (iv) Publish version-controlled protocols that cover delineation of sewersheds/subsewersheds, sample collection, laboratory methods, quality control assessment, analysis, and INSDC data submission. (v) Process, sequence, and upload sample data to support timely public health actions (not entirely met by our project, but recommended for future efforts). Lastly, (vi) develop a public dashboard to visualize current data collection and analysis results to serve the needs of the project.

Supplementary Material

Reviewer comments
reviewer-comments.pdf (435KB, pdf)

ACKNOWLEDGMENTS

This project was supported, in part, by funding from the American Rescue Plan Act of 2021 and an appointment to the Research Participation Program at the U.S. Food and Drug Administration administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration.

We would like to acknowledge John Callahan and CFSAN senior leadership for support on this project; Lili Velez for scientific editing; Sebastian Cianci and the web team for help getting the dashboards published; Justin Payne for advice on maintenance and scalability of software; Amy Kirby and Rory Welsh at CDC’s National Wastewater Surveillance System for collaboration; Arvind Varsan at Arizona State University for early discussions on sequencing methods; Rose Kantor and Stacia Wyman at UC Berkeley for early discussions on sequencing methods; Jay Garland at EPA and Seth A. Faith at Ohio State University for early discussions on sequencing strategy; Volodymyr Tryndyak and Camila Silva from FDA’s National Center for Toxicological Research for collaborative discussions; FDA’s 21Forward team for help with maps to help choose the wastewater sites; Josh Levy at the Scripps Research Institute: La Jolla, CA, for advice using Freyja; and Rick Lapoint, John Anderson, and the NCBI SRA and BioSample teams for handling all our curation requests.

Contributor Information

Ruth E. Timme, Email: Ruth.Timme@fda.hhs.gov.

Christopher W. Marshall, Marquette University, Milwaukee, Wisconsin, USA

The GenomeTrakr Laboratory consortium:

Ward Jacox, Dave Engelthaler, Michael Valentine, Crystal Hepp, David Kiang, Zhirong Li, Ryan Gentry, Mary Ann Hagerman, Mary Robinson, Jesse Knibbs, Madi Asbell, Beth Johnson, Logan Burns, Ashley Aurand-Cravens, Joshua Stacy, Tracy Stiles, Esther Fortes, Matthew Doucette, Brandon Sabina, Luc Gagne, Kelly Binns, Mark Pandori, Andrew Gorzalski, Lauryn Massic, Sarmila Dasgupta, Amar Patil, Apryle Panyi, Edward Acheampong, Thomas Kirn, Nicholas Palmateer, Willis Fedio, Yatziri Preciado, Srikanth Paladugu, Siddhartha Thakur, Lyndy Harden-Plumley, Luke Raymond, Melanie Prarat, Ashley Sawyer, Jonah Perkins, Edward Dudley, Jasna Kovac, Nkuchia M. M’ikanatha, Erin M. Nawrocki, Yezhi Fu, Nyduta Mbogo, Kristin Carpenter-Azevedo, Richard C. Huard, Sean Sierra-Patev, Megan Davis, Laura M. Lane, Christy A. Jeffcoat, Gregory Goodwin, Gabrielle Godfrey, Andrew Smith, Chukwuemika N. Aroh, Kirsti R. Gilmore, Jessica Freeman, Joy Scaria, Jane Hennings, Eric Nelson, Yan Sun, Bonnie Oh, Michael Jost, Bryan Brooks, Laura Langan, Lauren Turner, Stephanie Dela Cruz, Jessica Maitland, Shelby Bennett, Logan Fink, Mary Toothman, Hyunsook Moon, Yong Liu, Mychal Hendrickson, Darren Lucas, Phillip Dykema, Roxanne Meek, Geoff Melly, Paige Sickles, Breanna McArdle, Anneke Jansen, Megan Young, Josh Arbaugh, Zachary Kuhl, and Ewa King

SUPPLEMENTAL MATERIAL

The following material is available online at https://doi.org/10.1128/msystems.01415-23.

File S1. msystems.01415-23-s0001.xlsx.

Custom wastewater BioSample template.

DOI: 10.1128/msystems.01415-23.SuF1
File S2. msystems.01415-23-s0002.xlsx.

Custom SRA metadata template.

DOI: 10.1128/msystems.01415-23.SuF2
Figure S1. msystems.01415-23-s0003.pdf.

Wastewater Protocol Pilot exercise results.

DOI: 10.1128/msystems.01415-23.SuF3
Figure S2. msystems.01415-23-s0004.pdf.

Gantt chart showing the sampling times for each laboratory.

DOI: 10.1128/msystems.01415-23.SuF4
Supplemental legends. msystems.01415-23-s0005.docx.

Legends for Files S1 and S2.

DOI: 10.1128/msystems.01415-23.SuF5
Table S1. msystems.01415-23-s0006.xlsx.

List of wastewater sites sampled in this project.

DOI: 10.1128/msystems.01415-23.SuF6
Table S2. msystems.01415-23-s0007.xlsx.

Wastewater Protocol Pilot exercise results.

DOI: 10.1128/msystems.01415-23.SuF7
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

reviewer-comments.pdf (435KB, pdf)
DOI: 10.1128/msystems.01415-23.SuF8

ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.

REFERENCES

  • 1. World Health Organization . 2021. Weekly epidemiological update - 2 February 2021. Available from: https://www.who.int/publications/m/item/weekly-epidemiological-update---2-february-2021. Retrieved 24 Nov 2022.
  • 2. Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, Timme RE. 2016. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol 54:1975–1983. doi: 10.1128/JCM.00081-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Stevens EL, Carleton HA, Beal J, Tillman GE, Lindsey RL, Lauer AC, Pightling A, Jarvis KG, Ottesen A, Ramachandran P, et al. 2022. The use of whole-genome sequencing by the Federal interagency collaboration for genomics for food and feed safety in the United States. J Food Prot 85:755–772. doi: 10.4315/JFP-21-437 [DOI] [PubMed] [Google Scholar]
  • 4. Timme RE, Wolfgang WJ, Balkey M, Venkata SLG, Randolph R, Allard M, Strain E. 2020. Optimizing open data to support one health: best practices to ensure interoperability of genomic data from bacterial pathogens. One health outlook 2:20. doi: 10.1186/s42522-020-00026-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Bivins A, North D, Ahmad A, Ahmed W, Alm E, Been F, Bhattacharya P, Bijlsma L, Boehm AB, Brown J, et al. 2020. Wastewater-based epidemiology: global collaborative to maximize contributions in the fight against COVID-19. Environ Sci Technol 54:7754–7757. doi: 10.1021/acs.est.0c02388 [DOI] [PubMed] [Google Scholar]
  • 6. Medema G, Heijnen L, Elsinga G, Italiaander R, Brouwer A. 2020. Presence of SARS-Coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in the Netherlands. Environ Sci Technol Lett 7:511–516. doi: 10.1021/acs.estlett.0c00357 [DOI] [PubMed] [Google Scholar]
  • 7. Gerrity D, Papp K, Stoker M, Sims A, Frehner W. 2021. Early-pandemic wastewater surveillance of SARS-CoV-2 in Southern Nevada: methodology, occurrence, and incidence/prevalence considerations. Water Res X 10:100086. doi: 10.1016/j.wroa.2020.100086 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Karthikeyan S, Levy JI, De Hoff P, Humphrey G, Birmingham A, Jepsen K, Farmer S, Tubb HM, Valles T, Tribelhorn CE, et al. 2022. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609:101–108. doi: 10.1038/s41586-022-05049-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. FDA-ORA . 2019. Laboratory flexible funding model cooperative agreement program. Available from: https://grants.nih.gov/grants/guide/pa-files/PAR-20-105.html.
  • 10. FDA . 2021. 21 FORWARD: unleashing the power of FDA data to support COVID-19 vaccine distribution to food and agriculture workers. 21 FORWARD. FDA. Available from: https://www.fda.gov/news-events/fda-voices/unleashing-power-fda-data-support-covid-19-vaccine-distribution-food-and-agriculture-workers [Google Scholar]
  • 11. Fontenele RS, Kraberger S, Hadfield J, Driver EM, Bowes D, Holland LA, Faleye TOC, Adhikari S, Kumar R, Inchausti R, et al. 2021. High-throughput sequencing of SARS-Cov-2 in wastewater provides insights into circulating variants. MedRxiv Prepr Serv Health Sci. doi: 10.1101/2021.01.22.21250320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Izquierdo-Lara R, Elsinga G, Heijnen L, Munnink BBO, Schapendonk CME, Nieuwenhuijse D, Kon M, Lu L, Aarestrup FM, Lycett S, Medema G, Koopmans MPG, de Graaf M. 2021. Monitoring SARS-CoV-2 circulation and diversity through community wastewater sequencing, the Netherlands and Belgium. Emerg Infect Dis 27:1405–1415. doi: 10.3201/eid2705.204410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Spurbeck RR, Minard-Smith AT, Catlin LA. 2021. Applicability of neighborhood and building scale wastewater-based genomic epidemiology to track the SARS-CoV-2 pandemic and other pathogens. medRxiv. doi: 10.1101/2021.02.18.21251939 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Crits-Christoph A, Kantor RS, Olm MR, Whitney ON, Al-Shayeb B, Lou YC, Flamholz A, Kennedy LC, Greenwald H, Hinkle A, Hetzel J, Spitzer S, Koble J, Tan A, Hyde F, Schroth G, Kuersten S, Banfield JF, Nelson KL. 2021. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. mBio 12:e02703-20. doi: 10.1128/mBio.02703-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Timme R. Wastewater protocols within the protocols.io GenomeTrakr workspace. protocols.io. Available from: https://www.protocols.io/workspaces/genometrakr1. Retrieved 1111 DecDecember 2023. Accessed , 1111 DecDecember 2023
  • 16. Calci K. 2021. Collection from wastewater treatment plant, transportation, and storage of raw wastewater. protocols.io. Available from: 10.17504/protocols.io.bycepste [DOI]
  • 17. Woods J, Rodrigues R. 2021. Virus concentration from wastewater using PEG precipitation and ultracentrifugation. protocols.io. Available from: 10.17504/protocols.io.bx9ipr4e [DOI]
  • 18. Walsky T, Ramachandran P, Windsor A, Hoffmann M, Grim C. 2022. Extraction of total nucleic acid from wastewater using the promega wizard enviro total nucleic acid kit. protocols.io. Available from: 10.17504/protocols.io.4r3l2oebxv1y/v1 [DOI]
  • 19. Walsky T, Ramachandran P, Windsor A, Hoffmann M, Grim C. 2022. RNA extraction and quality assessment targeting SARS-CoV-2 from wastewater concentrates using zymo environ water RNA kit. protocols.io. Available from: 10.17504/protocols.io.b3inqkde [DOI]
  • 20. Woods J, Rodrigues R. 2021. RNA extraction from wastewater concentrates using RNeasy and zymo kits. protocols.io. Available from: 10.17504/protocols.io.bygvptw6 [DOI]
  • 21. Windsor A, Walsky T, Ramachandran P, Grim C, Hoffmann M. 2022. Rtqpcr of SARS-Cov-2 N1 target on ABI 7500 fast using Promega Gotaq Enviro wastewater SARS-Cov-2 system V1. Protocols.io. Available from: 10.17504/protocols.io.rm7vzy52xlx1/v1 [DOI]
  • 22. Woods J, Rodrigues R. 2021. RT-qPCR detection of SARS-CoV-2 from wastewater using the AB 7500 V.2. Protocols.io. Available from: 10.17504/protocols.io.6qpvrdj4bgmk/v2 [DOI]
  • 23. Woods J, Rodrigues R. 2021. RT-qPCR detection of process controls (murine noroviurs and crAssphage) from wastewater using AB 7500 V.2. Protocols.io. Available from: 10.17504/protocols.io.kqdg36j9pg25/v2 [DOI]
  • 24. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2022. Enhanced QIAseq DIRECT SARS-CoV-2 kit for Illumina MiSeq V.4. Protocols.io. Available from: 10.17504/protocols.io.rm7vzy39rlx1/v4 [DOI]
  • 25. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2021. Modified NEBNext® VarSkip short SARS-CoV-2 library prep kit for Illumina platforms - adapted for wastewater samples V.3. Protocols.io. Available from: 10.17504/protocols.io.5jyl89n26v2w/v3 [DOI]
  • 26. Ramachandran P, Walsky T, Windsor A, Grim C, Hoffmann M. 2022. Modified NEBNext® VarSkip short SARS-CoV-2 enrichment and library prep for Oxford Nanopore technologies- adapted for wastewater samples V.2. Protocols.io. Available from: 10.17504/protocols.io.3byl4bwervo5/v2 [DOI]
  • 27. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2022. Modified Illumina DNA prep (M) tagmentation library preparation for cDNA amplicons from wastewater. Protocols.io. Available from: 10.17504/protocols.io.b34rqqv6 [DOI]
  • 28. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, et al. 2016. The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. doi: 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Timme RE, Karsch-Mizrachi I, Waheed Z, Arita M, MacCannell D, Maguire F, Petit Iii R, Page AJ, Mendes CI, Nasar MI, Oluniyi P, Tyler AD, Raphenya AR, Guthrie JL, Olawoye I, Rinck G, O’Cathail C, Lees J, Cochrane G, Cummins C, Brister JR, Klimke W, Feldgarden M, Griffiths E. 2023. Putting everything in its place: using the INSDC compliant pathogen data object model to better structure genomic data submitted for public health applications. Microb Genom 9:001145. doi: 10.1099/mgen.0.001145 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. NCBI . 2021. BioSample Package: SARS-CoV-2 wastewater surveillance, version 1.0. Available from: https://submit.ncbi.nlm.nih.gov/biosample/template/?organism-organism_name=&organism-taxonomy_id=&package-0=SARS-CoV-2.wwsurv.1.0&action=definition. Retrieved 22 Aug 2023.
  • 31. Griffiths EJ, Timme RE, Mendes CI, Page AJ, Alikhan N-F, Fornika D, Maguire F, Campos J, Park D, Olawoye IB, et al. 2022. Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package. GigaScience 11:giac003. doi: 10.1093/gigascience/giac003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Griffiths E, Mendes CI, Maguire F, Guthrie J, Chindelevitch L, Karsch-Mizrachi I, Waheed Z, Cameron R, Holt K, Katz L, Petit III R, MacCannell D, Dave M, Oluniyi P, Nasar MI, Raphenya A, Hsiao W, Timme R. 2023. PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training. Life sciences. doi: 10.20944/preprints202303.0037.v1 [DOI] [PMC free article] [PubMed]
  • 33. Timme R, Bias C, Balkey M. 2021. NCBI submission protocol for SARS-CoV-2 wastewater data: SRA, BioSample, and BioProject. V10. Protocols.Io. Available from: 10.17504/protocols.io.ewov14w27vr2/v10 [DOI]
  • 34. Kayikcioglu T, Amirzadegan J, Rand H, Tesfaldet B, Timme RE, Pettengill JB. 2023. Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples. PeerJ 11:e14596. doi: 10.7717/peerj.14596 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. CDC . 2023. Aquascope (1.0.0). Nextflow. Centers for Disease Control and Prevention. [Google Scholar]
  • 36. Amirzadegan J, Kayikcioglu T, Rand H, Timme R, Balkey M. 2022. Wastewater QC workflow in GalaxyTrakr (Ssquawk4) V.9. Protocols.io. Available from: 10.17504/protocols.io.kxygxzk5dv8j/v9 [DOI]
  • 37. Gangiredla J, Rand H, Benisatto D, Payne J, Strittmatter C, Sanders J, Wolfgang WJ, Libuit K, Herrick JB, Prarat M, Toro M, Farrell T, Strain E. 2021. GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians. BMC Genomics 22:114. doi: 10.1186/s12864-021-07405-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Kayikciogly T, Amirzadegan J. 2021. Variant abundance estimations from wastewater surveillance study by FDA/CFSAN. FDA-CFSAN (Center for Food Safety and Applied Nutrition
  • 39. Tegally H, Moir M, Everatt J, Giovanetti M, Scheepers C, Wilkinson E, Subramoney K, Makatini Z, Moyo S, Amoako DG, et al. 2022. Emergence of SARS-CoV-2 Omicron lineages BA.4 and BA.5 in South Africa. Nat Med 28:1785–1790. doi: 10.1038/s41591-022-01911-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. New England Biolabs Inc . 2023. VarSkip multiplex PCR designs for SARS-CoV-2 sequencing. New England Biolabs Inc. [Google Scholar]
  • 41. ARTIC Network . 2023. ARTIC network - SARS-CoV-2 version 5.3.2 primer scheme release. ARTIC real-time genomic surveill. https://community.artic.network/t/sars-cov-2-version-5-3-2-scheme-release/462.
  • 42. Pfeifer T, Haendiges J, Balkey M, Timme R. 2022. GenomeTrakr WGS protocol collection and workflow for MiSeq V.2. Protocols.io. Available from: 10.17504/protocols.io.3byl4bwyjvo5/v2 [DOI]
  • 43. Focosi D, Quiroga R, McConnell S, Johnson MC, Casadevall A. 2023. Convergent evolution in SARS-CoV-2 spike creates a variant soup from which new COVID-19 waves emerge. Int J Mol Sci 24:2264. doi: 10.3390/ijms24032264 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Harvey RWS, Price TH. 1970. Sewer and drain swabbing as a means of investigating salmonellosis. Epidemiol Infect 68:611–624. doi: 10.1017/S0022172400042546 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Diemert S, Yan T. 2019. Clinically unreported salmonellosis outbreak detected via comparative genomic analysis of municipal wastewater salmonella isolates. Appl Environ Microbiol 85:e00139-19. doi: 10.1128/AEM.00139-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Sahlström L, de Jong B, Aspan A. 2006. Salmonella isolated in sewage sludge traced back to human cases of salmonellosis. Lett Appl Microbiol 43:46–52. doi: 10.1111/j.1472-765X.2006.01911.x [DOI] [PubMed] [Google Scholar]
  • 47. Goldblum ZS, M’ikanatha NM, Nawrocki EM, Cesari N, Kovac J, Dudley EG. 2024. Salmonella Senftenberg isolated from wastewater is linked to a 2022 multistate outbreak. medRxiv. doi: 10.1101/2024.02.20.24302949 [DOI] [Google Scholar]
  • 48. Adams C, Bias M, Welsh RM, Webb J, Reese H, Delgado S, Person J, West R, Shin S, Kirby A. 2024. The national wastewater surveillance system (NWSS): from inception to widespread coverage, 2020–2022, United States. Sci Total Environ 924:171566. doi: 10.1016/j.scitotenv.2024.171566 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. CDC . 2022. Wastewater surveillance: a new frontier for public health. Available from: https://www.cdc.gov/amd/whats-new/wastewater-surveillance.html. Retrieved 18 Mar 2024.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reviewer comments
reviewer-comments.pdf (435KB, pdf)
File S1. msystems.01415-23-s0001.xlsx.

Custom wastewater BioSample template.

DOI: 10.1128/msystems.01415-23.SuF1
File S2. msystems.01415-23-s0002.xlsx.

Custom SRA metadata template.

DOI: 10.1128/msystems.01415-23.SuF2
Figure S1. msystems.01415-23-s0003.pdf.

Wastewater Protocol Pilot exercise results.

DOI: 10.1128/msystems.01415-23.SuF3
Figure S2. msystems.01415-23-s0004.pdf.

Gantt chart showing the sampling times for each laboratory.

DOI: 10.1128/msystems.01415-23.SuF4
Supplemental legends. msystems.01415-23-s0005.docx.

Legends for Files S1 and S2.

DOI: 10.1128/msystems.01415-23.SuF5
Table S1. msystems.01415-23-s0006.xlsx.

List of wastewater sites sampled in this project.

DOI: 10.1128/msystems.01415-23.SuF6
Table S2. msystems.01415-23-s0007.xlsx.

Wastewater Protocol Pilot exercise results.

DOI: 10.1128/msystems.01415-23.SuF7
OPEN PEER REVIEW. reviewer-comments.pdf.

An accounting of the reviewer comments and feedback.

reviewer-comments.pdf (435KB, pdf)
DOI: 10.1128/msystems.01415-23.SuF8

Articles from mSystems are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES