ABSTRACT
Wastewater surveillance has emerged as a crucial public health tool for population-level pathogen surveillance. Supported by funding from the American Rescue Plan Act of 2021, the FDA‘s genomic epidemiology program, GenomeTrakr, was leveraged to sequence SARS-CoV-2 from wastewater sites across the United States. This initiative required the evaluation, optimization, development, and publication of new methods and analytical tools spanning sample collection through variant analyses. Version-controlled protocols for each step of the process were developed and published on protocols.io. A custom data analysis tool and a publicly accessible dashboard were built to facilitate real-time visualization of the collected data, focusing on the relative abundance of SARS-CoV-2 variants and sub-lineages across different samples and sites throughout the project. From September 2021 through June 2023, a total of 3,389 wastewater samples were collected, with 2,517 undergoing sequencing and submission to NCBI under the umbrella BioProject, PRJNA757291. Sequence data were released with explicit quality control (QC) tags on all sequence records, communicating our confidence in the quality of data. Variant analysis revealed wide circulation of Delta in the fall of 2021 and captured the sweep of Omicron and subsequent diversification of this lineage through the end of the sampling period. This project successfully achieved two important goals for the FDA’s GenomeTrakr program: first, contributing timely genomic data for the SARS-CoV-2 pandemic response, and second, establishing both capacity and best practices for culture-independent, population-level environmental surveillance for other pathogens of interest to the FDA.
IMPORTANCE
This paper serves two primary objectives. First, it summarizes the genomic and contextual data collected during a Covid-19 pandemic response project, which utilized the FDA’s laboratory network, traditionally employed for sequencing foodborne pathogens, for sequencing SARS-CoV-2 from wastewater samples. Second, it outlines best practices for gathering and organizing population-level next generation sequencing (NGS) data collected for culture-free, surveillance of pathogens sourced from environmental samples.
KEYWORDS: SARS-CoV-2, wastewater surveillance, data structures, FAIR data, data standards, pathogen genomic surveillance, wastewater based epidemiology, covid-19, GenomeTrakr
INTRODUCTION
All viruses, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolve over time, accumulating random mutations within their genomes that result in new variants and lineages. Although tracking the early spread of SARS-CoV-2 was primarily done through PCR tests, sequencing the entire genome facilitates the identification and tracking of new mutations and lineages. This is especially important when those mutations alter clinical characteristics, such as replicating faster than others, causing different symptoms or severity of disease, or eluding vaccines or therapeutic treatments. In early 2021, the first “variants of concern” started to emerge from SARS-CoV-2 (1), for example, Alpha (B.1.1.7), Beta (B.1.351), and (Gamma) P.1. Suddenly, merely testing for the presence of the virus was not sufficient to track the pandemic. The full-genome sequence became necessary to identify new mutations, emerging variants, and sub-lineages.
The U.S. Food and Drug Administration’s (FDA) GenomeTrakr Program (2), a pathogen genomic surveillance network led by the FDA Center for Food Safety and Applied Nutrition (CFSAN), has been collaborating with other U.S. government and state public health agencies (3) to use whole-genome sequence data to ensure food safety and assist with epidemiological investigations of foodborne pathogens since 2012. This laboratory network comprises 31 federal and state public health laboratories, each equipped with the instrumentation and trained personnel required for pathogen sequencing and data submission to the NIH’s National Center for Biotechnology Information (NCBI). By design, the network is focused on sequencing pathogens from food samples, food facilities, the farm environment, and adjacent waterways. Resulting genomic data informs regulatory decisions around foodborne disease outbreaks or food production environments. A dedicated funding model supports these activities, which include submitting raw sequence data along with a minimum set of contextual data to the publicly accessible NCBI database in real time (4). This model, while good general practice for a publicly funded pathogen surveillance network, is also an ideal model for the rapid sharing of pathogen genome sequence data during a global pandemic.
Although SARS-CoV-2 is not a virus that causes foodborne illness, several factors contributed to the tapping of GenomeTrakr to leverage its laboratory network for sequencing SARS-CoV-2 genomes and assist efforts of the U.S. government to better monitor the spread of new SARS-CoV-2 variants and mutations. Funding for this work came from the American Rescue Plan Act of 2021, which included public health funding for pandemic response. Wastewater was chosen as a surveillance tool for multiple reasons. It is optimal for acquiring timely population-level sequence data given its full suite of circulating and emerging mutations, which are valuable for independent validation and verification of FDA-approved therapeutics, diagnostics, and vaccines. New lineages of SARS-CoV-2 can be identified in wastewater samples up to a week prior to being detected in health-care seeking individuals from the same population (5–7). Routine wastewater samples also provide a relatively unbiased capture of genomic variation from the entire sewage catchment area, as opposed to clinical samples of a given population providing limited information on circulating variants. These samples may also reveal cryptic lineages not seen in the clinical sequence database (8). Furthermore, site locations can be targeted for monitoring specific types of populations (e.g., food production and agriculture workers). Choosing wastewater sites that captured circulating SARS-CoV-2 among these populations would meet these goals for the FDA and compliment efforts by the Centers for Disease Control and Prevention (CDC) and regional partners, which were initially focused on urban sewer sheds.
Our goal here is to provide an overview of FDA’s efforts to perform timely wastewater surveillance for SARS-CoV-2 by leveraging the existing GenomeTrakr laboratory network, as well as provide some lessons learned in this endeavor. We also give an overview of the sequence data collected throughout this project, identify which laboratory methods yielded high quality data, and describe our best practices for implementing these methods within a public health setting.
MATERIALS AND METHODS
Five major steps were necessary to build capacity for timely SARS-CoV-2 wastewater surveillance by the GenomeTrakr sequencing laboratories: (i) fund GenomeTrakr laboratories recruited for this project; (ii) test, optimize, develop, and publish new laboratory methods for sequencing population-level SARS-CoV-2 from wastewater samples; (iii) develop and publish data analysis methods that assessed the sequence quality of raw data and predicted proportions of SARS-CoV-2 variants within each sample; (iv) develop and publish protocols for timely data submission to NCBI; and (v) create a public dashboard to visualize variant data from those routine data submissions across the network, providing timely data release and data analysis for public health applications.
Laboratory funding and site selection
GenomeTrakr laboratories are supported by the FDA Laboratory Flexible Funding Model (9). With additional funding provided by the American Rescue Plan Act of 2021, in the spring of 2021, participating labs were invited to apply to participate in this special pandemic response wastewater project. Participating labs were required to select a minimum of two regional wastewater sites for routine sample collection, which involved sampling 1–2 times per week over a period of at least 6 months. The selected regional wastewater sites were chosen in an attempt to capture areas within each respective state that had higher populations of food and agriculture workers, assisted by county-level maps generated within FDA’s 21 FORWARD (10) data platform. The SARS-CoV-2 RNA in each sample would be sequenced, following RT-qPCR detection, and labs would submit both their sequencing data and a suite of rich contextual data to the NCBI as soon as possible.
Laboratory method development
In 2021, methods for enriching and sequencing SARS-CoV-2 from wastewater samples were in the early stages of being developed, with most laboratories focused on adapting targeted amplification panels used for clinical sequencing (11–13) to wastewater samples and a few also exploring oligo-capture approaches (14). A comprehensive set of standardized procedures for the entire wastewater processing workflow was needed. This workflow included sample collection, detection and quantification of SARS-CoV-2, SARS-CoV-2 sequencing, analysis, data submission, and visualization. Existing methods covering this workflow were tested, optimized, and published within the GenomeTrakr workspace on protocols.io (15). This platform facilitated real-time communication with version control to our laboratories and to the broader community. In total, 16 new protocols were drafted and published including wastewater sample collection (16), concentration and nucleic acid extraction (17–20), SARS-CoV-2 detection by RT-qPCR (21–23), and SARS-CoV-2-targeted amplification and sequencing (24–27). As long as the participating laboratories adhered to the tiled amplicon + short read sequencing approach established for this project, they had the option of adopting our methods and following our protocols or using different methods of their choice.
Quality control
At the start of this project in 2021, quality control (QC) checkpoints in the laboratory workflow as well as final QC thresholds for sequence data of SARS-CoV-2 from a mixed population sample had not yet been defined. One project objective was to identify those crucial QC checkpoints within the laboratory workflow and define thresholds for pass/fail at each of these steps that would yield data of sufficient quality to calculate relative abundance of circulating variants within a given sample.
NCBI data structure
Raw sequence data plus an extensive suite of contextual data describing the wastewater catchment area, site location information, methods for sampling, nucleic acid extraction, and sequencing the target pathogen all need to be structured and standardized so that data could be compared within our study and most importantly, among studies. To ensure our data were findable, accessible, interoperable, and reusable (FAIR) (28), we defined a standard data structure, or “data object model” (DOM) for pathogen-targeted sequence data from environmental sources. To accomplish this, we modified an existing DOM widely used for genomic pathogen surveillance (4, 29). This environmental pathogen DOM is a standard data structure that provides interoperability across public and private data repositories for population-level pathogen sequence data collected from environmental sources (wastewater, water, soil, air, etc.) (Fig. 1). This data structure includes a BioProject describing the scope of study (for our study, one BioProject per lab). Linked to the BioProject are a set of BioSamples set at the nucleic acid extraction level. These BioSample records include a wide variety of sample attributes, including the geographic location where the water was collected, specific site information, and sampling/concentration/and nucleic acid methods. Lastly, raw sequence data along with contextual data describing the experimental sequencing methods, filtering, and QC assessment are linked to the BioSample records.
Our data package needed to include several key pieces of contextual data not included in Version 1 of NCBI’s BioSample SARS-CoV-2 wastewater template (30) or in their generic Sequence Read Archive (SRA) metadata template. To fill this gap, we re-used fields from other packages where possible (Table 1), including (i) sample-level pooling and replicate information, (ii) sequence-level methods used for the targeted amplification of SARS-CoV-2 (31), and (iii) known QC information as determined by the submitter (32). New custom attributes were created where needed (Table 1) to capture sample collection information (collection_time, collection_volume, instantaneous_flow, and collection_site_id) and laboratory methods for sequencing (enrichment_kit). After the data structure was defined, we published an NCBI submission protocol adhering to this structure that included the custom BioSample and SRA metadata templates, capturing the full suite of contextual data needed for this project (33).
TABLE 1.
Additional metadata attributes | Definition and guidance for GenomeTrakr laboratories |
---|---|
BioSample | |
collection_timeb | For grab samples: the time of day the sample was collected in your time zone, 1–12 AM to 1–12 PM. |
specimen_processing | Replicate and/or pooling information, critical for interpreting results |
specimen_processing_id | Identifier used to track replicates and/or pooled samples |
specimen_processing_details | Description of the experimental design, describing the technical or biological replicates and/or pooling design. |
collection_site_idb | ID that uniquely identifies the sample collection site among other sample collection sites in this BioProject. It must be unique at the level of the submitter’s data BioProject. Where possible, and with agreement from the facility, include the full name of the wastewater treatment plant. If anonymity is requested, create a masking ID to use for all samples collected at this site (e.g., AL-plant-1). |
project_name | A concise name that describes the overall project or name of the coordinated sequencing effort from which the sequencing was organized. |
collection_volumeb | The volume of the sample collected, in mL |
concentration_methodb | The method used to concentrate a target organism, nucleic acid, or organelle within a sample. |
extraction_methodb | The protocol used to extract nucleic acids (DNA, RNA, or TNA) from a sample. |
extraction_controlb | Organism (or nucleic acid) used in the extraction protocol to determine successful extraction. |
instantaneous_flowb | The rate of flow past the meter at a given moment in time, converted into a standard MGD or L/D. For our project, the time of this measurement should correspond to when the grab sample was taken, and should be reported in units of liters per day. |
Sequence read archive | |
enrichment_kitb | Method used to enrich the target pathogen(s). |
amplicon_PCR_primer_scheme | Name and version of the primer scheme used to generate the amplicons for sequencing. |
library_preparation_kit | Library preparation method used to convert a set of amplicons into a library ready for sequencing. |
quality_control_method | Name of the method or pipeline used to evaluate sequence quality, often called "QC pipeline." |
quality_control_method_version | Version number of the quality control pipeline or method used. |
quality_control_determination | Result of the quality control assessment. Leave blank if pass/fail thresholds have not been established or choose to flag an issue if known. |
quality_control_issues | If there’s a known or suspected quality control issue present in the sequence, choose from the available picklist to flag the issue, or create your own. |
quality_control_details | Free text space to include additional description of the flagged quality control issue. |
dehosting_method | The method used to remove host reads from the raw sequencing file. |
sequence_submitter_contact_email | Email contact for the lab that sequenced the isolate. |
raw_sequence_data_processing_method | The method used for raw data processing such as removing barcodes, adapter trimming, filtering, etc. |
Contextual data attributes describing the wastewater site and local conditions, specimen replicate and pooling information, and laboratory methods employed through the nucleotide extraction process were added to NCBI’s BioSample template. Contextual data attributes describing the methods employed for sequencing SARS-CoV-2, sequence quality control assessment, and any automated data processing steps were added to the SRA metadata template. Where possible, we re-used existing NCBI attributes.
New custom attributes created specifically for this project.
Data flow and visualization
To effectively communicate and visualize the evolving landscape of SARS-CoV-2 variants detected in wastewater sites throughout the course of our project, we constructed an interactive dashboard in Tableau Desktop (Tableau Software LLC, Seattle, WA). Tableau offered a user-friendly interface, a broad set of dashboard design and development features, and could easily integrate multiple data sources, such as cloud queries of NCBI tables and output files from variant analysis pipelines.
The public dashboard needed to integrate several sources of data to present an informative snapshot of the project’s progress (Fig. 2). NCBI Entrez queries summarized BioSample records without sequence data. Amazon Web Services (AWS) Athena queries of the SRA metadata table summarized metadata attached to raw sequence (SRA) and BioSample records. New sequence submissions under the BioProject PRJNA757291 were downloaded daily and analyzed with CFSAN’s Wastewater Analysis Pipeline (C-WAP) (34) for both QC metrics and to infer relative abundances of SARS-CoV-2 lineages in each sample, computed using the Freyja method (8). C-WAP has been repackaged as Aquascope (https://github.com/CDCgov/aquascope), but the underlying algorithm remains the same, focusing on quality control (QC) metrics and using Freyja to infer relative abundances of SARS-CoV-2 lineages in each sample (35). A static list of BioProjects, laboratory names, and wastewater sites served to organize the records recovered through NCBI and aid with the final dashboard visualizations.
An important goal of the dashboard was to identify key aspects of the project that would be important to public health, such as geographic regions, stakeholders, temporality, sampling and sequencing progress, and variant calling. A map was used to display geography, stakeholders, and sampling and sequencing progress aspects to help communicate the scale of the project and the number of participating labs. A bar graph was used to show which SARS-CoV-2 variants were detected week to week, along with their relative abundances. Finally, a Gantt chart displayed the progress of participating labs in sampling and sequencing their samples over time.
Users were encouraged to explore the data visualizations by using filters to select which details they most wanted to see. Users could filter the dashboard by state, laboratory, and wastewater collection site. A quality control filter was added to the dashboard on the public-facing webpage for users to filter data by % genome uncovered.
Protocol pilot exercise
As this project entailed building methods to support expanded wastewater surveillance for state public health laboratories, it was important to establish consistency in analyses performed across participating laboratories. At the start of this project, FDA distributed a set of raw wastewater samples to each funded laboratory. These samples served two purposes: (i) they provided an early, standardized set of samples laboratories could use to test new methods and (ii) sequence data collected from each laboratory helped FDA identify which methods met the quality control requirements for this project. FDA collected four large volumes of wastewater (Table 2), comprising two samples taken about a month apart, each with two pseudo-replicates (grab samples taken back-to-back from the same location at the WWTP). Each large-volume sample was then aliquoted into 800 mL samples. The October 2021 samples (WPP-sample_SA-1.01, WPP-sample_SA-2.01) were then spiked with 106 copies of wild-type SARS-CoV-2 reference RNA (ATCC Heat Inactivated 2019 Novel Coronavirus strain nCoV/USA/WA-1/2020 Part #VR-1986HK). All samples were frozen at −80°C and then shipped to the laboratories on dry ice (four 800 mL samples in each shipment). Each laboratory was asked to sequence the population of SARS-CoV-2 from each of these samples using methods of their choice then to submit their resulting raw sequence and contextual data to NCBI.
TABLE 2.
BioSample “sample_name” | Collection date | WWTP location | Treatment |
---|---|---|---|
WPP-sample_B.01 | 20 September 2021 | Mobile, AL | Raw wastewater |
WPP-sample_C.01 | 20 September 2021 | Mobile, AL | Raw wastewater |
WPP-sample_SA-1.01 | 21 October 2021 | Pascagoula, MS | Raw wastewater spiked with wt SARS-CoV-2, 106 copies/800 mL |
WPP-sample_SA-2.01 | 21 October 2021 | Pascagoula, MS | Raw wastewater spiked with wt SARS-CoV-2, 106 copies/800 mL |
RESULTS
Participating laboratories
Twenty GenomeTrakr laboratories plus the FDA-CFSAN laboratories received funding for this special project (Table 3). Each laboratory identified at least two wastewater sites (Table S1) for routine sampling (1–2 times a week) for a minimum of 6 months. In total, samples from 81 sites were included in this project. Where feasible, sites were in counties with a higher relative percentage of food and agriculture workers. Sites included both municipal wastewater treatment plants and direct wastewater lines from food processing facilities—spanning both urban and rural populations.
TABLE 3.
Laboratory names |
---|
Arizona State Department of Health Services, T-Gen North |
California Department of Public Health |
Indiana State Department of Health |
Kentucky State Cabinet for Health and Family Services |
Massachusetts State Department of Public Health |
Nevada State Public Health Laboratory, University of Nevada—Reno |
New Jersey State Department of Agriculture |
New Jersey Department of Health |
New Mexico State University—Las Cruces |
North Carolina State University—Raleigh |
Ohio State Department of Agriculture |
Pennsylvania State University—University Park |
Rhode Island Department of Health, State Health Laboratory |
South Carolina Department of Health and Environmental Control |
South Dakota State University |
Texas Department of State Health Services |
Virginia Division of Consolidated Laboratory Services |
Washington State Department of Agriculture |
Washington State Department of Health |
West Virginia Department of Agriculture |
FDA-Center for Food Safety and Applied Nutrition |
Twenty GenomeTrakr laboratories plus FDA-CFSAN were funded for this project.
Wastewater protocol pilot exercise
Ten laboratories participated in a pilot exercise to acssess different laboratory methods being utilized: nine labs provided sequence data from all four distributed samples (Table 2). Those nine laboratories successfully amplified and sequenced SARS-CoV-2 from the four WPP samples (Table S2) and submitted their resulting sequences to NCBI (BioProject, PRJNA767800). These submissions were obtained by a diverse array of methods: seven extraction methods, six concentration methods, four enrichment strategies, seven primer schemes, and seven library preparation methods. One remaining lab opted out of the sequencing portion of the exercise, as they had encountered issues with their ddPCR (droplet digital PCR) method.
The cumulative submissions from multiple laboratories totaled 17 data sets for the four samples. Although the limited number of replicates precludes drawing definitive conclusions about individual or combined methods, several overarching trends emerged that informed our subsequent decisions for real-time sampling. In particular, the “Percent reads aligned” metric confirmed the robust specificity of three different enrichment methods for the SARS-CoV-2 virus: QIAseq DIRECT SARS-CoV-2, NEBNext ARTIC SARS-CoV-2, and Illumina COVIDSeq. The “Percent SARS-CoV-2 genome covered” demonstrated the strong performance of most primer schemes assessed in this exercise. Furthermore, across all tested methods, the variant analyses were largely consistent across samples. Specifically, the sequences for “WPP-sample_B.01” and “WPP-sample_C01” showed mostly Delta variants, while “WPP-sample_SA-1.01” and “WPP-sample_SA-2.01” revealed strong wild-type signal originating from the spiked-in synthetic virus (Fig. S1).
Quality control thresholds
We established initial QC thresholds for our sequence data after a thorough review of data collected from both the protocol pilot exercise and the first couple months of sequencing efforts. Important considerations for setting these thresholds included determining the percentage of the SARS-CoV-2 genome coverage required to confidently identify population-level variants and sub-lineages, the necessary depth of coverage to capture most of the circulating lineages, the percentage of SARS-CoV-2 in the raw sequence data, and the identification and removal of human sequencing reads prior to public release. We described four QC bins that capture major categories of sequence quality (Table 4) and proposed QC thresholds for three metrics we identified as important for determining high-quality data: % SARS-CoV-2 reads, % SARS-CoV-2 genome uncovered, and average genome coverage depth. These thresholds, deemed appropriate based on early data collection, served as a preliminary benchmark. However, we acknowledge the need for a rigorous validation process to fine-tune these thresholds to suit specific applications, recognizing that different use cases—such as general population variant tracking vs the validation of a new diagnostic kit—may require different QC thresholds.
TABLE 4.
QC bin | QC bin description | % genome uncovered (<10×) | Average coverage | Other observations | Submit to NCBI | Tag for SRA attribute: “quality_control_ determination” | Included in FDA dashboard |
---|---|---|---|---|---|---|---|
A | No QC issues evident | <5% | >1,000× | >50% reads are SARS-CoV-2 | Yes | No quality control issues identified | Yes |
B | Some QC issues, but variant calling likely OK | 6%–40% | 100×–1,000× | Yes | Minor quality control issues identified | Yes | |
C | Insufficient data for confidence in variant calling | 40%–95% | 10×–100× | Low fraction of lineage-specific mutations, (C-WAP reports) | Yes | Sequence flagged for potential quality control issues | No |
F | Significant QC and/or study design issues | >95% | <10× | <5% reads SARS-Cov-2, suspected contamination (SNR low), low sequence quality, etc. | No | Sequence flagged for significant quality control issues | No |
Four QC categories, or bins, were established based on thresholds set for various QC metrics. Sequence tags were developed to communicate these categories directly on the sequence file in an attribute called “quality_control_determination.” Only data with QC tagged in the A or B bin was visualized on the public FDA variant analysis dashboard.
Once we had a QC target for sequence data, we identified three critical QC checkpoints in laboratory workflow (Fig. 3). For QC check #1, samples containing no detectible SARS-CoV-2 RNA were deemed to have failed QC and were not processed further. However, samples that contained any level of target RNA, even at very low levels, were considered “passing” and sent on to the cDNA synthesis step for amplification. For QC check #2, to determine whether investing time in library preparation and sequencing was justifiable in terms of cost and effort, a thorough quality assessment of the PCR product (targeted enrichment) was conducted using the Qubit HS kit and a fragment size analyzer, such as Agilent Tape Station or Bioanalyzer (24, 26). Samples passing this QC step were selected for sequencing. For QC check #3, the final major QC check involved a rigorous evaluation of the raw sequencing data. For this purpose, we developed SSQuAWK4 (36), to automate the QC process for our laboratories; this tool was then made publicly accessible through a custom Galaxy instance, GalaxyTrakr (37). We also used a thorough QC evaluation and variant calling pipeline, CFSAN Wastewater Analysis Pipeline (C-WAP), via command line interface (34). Both reports included key summary metrics, such as percentage of total reads aligned to the SARS-CoV-2 reference genome, average depth of coverage, and percentage of the SARS-CoV-2 genome uncovered (<10×).
Based on that QC assessment, each sequence was assigned a QC bin (A, B, C, or F). While we performed QC assessments from very beginning of data collection, there was some uncertainty about what would qualify as a “high” or “low” quality sequencing run. Therefore, we submitted almost all sequence data to SRA, along with metadata and QC evaluations (32). Then, to guide future laboratory practices, we devised a decision matrix (Table 4), which became instrumental for downstream selections of which samples merited being featured on the public-facing dashboards.
Summary of data collected
Routine, systematic, wastewater sample collection for this project was initiated in September 2021 and ended by June 2023, with contributing laboratories submitting sequences in staggered 6-month time periods (Fig. S2). When detectable levels of SARS-CoV-2 were present, determined by quantification of COVID-specific RT/dd -PCR targets as a first phase screening, targeted amplicon approaches were used to sequence the SARS-CoV-2 RNA in the sample. In total, 3,406 wastewater samples were collected, of which 2,517 were subjected to sequencing. The resulting raw sequence data and comprehensive set of standard contextual data were submitted to laboratory-specific NCBI BioProjects, nested under the umbrella BioProject PRJNA757291. Every sample collected and tested for this project has a BioSample entry, even the ones that were not sequenced, thereby providing a unique data set within NCBI that includes both positive and negative samples.
Standard terminology describing sample collection and sequencing methods were included as attributes on both the sample record (BioSample) and and experiment records (SRA submission). Sample processing methods utilized varied across the project (Table 5). There was good representation of composite vs grab samples, n = 2,023 (59.7%) and n = 1,466 (40.3%), respectively. Most labs collected raw wastewater, n = 3,305 (97.5%), with a few primary effluent and post-grit removal samples also included. Among the concentration methods used, 90% of samples were concentrated using one of five methods: Ceres Nanotrap (n = 1,212), Innovaprep ultrafiltration (n = 604), Promega large volume TNA capture kit (n = 455), PEG (polyethylene glycol) precipitation + ultracentrifugation (n = 420), and Centricon 100 k (n = 357). For nucleic acid extraction, the most employed method was the Qiagen MagMAX Viral Kit (n = 939, 28%), followed by a variety of similiar Promega extraction kits (n = 627, 19%).
TABLE 5.
Method | # of biosamples | # of labs | % of total |
---|---|---|---|
Sample type | |||
Composite | 2,023 | 18 | 59.7 |
Grab | 1,466 | 8 | 40.3 |
Sample matrix | |||
Raw wastewater | 3,306 | 19 | 97.5 |
Post-grit removal | 79 | 1 | 2.3 |
Primary effluent | 4 | 1 | 0.1 |
Missing | 1 | 1 | <0.1 |
Viral concentration method | |||
Ceres Nanotrap | 1,212 | 8 | 35.8 |
Innovaprep ultrafiltration | 604 | 5 | 17.8 |
Promega wastewater large volume TNA capture kit | 455 | 5 | 13.4 |
Peg precipitation + ultracentrifugation | 420 | 4 | 12.4 |
Centricon 100k | 357 | 1 | 10.5 |
Zymo water concentration buffer | 155 | 1 | 4.6 |
Skim milk flocculation | 99 | 1 | 2.9 |
Membrane filtration with acidification and MgCl2 | 64 | 2 | 1.9 |
Innovaprep CP select | 19 | 1 | 0.5 |
Backflushed raw ww using rexseed filters; Promega wastewater large column TNA capture kit | 4 | 1 | 0.1 |
Nucleotide extraction method | |||
Qiagen MagMAX Viral Kit | 939 | 2 | 27.7 |
Promega Extraction Kits | 627 | 5 | 18.5 |
Qiagen AllPrep Powerviral DNA/RNA kit | 424 | 4 | 12.5 |
Qiagen Rneasy Powerwater Kit | 324 | 4 | 9.6 |
Zymo quick-rna viral kit | 345 | 2 | 10.2 |
Zymo Environ Water RNA Kit (R2042) | 262 | 1 | 7.7 |
QIAamp Viral RNA mini kit | 201 | 5 | 5.9 |
Neb monarch total rna miniprep kit + zymo onestep pcr inhibitor removal kit | 155 | 1 | 4.6 |
Macherey-Nagel nucleomag DNA/RNA water kit | 108 | 1 | 3.2 |
Ceres Nanotrap | 4 | 2 | 0.1 |
Methods cover type of sample collection, wastewater sample matrix, viral concentration method, and nucleotide extraction method.
Sequencing methods were captured at the experiment-level, attached to the raw sequence data(Table 6). Target enrichment methods encompassed broad categories of tiled amplicon approaches, with 90% of submissions choosing QIAseq DIRECT (n = 923, 37%), NEBNext ARTIC (n = 825, 33%), or the Illumina COVIDSeq Assay (n = 433, 17%). PCR primer schemes for these enrichment approaches evolved alongside the virus—in total there were 11 different primer schemes used across the project. Seven different library preparation kits were utilized to prepare the SARS-CoV-2 amplicons for sequencing and 10 different sequencing platforms were used to generate sequence data. Illumina instruments comprised 90% of the sequences submitted (n = 2,257), followed by Oxford Nanopore Technology (ONT) (n = 256, 10%), and finally, a small number sequenced on a PacBio instrument (n = 4).
TABLE 6.
Method | # of sequences | # of labs | % of total |
---|---|---|---|
Enrichment kit | |||
QIAseq DIRECT SARS-CoV-2 | 923 | 3 | 36.7 |
NEBNext ARTIC SARS-CoV-2 RT-PCR Module | 825 | 12 | 32.8 |
Illumina COVIDSeq Assay | 433 | 1 | 17.2 |
Swift Normalase Amplicon SARS-COV-2 Panels | 199 | 1 | 7.9 |
Not applicable | 137 | 4 | 5.4 |
Amplicon PCR primer scheme | |||
ARTIC V3 | 125 | 1 | 5.0 |
ARTIC V4 | 308 | 1 | 12.2 |
ARTIC V4.1 | 131 | 4 | 5.2 |
NEB VarSkip 1 a Long | 18 | 1 | 0.7 |
NEB VarSkip 1 a Short | 125 | 5 | 5.0 |
NEB VarSkip 2 a Short | 307 | 8 | 12.2 |
NEB VarSkip 2b Short | 369 | 7 | 14.7 |
QIAseq DIRECT SARS-CoV-2—Boosted | 97 | 2 | 3.9 |
QIAseq DIRECT SARS-CoV-2 primers | 838 | 3 | 33.3 |
SARS-CoV-2 SNAP primer pool | 199 | 1 | 7.9 |
Library preparation kit | |||
Amplicon sequencing kit (PacBio) | 4 | 1 | 0.2 |
Illumina DNA Prep | 513 | 8 | 20.4 |
Ligation sequencing kit | 286 | 3 | 11.4 |
NEBNext ARTIC SARS-CoV-2 Library Prep Kit (Illumina) | 514 | 5 | 20.4 |
NEBNext Ultra II FS DNA Library Prep for Illumina | 6 | 1 | 0.2 |
QIAseq DIRECT Unique Dual Index Prep | 995 | 4 | 39.5 |
Swift Normalase Amplicon SARS-CoV-2 Panels | 199 | 1 | 7.5 |
Sequencing instrument | |||
Illumina iSeq 100 | 55 | 2 | 2.2 |
Illumina MiniSeq | 440 | 4 | 17.5 |
Illumina MiSeq | 1,622 | 14 | 64.4 |
Illumina NovaSeq 6000 | 83 | 1 | 3.3 |
NextSeq 550 | 57 | 1 | 2.3 |
GridION | 8 | 1 | 0.3 |
MinION | 248 | 1 | 9.9 |
Sequel II | 4 | 1 | 0.2 |
Methods cover enrichment kit (general approach for enriching the target pathogen), Amplicon PCR primer scheme, library preparation kit, and sequencing instrument name.
Quality of sequence data
As could be expected from a multi-laboratory project using various field sampling approaches and performing simultaneous method development and data collection, the quality of sequences submitted for this project exhibited significant variability, ranging from exceptional to very low quality, based on the predefined thresholds outlined in Table 4. Of the 2,255 Illumina short read sequences, 1,381 (61%) were categorized as having “no quality control issues” (A bin), 219 (10%) as “minor quality control issues” (B bin), 633 (28%) as having “potential quality control issues” (C bin), and 22 (<1%) were flagged for “significant quality control issues” (F bin). For the average depth of SARS-CoV-2 genome coverage thresholds, we targeted 1,000× as an ideal, while considering 100× as the minimum. Coverage across our dataset ranged widely from less than 10× to over 135,000× (Fig. 4a), with a notable concentration of sequences below these thresholds flagged as low quality (F bin) (Fig. 4a). Percent of genome uncovered (i.e., percent of the SARS-CoV-2 genome not sequenced with at least 10× coverage) showed a similar pattern, with 72% of submissions meeting our threshold of 40% (Fig. 4b). Sequences for which more than 40% of the genome had not been sequenced were predominantly tagged with C and F QC bins (Fig. 4b). Conversely, submissions under this threshold, for which more of the genome had been successfully sequenced, were mostly assigned an A or B bin, indicating the coverage of the SARS-CoV-2 genome was suitable for variant analysis. Finally, for each submission, we computed the percentage of raw sequence reads that mapped to the SARS-CoV-2 genome. We see a general trend of reads with high percentages of SARS-CoV-2 being higher quality (Fig. 4c) and, conversely, reads with lower percentages of SARS-CoV-2 having a lower QC assessment, although there is no obvious inflection point at 50%, which was our target goal. We had plenty of sequences categorized as high quality, in Bin A, even though only a small fraction of reads might have been identified as SARS-CoV-2.
Variant analysis
To visualize the variants and sub-lineages in wastewater samples over time, we plotted their relative abundance against week of collection, from 12 September 2021 through 4 June 2023 (Fig. 5) (38). The dashboard was updated when new submissions appeared at NCBI, with a maximum frequency of once per day, aiming for current data representation. Analysts also continuously monitored public health news for mentions of new and clinically important variants that should be added to the dashboard’s legend. Sub-lineages that were not of public health importance or did not contribute more than 1% relative abundance within each sample were collectively categorized as “Others” for the purposes of the public dashboard.
Across the contributing laboratories, most of the sampling occurred in 2022 (Fig. S2), resulting in a few dashboard gaps in late 2022 and 2023 (Fig. 5). Samples collected from September 2021 through early December 2021 all belonged to Delta sub-lineages. Omicron BA.1 made its initial appearance during the week of 12 December 2021, swiftly replacing nearly all circulating Delta lineages within the subsequent month. Following this, Omicron BA.2 was identified in our samples in mid-March 2022, taking over from BA.1 by the end of April 2022. In early May, Omicron BA.4 emerged and circulated until October 2022 although it never reached dominance. In late April 2022, Omicron BA.5 was detected and became the predominant circulating sub-lineage until October 2022 when Omicron BQ lineages started appearing. The first widely circulating hybrid Omicron lineage, XBB, emerged in November 2022 and maintained dominance through June 2023.
Turnaround time
In line with our project’s primary objective of delivering timely pandemic sequence data for public health purposes, we evaluated the turnaround time (TAT) as the number of days from sample collection to NCBI data release for each participating laboratory. Our analysis revealed two distinct categories of laboratories based on their approach to sample processing (Fig. 6). The first category consisted of five laboratories, including FDA, that processed samples as they were collected, resulting in an average TAT range of approximately 15–30 days. The second category included 12 laboratories that initially collected samples but processed them at a later date due to various factors, including supply-chain delays for reagents and instruments, hesitancy within state public health laboratories to publically release data that were collected using non-validated methods (e.g., Lab P), and staffing shortages due to pandemic response burden. Within this category, the average TAT exhibited significant variation, ranging from 60 to 410 days between sample collection and data submission.
DISCUSSION
This project represents the first nation-wide, culture-free, population-level surveillance of a pathogen, with the intention to make data publicly available as it was collected. Within 6 months of funding acquisition, the GenomeTrakr program successfully implemented surveillance for a novel pathogen, sourced from a new-to-the-program sample origin, despite the requirement for developing new sample collection and preparation methods, optimization of novel sequencing and analysis methods, and need for novel contextual data fields. We demonstrated that these methods work and multiple U.S. local public health, agriculture, and academic laboratories within our network are now equipped and trained to execute these methods when requested at short notice. Data generated through these accomplishments underscore the enduring potential of wastewater sampling as an emerging surveillance tool.
Though largely successful, we encountered several challenges inherent to the targeted amplicon approach chosen for sequencing SARS-CoV-2 in the samples. Our initial primer sets for the targeted amplicons had been designed on previously circulating lineages of SARS-CoV-2; however, the ongoing evolution of the SARS-CoV-2 genome during multiple Omicron waves (BA.2, BA.4, BA.5) (39) resulted in periodic dropouts in coverage, or primer pairs that would suddenly stop working. Minor updates to the primer schemes were released in response (40, 41), however, these needed to be verified internally to ensure they worked before we recommend their adoption across our network of laboratories. This proved demanding to keep pace with, necessitating continuous evolution of protocols and metadata template updates alongside our routine surveillance efforts.
Due to their intrinsic reliance on the external sources regarding all SARS-CoV-2 variants ever reported in the literature, similar adaptations were required to ensure bioinformatic analyses were robust and the data analysis pipelines remained current. Each time a new variant or sub-lineage of significance was named, the variant database needed to be updated and the entire data set feeding the dashboard required re-analysis. This dynamic stands in stark contrast to the WGS protocol employed over the past decade (42), where a consistent protocol works reliably for all enteric bacterial pathogens, and updates to that protocol are infrequent occurrences.
As a direct result of constantly updating laboratory methods, there were periods of time when we were not confident in the variant calling until we were sure participant labs had implemented the primer updates. For example, as the virus mutated further in early 2022, multiple “Omicron” lineages were co-circulating, resulting in some lineages only differing by a few loci, further compounded by multiple other mutations evolving under convergent evolution (43). If the sequencing missed one or more of these diagnostic loci due to a now-suboptimal experimental design, we would expect an over-representation of parent lineages, mirrored by under-representation of the true variant(s). To address this problem, we attempted to use the QC flags to communicate how confident we were with our sequencing data.
Despite those challenges, wastewater is an ideal environmental sample to target for this project because it captures pathogen shedding at the population or subpopulation level within a spatially explicit geographic region (sewershed or subsewershed). Unlike well-established WGS-based surveillance systems (3, 4), the SARS-CoV-2 amplicon-based sequencing approach requires no culturing step, shaving days to weeks off the turnaround time from sample collection to acquiring sequencing results. For this reason, this project met two important goals for FDA’s GenomeTrakr program: (i) to contribute timely genomic data for SARS-CoV-2 pandemic response and (ii) to develop capacity and best practices for culture-independent, population-level, environmental surveillance for other pathogens of interest to the FDA, namely, enteric pathogens central to our food safety mission. Incorporating a signal provided through wastewater sampling to the existing U.S. surveillance strategies for enteric pathogens would provide a more complete picture of where pathogens are and are not circulating across the country, enabling more precise scoping of foodborne outbreaks (44–47). The potential for this expansion is currently being explored within the framework of the US National Wastewater Surveillance System (48, 49).
Drawing from our success in managing a laboratory network funded to sequence pure-culture enteric pathogens isolated from environmental and other non-human sources, with NCBI serving as our primary repository (4), we propose the following best practices for employing a comparable distributed laboratory model and utilizing NCBI as the primary repository for the implementation of culture-independent, population-level sequencing of a pathogen from wastewater. (i) Establish a standard data structure, or data object model (DOM), within NCBI (or other repository within the International Nucleotide Sequence Database Collaboration [INSDC]) to capture the sequence data and large suite of contextual data. (ii) Create a custom FAIR contextual data standard that captures relevant sample and sequence metadata, maps to the DOM, and is interoperable with existing INSDC standards. (iii) Define the critical steps within the methods for assessing QC and set thresholds for determining next steps. (iv) Publish version-controlled protocols that cover delineation of sewersheds/subsewersheds, sample collection, laboratory methods, quality control assessment, analysis, and INSDC data submission. (v) Process, sequence, and upload sample data to support timely public health actions (not entirely met by our project, but recommended for future efforts). Lastly, (vi) develop a public dashboard to visualize current data collection and analysis results to serve the needs of the project.
Supplementary Material
ACKNOWLEDGMENTS
This project was supported, in part, by funding from the American Rescue Plan Act of 2021 and an appointment to the Research Participation Program at the U.S. Food and Drug Administration administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Food and Drug Administration.
We would like to acknowledge John Callahan and CFSAN senior leadership for support on this project; Lili Velez for scientific editing; Sebastian Cianci and the web team for help getting the dashboards published; Justin Payne for advice on maintenance and scalability of software; Amy Kirby and Rory Welsh at CDC’s National Wastewater Surveillance System for collaboration; Arvind Varsan at Arizona State University for early discussions on sequencing methods; Rose Kantor and Stacia Wyman at UC Berkeley for early discussions on sequencing methods; Jay Garland at EPA and Seth A. Faith at Ohio State University for early discussions on sequencing strategy; Volodymyr Tryndyak and Camila Silva from FDA’s National Center for Toxicological Research for collaborative discussions; FDA’s 21Forward team for help with maps to help choose the wastewater sites; Josh Levy at the Scripps Research Institute: La Jolla, CA, for advice using Freyja; and Rick Lapoint, John Anderson, and the NCBI SRA and BioSample teams for handling all our curation requests.
Contributor Information
Ruth E. Timme, Email: Ruth.Timme@fda.hhs.gov.
Christopher W. Marshall, Marquette University, Milwaukee, Wisconsin, USA
The GenomeTrakr Laboratory consortium:
Ward Jacox, Dave Engelthaler, Michael Valentine, Crystal Hepp, David Kiang, Zhirong Li, Ryan Gentry, Mary Ann Hagerman, Mary Robinson, Jesse Knibbs, Madi Asbell, Beth Johnson, Logan Burns, Ashley Aurand-Cravens, Joshua Stacy, Tracy Stiles, Esther Fortes, Matthew Doucette, Brandon Sabina, Luc Gagne, Kelly Binns, Mark Pandori, Andrew Gorzalski, Lauryn Massic, Sarmila Dasgupta, Amar Patil, Apryle Panyi, Edward Acheampong, Thomas Kirn, Nicholas Palmateer, Willis Fedio, Yatziri Preciado, Srikanth Paladugu, Siddhartha Thakur, Lyndy Harden-Plumley, Luke Raymond, Melanie Prarat, Ashley Sawyer, Jonah Perkins, Edward Dudley, Jasna Kovac, Nkuchia M. M’ikanatha, Erin M. Nawrocki, Yezhi Fu, Nyduta Mbogo, Kristin Carpenter-Azevedo, Richard C. Huard, Sean Sierra-Patev, Megan Davis, Laura M. Lane, Christy A. Jeffcoat, Gregory Goodwin, Gabrielle Godfrey, Andrew Smith, Chukwuemika N. Aroh, Kirsti R. Gilmore, Jessica Freeman, Joy Scaria, Jane Hennings, Eric Nelson, Yan Sun, Bonnie Oh, Michael Jost, Bryan Brooks, Laura Langan, Lauren Turner, Stephanie Dela Cruz, Jessica Maitland, Shelby Bennett, Logan Fink, Mary Toothman, Hyunsook Moon, Yong Liu, Mychal Hendrickson, Darren Lucas, Phillip Dykema, Roxanne Meek, Geoff Melly, Paige Sickles, Breanna McArdle, Anneke Jansen, Megan Young, Josh Arbaugh, Zachary Kuhl, and Ewa King
SUPPLEMENTAL MATERIAL
The following material is available online at https://doi.org/10.1128/msystems.01415-23.
ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.
REFERENCES
- 1. World Health Organization . 2021. Weekly epidemiological update - 2 February 2021. Available from: https://www.who.int/publications/m/item/weekly-epidemiological-update---2-february-2021. Retrieved 24 Nov 2022.
- 2. Allard MW, Strain E, Melka D, Bunning K, Musser SM, Brown EW, Timme RE. 2016. Practical value of food pathogen traceability through building a whole-genome sequencing network and database. J Clin Microbiol 54:1975–1983. doi: 10.1128/JCM.00081-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Stevens EL, Carleton HA, Beal J, Tillman GE, Lindsey RL, Lauer AC, Pightling A, Jarvis KG, Ottesen A, Ramachandran P, et al. 2022. The use of whole-genome sequencing by the Federal interagency collaboration for genomics for food and feed safety in the United States. J Food Prot 85:755–772. doi: 10.4315/JFP-21-437 [DOI] [PubMed] [Google Scholar]
- 4. Timme RE, Wolfgang WJ, Balkey M, Venkata SLG, Randolph R, Allard M, Strain E. 2020. Optimizing open data to support one health: best practices to ensure interoperability of genomic data from bacterial pathogens. One health outlook 2:20. doi: 10.1186/s42522-020-00026-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Bivins A, North D, Ahmad A, Ahmed W, Alm E, Been F, Bhattacharya P, Bijlsma L, Boehm AB, Brown J, et al. 2020. Wastewater-based epidemiology: global collaborative to maximize contributions in the fight against COVID-19. Environ Sci Technol 54:7754–7757. doi: 10.1021/acs.est.0c02388 [DOI] [PubMed] [Google Scholar]
- 6. Medema G, Heijnen L, Elsinga G, Italiaander R, Brouwer A. 2020. Presence of SARS-Coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in the Netherlands. Environ Sci Technol Lett 7:511–516. doi: 10.1021/acs.estlett.0c00357 [DOI] [PubMed] [Google Scholar]
- 7. Gerrity D, Papp K, Stoker M, Sims A, Frehner W. 2021. Early-pandemic wastewater surveillance of SARS-CoV-2 in Southern Nevada: methodology, occurrence, and incidence/prevalence considerations. Water Res X 10:100086. doi: 10.1016/j.wroa.2020.100086 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Karthikeyan S, Levy JI, De Hoff P, Humphrey G, Birmingham A, Jepsen K, Farmer S, Tubb HM, Valles T, Tribelhorn CE, et al. 2022. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609:101–108. doi: 10.1038/s41586-022-05049-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. FDA-ORA . 2019. Laboratory flexible funding model cooperative agreement program. Available from: https://grants.nih.gov/grants/guide/pa-files/PAR-20-105.html.
- 10. FDA . 2021. 21 FORWARD: unleashing the power of FDA data to support COVID-19 vaccine distribution to food and agriculture workers. 21 FORWARD. FDA. Available from: https://www.fda.gov/news-events/fda-voices/unleashing-power-fda-data-support-covid-19-vaccine-distribution-food-and-agriculture-workers [Google Scholar]
- 11. Fontenele RS, Kraberger S, Hadfield J, Driver EM, Bowes D, Holland LA, Faleye TOC, Adhikari S, Kumar R, Inchausti R, et al. 2021. High-throughput sequencing of SARS-Cov-2 in wastewater provides insights into circulating variants. MedRxiv Prepr Serv Health Sci. doi: 10.1101/2021.01.22.21250320 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Izquierdo-Lara R, Elsinga G, Heijnen L, Munnink BBO, Schapendonk CME, Nieuwenhuijse D, Kon M, Lu L, Aarestrup FM, Lycett S, Medema G, Koopmans MPG, de Graaf M. 2021. Monitoring SARS-CoV-2 circulation and diversity through community wastewater sequencing, the Netherlands and Belgium. Emerg Infect Dis 27:1405–1415. doi: 10.3201/eid2705.204410 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Spurbeck RR, Minard-Smith AT, Catlin LA. 2021. Applicability of neighborhood and building scale wastewater-based genomic epidemiology to track the SARS-CoV-2 pandemic and other pathogens. medRxiv. doi: 10.1101/2021.02.18.21251939 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Crits-Christoph A, Kantor RS, Olm MR, Whitney ON, Al-Shayeb B, Lou YC, Flamholz A, Kennedy LC, Greenwald H, Hinkle A, Hetzel J, Spitzer S, Koble J, Tan A, Hyde F, Schroth G, Kuersten S, Banfield JF, Nelson KL. 2021. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. mBio 12:e02703-20. doi: 10.1128/mBio.02703-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Timme R. Wastewater protocols within the protocols.io GenomeTrakr workspace. protocols.io. Available from: https://www.protocols.io/workspaces/genometrakr1. Retrieved 1111 DecDecember 2023. Accessed , 1111 DecDecember 2023
- 16. Calci K. 2021. Collection from wastewater treatment plant, transportation, and storage of raw wastewater. protocols.io. Available from: 10.17504/protocols.io.bycepste [DOI]
- 17. Woods J, Rodrigues R. 2021. Virus concentration from wastewater using PEG precipitation and ultracentrifugation. protocols.io. Available from: 10.17504/protocols.io.bx9ipr4e [DOI]
- 18. Walsky T, Ramachandran P, Windsor A, Hoffmann M, Grim C. 2022. Extraction of total nucleic acid from wastewater using the promega wizard enviro total nucleic acid kit. protocols.io. Available from: 10.17504/protocols.io.4r3l2oebxv1y/v1 [DOI]
- 19. Walsky T, Ramachandran P, Windsor A, Hoffmann M, Grim C. 2022. RNA extraction and quality assessment targeting SARS-CoV-2 from wastewater concentrates using zymo environ water RNA kit. protocols.io. Available from: 10.17504/protocols.io.b3inqkde [DOI]
- 20. Woods J, Rodrigues R. 2021. RNA extraction from wastewater concentrates using RNeasy and zymo kits. protocols.io. Available from: 10.17504/protocols.io.bygvptw6 [DOI]
- 21. Windsor A, Walsky T, Ramachandran P, Grim C, Hoffmann M. 2022. Rtqpcr of SARS-Cov-2 N1 target on ABI 7500 fast using Promega Gotaq Enviro wastewater SARS-Cov-2 system V1. Protocols.io. Available from: 10.17504/protocols.io.rm7vzy52xlx1/v1 [DOI]
- 22. Woods J, Rodrigues R. 2021. RT-qPCR detection of SARS-CoV-2 from wastewater using the AB 7500 V.2. Protocols.io. Available from: 10.17504/protocols.io.6qpvrdj4bgmk/v2 [DOI]
- 23. Woods J, Rodrigues R. 2021. RT-qPCR detection of process controls (murine noroviurs and crAssphage) from wastewater using AB 7500 V.2. Protocols.io. Available from: 10.17504/protocols.io.kqdg36j9pg25/v2 [DOI]
- 24. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2022. Enhanced QIAseq DIRECT SARS-CoV-2 kit for Illumina MiSeq V.4. Protocols.io. Available from: 10.17504/protocols.io.rm7vzy39rlx1/v4 [DOI]
- 25. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2021. Modified NEBNext® VarSkip short SARS-CoV-2 library prep kit for Illumina platforms - adapted for wastewater samples V.3. Protocols.io. Available from: 10.17504/protocols.io.5jyl89n26v2w/v3 [DOI]
- 26. Ramachandran P, Walsky T, Windsor A, Grim C, Hoffmann M. 2022. Modified NEBNext® VarSkip short SARS-CoV-2 enrichment and library prep for Oxford Nanopore technologies- adapted for wastewater samples V.2. Protocols.io. Available from: 10.17504/protocols.io.3byl4bwervo5/v2 [DOI]
- 27. Ramachandran P, Walsky T, Windsor A, Hoffmann M, Grim C. 2022. Modified Illumina DNA prep (M) tagmentation library preparation for cDNA amplicons from wastewater. Protocols.io. Available from: 10.17504/protocols.io.b34rqqv6 [DOI]
- 28. Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE, et al. 2016. The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. doi: 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Timme RE, Karsch-Mizrachi I, Waheed Z, Arita M, MacCannell D, Maguire F, Petit Iii R, Page AJ, Mendes CI, Nasar MI, Oluniyi P, Tyler AD, Raphenya AR, Guthrie JL, Olawoye I, Rinck G, O’Cathail C, Lees J, Cochrane G, Cummins C, Brister JR, Klimke W, Feldgarden M, Griffiths E. 2023. Putting everything in its place: using the INSDC compliant pathogen data object model to better structure genomic data submitted for public health applications. Microb Genom 9:001145. doi: 10.1099/mgen.0.001145 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. NCBI . 2021. BioSample Package: SARS-CoV-2 wastewater surveillance, version 1.0. Available from: https://submit.ncbi.nlm.nih.gov/biosample/template/?organism-organism_name=&organism-taxonomy_id=&package-0=SARS-CoV-2.wwsurv.1.0&action=definition. Retrieved 22 Aug 2023.
- 31. Griffiths EJ, Timme RE, Mendes CI, Page AJ, Alikhan N-F, Fornika D, Maguire F, Campos J, Park D, Olawoye IB, et al. 2022. Future-proofing and maximizing the utility of metadata: The PHA4GE SARS-CoV-2 contextual data specification package. GigaScience 11:giac003. doi: 10.1093/gigascience/giac003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Griffiths E, Mendes CI, Maguire F, Guthrie J, Chindelevitch L, Karsch-Mizrachi I, Waheed Z, Cameron R, Holt K, Katz L, Petit III R, MacCannell D, Dave M, Oluniyi P, Nasar MI, Raphenya A, Hsiao W, Timme R. 2023. PHA4GE quality control contextual data tags: standardized annotations for sharing public health sequence datasets with known quality issues to facilitate testing and training. Life sciences. doi: 10.20944/preprints202303.0037.v1 [DOI] [PMC free article] [PubMed]
- 33. Timme R, Bias C, Balkey M. 2021. NCBI submission protocol for SARS-CoV-2 wastewater data: SRA, BioSample, and BioProject. V10. Protocols.Io. Available from: 10.17504/protocols.io.ewov14w27vr2/v10 [DOI]
- 34. Kayikcioglu T, Amirzadegan J, Rand H, Tesfaldet B, Timme RE, Pettengill JB. 2023. Performance of methods for SARS-CoV-2 variant detection and abundance estimation within mixed population samples. PeerJ 11:e14596. doi: 10.7717/peerj.14596 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. CDC . 2023. Aquascope (1.0.0). Nextflow. Centers for Disease Control and Prevention. [Google Scholar]
- 36. Amirzadegan J, Kayikcioglu T, Rand H, Timme R, Balkey M. 2022. Wastewater QC workflow in GalaxyTrakr (Ssquawk4) V.9. Protocols.io. Available from: 10.17504/protocols.io.kxygxzk5dv8j/v9 [DOI]
- 37. Gangiredla J, Rand H, Benisatto D, Payne J, Strittmatter C, Sanders J, Wolfgang WJ, Libuit K, Herrick JB, Prarat M, Toro M, Farrell T, Strain E. 2021. GalaxyTrakr: a distributed analysis tool for public health whole genome sequence data accessible to non-bioinformaticians. BMC Genomics 22:114. doi: 10.1186/s12864-021-07405-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Kayikciogly T, Amirzadegan J. 2021. Variant abundance estimations from wastewater surveillance study by FDA/CFSAN. FDA-CFSAN (Center for Food Safety and Applied Nutrition
- 39. Tegally H, Moir M, Everatt J, Giovanetti M, Scheepers C, Wilkinson E, Subramoney K, Makatini Z, Moyo S, Amoako DG, et al. 2022. Emergence of SARS-CoV-2 Omicron lineages BA.4 and BA.5 in South Africa. Nat Med 28:1785–1790. doi: 10.1038/s41591-022-01911-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. New England Biolabs Inc . 2023. VarSkip multiplex PCR designs for SARS-CoV-2 sequencing. New England Biolabs Inc. [Google Scholar]
- 41. ARTIC Network . 2023. ARTIC network - SARS-CoV-2 version 5.3.2 primer scheme release. ARTIC real-time genomic surveill. https://community.artic.network/t/sars-cov-2-version-5-3-2-scheme-release/462.
- 42. Pfeifer T, Haendiges J, Balkey M, Timme R. 2022. GenomeTrakr WGS protocol collection and workflow for MiSeq V.2. Protocols.io. Available from: 10.17504/protocols.io.3byl4bwyjvo5/v2 [DOI]
- 43. Focosi D, Quiroga R, McConnell S, Johnson MC, Casadevall A. 2023. Convergent evolution in SARS-CoV-2 spike creates a variant soup from which new COVID-19 waves emerge. Int J Mol Sci 24:2264. doi: 10.3390/ijms24032264 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Harvey RWS, Price TH. 1970. Sewer and drain swabbing as a means of investigating salmonellosis. Epidemiol Infect 68:611–624. doi: 10.1017/S0022172400042546 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Diemert S, Yan T. 2019. Clinically unreported salmonellosis outbreak detected via comparative genomic analysis of municipal wastewater salmonella isolates. Appl Environ Microbiol 85:e00139-19. doi: 10.1128/AEM.00139-19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Sahlström L, de Jong B, Aspan A. 2006. Salmonella isolated in sewage sludge traced back to human cases of salmonellosis. Lett Appl Microbiol 43:46–52. doi: 10.1111/j.1472-765X.2006.01911.x [DOI] [PubMed] [Google Scholar]
- 47. Goldblum ZS, M’ikanatha NM, Nawrocki EM, Cesari N, Kovac J, Dudley EG. 2024. Salmonella Senftenberg isolated from wastewater is linked to a 2022 multistate outbreak. medRxiv. doi: 10.1101/2024.02.20.24302949 [DOI] [Google Scholar]
- 48. Adams C, Bias M, Welsh RM, Webb J, Reese H, Delgado S, Person J, West R, Shin S, Kirby A. 2024. The national wastewater surveillance system (NWSS): from inception to widespread coverage, 2020–2022, United States. Sci Total Environ 924:171566. doi: 10.1016/j.scitotenv.2024.171566 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49. CDC . 2022. Wastewater surveillance: a new frontier for public health. Available from: https://www.cdc.gov/amd/whats-new/wastewater-surveillance.html. Retrieved 18 Mar 2024.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.