Intensity of sample processing methods impacts wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes

Shuchen Feng; Sarah M Owens; Abhilasha Shrestha; Rachel Poretsky; Erica M Hartmann; George Wells

doi:10.1016/j.scitotenv.2023.162572

. 2023 Mar 4;876:162572. doi: 10.1016/j.scitotenv.2023.162572

Intensity of sample processing methods impacts wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes

Shuchen Feng ^a,¹, Sarah M Owens ^b, Abhilasha Shrestha ^c, Rachel Poretsky ^d, Erica M Hartmann ^a, George Wells ^a,^⁎

PMCID: PMC9984232 PMID: 36871720

Abstract

Wastewater SARS-CoV-2 surveillance has been deployed since the beginning of the COVID-19 pandemic to monitor the dynamics in virus burden in local communities. Genomic surveillance of SARS-CoV-2 in wastewater, particularly efforts aimed at whole genome sequencing for variant tracking and identification, are still challenging due to low target concentration, complex microbial and chemical background, and lack of robust nucleic acid recovery experimental procedures. The intrinsic sample limitations are inherent to wastewater and are thus unavoidable. Here, we use a statistical approach that couples correlation analyses to a random forest-based machine learning algorithm to evaluate potentially important factors associated with wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes, with a specific focus on the breadth of genome coverage. We collected 182 composite and grab wastewater samples from the Chicago area between November 2020 to October 2021. Samples were processed using a mixture of processing methods reflecting different homogenization intensities (HA + Zymo beads, HA + glass beads, and Nanotrap), and were sequenced using one of the two library preparation kits (the Illumina COVIDseq kit and the QIAseq DIRECT kit). Technical factors evaluated using statistical and machine learning approaches include sample types, certain sample intrinsic features, and processing and sequencing methods. The results suggested that sample processing methods could be a predominant factor affecting sequencing outcomes, and library preparation kits was considered a minor factor. A synthetic SARS-CoV-2 RNA spike-in experiment was performed to validate the impact from processing methods and suggested that the intensity of the processing methods could lead to different RNA fragmentation patterns, which could also explain the observed inconsistency between qPCR quantification and sequencing outcomes. Overall, extra attention should be paid to wastewater sample processing (i.e., concentration and homogenization) for sufficient and good quality SARS-CoV-2 RNA for downstream sequencing.

Keywords: Wastewater SARS-CoV-2, Amplicon sequencing, Sample processing methods, RNA fragmentation, Illumina COVIDseq, QIAseq DIRECT

Graphical abstract

1. Introduction

Since the beginning of the COVID-19 pandemic, wastewater-based epidemiology (WBE) has been applied for the surveillance for SARS-CoV-2 and its variants community-wide (Medema et al., 2020; Bivins et al., 2020; Larsen and Wigginton, 2020; Ahmed et al., 2020a). In September 2020, the Centers for Disease Control and Prevention (CDC) launched the National Wastewater Surveillance System (NWSS) to support the COVID-19 pandemic response in the U.S. (CDC, 2020a). Earlier wastewater surveillance studies mostly focused on tracking the change of viral concentration in the sewersheds by applying reverse-transcription quantitative polymerase chain reaction (RT-qPCR) (Medema et al., 2020; Ahmed et al., 2020a; Wu et al., 2020) or droplet digital polymerase chain reaction (ddPCR) (Feng et al., 2021; Graham et al., 2021). Soon thereafter, genomic surveillance of SARS-CoV-2 was also applied to screen for the presence of SARS-CoV-2 and its variants (Crits-Christoph et al., 2021; Izquierdo-Lara et al., 2021; Nemudryi et al., 2020; Fontenele et al., 2021).

Successful SARS-CoV-2 whole genome sequencing from wastewater has been reported sporadically (Crits-Christoph et al., 2021; Izquierdo-Lara et al., 2021; Nemudryi et al., 2020; Fontenele et al., 2021; Karthikeyan et al., 2022), where SARS-CoV-2 reads were successfully mapped to the reference genome at near-full genome breadth of coverage (e.g., >90 %) in considerable read depth. Recently, wastewater SARS-CoV-2 sequencing has also focused on targeted region(s), such as the S gene that contains key mutations for viral evolution and lineage identification (Smyth et al., 2022). On the other side, mutations in non-spike regions also provide important information about viral replication and transmission (Syed et al., 2021). This indicates the importance of whole genome sequencing of SARS-CoV-2 in wastewater samples, which could reveal multiple mutations, variants and lineages circulating in the local communities, and could also be viewed as complementary evidence for novel or emerging variants and lineages in addition to clinical sample sequencing results reported to public health departments (Mercer and Salit, 2021; CDC, 2021). Despite the reported successful cases in whole genome and targeted sequencing in the field, sequencing SARS-CoV-2 from wastewater is still very complicated due to significant challenges specific to the sample type (Mercer and Salit, 2021; O'Reilly et al., 2020). These challenges include low target concentration, degraded and/or fragmented RNA, potential PCR inhibitors, and extremely complex microbial and chemical background, which make RNA yield and quality not ideal for downstream sequencing. Sample intrinsic features such as fragmented RNA template in wastewater are unavoidable. In fact, reports of fragmented wastewater SARS-CoV-2 sequencing outcomes from whole genome amplification are not uncommon (Izquierdo-Lara et al., 2021; Amman et al., 2022; Jahn et al., 2022); for example, Amman et al. applied a criteria of only 40 % genome breadth of coverage for a wastewater sample to be “passed” for downstream sequence analysis (Amman et al., 2022).

To date, there is no standard method for wastewater SARS-CoV-2 sequencing approaches. Various methods can and have been applied for each single step from sample concentration to sequencing, such as concentration methods based on the solids and/or the liquid portion of wastewater (Graham et al., 2021; Jahn et al., 2022; Forés et al., 2021), extraction methods using a wide variety of commercially available kits with the applications of silica columns or magnetic beads (Palmer et al., 2021), and various library preparation kits with different sequencing primer schemes (e.g., ARTIC primer schemes (Jahn et al., 2022; Bar-Or et al., 2021; Barbé et al., 2022), QIAseq DIRECT kit (Wurtzer et al., 2022), Swift Normalase Amplicon SARS-CoV-2 Panel/IDT xGen SARS-CoV-2 Panel (Fontenele et al., 2021; Karthikeyan et al., 2022)), as well as variations in sequencing platforms including those based on Illumina (Fontenele et al., 2021; Karthikeyan et al., 2022) and Oxford Nanopore Technologies (Barbé et al., 2022). Whether one or several of these strategies contribute specifically to the reported success for wastewater SARS-CoV-2 genome sequencing is unknown.

In this study, we aimed to explore potential important factors that could impact the outcome of whole genome amplicon sequencing of wastewater SARS-CoV-2 from a technical perspective using statistical approaches. We report our sequencing outcomes (i.e., genome breadth of coverage) from 180 composite and grab samples in the Chicago area that were collected from January to October 2021, with another two samples from November 2020. In total, three sample processing methods (i.e., concentration and homogenization methods) and two sequencing library preparation kits were tested, yielding mostly incomplete and variable genome breadth of converge. We applied correlation analysis and a random forest-based machine learning algorithm to identify potential contributors to the incomplete sequencing results from a suite of co-varying factors, including certain sample intrinsic parameters and technical aspects. We report details of sequencing results and the identified potential important features, including 1) sequencing outcomes of different composite and grab samples using two library preparation kits, 2) potential impacts from wastewater sample intrinsic features, including viral concentration (measured using the CDC N1 assay), flow rate, nutrient concentration (indicated by the concentration of ammonia), biochemical oxygen demand (BOD), and suspended solids content, and 3) potential contribution of sample processing methods that are of different homogenization intensity levels. We also report and provide a possible explanation for the inconsistency observed between qPCR N1 concentrations and sequencing outcomes in our data through the results of a synthetic SAR-CoV-2 RNA control experiment, where different processing methods were tested for raw samples and water (as control), with and without spike-in of the Twist synthetic SAR-CoV-2 RNA Control.

2. Material and methods

2.1. Sample collection and processing

Flow-weighted weekly 24-h raw composite wastewater treatment plant (WWTP) influent samples and grab samples from local sewers were sent on ice to the School of Public Health, University of Illinois Chicago within 4 h of collection and kept at 4 °C before processing. Composite sample used were collected from six WWTPs that serve metropolitan Chicago and suburbs: Stickney, Terrence J. O'Brien, Calumet, Hanover Park, Kirie and Egan WWTPs. A total of 182 wastewater samples were collected from the Chicago area between November 2020 to October 2021 at a sampling frequency of once to twice a week, with most samples collected between January and October 2021, including 99 composite and 83 grab samples. Detailed sample information is provided in Supplemental Dataset 1.

Two concentration methods were used, including the HA filtration method and the Nanotrap method. Samples undergoing HA filtration were processed within 4 h of arrival. Those undergoing Nanotrap were processed within 12 h of arrival, as previous studies have demonstrated that storing wastewater samples at 4 °C does not significantly alter SARS-CoV-2 viral signal for up to 15–19 days (Wu et al., 2020; Ahmed et al., 2020b; Beattie et al., 2022); prior to Nanotrap processing, these samples were stored at 4 °C. All sample processing procedures were performed in a biosafety cabinet in a BSL2 laboratory. HA filtration was performed by filtering 25 mL samples through 0.8 μm cellulose ester HA filters (47 mm diameter; MF-Millipore, Carrigtwohill, Ireland). For Nanotrap, 10 mL of raw sewage was mixed with 150 μL of Nanotrap nanobeads (Ceres Nanosciences, Inc., Manassas, VA) in a 50 mL conical tube and mixed by vortexing briefly. The mixture was then incubated at room temperature for 30 min and transferred to a magnetic rack for another 30 min until all nanoparticles were aggregated and settled to the bottom where the magnet was positioned. The supernatant was decanted, and the Nanotrap pellets were then eluted with 650 μL viral lysis buffer (Solution PM1, QIAGEN, Germantown, MD) for downstream extraction.

For HA filter homogenization prior to extraction, bead-beating was employed for viral lysis. We tested two types of beads: the ZR BashingBead Lysis tube (Zymo beads; Zymo, Irvine, CA) and the GeneRite pre-loaded beads tube (glass beads; GeneRite, #S0205-50, North Brunswick, NJ). For both beads, bead beating was performed using a mini beadbeater (Biospec Products, Bartlesville, OK) in two runs of 2.5 min with a 5 min rest in 4 °C (Feng et al., 2021). Detailed information of samples and processing methods is provided in Supplemental Dataset 1.

2.2. Nucleic acid extraction

Three extraction kits were used in this study: (1) AllPrep PowerViral DNA/RNA Kit (herein “PV”; Qiagen, Hilden, Germany), (2) QIAamp Viral RNA Mini Kit (“QIAamp”; Qiagen, Hilden, Germany), and (3) MagMax Viral/Pathogen Total Nucleic Acid Isolation Kit (“MagMax”; Thermo Fisher Scientific, Inc., Waltham, MA, USA). For HA filtration with Zymo beads (HA + Zymo), the PV kit was used (Feng et al., 2021). For HA filtration with glass beads (HA + glass), the QIAamp Kit was used as previously described (Owen et al., 2022). For the Nanotrap method, the MagMax Kit was used. All extractions followed the manufacturer's standard instructions.

We performed several rounds of experiments to compare extraction kits QIAamp vs. PV and PV vs. MagMax, respectively. For QIAamp and PV, we measured N1 concentrations in three samples collected in February 2021, including two composite (from the Stickney sites) and one grab samples (samples not sequenced) (Table S1, Fig. S1). The HA + glass method was used to process the samples for these two extraction kits comparison. Additionally, another group of Zymo beads with the PV kit was assessed for Zymo and glass beads performance comparison using the same samples (Fig. S1). Details of biological samples and extractions are provided in Table S1. Our extraction kit comparison results showed that the N1 concentrations (cp/L) obtained from the two kits were comparable (QIAamp v.s. PV, Welch's t-test, p-value = 0.107; Fig. S1), and that the average N1 concentrations from the Zymo beads were significantly higher (5-fold) than the glass beads (both with PV kit, Welch's t-test, p-value = 4.96 × 10⁻⁴; Fig. S1). For PV and MagMax comparison, we extracted five wastewater samples in singleton side by side using the HA + Zymo method, including four composite (Stickney and Calumet WWTPs) and one grab from July 2021, and quantified N1 concentrations (See Table S2 for details). The N1 concentration (cp/L) from the two kits were also comparable (Welch's t-test, p-value = 0.235).

Because N1 concentration was significantly higher in extractions with Zymo beads than those with glass beads, and also that under the same bead beating setting (same bead beater with the same amount of time), the Zymo beads homogenized the filter completely to a paste-like texture, while the glass beads left large pieces still intact. Therefore, we consider the HA + Zymo a harsher homogenization method than the HA + glass or Nanotrap method, where no beads were used. Also, to interpret impacts from the processing methods' features, we considered both the PV and QIAamp extraction kits as “silica column-based” extraction methods in our downstream variable of importance selection analysis, and the MegMax kit as a “magnetic beads-based” extraction method. Herein, we refer to the sample “processing methods” as concentration and homogenization methods, and the sample processing groups to “HA + Zymo”, “HA + glass” and “Nanotrap”.

2.3. Synthetic SAR-CoV-2 RNA control testing

In addition to processing wastewater samples, we also employed controlled testing with synthetic SARS-CoV-2 RNA to elucidate the influence of sample processing methods on sequencing outcomes. We hypothesized that more intensive processing methods (i.e., homogenization) could have worse recoveries of the free synthetic SARS-CoV-2 RNA in sequencing, and that the wastewater sample context could also affect genome recoveries under the setting of different processing methods.

To test this hypothesis, we spiked 10 μL of 1:10 v/v diluted Twist Synthetic SARS-CoV-2 RNA Control 2 (Twist Bioscience, South San Francisco, CA) into composite samples which then underwent each sample processing combination. The Twist RNA's concentration was quantified using the CDC N1 assay by testing 1:10 and 1:100 v/v dilutions in duplicate and was determined to be 1.35 × 10⁷ cp/μL N1 (see Section 2.4 for the N1 assay details). Four composite samples were used, including two from the Stickney Water Reclamation Plant (SW) and two from the Terrence J. O'Brien Water Reclamation Plant (OB), sampled on November 2 and November 9, 2021, respectively. Samples were processed in duplicate using the HA + Zymo, HA + glass, and Nanotrap methods with and without the Twist RNA spiked in. The two November 9 samples using Twist spike-in Nanotrap method were processed in singletons. For controls, Twist RNA was spiked into the same volume of molecular grade water as samples in each processing method. Briefly, two types of positive controls were made for the HA methods to evaluate the impacts of filtration on Twist RNA recovery: Twist RNA spiked into 25 mL molecular grade water followed by HA filtration before bead beating (referred as “positive control filtered”), and Twist RNA placed directly in the bead beating tube with a blank filter followed by bead beating (referred as “positive control not filtered”). For Nanotrap positive controls, Twist RNA was mixed with 10 mL of molecular grade water and was processed as wastewater samples. All positive controls were duplicated except for the “positive controls not filtered” that was in singleton. These method groups resulted in a total of 53 extractions for the experiment (See Table S3 for detailed information of the experiment extracts). For a subset of available sample RNA extracts, the sizes of the total RNA were measured using a 5200 Fragment Analyzer System (Agilent, Santa Clara, CA) to understand the fragmentation patterns of extracted total RNA from the three processing methods.

2.4. Quantification of SARS-CoV-2 concentration

Quantification of SARS-CoV-2 concentration was performed using quantitative reverse transcription PCR (RT-qPCR) using the CDC N1 assay according to the CDC 2019-nCoV Real-Time RT-PCR Diagnostic Panel (CDC, 2020b). Briefly, a total reaction volume of 20 μL was used, including 5 μL of the TaqPath™ 1-Step RT-qPCR Master Mix, CG (Thermo Fisher Scientific, Inc.), 1 μL of primers with a final concentration of 500 nM each, and 1 μL of probe with a final concentration of 125 nM, 8 μL DNase/RNase-free water and 5 μL of template. The amplification program started at 25 °C for 2 min, followed by 50 °C for 15 min and 45 cycles of 95 °C for 2 min, then 45 cycles of 95 °C for 3 s and 55 °C for 30 s. A standard curve was established by running five dilutions of transcribed and purified plasmid DNA targets (Integrated DNA Technologies, Inc., Coralville, IA, USA) in triplicate from 5.0 × 10⁵ copies to 50 copies per reaction. The N1 assay standard curve had a slope of −3.189, y-interception of 41.805, R² of 0.999 and an efficiency of 105.9 %.

2.5. Library preparation and sequencing

Two library preparation methods and sequencing platforms were used in this study, including i) the QIAseq DIRECT SARS-CoV-2 Kit (Qiagen, Hilden, Germany) on an Illumina Miseq or Hiseq platform in the Environmental Sample Preparation & Sequencing Facility at Argonne National Laboratory, and ii) the Illumina COVIDSeq Test (RUO Version; Illumina Inc., USA) on an Illumina Nextseq 550 platform performed in the Illinois Department of Public Health (IDPH).

2.6. Metadata collection

Monthly plant operating data for composite samples metadata analysis were collected by each WWTP and accessed from https://apps.mwrd.org/plant_data/OperatingData.aspx. Metadata collected included air temperature (°F), flow rate (million gallons per day, MGD), suspended solids content (mg/L), biochemical oxygen demand (BOD, mg/L), and ammonia nitrogen (mg N/L) on both the sampling day and the day before sampling. For metadata analysis of all 24-h composite samples, geometric means of two days' values were used.

2.7. Sequencing data analysis

Raw data were assessed for reads quality in FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/), followed by adaptor trimming in cutadapt (Martin, 2011) and quality trimming in BBduk (http://jgi.doe.gov/data-and-tools/bb-tools/). Bwa-mem (Li, 2013) was used for paired-end reads mapping to the reference genome of Wuhan-Hu-1 (NCBI RefSeq accession NC_045512.2). Primer trimming was performed with iVar (Castellano et al., 2021) using bed files specific to the QIAseq DIRECT kit and the Illumina COVIDseq kit, respectively. The trimmed bam file was then realigned using bwa-mem and deduplicated using GATK4 (Van der Auwera and O’Connor, 2020) for statistical evaluation and downstream analysis.

2.8. Statistical analysis

Spearman's rank correlation coefficients were used as a proxy to indicate the relationship between the metadata and sequencing outcomes. Selection of important variables (i.e., feature selection) was performed using the R package ‘Boruta’ (Kursa and Rudnicki, 2010), which performs random shuffling of the original features (called shadow features) and trains a random forest classifier on this shuffled dataset to evaluate the importance of original features by comparing them to the shadow features (i.e., lower or higher than the importance of ‘Shadow Max’). All statistical analysis was performed using R version 4.0.5 (R Core Team, 2021).

2.9. Sequencing data access

All sequencing data used in this manuscript are available via the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under BioProject PRJNA873764.

3. Results

3.1. Incomplete SARS-CoV-2 genomes were recovered from wastewater samples using three sample processing methods and two sequencing library preparation kits

We sequenced 99 composite samples and 83 grab samples collected from the Chicago area from November 2020 to October 2021, all but two were collected from January to October 2021 (Table 1; Supplemental Dataset 1). The composite samples were highly variable with the different collection dates and processed methods (Figs. 1, S2) and had an average N1 concentration of 2.4 × 10⁵ ± 3.5 × 10⁵ copies per liter (cp/L); the grab samples had an average N1 concentration of 4.4 × 10⁵ ± 2.4 × 10⁶ cp/L. Among these wastewater samples, 155 were sequenced using the Illumina COVIDseq kit, 25 were sequenced using the QIAseq DIRECT kit, and two were sequenced using both kits, resulting in a total number of 184 RNA samples subjected to sequencing. Grab samples were only sequenced using the Illumina COVIDseq kit (Table 1).

Table 1.

SARS-CoV-2 sequencing outcomes summarized by sample processing methods and library preparation kits.

Processing\sequencing methods	Composite				Grab
	QIAseq DIRECT kit		Illumina COVIDseq kit		Illumina COVIDseq kit
	Genome breadth of coverage (Mean ± SD)	Number of samples sequenced	Genome breadth of coverage (Mean ± SD)	Number of samples sequenced	Genome breadth of coverage (Mean ± SD)	Number of samples sequenced
HA + Zymo	25.2 % ± 18.1 %	13	12.7 % ± 4.2 %	10	/	/
HA + glass	/	/	49.7 % ± 28.3 %	59	43.9 % ± 29.4 %	83
Nanotrap	16.9 % ± 14.3 %	14	28.9 % ± 20.0 %	5	/	/

Open in a new tab

Fig. 1 — Comparison of N1 gene copy numbers via RT-qPCR and SARS-CoV-2 genome breadth of coverage via sequencing in composite and grab sewage samples, using the QIAseq DIRECT kit and Illumina COVIDseq kits. The x-axis represents the N1 concentration in gene copies per liter of sewage sample (cp/L), and the y-axis represents for the breadth of coverage observed. The dashed line represents a genome breadth of coverage of 80 %. The dot colors indicate the sample processing methods, the sizes represent the sequencing depth of each sample, and the shapes indicate the library preparation kit used for sequencing. Grab samples with no detection SARS-CoV-2 are marked as not detected (ND) on the x-axis.

For all 184 sequenced RNA samples, an average of 2.0 M reads were obtained, but sequencing success was highly variable with an average of 2.0 M ± 0.7 M (mean ± SD) from the Illumina COVIDseq kit samples and 1.6 M ± 4.2 M from the QIAseq DIRECT kit samples. We noticed that the percentage of mapped reads differed by the kits, regardless of sample processing methods. For Illumina COVIDseq, an average of 9.1 % ± 14.6 % (mean ± SD) of total reads mapped to the reference genome, compared to an average of 88.3 % ± 9.6 % for QIAseq DIRECT. Despite the much lower reads mapping percentage, for the Illumina COVIDseq samples, the average final mapped reads number after primer trimming and deduplication was 58.8-fold higher than the QIAseq DIRECT ones. This suggested a high level of PCR duplication using the QIAseq DIRECT kit, which could be due to the low template concentration in wastewater samples and/or the kit's primer scheme, where multiple primer pairs are designed to target the same regions.

For sequencing outcomes, 23 samples (12.5 % of all sequenced samples) yielded genome breadth of coverage >80 %, of which three were > 90 %. One sample reached near-full genome breadth of coverage (99.2 %). However, the majority of samples had incomplete genome recoveries, e.g., 63.6 % (n = 117) were below 50 % breadth of coverage. Regarding the sequencing kits, for all Illumina COVIDseq samples, an average breadth of coverage of 43.6 % ± 29.1 % (mean ± SD) was observed at an average depth of 549×. Among these, the HA + glass composite samples had on average the highest breadth of coverage of 49.7 % ± 28.3 %, the HA + Zymo samples had a lower breadth of coverage of 12.7 % ± 4.2 % (all composite, Welch's t-test, p-value <6.09 × 10⁻¹⁴) and the Nanotrap method had a lower breadth of coverage of 28.9 % ± 20.0 % but is not statistically significant (all composite, Welch's t-test, p-value = 0.084) (Table 1). These different sequencing outcomes of the three method groups may indicate potential impacts from the processing methods. For the QIAseq DIRECT samples, a breadth of coverage of 20.9 % ± 16.5 % at a much higher average depth of 9910× was observed. This high depth was likely related to preferential amplification of some regions over others across the genome and was correlated with the high PCR duplication observed in samples run with this kit. It should be noted that the overall performance of these two sequencing kits were not comparable as there were no HA + glass method processed samples available for the QIAseq DIRECT kit. Overall, incomplete genome breadth of coverage was observed from most of the sequenced samples using the different combinations of sample processing and sequencing methods. Detailed sample information and sequencing outcome parameters are listed in Supplemental Dataset 1.

3.2. SARS-CoV-2 whole genome sequencing outcomes could be associated with sample processing methods

To understand the factors affecting SARS-CoV-2 genome recovery, we first evaluated multiple factors including sample types and WWTP reported wastewater sample parameters.

We observed that, when using the same sample processing and sequencing method (i.e., HA + glass, Illumina COVIDseq), our grab samples had better correlation between N1 concentration and breadth of coverage (n = 83, Spearman's rho = 0.697, p-value = 2.43 × 10⁻¹³) than composite samples (n = 59, Spearman's rho = 0.119, p-value = 0.367). These grab samples reached an average breadth of coverage of 43.9 % ± 29.4 % at an average depth of 727× ± 1656×. Sixteen percent of the grab samples were over 80 % breadth of coverage, among which the three highest concentration samples were >90 %, with one of them reached near-full-genome (>99 %) breadth of coverage (Fig. 1). Also, 24 % of the grab samples showed non-detection in RT-qPCR. For composite samples of the same processing methods, an average breadth of coverage of 49.7 % ± 28.3 % at an average depth of 433× ± 650× was obtained (Table 1). Seventeen percent of them had >80 % breadth of coverage, but none reached >90 %. No negative samples were detected in composite samples (Fig. 1). Our data indicated that these HA + glass samples' sequencing outcomes (breadth of coverage) were not significantly different in the two sample types when the SARS-CoV-2 concentration was between 10⁴ and 10⁶ cp/L (N1 assay, Welch's t-test, p-value = 0.07). However, all three grab samples >90 % breadth of coverage had high concentration (>10⁶ cp/L, N1 assay; Fig. 1), indicating higher concentration and/or sample types might be related to the near-full-genome breadth of coverage in these HA + glass samples. We note this observation may not be transferrable to samples processed using the other two methods (HA + Zymo and Nanotrap), and the higher genome breadth of coverage observed could be the result of a combination of factors, e.g., processing methods, sample types and target concentrations.

We also examined the potential impact on the sequencing outcomes from the wastewater samples' intrinsic features, including the SARS-CoV-2 concentration (cp/L, N1 assay), air temperature (°F), flow rate (MGD), suspended solids content (mg/L), BOD (mg/L), and ammonia nitrogen (mg N/L) (Supplemental Dataset 1). We collected all available metadata for the HA + glass method processed composite samples (n = 55) and examined their relationships with samples' genome breadth of coverage using both the Spearman's rank correlation test and random forest-based feature selection analysis (i.e., Boruta). Spearman's correlation analysis showed that among all the WWTP reported metadata, only the flow rate had a weak positive correlation yet not statistically significant with the breadth of coverage (Fig. S3A; Spearman's rho = 0.251, p-value = 0.064).

Random forest-based feature selection analysis (R package ‘Boruta’) was used to further evaluate the potential important variables in samples' metadata (e.g., flow rate, temperature, BOD, ammonia nitrogen and suspend solids contents) and samples processing, extraction and sequencing methods (i.e., library preparation kits) for composite samples' sequencing outcomes. Our results showed that only the processing methods (i.e., concentration and homogenization) and library preparation kits were considered important features, which exceeded the “Shadow Max” value that represents for the highest importance of the shadow features. The processing methods had much higher importance value than the shadow max value compared to the library prep kit, indicating the important roles of concentration and homogenization in contributing to sequencing outcomes (Fig. 2 ). Interestingly, the extraction method (i.e., silica column- or magnetic beads-based) was not considered important in this experiment setting by the machine learning method used (random forest-based feature selection), suggesting that in our scenario, the input RNA quality was most consequential in impacting sequencing outcomes. This is also consistent with the study from Ahmed et al., where their HA-filtration-based processing method showed higher SARS-CoV-2 N1 concentration than the Nanotrap-based method with silica column-based extraction kits used (Ahmed et al., 2023). In addition, feature selection analysis using only sample's intrinsic features (i.e., technical factors not included) confirmed none of these features was considered important (Fig. S3B). Overall, this analysis suggested that for composite samples in this study, the processing methods (concentration and homogenizing methods) are likely determining factors for genome breadth of coverage. Library preparation kit could also play a role; extraction method or sample intrinsic features is likely less influential.

Fig. 2 — Statistical assessment of impact of sampling and processing variables on genome breadth of coverage using all composite samples with available metadata. The “Shadow Min”, “Shadow Mean” and “Shadow Max” values indicate the minimal, average, and maximum Z score of a shadow feature decided by Boruta, respectively (Blue boxes). Features with an importance metric that exceeds the Shadow Max value are considered important (green boxes), and features with importance below the Shadow Max value are deemed not important (red boxes). In this analysis only processing method (i.e., concentration and homogenization) and library preparation kit are considered important by Boruta.

3.3. Potential mechanisms for processing methods impacting sequencing outcomes, elucidated by a synthetic RNA spike-in experiment

With the highlight of sample processing methods from the feature selection analysis and their overall different sequencing outcomes (Table 1, Fig. 2), we hypothesis that the intensity of processing methods could contribute to the sequencing outcomes, mainly that the yielded RNA quality could be influenced and thus impact the sequencing results. To understand the potential mechanism, we conducted a synthetic SARS-CoV-2 RNA experiment. We spiked about 10⁷ copies of the Twist SARS-CoV-2 synthetic RNA control into composite sewage samples and molecular biology grade water, respectively, and processed using HA + Zymo, HA + glass, and Nanotrap methods. The resulting samples were extracted, quantified, and sequenced for genome breadth of coverage evaluations. In parallel, we examined the patterns of total RNA sizes of available RNA extracts from the three processing methods on a Fragment Analyzer.

From RT-qPCR results (N1 assay), the Twist-spiked sample groups had higher N1 concentrations compared to their no spike-in pairs (Fig. 3 ). The Nanotrap samples spiked with Twist had on average 1.2 folds recovery of the no spike-in group, the HA + glass samples had on average 5.0 folds, and the HA + Zymo samples had on average 13.1 folds of the no spike-in group. This suggested all three processing methods recovered Twist RNA to some extent; the intensive HA + Zymo method had the highest capacity and the Nanotrap method had the least ability recovering the free Twist RNA. It is notable that all three methods had on average very low recovery of Twist RNA in samples; when comparing the total N1 concentration of the spiked samples to the original spiked-in Twist RNA amount, the HA + Zymo group had a total N1 comparable to 2.21 % of spiked amount, HA + glass had 0.17 %, and Nanotrap had only 0.05 %. Like our previous observation of N1 concentration in composite samples (Fig. S2), the HA + Zymo method had the highest N1 concentration in samples both with and without Twist RNA (Fig. 3), indicating its higher capacity of releasing and/or grabbing the total amount of N1 target from filtered membranes, likely through the complete filters' disintegration. This also agreed with the results demonstrating that samples processed with Zymo beads showed >5-fold higher N1 concentration than those processed glass bead (Fig. S1). Interestingly, although the HA + glass group had lower N1 concentration in samples compared to the HA + Zymo group, it showed 1.7 orders of magnitude higher N1 concentration in the positive controls (not filtered), being on average 19.2 % recovered of the original spike-in and was also the highest recovery of Twist RNA in all positive controls. This indicated that the two bead-beating methods behaved differently in recovering nucleic acids from different sample context, i.e., when from a mixture of sample's and free RNA or when just free RNA.

We then sequenced all the extracts from the synthetic RNA test (Fig. 4 ). As expected, the HA + Zymo method positive controls had much less genome breadth of coverage (47.6 % ± 14.6 %) compared to either the HA + glass or Nanotrap methods' positive controls (both >95 %). This near-full-genome breadth of coverage of the two positive controls also confirmed that the incomplete genome recovery in the other groups was not a deficiency of the spike-in synthetic RNA. Also, the mean coverage of the HA + Zymo group positive controls was 549×, while the HA + glass positive controls reached >15,300× and the Nanotrap group also reached 7800×. It is particularly worth noting that the Nanotrap group positive controls had >10 folds higher depth than the HA + Zymo group positives despite its much lower N1 concentration (i.e., more than one order of magnitude difference, Fig. 3). These observations supported our hypothesis that the HA + Zymo method, which is harsher in homogenizing the filters, yields RNA templates less suitable for sequencing compared to the other two groups despite its much higher N1 concentration. To confirm, we examined the fragmentation patterns of the total RNA from extracts derived from the three processing methods (Fig. S4). Because positive controls had too low total RNA concentrations for the fragment analyzer test, we used available raw samples extracted using the three methods. We observed that patterns of total RNA sizes differed in the processing methods. Samples in the HA + Zymo group were very fragmented despite of much higher total RNA concentrations (i.e., no clear bands on the gel pictures, or sharp peaks on the traces graphs). In contrast, some samples in the HA + glass and Nanotrap groups, particularly the two HA + glass samples from the Terrence J. O'Brien site, had clear bands in gel image (Fig. S4A; bands shown in dark color) and sharp peaks in traces at ~2000 bp (Fig. S4B; marked out in arrows), although their total RNA concentrations were lower than the HA + Zymo group. This confirmed the different fragmentation patterns of RNA yielded by the three methods. Taken together, these observations suggested that in the technical features assessed by our study, the extracted RNA fragment sizes, which differed in the three processing methods, could be a key factor impacting sequencing outcomes.

Regarding samples sequencing outcomes, all three sample groups with Twist spiked-in had overall 1.4 ± 0.3 (mean ± SD) folds higher genome breadth of coverage than their paired groups without Twist, and were of 6.5 ± 6.1 folds higher N1 concentration than their paired groups (Table S4), again indicating the spiked-in Twist RNA was recovered to some extent by all methods. The Nanotrap group had the lowest genome percentage recovered in samples compared to the other two, which was consistent with the qPCR results and could be due to the capacity of nucleic acid adsorption of this method. The HA + glass and HA + Zymo groups showed overall similar mean genome breadth of coverage in sewage samples with Twist (mean value 65.7 % and 70.8 %, respectively) from the two biological samples. However, considering that the HA + glass group had on average 12 times less N1 concentration than the HA + Zymo group (Fig. 3, Fig. 4; Table S4), this means that the less intensive sample processing method could contribute to a similar genome recovery at a much lower template concentration level (indicated by RT-qPCR), and again, suggests the inconsistency of qPCR quantification and sequencing genome recovery. This methods-caused inconsistency of qPCR and sequencing results was also supported by the total RNA size measurement, where the N1 concentration agreed with their total RNA concentrations despite of the quality of RNA extracted (Fig. S4A, Spearman's rho = 0.718, p-value =0.017); and that HA + Zymo group had the highest total RNA concentration. In addition, both the HA + glass and Nanotrap samples with Twist spike-in had lower genome breadth of coverage values compared to their positive controls, indicating that the sample matrix could have adverse effects in recovering the total RNA compared to when there was just free RNA in these methods. In contrast, in the HA + Zymo group, the Twist spike-in samples had 1.5 folds higher average genome breadth of coverage than the positive control samples, and 1.7 folds higher than the sewage only samples, indicating that the sample matrix and the way of processing could contribute to preserving the spiked-in free RNA in this more intensive method. More studies are needed to better understand how wastewater matrix could affect the quality of extracted RNA under different processing methods as a way of impacting sequencing outcomes.

4. Discussion

Wastewater epidemiology (WBE) has been demonstrated to be a useful and successful tool for surveillance of SARS-CoV-2 genetic signals in a community-wide scope (Bivins et al., 2020; Larsen and Wigginton, 2020; CDC, 2020a). The change in SARS-CoV-2 genetic signals overtime could show the overall local trend of COVID-19 infection, providing complementary public health information to clinical diagnostic tests (Bivins et al., 2020; Larsen and Wigginton, 2020; CDC, 2020a). The most widely adopted WBE as of today is RT-qPCR/ddPCR/dPCR where the level of detected SARS-CoV-2 could be reported (CDC, 2020a; CDC, 2020c). Further, with the applications of variant-specific assays, the quantification of a variant's presence has also been realized (Yu et al., 2022). While quantification-based methods provide useful information for existing SARS-CoV-2 variants that are already known to be circulating in a community, whole genome sequencing of SARS-CoV-2 from wastewater is attractive because it not only provides higher resolution of the known circulating variants, but also enables broad identification of novel mutations within the context of the whole genome, therefore contributes to understanding the viral evolution and/or transmission within the sewersheds. However, sequencing SARS-CoV-2 from wastewater matrix is challenging due to issues such as low RNA concentration, degraded RNA, and complex microbial background (Mercer and Salit, 2021; O'Reilly et al., 2020).

Our study provides evidence that the wastewater sample processing methods, including concentration and homogenization procedures, could impact the quality of RNA used for downstream sequencing and lead to an inconsistency in RT-qPCR measurement and sequencing outcomes. We picked the methods of HA filtration, which requires bead beating, and Nanotrap that has no intensive sample mixing, to explore the impacts of the intensity of processing methods on sequencing results. The wastewater samples' SARS-CoV-2 N1 gene concentrations were positively correlated with the extracted total RNA concentration and intensity of processing method (Figs. 3, S2, S4), indicating that the HA + Zymo method released more N1 gene than the HA + glass and Nanotrap methods, likely through its complete homogenization. However, the same processing method did not outperform, or was even worse than, the other two in sequencing outcomes (breadth of coverage) (Fig. 1, Fig. 4). This suggested an inconsistency between the RT-qPCR measurement of a single gene and the sequencing outcomes, which requires the amplification of ~400 bp amplicons across the whole genome (e.g., illumina COVIDseq). This inconsistency was also observed in our fragment analyzer experiment results, where the HA + Zymo group had the highest N1 and total RNA concentrations yet no clear evidence for detectable large piece RNA (Fig. S4). Therefore, RNA templates released from the intensive processing method (HA + Zymo) very likely have caused incomplete genome breadth of coverage.

Our results in composite samples sequencing indicated that the HA + glass method generated 17 % samples >80 % breadth of coverage. The less intensive beat beading method could have released RNA template of longer fragments, while the bead beating intensity along with the produced heat in the more intensive HA + Zymo method could lead to more fragmented RNA templates. We did observe that the three near-full breadth of coverage grab samples were associated with higher SARS-CoV-2 concentrations (> 10⁶ N1 cp/L) (HA + glass, Fig. 1). Here the intrinsic sample difference between composite and grab samples cannot be ignored (e.g., residence time), and the near-full genome recoveries in these samples could be a co-effect of sample types, target concentrations and proper processing methods. Additionally, the observed inconsistency of RT-qPCR and sequencing outcomes in the intensive method also indicated that selection of wastewater samples for sequencing should not solely depend on concentration quantified by RT-qPCR, as not enough information about the overall RNA quality is indicated.

To maximize efficacy of wastewater SARS-CoV-2 quantitative and genomic surveillance, a suitable sample processing method that balances the need for sufficient RNA template yield while not negatively impacting the obtained RNA quality (over-fragmentation) is essential. For example, automatic RNA extraction was reported to have better recovery for RNA extraction from the human influenza virus and respiratory syncytial virus compared to manual extraction (Yang et al., 2011). Such automatic extraction usually also incorporates sample processing steps (e.g., homogenization), and has already been adopted by a successful wastewater genome surveillance study (e.g., KingFisher Flex system) (Karthikeyan et al., 2022) as well as other quantitative surveillance studies (Palmer et al., 2021; Karthikeyan et al., 2021). Also, these automated systems are high-throughput, making them ideal for the surveillance purpose. Interestingly, in our study, the used RNA extraction kits were thought to not impact the sequencing outcomes (via Boruta's feature selection), suggesting the importance of input RNA templates from upstream sample processing. This agrees with Qiu et al.'s reports, where the authors have compared five commercial RNA isolation kits, including the three used in our study (QIAamp, MagMax, and PowerViral that shares the same reagents as the PowerMicrobiome kit tested by Qiu et al.), that these kits showed comparable recovery rate for their surrogate human coronavirus 229E (Qiu et al., 2022).

We also explored the impacts of intrinsic wastewater sample features on sequencing outcomes (e.g., flow rate, solids content, chemicals, temperature, nutrients; Fig. S3). Our results from both the random forest-based feature selection analysis and the Spearman's rank correlation analysis suggest that wastewater parameters monitored over the course of this study were mostly not significantly correlated to sequencing outcomes. This result agrees with Kevill et al.'s finding that turbidity and temperature of wastewater samples are not significant factors affecting SARS-CoV-2 genome recovery (Kevill et al., 2022). The flow rate showed a weak positive correlation with the sequencing breadth of coverage. Flow rate is generally negatively correlated with residence time (Nauman, 2003), therefore it is possible a higher flow rate could be correlated with better SARS-CoV-2 genome recovery in sequencing, if the viral concentration is not too diluted out (e.g., rainfall events) to impact sequencing outcomes. In addition, other environmental factors in the sewer environment, such as sewer biofilm that has been reported to contribute to the decay of human coronavirus (Shi et al., 2022), could also contribute to more RNA degradation when the residence time increases. The shorter residence time of the grab samples (<24 h) compared to the composite samples (1–3 days) could contribute to our near-full-genome sequencing results in the high N1 concentration grab samples as well (n = 3, >90 %; Fig. 1). Overall, these observations support the idea that longer residence time could cause decayed RNA in the wastewater samples, impacting the downstream RNA extraction and amplicon sequencing. For future studies of wastewater SARS-CoV-2 whole genome sequencing, reports of sample processing methods and metadata of sample concentration and WWTP reported parameters would be useful to further understand the impacts from both technical and environmental factors. Additional studies are also needed from multiple geographical regions to understand how environmental factors in sewers affect SARS-CoV-2 whole genome sequencing outcomes in wastewater samples.

The approach we use involved a combination of a random forest-based machine learning method (Boruta) and correlation analysis, and is well suited to datasets with many co-varying features. Environmental samples usually come with complex intrinsic features that could affect downstream sample processing; for example, characteristics of wastewater influent samples can vary on a day-to-day basis and among different WWTPs. This adds an additional layer of difficulty in analyzing the factors affecting the final sequencing outcomes. To understand the contributing factors from a variety of possibilities from wastewater intrinsic parameters to technical processing, an assessment of “all relevant” features (Kursa and Rudnicki, 2010), both numeric and categorical, is needed. Non-linear relationships and interactions between variables also need to be considered. Therefore, Boruta is a suitable application to select features. Our Boruta analysis provided strong evidence for the likely impact of intensity of sample processing methods on sequencing outcomes, and laid the foundation for future controlled optimization work (e.g., side-by-side methods comparisons using same samples) for SARS-CoV-2 wastewater whole genome sequencing.

The goal of this study was to identify dominant factors affecting SARS-CoV-2 whole genome sequencing outcomes in wastewater samples with complex environmental and technical factors. It is important to acknowledge the limitations of the outcomes and warranted future work. First, only a limited number of biological samples were used for validating the RNA fragmentation pattern from the three processing methods. Moreover, the 184 sequenced samples were not analyzed side-by-side using the three processing methods, and the sample numbers using each processing method were variable. This is mainly due to the practical limitations on manpower and facility/equipment availabilities throughout the project period, as the group has been dedicated simultaneously to both sequencing method optimization and genomic and quantitative surveillance in the Chicago area on a weekly basis. Second, our results strongly suggest that sample processing methods, rather than the several intrinsic sample features we tracked, have an important impact on sequencing outcomes. This conclusion is based on a statistical analysis (random forest-based feature selection), and is supported by our experimental observation that the sample processing methods could impact the RNA quality (i.e., fragmentation) for sequencing (Figs. 3, S4). Future work is warranted to more firmly establish a robust causal relationship between sample processing and sequencing outcomes via side-by-side comparison under controlled conditions (e.g., in the absence of variation in wastewater sample intrinsic features and with large sample size). Careful comparison of extraction kit efficacy should also be performed in future studies. Also, more replicates of sample extractions (e.g., triplicates) under controlled conditions could strengthen the conclusions obtained from our validation experiment and should be used in future studies.

5. Conclusions

In this study, we explored the impacts from sample types, wastewater sample intrinsic features, sample processing methods, and sequencing library preparation methods on wastewater SARS-CoV-2 amplicon sequencing outcomes. Using a statistical approach, we identified the sample processing method as a factor that likely plays major role impacting downstream incomplete genome recoveries and could contribute to the inconsistency of RT-qPCR concentrations and sequencing outcomes. Proper sample processing methods are important for sufficient and good quality RNA yield for downstream sequencing for wastewater SARS-CoV-2. More studies of different techniques under controlled conditions and wastewater metadata in other geographical regions are needed for standardizing successful whole genome sequencing in the field.

The following are the supplementary data related to this article.

Supplementary materials

mmc1.pdf^{(2.4MB, pdf)}

Supplemental Dataset 1

Detailed information of the 184 wastewater RNA samples sequenced in this study.

mmc2.xlsx^{(33.1KB, xlsx)}

CRediT authorship contribution statement

Shuchen Feng: Conceptualization, Data curation, Investigation, Methodology, Software, Formal analysis, Visualization, Writing – original draft, Writing – review & editing. Sarah M. Owens: Data curation, Investigation, Methodology, Resources. Abhilasha Shrestha: Data curation, Methodology, Resources. Rachel Poretsky: Funding acquisition, Project administration, Resources. Erica M. Hartmann: Conceptualization, Funding acquisition, Project administration, Supervision. George Wells: Conceptualization, Funding acquisition, Project administration, Supervision, Methodology, Writing – review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This study was funded by the Walder Foundation Chicago Coronavirus Assessment Network (Chicago CAN). We thank Dr. Ira Heimler at the Illinois Department of Public Health (IDPH) for performing timely sequencing work using the Illumina COVIDseq kit for this study. We thank Stephanie Greenwald at the Environmental Sample Preparation and Sequencing Facility, Argonne National Laboratory for assistance with the QIAseq DIRECT SARS-CoV-2 kit sequencing work and Fragment Analyzer analysis for this study.

Editor: Warish Ahmed

Data availability

Data has been uploaded in the "Attach file" step.

References

Ahmed W., Angel N., Edson J., et al. First confirmed detection of SARS-CoV-2 in untreated wastewater in Australia: a proof of concept for the wastewater surveillance of COVID-19 in the community. Sci. Total Environ. 2020;728 doi: 10.1016/j.scitotenv.2020.138764. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ahmed W., Bertsch P.M., Bibby K., et al. Decay of SARS-CoV-2 and surrogate murine hepatitis virus RNA in untreated wastewater to inform application in wastewater-based epidemiology. Environ. Res. 2020;191 doi: 10.1016/j.envres.2020.110092. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ahmed W., Bivins A., Korajkic A., Metcalfe S., Smith W.J.M., Simpson S.L. Comparative analysis of Adsorption-Extraction (AE) and Nanotrap® Magnetic Virus Particles (NMVP) workflows for the recovery of endogenous enveloped and non-enveloped viruses in wastewater. Sci. Total Environ. 2023;859 doi: 10.1016/j.scitotenv.2022.160072. [DOI] [PMC free article] [PubMed] [Google Scholar]
Amman F., Markt R., Endler L., et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nat. Biotechnol. 2022 doi: 10.1038/s41587-022-01387-y. Published online. [DOI] [PubMed] [Google Scholar]
Barbé L., Schaeffer J., Besnard A., et al. SARS-CoV-2 whole-genome sequencing using Oxford nanopore technology for variant monitoring in wastewaters. Front. Microbiol. 2022;13(June):1–14. doi: 10.2139/ssrn.4028274. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bar-Or I., Weil M., Indenbaum V., et al. Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel. Sci. Total Environ. 2021;789 doi: 10.1016/j.scitotenv.2021.148002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beattie R.E., Blackwood A.D., Clerkin T., Dinga C., Noble R.T. Evaluating the impact of sample storage, handling, and technical ability on the decay and recovery of SARS-CoV-2 in wastewater. PLoS One. 2022;17(6) doi: 10.1371/journal.pone.0270659. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bivins A., North D., Ahmad A., et al. Wastewater-based epidemiology: global collaborative to maximize contributions in the fight against COVID-19. Environ. Sci. Technol. 2020;54(13):7754–7757. doi: 10.1021/acs.est.0c02388. [DOI] [PubMed] [Google Scholar]
Castellano S., Cestari F., Faglioni G., et al. Ivar, an interpretation-oriented tool to manage the update and revision of variant annotation and classification. Genes (Basel) 2021;12(3) doi: 10.3390/genes12030384. [DOI] [PMC free article] [PubMed] [Google Scholar]
CDC National Wastewater Surveillance System: a new public health tool to understand COVID-19’s spread in a community. 2020. https://www.cdc.gov/healthywater/surveillance/wastewater-surveillance/wastewater-surveillance.html
CDC . Vol 3. 2020. Real-Time RT-PCR Diagnostic Panel for Emergency Use Only.https://www.fda.gov/media/134922/download [Google Scholar]
CDC Wastewater Surveillance Testing Methods. 2020. https://www.cdc.gov/healthywater/surveillance/wastewater-surveillance/testing-methods.html
CDC Guidance for Reporting SARS-CoV-2 Sequencing Results. 2021. https://www.cdc.gov/coronavirus/2019-ncov/lab/resources/reporting-sequencing-guidance.html
Crits-Christoph A., Kantor R.S., Olm M.R., et al. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. MBio. 2021;12(1):1–9. doi: 10.1128/mBio.02703-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Feng S., Roguet A., McClary-Gutierrez J.S., et al. Evaluation of sampling, analysis, and normalization methods for SARS-CoV-2 concentrations in wastewater to assess COVID-19 burdens in Wisconsin communities. ACS ES&T Water. 2021;1(8):1955–1965. doi: 10.1021/acsestwater.1c00160. [DOI] [Google Scholar]
Fontenele R.S., Kraberger S., Hadfield J., et al. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants. Water Res. 2021:205. doi: 10.1016/j.watres.2021.117710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Forés E., Bofill-Mas S., Itarte M., et al. Evaluation of two rapid ultrafiltration-based methods for SARS-CoV-2 concentration from wastewater. Sci. Total Environ. 2021;768 doi: 10.1016/j.scitotenv.2020.144786. [DOI] [PMC free article] [PubMed] [Google Scholar]
Graham K.E., Loeb S.K., Wolfe M.K., et al. SARS-CoV-2 RNA in wastewater settled solids is associated with COVID-19 cases in a large urban sewershed. Environ. Sci. Technol. 2021;55(1):488–498. doi: 10.1021/acs.est.0c06191. [DOI] [PubMed] [Google Scholar]
Izquierdo-Lara R., Elsinga G., Heijnen L., et al. Monitoring SARS-CoV-2 circulation and diversity through community wastewater sequencing, the Netherlands and Belgium. Emerg. Infect. Dis. 2021;27(5):1405–1415. doi: 10.3201/eid2705.204410. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jahn K., Dreifuss D., Topolsky I., et al. Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC. Nat. Microbiol. 2022;7(8):1151–1160. doi: 10.1038/s41564-022-01185-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karthikeyan S., Nguyen A., et al. McDonald D. Rapid, large-scale wastewater surveillance and automated reporting system enable early detection of nearly 85% of COVID-19 cases on a university campus. mSystems. 2021;6(4) doi: 10.1128/mSystems.00793-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Karthikeyan S., Levy J.I., De Hoff P., et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature. 2022 doi: 10.1038/s41586-022-05049-6. Published online. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kevill J.L., Pellett C., Farkas K., et al. A comparison of precipitation and filtration-based SARS-CoV-2 recovery methods and the influence of temperature, turbidity, and surfactant load in urban wastewater. Sci. Total Environ. 2022;808(January) doi: 10.1016/j.scitotenv.2021.151916. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kursa M.B., Rudnicki W.R. Feature selection with the boruta package. J. Stat. Softw. 2010;36(11):1–13. doi: 10.18637/jss.v036.i11. [DOI] [Google Scholar]
Larsen D.A., Wigginton K.R. Tracking COVID-19 with wastewater. Nat. Biotechnol. 2020;38(10):1151–1153. doi: 10.1038/s41587-020-0690-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs With BWA-MEM. 00(00) 2013. pp. 1–3.http://arxiv.org/abs/1303.3997 [Google Scholar]
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
Medema G., Heijnen L., Elsinga G., Italiaander R., Brouwer A. Presence of SARS-Coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in the Netherlands. Environ. Sci. Technol. Lett. 2020 doi: 10.1021/acs.estlett.0c00357. Published online. [DOI] [PubMed] [Google Scholar]
Mercer T.R., Salit M. Testing at scale during the COVID-19 pandemic. Nat. Rev. Genet. 2021;22(7):415–426. doi: 10.1038/s41576-021-00360-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
Nauman E.B. In: Handbook of Industrial Mixing: Science and Practice. Paul Edward L., Atiemo-Obeng SMK Victor A., editors. 2003. pp. 1–17. [DOI] [Google Scholar]
Nemudryi A., Nemudraia A., Wiegand T., et al. Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater. Cell Rep.Med. 2020;1(6) doi: 10.1016/j.xcrm.2020.100098. [DOI] [PMC free article] [PubMed] [Google Scholar]
O'Reilly K.M., Allen D.J., Fine P., Asghar H. The challenges of informative wastewater sampling for SARS-CoV-2 must be met: lessons from polio eradication. Lancet Microbe. 2020;1(5):e189–e190. doi: 10.1016/S2666-5247(20)30100-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
Owen C., Wright-Foulkes D., Alvarez P., et al. Reduction and discharge of SARS-CoV-2 RNA in Chicago-area water reclamation plants. FEMS Microbes. 2022;3 doi: 10.1093/femsmc/xtac015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Palmer E.J., Maestre J.P., Jarma D., et al. Development of a reproducible method for monitoring SARS-CoV-2 in wastewater. Sci. Total Environ. 2021;799(149405) doi: 10.1016/j.scitotenv.2021.149405. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu Y., Yu J., Pabbaraju K., et al. Validating and optimizing the method for molecular detection and quantification of SARS-CoV-2 in wastewater. Sci. Total Environ. 2022;812 doi: 10.1016/j.scitotenv.2021.151434. [DOI] [PMC free article] [PubMed] [Google Scholar]
R Core Team . Vol 2. 2021. R: A Language and Environment for Statistical Computing.https://www.r-project.org/ [Google Scholar]
Shi J., Li X., Zhang S., et al. Enhanced decay of coronaviruses in sewers with domestic wastewater. Sci. Total Environ. 2022;813 doi: 10.1016/j.scitotenv.2021.151919. [DOI] [PMC free article] [PubMed] [Google Scholar]
Smyth D.S., Trujillo M., Gregory D.A., et al. Tracking cryptic SARS-CoV-2 lineages detected in NYC wastewater. Nat. Commun. 2022;13(1):635. doi: 10.1038/s41467-022-28246-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Syed A.M., Taha T.Y., Tabata T., et al. Rapid assessment of SARS-CoV-2–evolved variants using virus-like particles. Science. 2021;374(6575):1626–1632. doi: 10.1126/science.abl6184. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van der Auwera G.A., O’Connor B.D. 1st edition. O’Reilly Media; 2020. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. [Google Scholar]
Wu F., Zhang J., Xiao A., et al. SARS-CoV-2 titers in wastewater are higher than expected from clinically confirmed cases. mSystems. 2020;5(4) doi: 10.1128/mSystems.00614-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wurtzer S., Levert M., Dhenain E., et al. From alpha to omicron BA.2: new digital RT-PCR approach and challenges for SARS-CoV-2 VOC monitoring and normalization of variant dynamics in wastewater. Sci. Total Environ. 2022;848(157740) doi: 10.1016/j.scitotenv.2022.157740. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang G., Erdman D.E., Kodani M., Kools J., Bowen M.D., Fields B.S. Comparison of commercial systems for extraction of nucleic acids from DNA/RNA respiratory pathogens. J. Virol. Methods. 2011;171(1):195–199. doi: 10.1016/j.jviromet.2010.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu Alexander T., Hughes Bridgette, Wolfe Marlene K., Leon Tomas, Duong Dorothea, Rabe Angela, Kennedy Lauren C., Ravuri Sindhu, White Bradley J., Wigginton Krista R., Boehm Alexandria B., Vugia D.J. Estimating relative abundance of 2 SARS-CoV-2 variants through wastewater surveillance at 2 large metropolitan sites, United States. Emerg. Infect. Dis. 2022;28(5):940–947. doi: 10.3201/eid2805.212488. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary materials

mmc1.pdf^{(2.4MB, pdf)}

Supplemental Dataset 1

Detailed information of the 184 wastewater RNA samples sequenced in this study.

mmc2.xlsx^{(33.1KB, xlsx)}

Data Availability Statement

Data has been uploaded in the "Attach file" step.

[bb0005] Ahmed W., Angel N., Edson J., et al. First confirmed detection of SARS-CoV-2 in untreated wastewater in Australia: a proof of concept for the wastewater surveillance of COVID-19 in the community. Sci. Total Environ. 2020;728 doi: 10.1016/j.scitotenv.2020.138764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0010] Ahmed W., Bertsch P.M., Bibby K., et al. Decay of SARS-CoV-2 and surrogate murine hepatitis virus RNA in untreated wastewater to inform application in wastewater-based epidemiology. Environ. Res. 2020;191 doi: 10.1016/j.envres.2020.110092. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0015] Ahmed W., Bivins A., Korajkic A., Metcalfe S., Smith W.J.M., Simpson S.L. Comparative analysis of Adsorption-Extraction (AE) and Nanotrap® Magnetic Virus Particles (NMVP) workflows for the recovery of endogenous enveloped and non-enveloped viruses in wastewater. Sci. Total Environ. 2023;859 doi: 10.1016/j.scitotenv.2022.160072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0020] Amman F., Markt R., Endler L., et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nat. Biotechnol. 2022 doi: 10.1038/s41587-022-01387-y. Published online. [DOI] [PubMed] [Google Scholar]

[bb0025] Barbé L., Schaeffer J., Besnard A., et al. SARS-CoV-2 whole-genome sequencing using Oxford nanopore technology for variant monitoring in wastewaters. Front. Microbiol. 2022;13(June):1–14. doi: 10.2139/ssrn.4028274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0030] Bar-Or I., Weil M., Indenbaum V., et al. Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel. Sci. Total Environ. 2021;789 doi: 10.1016/j.scitotenv.2021.148002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0035] Beattie R.E., Blackwood A.D., Clerkin T., Dinga C., Noble R.T. Evaluating the impact of sample storage, handling, and technical ability on the decay and recovery of SARS-CoV-2 in wastewater. PLoS One. 2022;17(6) doi: 10.1371/journal.pone.0270659. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0040] Bivins A., North D., Ahmad A., et al. Wastewater-based epidemiology: global collaborative to maximize contributions in the fight against COVID-19. Environ. Sci. Technol. 2020;54(13):7754–7757. doi: 10.1021/acs.est.0c02388. [DOI] [PubMed] [Google Scholar]

[bb0045] Castellano S., Cestari F., Faglioni G., et al. Ivar, an interpretation-oriented tool to manage the update and revision of variant annotation and classification. Genes (Basel) 2021;12(3) doi: 10.3390/genes12030384. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0065] CDC National Wastewater Surveillance System: a new public health tool to understand COVID-19’s spread in a community. 2020. https://www.cdc.gov/healthywater/surveillance/wastewater-surveillance/wastewater-surveillance.html

[bb0050] CDC . Vol 3. 2020. Real-Time RT-PCR Diagnostic Panel for Emergency Use Only.https://www.fda.gov/media/134922/download [Google Scholar]

[bb0070] CDC Wastewater Surveillance Testing Methods. 2020. https://www.cdc.gov/healthywater/surveillance/wastewater-surveillance/testing-methods.html

[bb0060] CDC Guidance for Reporting SARS-CoV-2 Sequencing Results. 2021. https://www.cdc.gov/coronavirus/2019-ncov/lab/resources/reporting-sequencing-guidance.html

[bb0075] Crits-Christoph A., Kantor R.S., Olm M.R., et al. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. MBio. 2021;12(1):1–9. doi: 10.1128/mBio.02703-20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0085] Feng S., Roguet A., McClary-Gutierrez J.S., et al. Evaluation of sampling, analysis, and normalization methods for SARS-CoV-2 concentrations in wastewater to assess COVID-19 burdens in Wisconsin communities. ACS ES&T Water. 2021;1(8):1955–1965. doi: 10.1021/acsestwater.1c00160. [DOI] [Google Scholar]

[bb0090] Fontenele R.S., Kraberger S., Hadfield J., et al. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants. Water Res. 2021:205. doi: 10.1016/j.watres.2021.117710. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0095] Forés E., Bofill-Mas S., Itarte M., et al. Evaluation of two rapid ultrafiltration-based methods for SARS-CoV-2 concentration from wastewater. Sci. Total Environ. 2021;768 doi: 10.1016/j.scitotenv.2020.144786. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0100] Graham K.E., Loeb S.K., Wolfe M.K., et al. SARS-CoV-2 RNA in wastewater settled solids is associated with COVID-19 cases in a large urban sewershed. Environ. Sci. Technol. 2021;55(1):488–498. doi: 10.1021/acs.est.0c06191. [DOI] [PubMed] [Google Scholar]

[bb0105] Izquierdo-Lara R., Elsinga G., Heijnen L., et al. Monitoring SARS-CoV-2 circulation and diversity through community wastewater sequencing, the Netherlands and Belgium. Emerg. Infect. Dis. 2021;27(5):1405–1415. doi: 10.3201/eid2705.204410. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0110] Jahn K., Dreifuss D., Topolsky I., et al. Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC. Nat. Microbiol. 2022;7(8):1151–1160. doi: 10.1038/s41564-022-01185-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0115] Karthikeyan S., Nguyen A., et al. McDonald D. Rapid, large-scale wastewater surveillance and automated reporting system enable early detection of nearly 85% of COVID-19 cases on a university campus. mSystems. 2021;6(4) doi: 10.1128/mSystems.00793-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0120] Karthikeyan S., Levy J.I., De Hoff P., et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature. 2022 doi: 10.1038/s41586-022-05049-6. Published online. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0125] Kevill J.L., Pellett C., Farkas K., et al. A comparison of precipitation and filtration-based SARS-CoV-2 recovery methods and the influence of temperature, turbidity, and surfactant load in urban wastewater. Sci. Total Environ. 2022;808(January) doi: 10.1016/j.scitotenv.2021.151916. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0130] Kursa M.B., Rudnicki W.R. Feature selection with the boruta package. J. Stat. Softw. 2010;36(11):1–13. doi: 10.18637/jss.v036.i11. [DOI] [Google Scholar]

[bb0135] Larsen D.A., Wigginton K.R. Tracking COVID-19 with wastewater. Nat. Biotechnol. 2020;38(10):1151–1153. doi: 10.1038/s41587-020-0690-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0140] Li H. Aligning Sequence Reads, Clone Sequences and Assembly Contigs With BWA-MEM. 00(00) 2013. pp. 1–3.http://arxiv.org/abs/1303.3997 [Google Scholar]

[bb0145] Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal. 2011;17(1):10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[bb0150] Medema G., Heijnen L., Elsinga G., Italiaander R., Brouwer A. Presence of SARS-Coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in the Netherlands. Environ. Sci. Technol. Lett. 2020 doi: 10.1021/acs.estlett.0c00357. Published online. [DOI] [PubMed] [Google Scholar]

[bb0155] Mercer T.R., Salit M. Testing at scale during the COVID-19 pandemic. Nat. Rev. Genet. 2021;22(7):415–426. doi: 10.1038/s41576-021-00360-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0160] Nauman E.B. In: Handbook of Industrial Mixing: Science and Practice. Paul Edward L., Atiemo-Obeng SMK Victor A., editors. 2003. pp. 1–17. [DOI] [Google Scholar]

[bb0165] Nemudryi A., Nemudraia A., Wiegand T., et al. Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater. Cell Rep.Med. 2020;1(6) doi: 10.1016/j.xcrm.2020.100098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0170] O'Reilly K.M., Allen D.J., Fine P., Asghar H. The challenges of informative wastewater sampling for SARS-CoV-2 must be met: lessons from polio eradication. Lancet Microbe. 2020;1(5):e189–e190. doi: 10.1016/S2666-5247(20)30100-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0175] Owen C., Wright-Foulkes D., Alvarez P., et al. Reduction and discharge of SARS-CoV-2 RNA in Chicago-area water reclamation plants. FEMS Microbes. 2022;3 doi: 10.1093/femsmc/xtac015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0180] Palmer E.J., Maestre J.P., Jarma D., et al. Development of a reproducible method for monitoring SARS-CoV-2 in wastewater. Sci. Total Environ. 2021;799(149405) doi: 10.1016/j.scitotenv.2021.149405. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0185] Qiu Y., Yu J., Pabbaraju K., et al. Validating and optimizing the method for molecular detection and quantification of SARS-CoV-2 in wastewater. Sci. Total Environ. 2022;812 doi: 10.1016/j.scitotenv.2021.151434. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0190] R Core Team . Vol 2. 2021. R: A Language and Environment for Statistical Computing.https://www.r-project.org/ [Google Scholar]

[bb0195] Shi J., Li X., Zhang S., et al. Enhanced decay of coronaviruses in sewers with domestic wastewater. Sci. Total Environ. 2022;813 doi: 10.1016/j.scitotenv.2021.151919. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0200] Smyth D.S., Trujillo M., Gregory D.A., et al. Tracking cryptic SARS-CoV-2 lineages detected in NYC wastewater. Nat. Commun. 2022;13(1):635. doi: 10.1038/s41467-022-28246-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0205] Syed A.M., Taha T.Y., Tabata T., et al. Rapid assessment of SARS-CoV-2–evolved variants using virus-like particles. Science. 2021;374(6575):1626–1632. doi: 10.1126/science.abl6184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0080] Van der Auwera G.A., O’Connor B.D. 1st edition. O’Reilly Media; 2020. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. [Google Scholar]

[bb0210] Wu F., Zhang J., Xiao A., et al. SARS-CoV-2 titers in wastewater are higher than expected from clinically confirmed cases. mSystems. 2020;5(4) doi: 10.1128/mSystems.00614-20. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0215] Wurtzer S., Levert M., Dhenain E., et al. From alpha to omicron BA.2: new digital RT-PCR approach and challenges for SARS-CoV-2 VOC monitoring and normalization of variant dynamics in wastewater. Sci. Total Environ. 2022;848(157740) doi: 10.1016/j.scitotenv.2022.157740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0220] Yang G., Erdman D.E., Kodani M., Kools J., Bowen M.D., Fields B.S. Comparison of commercial systems for extraction of nucleic acids from DNA/RNA respiratory pathogens. J. Virol. Methods. 2011;171(1):195–199. doi: 10.1016/j.jviromet.2010.10.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bb0225] Yu Alexander T., Hughes Bridgette, Wolfe Marlene K., Leon Tomas, Duong Dorothea, Rabe Angela, Kennedy Lauren C., Ravuri Sindhu, White Bradley J., Wigginton Krista R., Boehm Alexandria B., Vugia D.J. Estimating relative abundance of 2 SARS-CoV-2 variants through wastewater surveillance at 2 large metropolitan sites, United States. Emerg. Infect. Dis. 2022;28(5):940–947. doi: 10.3201/eid2805.212488. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Intensity of sample processing methods impacts wastewater SARS-CoV-2 whole genome amplicon sequencing outcomes

Shuchen Feng

Sarah M Owens

Abhilasha Shrestha

Rachel Poretsky

Erica M Hartmann

George Wells

Abstract

Graphical abstract

1. Introduction

2. Material and methods

2.1. Sample collection and processing

2.2. Nucleic acid extraction

2.3. Synthetic SAR-CoV-2 RNA control testing

2.4. Quantification of SARS-CoV-2 concentration

2.5. Library preparation and sequencing

2.6. Metadata collection

2.7. Sequencing data analysis

2.8. Statistical analysis

2.9. Sequencing data access

3. Results

3.1. Incomplete SARS-CoV-2 genomes were recovered from wastewater samples using three sample processing methods and two sequencing library preparation kits

Table 1.

Fig. 1.

3.2. SARS-CoV-2 whole genome sequencing outcomes could be associated with sample processing methods

Fig. 2.

3.3. Potential mechanisms for processing methods impacting sequencing outcomes, elucidated by a synthetic RNA spike-in experiment

Fig. 3.

Fig. 4.

4. Discussion

5. Conclusions

CRediT authorship contribution statement

Declaration of competing interest

Acknowledgement

Data availability

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases