Skip to main content
. 2022 Apr 14;17(4):e0266407. doi: 10.1371/journal.pone.0266407

Wastewater surveillance of SARS-CoV-2 mutational profiles at a university and its surrounding community reveals a 20G outbreak on campus

Candice L Swift 1, Mirza Isanovic 1, Karlen E Correa Velez 1, Sarah C Sellers 1, R Sean Norman 1,*
Editor: Theodore Raymond Muth2
PMCID: PMC9009614  PMID: 35421164


Wastewater surveillance of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been leveraged during the Coronavirus Disease 2019 (COVID-19) pandemic as a public health tool at the community and building level. In this study, we compare the sequence diversity of SARS-CoV-2 amplified from wastewater influent to the Columbia, South Carolina, metropolitan wastewater treatment plant (WWTP) and the University of South Carolina campus during September 2020, which represents the peak of COVID-19 cases at the university during 2020. A total of 92 unique mutations were detected across all WWTP influent and campus samples, with the highest frequency mutations corresponding to the SARS-CoV-2 20C and 20G clades. Signature mutations for the 20G clade dominated SARS-CoV-2 sequences amplified from localized wastewater samples collected at the University of South Carolina, suggesting that the peak in COVID-19 cases during early September 2020 was caused by an outbreak of the 20G lineage. Thirteen mutations were shared between the university building-level wastewater samples and the WWTP influent collected in September 2020, 62% of which were nonsynonymous substitutions. Co-occurrence of mutations was used as a similarity metric to compare wastewater samples. Three pairs of mutations co-occurred in university wastewater and WWTP influent during September 2020. Thirty percent of the detected mutations, including 12 pairs of concurrent mutations, were only detected in university samples. This report affirms the close relationship between the prevalent SARS-CoV-2 genotypes of the student population at a university campus and those of the surrounding community. However, this study also suggests that wastewater surveillance at the building-level at a university offers important insight by capturing sequence diversity that was not apparent in the WWTP influent, thus offering a balance between the community-level wastewater and clinical sequencing.


Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the causative agent for the Coronavirus Disease 2019 (COVID-19) pandemic. Since SARS-CoV-2 ribonucleic acid (RNA) was first detected in the feces of infected individuals [1], its presence has been confirmed in the wastewater of many countries [24]. SARS-CoV-2 wastewater surveillance offers many benefits, including, but not limited to, early detection [5], the ability to monitor infection trends separately from clinical data [6], and data that is independent of healthcare access or choices [6]. Collaborations between wastewater surveillance research teams and policymakers have resulted in effective public health actions [7].

Students at universities are at risk for SARS-CoV-2 infection due to factors such as living in high-density facilities (dormitories) in close contact with others. A study of 16,101 university students from Fall 2020 to Spring 2021 demonstrated that although 84% of student were protected from SARS-CoV-2 infection, 16% of students remained susceptible to infection and reinfection occurred in 2.2% of the previously infected student population during the period 12 to 30 weeks after initial infection. [8], prior to the availability of vaccines and the emergence of the highly transmissible delta variant [9] of SARS-CoV-2. Localized wastewater sampling at universities has been used across the United States as a disease mitigation strategy [1012]. Since wastewater trends can precede clinical data by as much as a week [7], administrative officials can take action quickly. Disease mitigation strategies can include increased COVID-19 testing at specific buildings, also called surge testing [12]. Monitoring at the building level on a college campus has been reported as a highly sensitive method capable of detecting a single asymptomatic student amidst 150–200 individuals [12].

The University of South Carolina serves 27,502 undergraduate students during the academic year [13], representing 21% of the population of Columbia, South Carolina (131,674 as of July 2021 [14]), or 8% of the greater Columbia metropolitan area served by the Columbia wastewater treatment plant. Although it is anticipated that the influx of students at the start of the fall semester would increase transmission of SARS-CoV-2, both due to the increase in population as well as the input of potentially more infectious viral genotypes from other states and countries, the impact of the student population on the community in terms of the SARS-CoV-2 sequence diversity has not been shown in wastewater data. However, the increase in transmission due to the influx of students has been demonstrated by clinical data [15, 16] and predicted by modelling [17] in other communities. We hypothesized that there would be substantial overlap in the observed mutations between wastewater samples collected from the university and those collected from the neighboring WWTP influent, which serves the greater Columbia metropolitan area and the university.

In this work, we compare the sequence diversity of SARS-CoV-2 in wastewater collected from four sites across the University of South Carolina (UofSC) campus in September 2020 and the influent to the Columbia wastewater treatment plant (WWTP), which serves approximately 363,714 individuals in Columbia, SC (based on the United States 2020 Census tabulated by ZIP code), including the UofSC campus, from July-September 2020. The University of South Carolina partially resumed in-person instruction on August 20, 2020. COVID-19 cases of isolation (individuals who tested positive for COVID-19) or quarantine (individuals with close contact to a confirmed case of COVID-19) for the university peaked at 258 from August 30 to September 1 (Fig 1). The results presented here imply that disease mitigation strategies adopted by a university can impact the community at large.

Fig 1. Histogram of COVID-19 cases of exposure requiring quarantine or isolation at the start of the 2020 academic year.

Fig 1

Each date represents the inclusive end date of a two-day interval.

Methods and materials

Wastewater sampling

The Columbia WWTP is a secondary (activated sludge) WWTP that treats municipal wastewater from a population of 363,714 (based on the United States 2020 Census data tabulated by ZIP code [18]) with 6% of total flow permitted from industry. The monthly average flow of the Columbia WWTP is 45 million gallons per day (MGD). One liter 24-hour composite wastewater samples were collected using an ISCO refrigerated autosampler (Lincoln, NE) twice a week at the influent site of the Columbia WWTP. Samples for the University of South Carolina buildings were 0.3 L grab samples of raw wastewater collected between 8:30 and 10:00 AM from the sites marked on Fig 2.

Fig 2. Locations of wastewater sampling at the University of South Carolina within the greater Columbia metropolitan area.

Fig 2

Figure was rendered using ArcGIS Pro. Green area represents the ZIP codes within the greater Columbia metropolitan area. Black outlined region represents the sanitation sewage management area. Red dots represent the sampling sites within the University of South Carolina campus. Numbered sites not included in this study are not shown.

Columbia WWTP influent samples and university samples were processed separately. One mL of bovine respiratory syncytial virus (BRSV) vaccine (~80 million copies/mL) (INFORCE 3®) was added to one liter of wastewater prior to concentration in order to quantify processing and viral extraction efficiency. The average BRSV viral recovery was 4–5% for WWTP influent samples and 5–8% for university samples. Both influent and university samples were homogenized for 10 min using laboratory blenders. 50 mL (university samples) or 250 mL (WWTP influent samples) of homogenized wastewater was decanted into VWR 50 mL Falcon tubes (university samples) or centrifuge bottles (WWTP influent samples) and were centrifuged using an Avanti® J-E Centrifuge (Beckman Coulter Lifesciences, Indianapolis, Indiana) with a JS-5.3 rotor for 20 min (university samples) or 30 min (WWTP influent samples) at 4,577 g without braking. 50 mL of each supernatant was concentrated to 400 μL using Milipore Amicon 30 kDa ultrafilters (Burlington, MA).

RNA extraction, library preparation and sequencing

RNA was extracted from 200 μL of the concentrated supernatant using the Qiagen AllPrep PowerViral extraction kit (Hilden, Germany) per the manufacturer’s instructions, eluted into 51 μL of RNase-free water, and stored at -80 ˚C until library preparation. Sequencing libraries were prepared following the Oxford Nanopore Technologies (ONT) PCR tiling of SARS-CoV-2 with native barcoding protocol, which is based on the protocol developed by the ARTIC network [19]. The ONT Native Barcoding Expansion 96 (EXP-NBD196) was used. Samples were separated into two different library preparations: July/August 2020 samples and September 2020 samples. Total RNA was transcribed into cDNA using the LunaScript® RT SuperMix Kit (New England Biolabs, Ipswich, MA). The resulting products were amplified by 40 cycles of PCR using two different primer pools (V3 design) to create ~400 bp amplicons spanning the entire SARS-CoV-2 genome. The PCR products were pooled and purified using a 1:1 ratio of SPRISelect beads (Beckman Coulter Lifesciences, Indianapolis, IN). The PCR products were then end-prepped using the NEBnext® UltraTM II End Repair/dA-Tailing Module (New England Biolabs, Ipswich, MA). Sequencing barcodes and adapters (Oxford Nanopore Technologies, Oxford, UK) were sequentially ligated, and all remaining bead cleanups were performed using SPRIselect beads (Beckman Coulter Lifesciences, Indianapolis, IN). The final libraries were loaded onto two separate R9.4.1 flow cells (Oxford Nanopore Technologies, Oxford, UK) and sequenced using a GridION X5. Columbia WWTP influent samples and University of South Carolina campus wastewater samples from September 2020 (S1 and S2 Tables) were sequenced together on an R9.4.1 flow cell. 7.3 M reads (3.9 Gb) were sequenced with a mean read quality score of 11.1 and a mean read length of 531.6 bp. Columbia WWTP influent samples from July and August 2020 (S1 Table) were sequenced on a separate R9.4.1 flow cell. 3.6 M reads (1.7 Gb) were sequenced with a mean read quality score of 11.2 and a mean read length of 477 bp.

Data processing

Sequencing data processing was performed according to the ARTIC network “nCoV-2019 novel coronavirus bioinformatics protocol” [20]. Basecalling and demultiplexing were performed within MinKNOW using the high-accuracy model of Guppy version 4.2.3 developed by Oxford Nanopore Technologies. The minimum barcode score was set to 40 and the dual barcoding option was applied. Reads were filtered using a Qscore threshold of 7 and reads outside of the length range of 400–700 bp were omitted to eliminate chimeric reads. Lastly, filtered reads were mapped to the SARS-CoV-2 genome (accession MN908947.3) using minimap [21] within the artic minion command with the V3 primer scheme, filtered aggregate FASTQ file, and FAST5 directory as input and the normalization option enabled (—normalize 200). Variant calls made by nanopolish were also output from the artic minion pipeline for positions with at least 20× sequencing depth. Mutations identified within primer-binding regions were not considered.

Principal component analysis (PCA) and analysis of variance (ANOVA)

Principal component analysis (PCA) was conducted following the precedent established by Fontenele and colleagues [22]. Briefly, a genotype for each sample was established by recording the nucleotide frequency at each position in the SARS-CoV-2 reference genome using the Python utility pysamstats. Similarity indices between all pairwise combinations of samples were calculated per Yue and Clayton [23]. The sum of the indices across all positions for all sample pairs was used to construct a distance matrix. The R [24] package prcomp was used to construct a PCA object that was subsequently visualized with the package ggbiplots. One-way analysis of variance (ANOVA) and Tukey’s test were conducted in R using the aov() and TukeyHSD() functions, respectively.

Co-occurrence analysis

Mutational co-occurrence was calculated using the R package cooccur [25], which is a probabilistic model originally developed to analyze species co-occurrence in ecology, but which is broadly applicable to detect statistically significant co-occurrence patterns in other fields [26]. The input data frame to the cooccur function consisted of rows representing the presence or absence (one or zero as values) of all mutations detected by nanopolish at positions with greater than 20× sequencing depth with each wastewater sample represented by a column. Only wastewater samples with at least 50% SARS-CoV-2 genome coverage were included (see S1 and S2 Tables). Concurrent mutations were validated against clinical data using the Global Evaluation of SARS-CoV-2/nCoV-19 Sequences (GESS) database [27] to infer whether the mutations might have co-occurred in the same genome.

Results and discussion

To gain insight into the influence of a college campus on the surrounding community during the COVID-19 pandemic, SARS-CoV-2 was amplified from both university building-level wastewater and the Columbia metropolitan wastewater treatment plant (WWTP) influent. The WWTP influent was sampled from July to September 2020. University wastewater surveillance began on August 14, 2020, during the week preceding the academic term. Mutations from the SARS-CoV-2 reference genome (accession MN908947.3) and their relative positions in the SARS-CoV-2 genome that were detected in both university wastewater and WWTP influent are depicted in Fig 3, with a table of the corresponding amino acid substitutions for the nonsynonymous mutations.

Fig 3. Mutational profiles of SARS-CoV-2 amplified from University of South Carolina (UofSC) building wastewater and Columbia metropolitan wastewater treatment plant (WWTP) influent.

Fig 3

(A) Location of mutations detected in university wastewater samples and Columbia WWTP influent during September 2020. Dot Jitter was added for clarity to arbitrary y-axis values. (B) Substitutions identified in both university and Columbia WWTP influent. Only nonsynonymous mutations located in coding regions are shown in the table.

Principal component analysis of SARS-CoV-2 sequence diversity in Columbia wastewater treatment plant influent shows little alternation from July to September 2020

SARS-CoV-2 genomic diversity in wastewater was visualized by principal component analysis (Fig 4) using a method pioneered by Fontenele and colleagues [22], in which the sum of Yue and Clayton similarity indices [23] across the entire SARS-CoV-2 genome for all pairwise combinations of samples is the input matrix for the PCA (see Methods) [23]. Notably, data points are overlapping for the University of South Carolina site 1 (Figs 2 and 4B) on August 28 and September 11. However, these data points are resolved in a three-dimensional PCA (S1 Fig). One of the limitations of the method pioneered by Fontenele and colleagues [22] is that differences in sequencing depth between samples will affect the sum of the similarity indices. Due to the differences in the sequencing depth of Columbia WWTP influent samples compared to the University of South Carolina samples (S1 and S2 Tables), the sum of the Yue and Clayton similarity indices was higher for pairwise combinations with greater sequencing depth (e.g. two University of South Carolina samples). Therefore, PCA was only conducted for samples of similar depth (Fig 4 panels A and B). PCA for WWTP influent samples (Fig 4A) showed a high degree of similarity between samples collected during the summer (July and August) preceding the start of the academic year and those collected in September 2020. Therefore, despite the return of some students to campus, there was not a substantial shift in the SARS-CoV-2 sequence diversity. Some factors that may contribute to this observation include the continued presence of students on campus during the summer as well as the fact that some students continued remote instruction during fall 2020. Also, since the University of South Carolina is a public school, ~60% of its students are from South Carolina [28], which lessens the influx of SARS-CoV-2 genotypes from other states and countries. We anticipate that private colleges and universities in which a greater proportion of the students live out-of-state may experience a shift in SARS-CoV-2 sequence diversity at the start of the academic year. However, it must also be considered that contributions to WWTP influent from industry or stormwater may dilute out the anticipated effects.

Fig 4. Principal Component Analysis (PCA) of (A) Columbia WWTP influent composite genotypes (from July-September, and (B) University of South Carolina wastewater composite genotypes.

Fig 4

Composite genotypes for each wastewater sample were established by calculating the nucleotide frequency at each position in the SARS-CoV-2 reference genome. The composite genotypes were then pairwise compared to each other by summing of the Yue and Clayton similarity index [23] for each position in the reference genome. Analysis and visualization were performed with the R packages prcomp and ggbiplots. The size of the ellipse in Normal probability (ellipse.prob option) was set to 0.95.

Hierarchical clustering of mutation read frequencies reveals a localized outbreak of the 20G clade at the University of South Carolina

A heatmap with hierarchical clustering of the read frequency (the fraction of reads containing the mutated nucleotide at a specific location) of mutations in the SARS-CoV-2 genome from all wastewater samples (UofSC and Columbia WWTP influent) is depicted in Fig 5, with labeled mutations available in S2 Fig. An intrinsic challenge and limitation of this study was the high degree of biological variability within each condition. The data used for this study was part of a statewide and university sampling effort in response to the COVID-19 pandemic. Samples varied both in time and space for the university samples and in time for the WWTP influent samples. Despite these limitations, the university samples from August 28 (sites 1 and 8, Fig 2) and September 4 (site 5, Fig 2) showed a high degree of reproducibility, as did the mutational profiles of WWTP influent samples collected on September 2 and 6 (Fig 5).

Fig 5. Read frequency of SARS-CoV-2 mutations in Columbia WWTP influent and the University of South Carolina.

Fig 5

Each cell represents the read frequency of a mutation from the SARS-CoV-2 reference genome (accession MN908947.3) in wastewater samples from the Columbia WWTP influent (“Columbia”) or the University of South Carolina campus (“UofSC”). Cells in blue indicate the mutation was not observed in the sample. See S2 Fig for complete list of nucleotide mutations corresponding to the heatmap. Only wastewater samples with at least 50% SARS-CoV-2 genome coverage were included in the heatmap.

Hierarchical clustering analysis revealed four clusters of mutations (Fig 5). The largest cluster (Cluster 2) consisted of mutations that were mostly detected in a single sample. In contrast, the majority of the mutations in the top cluster of Fig 5 (Cluster 3) were shared across all groups (WWTP influent collected during summer months and September and UofSC wastewater collected during September). Four of the five signature mutations of the 20C clade in NextStrain [29], also referred to as the GH clade in GISAID [30]. were observed in Cluster 3: 241c > t, 1059c > t, 3037c > t, 14,408c > t, and 25,563g > t. The high frequency of these mutations across both university and WWTP influent samples is in agreement with clinical data from July to September 2020, where the 20A, 20C, and 20G NextStrain clades were dominant in the United States [29].

Notably, Cluster 4 of Fig 5 included five of the seven signature mutations of the 20G clade, illustrating that the peak in COVID-19 cases experienced at the beginning of September may have been the result of a 20G outbreak on campus. Cluster 1 indicated traces of the 20G clade in the WWTP influent (10,319c > t). However, the 20G clade was predominantly detected in the localized wastewater sample at the University of South Carolina. The remaining mutations in Cluster 1 demonstrated homogeneity in the WWTP influent collected in early September.

One-way analysis of variance (ANOVA) and subsequent Tukey’s test for each of the observed 20G mutations resulted in statistically significant differences (p-adjusted < 0.1) between the University of South Carolina September wastewater samples and WWTP influent collected September (S1 Dataset) for mutations 27,964c > t, 28,472c > t, 28,869c > t, and 25,907g > t. Observation of the mutational profiles of each wastewater sample indicated that the 20G outbreak may have been limited to specific buildings on campus, since the mutational profile for wastewater collected from site 11 differed from site 5 on September 4th (Figs 2 and 5, S2 Fig). These results suggest that localized sampling increases the sensitivity of detection of specific mutations, since in all cases the read frequency of the mutation was higher in the university sample set, whereas the mutations were either not detected consistently or were detected at low frequencies in the WWTP influent sample sets. Nevertheless, it cannot be ruled out that differences in sequencing depth between the university samples and WWTP influent samples, likely caused by the different input RNA concentrations (S1 and S2 Tables) contributed to these perceived differences.

Co-occurrence of mutations corroborates a 20G viral outbreak at the University of South Carolina

Mutations that occur in the same SARS-CoV-2 genome can have important phenotypic implications, such as greater infectivity, as demonstrated for the delta SARS-CoV-2 variant [9]. In the context of SARS-CoV-2 amplicons from wastewater, it is difficult to determine which of the mutations originates from the same genome, since the wastewater sample represents a composite from many individuals. However, concurrent mutations can be compared across wastewater samples as a similarity metric and further validated with clinical data to determine whether they are commonly observed in the same viral genome (S3 Table).

Out of 92 total distinct mutations detected in the SARS-CoV-2 genome in the University of South Carolina and Columbia WWTP influent samples, 16 pairs co-occurred more than expected if the two mutations were distributed randomly from each other, as determined using a probabilistic co-occurrence model [26] (Fig 6). In this model, combinatorics is used to compare the observed co-occurrence to the mathematical expected co-occurrence (the product of each mutation’s probability of occurrence multiplied by the number of samples). If the frequency of co-occurrence is observed significantly more than expected, then the mutations are considered positively correlated. For the full details of the model, the reader is referred to references [25, 26]. Out of 16 total concurrent pairs of mutations, 12 had at least one signature mutation of the 20G lineage and one had a signature mutation (14,408c > t) of the 20C lineage. Network analysis of concurrent mutations suggests that localized sampling may be more sensitive to detect viral strains, since 12 of the concurrent mutation pairs that were identified in the university samples from September 2020 were not detected in the WWTP influent. However, these results also corroborate a 20G outbreak at the University of South Carolina during September 2020 since 10 of 12 mutation pairs comprised at least one signature mutation of the 20G clade. All concurrent mutations were identified in clinical sequences with a non-zero concurrence ratio (S3 Table). Therefore, it cannot be ruled out that each pair of concurrent mutations may have originated from the same genome.

Fig 6. Co-occurrence and network analysis of mutations detected in WWTP influent and university samples.

Fig 6

(A) Co-occurrence of mutations. No negative co-occurrences (those mutations detected together less often than expected by chance) were identified. (B) Co-occurrence network for mutations from the SARS-CoV-2 reference genome (accession MN908947.3) detected in University of South Carolina wastewater and Columbia WWTP influent. Network was rendered in Cytoscape [31] using the yFiles circular layout algorithm. Edge colors signify the sample types and collection period where the co-occurrences were detected: Green = co-occurrence found in university wastewater samples from September 2020, blue = co-occurrence found in all wastewater sample types (WWTP influent from July/August 2020, and WWTP influent and university samples from September 2020), pink = co-occurrence found in WWTP influent from September 2020 and university samples. Green nodes indicate signature mutations of clade 20G.

Taken together, these results suggest that although localized university wastewater sampling shared concurrent mutations that were also detected in WWTP influent, the greater sensitivity afforded by onsite sample collection nearer the source resulted in the detection of a distinct set of mutations and a strong signal of the 20G clade.


Although many publications have compared clinical sequence data to wastewater data [22, 3234], this work represents one of the few studies to compare wastewater data collected from localized sampling at a university to WWTP influent from the greater metropolitan area. This work affirms a close relationship between SARS-CoV-2 sequences from the student body of a university and those of the greater surrounding metropolitan area. Thirteen mutations were identified in both university and WWTP influent samples during September 2020. In addition, we found ten concurrent mutations unique to the localized university sampling that were strongly indicative of a 20G outbreak on campus. Therefore, strategic localized sampling at potential hotspots offers distinctive advantages compared to WWTP influent sampling, such as increased sensitivity in detecting SARS-CoV-2 variants. Relative to sequencing clinical samples or WWTP influent, sequencing at the building level affords a balance between sensitivity and cost.

We anticipate that similar results would be obtained for other universities and their surrounding communities, with even more overlap in cases where universities are situated in less populated areas. Given the overlap in viral mutations between the greater Columbia metropolitan wastewater and the localized university wastewater, university policy makers should work together with government officials from the surroundings communities to manage infectious disease spread.

Supporting information

S1 Text. Reverse transcription quantitative PCR (RT-qPCR) methods.


S1 Table. Columbia WWTP influent samples used in this study and sequencing depth and coverage per barcode.

WW = wastewater. Samples with less than 50% coverage that are highlighted in gray were not included in the heatmap or co-occurrence analysis.


S2 Table. University of South Carolina campus samples used in this study and sequencing depth and coverage per barcode.

WW = wastewater; UofSC = University of South Carolina.


S3 Table. Concurrent mutations identified in wastewater samples that were validated with GESS [27] on December 22, 2021.


S1 Fig. Three-dimensional principal component analysis (PCA) visualized using the pca3d() function in R corresponding to Fig 4B in the main text.


S2 Fig. Heatmap corresponding to Fig 1 in the main text in PDF format to enhance visualization of nucleotide mutations.



We are grateful to the South Carolina utilities directors and operators, as well as SCDHEC, for their contributions to wastewater sampling and transportation. We acknowledge the University of South Carolina facilities staff, as well as Emily Gosnell, Dillon Bryant, Stefano Belmonte, and Sejla Isanovic, for their contribution to the wastewater collection at the University of South Carolina. We would like to acknowledge that the Research Computing program under the Division of Information Technology at the University of South Carolina contributed to the results in this research by providing High Performance Computing resources and expertise. We are very grateful to GISAID Initiative and all its data contributors, i.e. the Authors from the Originating laboratories responsible for obtaining the specimens and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based.

Data Availability

S1-S4 Datasets are available through Mendeley Data at: S1 Dataset contains the one-way analysis of variance (ANOVA) and subsequent Tukey’s test comparing 20G mutation read frequencies between UofSC, WWTP influent collected in September 2020 (treatment “fall”), and WWTP influent collected in July/August 2020 (treatment “summer”). S2-S4 Datasets contain the ARTIC minion (nanopolish) VCF output for Columbia WWTP influent samples from July and August 2020 (S2 Dataset), Columbia WWTP influent samples from September 2020 (S3 Dataset), or University of South Carolina samples from September 2020 (S4 Dataset). Single nucleotide variant analysis is included in the *.vcf files in the output subfolder. Please refer to for details of the artic minion output. Sequencing reads aligned to the SARS-CoV-2 genome (accession MN908947.3) in BAM format are available at NCBI BioProject PRJNA763484.

Funding Statement

Funding sources to RSN: Centers for Disease Control and Prevention ( #75D-301-18C-02903, South Carolina Department of Health and Environmental Control ( #EQ-0-654, and the University of South Carolina. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


  • 1.Zhang W, Du RH, Li B, Zheng XS, Yang X Lou, Hu B, et al. Molecular and serological investigation of 2019-nCoV infected patients: implication of multiple shedding routes. Emerg Microbes Infect. 2020;9: 386–389. doi: 10.1080/22221751.2020.1729071 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ahmed F, Islam MA, Kumar M, Hossain M, Bhattacharya P, Islam MT, et al. First detection of SARS-CoV-2 genetic material in the vicinity of COVID-19 isolation Centre in Bangladesh: Variation along the sewer network. Sci Total Environ. 2021;776: 145724. doi: 10.1016/j.scitotenv.2021.145724 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kumar M, Patel AK, Shah A V., Raval J, Rajpara N, Joshi M, et al. First proof of the capability of wastewater surveillance for COVID-19 in India through detection of genetic material of SARS-CoV-2. Sci Total Environ. 2020;746: 141326. doi: 10.1016/j.scitotenv.2020.141326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.La Rosa G, Iaconelli M, Mancini P, Bonanno Ferraro G, Veneri C, Bonadonna L, et al. First detection of SARS-CoV-2 in untreated wastewaters in Italy. Sci Total Environ. 2020;736: 139652. doi: 10.1016/j.scitotenv.2020.139652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wu F, Xiao A, Zhang J, Moniz K, Endo N, Armas F, et al. SARS-CoV-2 RNA concentrations in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases. Sci Total Environ. 2021; 150121. doi: 10.1016/j.scitotenv.2021.150121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.National Wastewater Surveillance System (NWSS)–a new public health tool to understand COVID-19 spread in a community | CDC. 2021 [cited 20 Apr 2021]. Available:
  • 7.McClary et al. JS. SARS-CoV-2 Wastewater Surveillance for Public Health Action: Connecting Perspectives from Wastewater Researchers and Public Health Officials During a Global Pandemic. Preprint. 2021; 1–21. doi: 10.20944/preprints202104.0167.v1 [DOI]
  • 8.Rennert L, McMahan C. Risk of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Reinfection in a University Student Population. Clin Infect Dis. 2021. [cited 31 Aug 2021]. doi: 10.1093/CID/CIAB454 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Li B, Deng A, Li K, Hu Y, Li Z, Xiong Q, et al. Viral infection and transmission in a large, well-traced outbreak caused by the SARS-CoV-2 Delta variant. medRxiv. 2021; 2021.07.07.21260122. doi: 10.1101/2021.07.07.21260122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Scott LC, Aubee A, Babahaji L, Vigil K, Tims S, Aw TG. Targeted wastewater surveillance of SARS-CoV-2 on a university campus for COVID-19 outbreak detection and mitigation. Environ Res. 2021;200: 111374. doi: 10.1016/j.envres.2021.111374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Harris-Lovett S, Nelson KL, Beamer P, Bischel HN, Bivins A, Bruder A, et al. Wastewater Surveillance for SARS-CoV-2 on College Campuses: Initial Efforts, Lessons Learned, and Research Needs. Int J Environ Res Public Heal 2021, Vol 18, Page 4455. 2021;18: 4455. doi: 10.3390/IJERPH18094455 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gibas C, Lambirth K, Mittal N, Juel MAI, Barua VB, Roppolo Brazell L, et al. Implementing building-level SARS-CoV-2 wastewater surveillance on a university campus. Sci Total Environ. 2021;782: 146749. doi: 10.1016/j.scitotenv.2021.146749 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.UP. University of Pennsylvania—Profile, Rankings and Data | US News Best Colleges. 2020. Available:
  • 14.U.S. Census Bureau QuickFacts: Columbia city, South Carolina. [cited 27 Aug 2021]. Available:
  • 15.Richmond CS, Sabin AP, Jobe DA, Lovrich SD, Kenny PA. SARS-CoV-2 sequencing reveals rapid transmission from college student clusters resulting in morbidity and deaths in vulnerable populations. medRxiv. 2020; 2020.10.12.20210294. doi: 10.1101/2020.10.12.20210294 [DOI] [Google Scholar]
  • 16.Leidner AJ, Barry V, Bowen VB, Silver R, Musial T, Kang GJ, et al. Opening of Large Institutions of Higher Education and County-Level COVID-19 Incidence—United States, July 6–September 17, 2020. MMWR Morb Mortal Wkly Rep. 2021;70: 14–19. doi: 10.15585/mmwr.mm7001a4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Andersen MS, Bento AI, Basu A, Marsicano CR, Simon K. College Openings in the United States Increased Mobility and COVID-19 Incidence. medRxiv. 2021; 2020.09.22.20196048. doi: 10.1101/2020.09.22.20196048 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Bureau UC. ZIP Code Tabulation Areas (ZCTAs). [cited 5 Oct 2021]. Available:
  • 19.Quick J. nCoV-2019 sequencing protocol v3 (LoCost). 2020 [cited 18 May 2021]. Available:
  • 20.Loman N, Rowe W, Rambaut A. nCoV-2019 novel coronavirus bioinformatics protocol. In: ARTIC Network [Internet]. 2020 [cited 15 Apr 2021] pp. 1–4. Available:
  • 21.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34: 3094–3100. doi: 10.1093/bioinformatics/bty191 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fontenele RS, Kraberger S, Hadfield J, Driver EM, Bowes D, Holland LA, et al. High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants. medRxiv Prepr Serv Heal Sci. 2021. doi: 10.1101/2021.01.22.21250320 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Yue JC, Clayton MK. A similarity measure based on species proportions. Commun Stat—Theory Methods. 2005;34: 2123–2131. doi: 10.1080/STA-200066418 [DOI] [Google Scholar]
  • 24.R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. 2013. [cited 20 Feb 2020]. Available: [Google Scholar]
  • 25.Griffith DM, Veech JA, Marsh CJ. Cooccur: Probabilistic species co-occurrence analysis in R. J Stat Softw. 2016;69: 1–17. doi: 10.18637/jss.v069.c02 [DOI] [Google Scholar]
  • 26.Veech JA. A probabilistic model for analysing species co-occurrence. Glob Ecol Biogeogr. 2013;22: 252–260. doi: 10.1111/j.1466-8238.2012.00789.x [DOI] [Google Scholar]
  • 27.Fang S, Li K, Shen J, Liu S, Liu J, Yang L, et al. GESS: A database of global evaluation of SARS-CoV-2/hCoV-19 sequences. Nucleic Acids Res. 2021;49: D706–D714. doi: 10.1093/nar/gkaa808 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.UofSC enrollment increases—UofSC News & Events | University of South Carolina. [cited 1 Sep 2021]. Available:
  • 29.Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, et al. NextStrain: Real-time tracking of pathogen evolution. Bioinformatics. 2018;34: 4121–4123. doi: 10.1093/bioinformatics/bty407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Shu Y, McCauley J. GISAID: Global initiative on sharing all influenza data–from vision to reality. Eurosurveillance. 2017;22: 2–4. doi: 10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13: 2498–504. doi: 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Swift CL, Isanovic M, Correa Velez KE, Norman RS. Community-level SARS-CoV-2 sequence diversity revealed by wastewater sampling. Sci Total Environ. 2021;801: 149691. doi: 10.1016/j.scitotenv.2021.149691 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jahn K, Dreifuss D, Topolsky I, Kull A, Ganesanandamoorthy P, Fernandez-Cassi X, et al. Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples. medRxiv. 2021; 2021.01.08.21249379. Available: 10.1101/2021.01.08.21249379. [DOI] [Google Scholar]
  • 34.Crits-Christoph A, Kantor RS, Olm MR, Whitney ON, Al-Shayeb B, Lou YC, et al. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. medRxiv. 2020;12: 1–9. doi: 10.1101/2020.09.13.20193805 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Theodore Raymond Muth

13 Dec 2021

PONE-D-21-32875Wastewater surveillance illustrates overlapping SARS-CoV-2 mutational profiles between a university campus and its surrounding communityPLOS ONE

Dear Dr. Norman,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The two reviewers of your manuscript provided extensive and detailed comments and requests for improvements or clarifications. Please read through both reviews carefully and address each of the points made by the reviewers as thoroughly as possible. In some cases additional analyses are requested, such as ANOVA. 

Please submit your revised manuscript by Jan 27 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at When you're ready to submit your revision, log on to and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in to enhance the reproducibility of your results. assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on Read more information on sharing protocols at

We look forward to receiving your revised manuscript.

Kind regards,

Theodore Raymond Muth

Academic Editor


Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at  and

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information:

3. We note that Figure 2 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines:

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

   a. You may seek permission from the original copyright holder of Figure 2 to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form ( and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 ( Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

   b. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain):

The Gateway to Astronaut Photography of Earth (public domain):

Maps at the CIA (public domain): and

NASA Earth Observatory (public domain):


USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain):

Natural Earth (public domain):

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly


2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes


3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes


4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes


5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors report and compare the sequence diversity of SARS-CoV-2 from wastewater influent to the Columbia, South Carolina, metropolitan wastewater treatment plant (WWTP) and four sites from the University of South Carolina (UofSC) campus during July-September 2020. In this study, 154 unique mutations were detected across all samples; of those 26 mutations were shared between the university and the WWTP. Their main conclusions are: 1- there is a close relationship between the prevalent SARS-CoV-2 genotypes from the UofSC campus and the Columbia WWTP.

2- wastewater surveillance at the building level at a university captures sequence diversity not detected in the WWTP, thus offering a balance between WWTPs and clinical sequencing.

It is important to track and characterize mutations at large volume facilities as well as obtain a higher resolution at a local level, which is much closer to the potential source. Such methods when employed systematically can help in deploying mitigating efforts to contain outbreaks. Surveillance, quantification, sequencing, and comparison of results is intricate as wastewater is a complex matrix. Additionally, the impact of temporal and geographical variability needs to be carefully assessed for data from wastewater surveillance to be meaningful. Therefore, this comparison study is valuable and important. However, there are concerns regarding the methodology and the conclusions derived from the results, which need to be addressed before the paper can be published.

Major issues

1- Lines 227-228. To compare the read frequencies differences between the WWTP influent collected in July and August, the WWTP influent collected in September, and the university wastewater samples collected in September, the authors use the student’s t-test. The t-test is generally used for pairwise comparisons, for comparing three or more conditions, ANOVA for multiple comparisons is more suitable. The authors should use ANOVA and assess the significance of the differences detected.

2- Another limitation that should be addressed is that at low RNA copy numbers, the read frequencies cannot be correlated as absolute frequencies of mutations found in a sample. The aliquots used for cDNA synthesis will show stochastic variations due to the low RNA copy numbers. This results in biased PCR amplifications and the read frequencies might vary among the replicates from the same sample. Therefore, at low copy numbers (as seen in the Columbia WWTP), comparing read frequencies may not be appropriate.

3- These caveats are also problematic as read frequencies were used for the probabilistic models in the co-occurrence and linkage analysis. The methodologies used for the co-occurrence and linkage analysis should be further detailed in the paper.

4- The use of the terms co-occurrent and concurrent should be explained in more detail. For example, in line 293, “here” implies that in other instances co-occurring has another meaning? Additionally, the authors should explain why none of the pairs of mutations identified by linkage overlapped with those identified through the R package cooccur.

Minor issues

1- More precise information would be useful for Figure 1. What were the cases of isolation or quarantine based on? Did they test positive for SARS-CoV-2 or as suggested by the legend suspected of exposure?

2- Line 170- Figure 2 is mentioned to visualize site 1 but Figure 4, panel B should be referenced to show the overlapping points. The authors use the PCA analysis showing overlapping points for the UofSC site 1 for August 28 and September 11 to suggest sequence stability. The average shedding time of the SARS-CoV-2 virus is two weeks, the same time lapse between the two sampling events at site 1. Therefore, the authors should explain the sequence stability because of the temporal and spatial distribution of the samples.

3- Figure 4. The color coding of the groups has been switched on panel A. Red should be assigned to Sept and blue to July/Aug.

4- Line 287. Something seems to be missing. “complementary techniques to identify co-occurrence. patterns in sequence space.”

Reviewer #2: In the submitted manuscript “Wastewater surveillance illustrates overlapping SARS-CoV-2 mutational profiles between a university campus and its surrounding community”, the authors present the results of municipal-level wastewater treatment plant (WWTP) surveillance and campus-level surveillance in concert. They identify common trends and also differences between the WWTP and campus samples along with presentation of results of a variety of statistical tools that aim to deconvolute the wastewater mixtures into co-occurring genotypes. I have a few concerns and suggestions listed below:

1) Interpretation of the SNPs over time (Page 10-12): The co-occurrence result is challenging to interpret, especially the linkage results on page 13. I like Figure 6 A and B, and think they should stay in some form (and maybe even largely as-is), but they need more interpretation. What viral context do these networks reveal?

The authors state on page 10 (lines 209-221) that Cluster 4 mutations, including 25563 g>t, are found in all samples at close to 100% frequency, which would suggest that at the time(s) of sampling, nearly all detectable SARS-CoV-2 in Columbia and UofSC are of the 20C clade (or a descendant). However, one of the other major markers for clade 20C, 1059 c>t, is found in fewer enough samples that it is grouped into the “all conditions, albeit less consistently” Cluster 3. To my eye, the differences between 1059 and 25563 are not huge in the campus samples. All but one (site 1, 9/11) have both. From this we might surmise a strong presence of 20C (or descendant) on the campus at the time of sampling. The less consistent signal in the WWTP samples suggest a more complicated mix of genotypes in the broader area.

Alternatively, if we expect to find 1059 in all samples where the 25563 mutant is found, might that imperfect correspondence be the result of wastewater sequencing itself, with this more challenging substrate producing more piecemeal results as one of the sensitivity tradeoffs? Along these same lines of sensitivity, I note that there is no discussion of the spike mutation D614G (23403 a>g). Do the V3 amplicons poorly amplify this region? Can the authors speculate on the reasons?

2) A 20G outbreak on campus: I think at least one of the things the authors have revealed (Networks in Fig 6, SNPs in Fig 5/ S1, and lists thereof on pages 10-12), is it looks like they caught a 20G outbreak in-progress on campus. Hints of which are seen in the WWTP, but not nearly as consistently as the 20G signal caught in-the-act on campus. The authors should emphasize this finding/interpretation, even as it may require re-writing of the co-occurrence results.

On lines 230-232, the authors describe a significance test whereby 6 mutations were different between September WWTP and University samples “1358g > a, 3037c > t 13,201g > t, 18,424a > g, 25,907g > t, and 27,964c > t.” Of these, 3037 c >t is hard to explain (it is found in all 20A and all descendants, along with 241, 14408, and 23403), but 18424, 25907, and 27964 are some of the markers of 20G. If these 3 are all going in the same direction (are they?), they may be evidence that either the University or the City were undergoing a wave of 20G that the other was not (

The tests continue on lines 232-236, with summer City samples compared to University samples. Two more 20G markers make this list: 10319 and 28472. Another universal 20A marker does too (241 c>t). I think the interpretations of each are similar to those on the previous tests.

Together, the above, and knowing the global context of mutations 10319, 18424, 25907, 27964, and 28472 as markers of 20G, the authors may have bona-fide evidence of a 20G outbreak in-progress on this campus. While less consistent hints of 20G are found in the City samples, the 20G signal is quite “bright” in 3 of the campus samples (Fig S1), (Site 1 8/28, Site 8 8/28, Site 5 9/4) and absent in (Site 11 9/4, Site 1 9/11).

3) PCA plots in 4B: The PCA in Figure 4B does not seem to agree with the other findings of sample similarity/dissimilarity. The two samples in the center are unlikely to be that similar.

I find it hard to come up with a scenario in which two samples would completely overlap on a PCA plot unless they were identical (UofSC points 1 8/28/20 and 1 9/11/20). I guess it could be possible if their differences only showed up on an N-th PC and not PC1 or 2? At minimum, the authors should check their input matrix/dataframe for the PCA in 4B, verify whether these two datapoints are identical or not, and add/expand comments on this to help explain. Figure 4’s legend needs to also explain what is being plotted, not just “samples”. Is it a singular inferred genotype per sample or a species proportion?

Moreover, regardless of what interpretation of genotype(s) is being computed for a dissimilarity score, it is difficult to believe that these two samples in particular would be perfectly identical on the first two PCs, given how many SNPs they have not-in-common, per figure 5 Clusters 3,2, and 1.

Other issues


The need to have Figure 5 and Figure S1 which only differ by the S1 having text labels on each row is odd? Figure S1 seems to be sufficient, and an idea would be to merge the two figures back into one, and in the body of the article.

Line 42-43: The statement that “reinfection with SARS-CoV-2 has been demonstrated to occur on college campuses in at least 16% of students” is interpreted incorrectly from Reference #8. The estimated reinfection rate in Reference 8 was 2.2% with 16% of the population susceptible to reinfection. This needs to be fixed.

Figure 1

Figure 1’s X axis makes sense in set notation, but probably not as what most readers will expect from a scientific figure. Would it be appropriate to replace the current x-axis label with the latter date of each given set’s range?

Figure 3

Fig 3A is challenging to interpret. The gray dots along the axes aren’t easy to mentally place alongside the colorful diagonal dots. I advise the authors to rework this figure to 3 horizontal graphs, such as a top level with the gray UofSC-only sites, a middle with the colorful shared sites, and a bottom level with Columbia WWTP-only sites. Dot jitter could be used for legibility. Also, the Y-axis on Fig 3A uses the abbreviation “USC” not found elsewhere in the manuscript.

Figure 4

Axis labels and legends lack capitalization. Name the 4A legend something like “WWTP groups.”

Linkage results and discussion

The linkage discussion on divergence on p 13 is incomplete and could be made more clear. What is the significance or lack thereof of the disagreements? Also, there are some typos on line 293.

Other minor notes

On line 78, MGD has not been defined.

On and around line 100, which native barcoding kit(s) were used? Expansion 1-12, 13-24, Expansion 96?

On line 104, unless all amplicons are exactly 400 bp, replace with “~400 bp.”

On line 113, was the flow cell R9.4? If possible, more specific is better.

Citation 16 only has author initials.


6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Edwin Oh, Van Vo, and Richard Tillett

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at Please note that Supporting Information files do not need this step.


Submitted filename: review.docx


Submitted filename: plos1_wwtp_reviewer_notes.docx

Decision Letter 1

Theodore Raymond Muth

21 Mar 2022

Wastewater surveillance of SARS-CoV-2 mutational profiles at a university and its surrounding community reveals a 20G outbreak on campus


Dear Dr. Norman,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact

Kind regards,

Theodore Raymond Muth

Academic Editor


Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed


2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes


3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: Yes


4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes


5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes


6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: I commend the authors for a job well done. This is an interesting finding and all of our concerns have been addressed.


7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Edwin Oh

Acceptance letter

Theodore Raymond Muth

4 Apr 2022


Wastewater surveillance of SARS-CoV-2 mutational profiles at a university and its surrounding community reveals a 20G outbreak on campus

Dear Dr. Norman:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact

If we can help with anything else, please email us at

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Theodore Raymond Muth

Academic Editor


Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text. Reverse transcription quantitative PCR (RT-qPCR) methods.


    S1 Table. Columbia WWTP influent samples used in this study and sequencing depth and coverage per barcode.

    WW = wastewater. Samples with less than 50% coverage that are highlighted in gray were not included in the heatmap or co-occurrence analysis.


    S2 Table. University of South Carolina campus samples used in this study and sequencing depth and coverage per barcode.

    WW = wastewater; UofSC = University of South Carolina.


    S3 Table. Concurrent mutations identified in wastewater samples that were validated with GESS [27] on December 22, 2021.


    S1 Fig. Three-dimensional principal component analysis (PCA) visualized using the pca3d() function in R corresponding to Fig 4B in the main text.


    S2 Fig. Heatmap corresponding to Fig 1 in the main text in PDF format to enhance visualization of nucleotide mutations.



    Submitted filename: review.docx


    Submitted filename: plos1_wwtp_reviewer_notes.docx


    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    S1-S4 Datasets are available through Mendeley Data at: S1 Dataset contains the one-way analysis of variance (ANOVA) and subsequent Tukey’s test comparing 20G mutation read frequencies between UofSC, WWTP influent collected in September 2020 (treatment “fall”), and WWTP influent collected in July/August 2020 (treatment “summer”). S2-S4 Datasets contain the ARTIC minion (nanopolish) VCF output for Columbia WWTP influent samples from July and August 2020 (S2 Dataset), Columbia WWTP influent samples from September 2020 (S3 Dataset), or University of South Carolina samples from September 2020 (S4 Dataset). Single nucleotide variant analysis is included in the *.vcf files in the output subfolder. Please refer to for details of the artic minion output. Sequencing reads aligned to the SARS-CoV-2 genome (accession MN908947.3) in BAM format are available at NCBI BioProject PRJNA763484.

    Articles from PLoS ONE are provided here courtesy of PLOS