Combining Short- and Long-Read Sequencing Technologies to Identify SARS-CoV-2 Variants in Wastewater

Gabrielle Jayme; Ju-Ling Liu; Jose Hector Galvez; Sarah Julia Reiling; Sukriye Celikkol; Arnaud N’Guessan; Sally Lee; Shu-Huang Chen; Alexandra Tsitouras; Fernando Sanchez-Quete; Thomas Maere; Eyerusalem Goitom; Mounia Hachad; Elisabeth Mercier; Stephanie Katharine Loeb; Peter A Vanrolleghem; Sarah Dorner; Robert Delatolla; B Jesse Shapiro; Dominic Frigon; Jiannis Ragoussis; Terrance P Snutch

doi:10.3390/v16091495

. 2024 Sep 21;16(9):1495. doi: 10.3390/v16091495

Combining Short- and Long-Read Sequencing Technologies to Identify SARS-CoV-2 Variants in Wastewater

Gabrielle Jayme ¹, Ju-Ling Liu ^2,³, Jose Hector Galvez ⁴, Sarah Julia Reiling ^2,³, Sukriye Celikkol ⁵, Arnaud N’Guessan ^6,⁷, Sally Lee ^2,³, Shu-Huang Chen ^2,³, Alexandra Tsitouras ⁵, Fernando Sanchez-Quete ⁵, Thomas Maere ⁸, Eyerusalem Goitom ⁹, Mounia Hachad ¹⁰, Elisabeth Mercier ¹¹, Stephanie Katharine Loeb ⁵, Peter A Vanrolleghem ⁸, Sarah Dorner ¹⁰, Robert Delatolla ¹¹, B Jesse Shapiro ^2,¹², Dominic Frigon ⁵, Jiannis Ragoussis ^2,^3,^13,^*,^†, Terrance P Snutch ^1,^14,^*,^†

Editor: Norbert Nowotny

PMCID: PMC11437403 PMID: 39339971

Abstract

During the COVID-19 pandemic, the monitoring of SARS-CoV-2 RNA in wastewater was used to track the evolution and emergence of variant lineages and gauge infection levels in the community, informing appropriate public health responses without relying solely on clinical testing. As more sublineages were discovered, it increased the difficulty in identifying distinct variants in a mixed population sample, particularly those without a known lineage. Here, we compare the sequencing technology from Illumina and from Oxford Nanopore Technologies, in order to determine their efficacy at detecting variants of differing abundance, using 248 wastewater samples from various Quebec and Ontario cities. Our study used two analytical approaches to identify the main variants in the samples: the presence of signature and marker mutations and the co-occurrence of signature mutations within the same amplicon. We observed that each sequencing method detected certain variants at different frequencies as each method preferentially detects mutations of distinct variants. Illumina sequencing detected more mutations with a predominant lineage that is in low abundance across the population or unknown for that time period, while Nanopore sequencing had a higher detection rate of mutations that are predominantly found in the high abundance B.1.1.7 (Alpha) lineage as well as a higher sequencing rate of co-occurring mutations in the same amplicon. We present a workflow that integrates short-read and long-read sequencing to improve the detection of SARS-CoV-2 variant lineages in mixed population samples, such as wastewater.

Keywords: SARS-CoV-2, coronaviruses, variants, wastewater surveillance, Illumina sequencing, Nanopore sequencing

1. Introduction

The SARS-CoV-2 genome is constantly evolving, with mutations happening at a rate of about once every 2 weeks [1]. While not all mutations change the characteristics of the virus, some mutations have proven to be of greater concern. Variants of interest (VOI) are labelled as such when an observed lineage is shown to have mutations potentially causing increased transmissibility or virulence, among other attributes [2]. Health organisations may reclassify these variants as variants of concern (VOC) if there is a demonstrable impact on epidemiological data. These viruses are labelled by the WHO and assigned a lineage based on PANGO nomenclature [3].

Wastewater surveillance has emerged as a crucial tool in tracking mutations in the SARS-CoV-2 genome. Samples of untreated wastewater can be collected to provide useful information about the spread of COVID-19 in the community, without relying on clinical testing [4,5]. As clinical sampling mainly relies on symptomatic testing, wastewater sampling can provide unbiased and consistent data that can be used to inform appropriate public health responses. It is used to detect variants earlier and provide more context on the transmissibility and COVID-19 levels in communities, particularly where access to clinical testing is not readily available. As wastewater samples consist of a mixture of fragmented RNA from many sources, it can be difficult to accurately identify mutations and variants, particularly those without a known lineage [6].

Next-generation sequencing has proven to be an important tool in pandemic surveillance, particularly in the early detection and spread of variants [7,8,9]. With a high rate of occurrence of mutations and increased transmissibility, the need to provide high-throughput data generation in a relatively short time frame has led to the development of a number of tools and protocols using next-generation sequencing, such as SARS-CoV-2 specific primers and tools to determine lineage in samples. These sequencing methods have been useful in analysing clinical and environmental samples and assisting in tracking viral load, transmission, contact tracing, and virus evolution [8]. Illumina and Nanopore sequencing are two next-generation sequencing technologies that have become major tools in genomic research. Illumina sequencing is a second-generation sequencing technology that uses sequencing by synthesis (SBS), where a reversible fluorescent terminator is used to detect the nucleotide sequence [10,11]. Nanopore sequencing is a third-generation sequencing technology that uses the current changes in a charged protein nanopore from the molecule passing through to determine the specific sequence [10,12]. Multiple studies have been performed on the comparison of Nanopore and Illumina sequencing, highlighting their various advantages in different applications [13,14,15]. Illumina sequencing is widely regarded as being highly accurate, consistently sequencing around 99.5–99.9% accuracy, and the higher depth of reads enables it to be a useful tool in circumstances with poor sequencing coverage, such as wastewater surveillance [16]. Nanopore sequencing has the ability to produce ultra-long reads, only limited by the sample preparation and quality, and is useful in genomic assembly and spanning entire regions of repetitive bases and structural variation [17]. Furthermore, real-time analysis of sequences and portability of sequencing devices has benefits in the field. Studies have been completed comparing Illumina and Nanopore sequencing on clinical and wastewater SARS-CoV-2 samples, which focuses on benchmarking parameters such as genome coverage and depth and variant calling on samples. However, they did not explore the combination of the two sequencing technologies as a method to improve the detection of variants [18,19,20].

In this work, we looked to highlight the advantages of both Illumina and Nanopore sequencing in tracking SARS-CoV-2 variants from wastewater samples. Mutational analysis on samples sequenced with both methods allowed for a comparison of the major variants identified among each dataset.

2. Materials and Methods

2.1. Sample Collection and Sequencing

Wastewater samples were collected from Ottawa (Ontario, Canada) and Montreal and Quebec City (Quebec, Canada) between March 2020 and March 2021 (Table S1). All 248 samples were sequenced with both Illumina and Nanopore sequencing. For the 48 Ontario wastewater samples, 24 h, 500 mL composite primary clarified sludge (PCS) samples were harvested from the City of Ottawa’s Robert O. Pickard Environmental Centre (Ontario, Canada). Samples were transported to the laboratory on ice and immediately processed as described by D’Aoust et al. [21]. PCS samples were concentrated by reacting 32 mL of homogenised PCS with a NaCl/polyethylene glycol (PEG) solution at a working concentration of 0.3 M NaCl and 80 mg/L PEG, while being agitated for a period of 12–17 h at 4 °C [22,23]. Afterwards, samples were centrifuged at 10,000× g for 45 min at 4 °C, then at 10,000× g for 10 min at 4 °C, with the supernatant being discarded after each run. Viral RNA was extracted from the resulting pellet using the RNeasy PowerMicrobiome Kit (Qiagen, Germantown, MD, USA) as per the manufacturer’s instructions with the following exceptions: exactly 250 mg of the resulting pellet was inputted into the extraction kit and the chloroform-phenol solution was substituted with Trizol LS reagent (ThermoFisher, Ottawa, ON, Canada). Extracted nucleic acids were eluted in 100 µL RNAse and DNAse free water and frozen at −80 °C until further processing.

The 108 Montreal and 92 Quebec City wastewater samples were collected by grab sampling, composite sampling, and passive sampling. The composite samples were collected with autosamplers, which collected wastewater every 10 min, over a 24, 48, or 72 h time period (depending on the week, day, and sampling sites). The passive samples were collected through 2 absorbent materials, gauze and negatively charged membranes, Mixed Cellulose Ester (MCE) filters, which were housed in torpedoes, also deployed over a 24, 48, or 72 h time period [24]. Grab and composite wastewater samples were additionally concentrated by filtration. Using 50 mL of wastewater, the pH was adjusted to between 3.5 and 4.5, and MgCl₂ was added to a final concentration of 25 mM. The samples were then filtered through a 0.45 μm MCE filter. The MCE filters and gauze were stored at −80 °C and all samples were processed within 72 h. RNA was extracted using the Qiagen Allprep Powerviral DNA/RNA kit. The protocol was followed according to the manufacturer, with the exception of the lysis step, where a final concentration of 10% beta-mercaptoethanol was used in the lysis buffer and incubation time was raised to 30 min at 55 °C.

RNA extracts of each sample were reverse transcribed with the NEB LunaScript^® RT SuperMix Kit and followed by targeted SARS-CoV-2 amplification using the ARTIC V3 primer scheme with NEB Q5 Hot Start High-Fidelity 2X Master Mix or amplicons prepared by Qiagen QIAseq SARS-CoV-2 Primer Panel [25]. Then, amplicon PCR products were purified and normalised based on their concentrations. Native barcoded library preparation was performed for Nanopore amplicon sequencing on a PromethION instrument, or a Nextera DNA Flex library preparation was performed for Illumina paired-end amplicon sequencing (PE150) on a NovaSeq 6000 instrument at the McGill Genome Centre. The detailed protocol can be accessed at: https://www.protocols.io/view/sars-cov-2-mcgill-nextera-flex-sequencing-protocol-14egnz586g5d (accessed on 4 December 2023) and https://www.protocols.io/view/mcgill-nanopore-native-barcoding-libprep-protocol-n92ld935xg5b (accessed on 4 December 2023).

2.2. Mutation Calling and Filtering

For Illumina sequencing data, raw reads were first trimmed using cutadapt (v2.10) and then aligned to the reference using bwa-mem (v0.7.17) [26,27]. Aligned reads were filtered using sambamba (v0.7.0) to remove paired reads with an insert size outside the 60–300 bp range, unmapped reads, and all secondary alignments [28]. Then, any remaining primer sequences were trimmed with iVar (v.1.3.4) [29]. Afterwards, a pileup was produced using Samtools (v1.9); it was then used as input for FreeBayes (v1.2.2) to create a consensus sequence and perform variant calling [30,31]. Variants were called in regions with a minimum of 10x depth, using only bases with a Q score > 20 and a minimum allele frequency of 0.2.

For Oxford Nanopore Technologies (a.k.a. Nanopore) sequencing data, raw signals were basecalled using guppy (v3.4.4) with the High-Accuracy Model (dna_r9.4.1_450bps_hac). Mutation calling for basecalled Nanopore samples was performed through the ARTIC nCoV-2019 pipeline, which includes filtering, primer trimming, mapping, polishing, and consensus generation, and medaka_haploid_variant workflow (v1.6.0) aligning to the reference genome MN908947.3 [32]. Initial comparison of mutations between the two datasets revealed that 84.1% of mutations across all samples only occurred once in the Nanopore dataset and were not present in the Illumina dataset. To reduce the amount of background error present in the Nanopore sample set, mutations were filtered based on frequency of mutations. Mutations that were not present in the Illumina dataset and occurred in less than 10 Nanopore samples were filtered out. A less stringent filtering threshold resulted in a high number of low occurring mutations remaining compared to Illumina samples, while increasing the threshold affected key mutations. Three control samples (1 negative control and 2 positive controls from AccuGenomics) were also sequenced with each method to take into account background noise removal for all samples, due to errors occurring from sample preparation to sequencing [33]. These filters allowed for a more accurate comparison of variants downstream (Supplementary Material: SARS_WW_Mutations.py). Coverage analysis to compare read depth and quality was performed with minimap2 and samtools (v1.10) coverage [30,34].

2.3. Detection of Variant Lineages in Wastewater Samples

The presence of signature and marker mutations was used to determine the presence of SARS-CoV-2 lineages in the wastewater samples (Supplementary Material: SARS_WW_Lineage.py). Signature mutations are those mutations used to define a lineage, taken from PANGO constellations, while marker mutations denote a signature mutation that is only highly prevalent in a certain lineage when compared to other variant lineages [35,36]. To confirm the presence of lineages in the wastewater samples, Freyja was used to determine relative lineage abundances in the dataset [37]. The frequency of variants between the Illumina and Nanopore datasets was compared using Barnard’s exact test due to the smaller sample sizes.

To determine the predominant lineage of single mutations, each mutation was searched for using CoV-Spectrum [38]. Using this database, the overall proportion of sequenced samples containing a mutation and the variant lineages of these samples can be established over a set time period. This method was used on mutations that were preferentially detected by a single sequencing method, either Illumina or Nanopore.

Similar to the analysis employed in COJAC, the detection of the co-occurrence of signature mutations within the same amplicon was used to compare the detection of the B.1.1.7 lineage in both datasets (Supplementary Material: SARS_WW_CoocAmp.py) [39]. Five amplicons were found to have 2–4 Alpha signature mutations co-located in a number of samples and were compared to the overall frequency of these mutations. The chi-squared test was used to compare the frequency of co-occurrences between the Illumina and Nanopore datasets.

Python (v3.9.12) was used for all statistical analysis and data visualisation [40]. Scripts relating to mutation and variant analyses can be found in Supplementary Materials.

3. Results

3.1. Wastewater Sampling and Sequencing with Short- and Long-Read Methods

We collected 248 wastewater samples from various cities in the provinces of Quebec and Ontario (Canada) and sequenced all samples with both Illumina and Nanopore sequencing. To better understand the differences in variant analyses by both sequencing technologies, we compared read statistics from the samples sequenced of either technology-derived dataset (Table 1). As expected, samples from the Illumina dataset were sequenced at a higher read depth than those from the Nanopore dataset, and, overall, the depth of coverage was 562% higher (Figure 1, Figure S1). Average base quality in the Illumina dataset was also higher, with a Q30 score corresponding to a 99.9% accuracy, compared to a Q20 average score for the Nanopore dataset, equating to a 99% accuracy [41]. However, Nanopore sequencing recovered longer reads, allowing them to span the entire amplicon length.

Table 1.

Comparison of read statistics between Illumina and Nanopore datasets. Statistics are presented with standard deviation and include the mean number of reads sequenced per sample, the mean read length across all samples, the mean depth of coverage across the SARS-CoV-2 genome for a single sample, the mean breadth of coverage across the SARS-CoV-2 genome for all samples, and the mean base quality of reads across all samples.

	Illumina	Nanopore
Mean Number of Reads/Sample	581,699 ± 66,117	21,103 ± 3653
Mean Length (bp)	102.1 ± 2.9	491.4 ± 14.6
Mean Depth (reads)	2345.9 ± 269.7	240.9 ± 42.4
Mean Breadth of Coverage (%)	32.1 ± 2.0	32.9 ± 2.0
Base Quality (Q score)	30.0 ± 0.8	19.9 ± 0.4

Open in a new tab

**Comparison of depth of coverage across the SARS-CoV-2 genome between Illumina (red) and Nanopore (blue) datasets.** Read depths of all wastewater samples within a dataset were combined to show overall coverage on a log10 scale. Horizontal lines indicate average depth across all positions.

3.2. Comparison of Mutation Frequency across All Samples

To study the SARS-CoV-2 variants present in the wastewater samples, we performed mutation calling on both datasets (Table S2). Initial analysis yielded 23,688 mutations across all samples, with 94.7% of these mutations exclusive to samples sequenced by Nanopore sequencing (Table 2). Further analysis showed that of the 23,105 total mutations in the Nanopore sample set, 86.3% of them appeared in only one sample, with many more appearing in a low number of samples as well. Given the greater base quality and higher depth of reads in the Illumina dataset, we attributed the high number of Nanopore mutations to background noise and looked to filter these mutations to provide a more accurate comparison of variants in downstream analysis. Due to the lower coverage in individual samples from the Nanopore dataset, filtering mutations on a sample-by-sample basis led to an overcorrection and signature mutations of VOCs were filtered out. Using the procedures described in the Materials and Methods section, filtering was completed on the overall Nanopore sample set to provide a more comprehensive set of mutations to perform variant analysis (Table S3). This gave us a more definitive comparison of the frequency of mutations between the Illumina and Nanopore datasets (Figure 2).

Table 2.

Number of mutations across all SARS-CoV-2 wastewater samples. Counts were recorded after variant calling all samples as well as after quality filtering the datasets. The number of mutations that are exclusively found within a dataset (Illumina or Nanopore) are also noted, with a significant reduction in exclusively Nanopore mutations that can be attributed to background noise.

Initial Analysis
Number of Mutations		Exclusive
Total	23,688	N/A
Illumina	1259	583
Nanopore	23,105	22,429
After Filtering
Number of Mutations		Exclusive
Total	1259	N/A
Illumina	1249	534
Nanopore	725	10

Open in a new tab

**Frequency of mutations across all SARS-CoV-2 wastewater samples.** Each bar represents a mutation along the SARS-CoV-2 genome, with red and blue bars indicating the number of samples that exclusively contain that mutation within the Illumina and Nanopore datasets, respectively, and green bars indicating the number of samples in which the mutations occur in both datasets.

To better visualise and compare the frequency of mutations between both methods, we divided the mutations based on the number of samples they were detected in (Figure 3). From there, we showed the number of common and unique sites (percentage), defined by the sum of the number of samples detecting each mutation in the bin. All mutations detected in over 60 samples were signature mutations of VOCs. Most mutations were detected in under 10 samples and were detected in similar frequency between the two sequencing methods, after filtering. No mutations were detected in only 34–59 samples. A23403G was detected in 105 samples, which was the most frequent among all mutations. As low occurring Nanopore-specific mutations were filtered out, a higher ratio of mutations occurring in under 10 samples, which make up the majority of mutations, were detected by Illumina compared to other frequencies (Figure S1).

**Comparison of mutation detection between sequencing tools based on frequency.** Mutations were binned based on the total number of samples each was detected in, with the number of mutations in each group reported in the attached table. Percentage of sites (y-axis) refers to the fraction of the sum of the number of samples detecting each mutation in that particular bin. Red and blue bars represent sites exclusively detected by Illumina and Nanopore, respectively, while green bars represent common sites.

3.3. Use of Signature and Marker Mutations to Detect Lineages

Using the mutations obtained above, we were able to determine the presence of various lineages in the wastewater samples. First, we sought to identify the variants of concern (VOCs) and variants of interest (VOIs) that were present at the time of sampling. These variants were assigned a lineage using PANGO nomenclature [3]. Using the approach presented by N’Guessan et al., we determined the presence of a major variant in a sample by the occurrence of at least three signature mutations and one marker mutation [36].

As expected for the time period, the Alpha variant (B.1.1.7) was the most abundant among both the Illumina and Nanopore datasets, with an increased identification of variants by combining both datasets together (Figure 4). Of the nine VOCs/VOIs detected in the overall population, Nanopore identified five of them (Alpha, Gamma, Delta, Zeta, Eta) and all in equal or higher frequency compared to the Illumina dataset. Illumina detected all nine variants across the population, including the four (Beta, Theta, Lambda, Mu) not identified by Nanopore. Two variants had a significant difference in frequency between the Illumina and Nanopore datasets: Nanopore sequencing detected the Alpha variant in significantly more samples (p = 0.0027), whilst Illumina detected the Beta variant in significantly more samples (p = 0.022).

**Relative abundance of VOCs/VOIs in SARS-CoV-2 wastewater samples.** Using a combination of Freyja and the detection of signature and marker mutations [8], the variants found within each sample were identified. Green bars indicate the number of samples containing each variant, as detected with either sequencing method, compared to variants detected only in the Illumina (red) or Nanopore (blue) dataset. Barnard’s exact test: ##: p < 0.01, #: p < 0.05, ns: p > 0.05.

3.4. Predominant Lineages of Key Mutations from Each Dataset

We compared the lineages that were preferentially detected by each sequencing method by looking at the five most frequent mutations occurring within a dataset that were found in low occurrence with the other sequencing method (Table 3). Using CoV-Spectrum, we searched for all lineages in which these mutations are present and identified the most predominant lineage among all consensus sequences at the time of sampling and over time [38].

Table 3.

Predominant lineages of mutations preferentially detected by a sequencing method. Predominant lineage indicates that, of all samples containing the mutation, the indicated lineage is present in the highest percentage of samples.

Mutation	Number of Samples Illumina	Number of Samples Nanopore	Predominant Lineage
T12058C	20	7	B.1.177
A28254C	16	0	B.1.351 (Beta)
C10789T	14	2	B.1.375
G29692T	10	2	B.1.1.284
C25904T	10	1	B.1.351 (Beta)
A24389C	0	32	B.1.1.7 (Alpha)
C23271A	4	23	B.1.1.7 (Alpha)
G24914C	4	22	B.1.1.7 (Alpha)
T8022G	0	18	B.1.617.2 (Delta)
C23709T	3	16	B.1.1.7 (Alpha)

Open in a new tab

Two mutations preferentially detected by Illumina sequencing, A28254C and C25904T, were predominantly found in samples containing B.1.351 (Beta) lineage. The Beta variant was present in 50% of samples containing mutation A28254C at the time of sampling but then decreased to 20% of samples as the mutation is present in currently circulating variants. Mutation C25904T is considered a signature mutation of the Beta variant, though it was present in a number of other variants circulating at the time of sampling. The Beta variant was present in 64% of samples containing this mutation at the time of sampling and then decreased in presence to 41%, while the mutation itself is still present in variants today, though largely decreased. Of the samples containing mutation T12058C, 70% contained the B.1.177 lineage at the time of sequencing, a variant that peaked in transmission in late 2020. Currently, multiple lineages contain this mutation, including some circulating today, with no significantly predominant lineage. The B.1.375 lineage was most prevalent in samples containing mutation C10789T, with a presence in 33% of samples at the time of sampling. Though this variant reached peak transmissibility in late 2020, the mutation is still present in a number of variants today. Samples containing mutation G29692T had a predominant lineage of B.1.1.284 at 38%, though the lineage B.1.1.176 was also highly present at 37%, at the time of sampling. Overall, however, the B.1.1.284 lineage increased its predominance with a presence in 44% of all samples, while B.1.1.176 is only present in 10% of samples. This mutation is still present in circulating variants but at a decreased presence compared to the time of sampling.

Of the five mutations analysed that were preferentially detected by Nanopore sequencing, three are considered signature mutations of the B.1.1.7 (Alpha) variant: C23271A, G24914C, and C23709T. For all samples containing any of these mutations, the Alpha variant was the predominant lineage, both at time of sampling and overall, with a 97–98% presence. Following this, the presence of these mutations is similar to the presence of the Alpha variant in the population and thus is non-existent today. The Alpha variant was also predominant in samples containing mutation A24389C, with a 50% presence at the time of sampling, which then decreased slightly to 39% to date. Like the three signature mutations, the presence of this mutation in samples decreased proportionately with the presence of the Alpha variant. The mutation T8022G was sequenced in a low number of samples overall, with the majority of them containing the B.1.617.2 (Delta) lineage. At the time of sampling, the Delta lineage was present in 22% of samples containing the mutation and then rose to 34% presence.

3.5. Identifying Co-Occurrence of Mutations within the Same Amplicon

As demonstrated by Jahn et al., the analysis of the co-occurrence of mutations within the same amplicon is useful for the early detection of SARS-CoV-2 variants in wastewater, compared to clinical testing [39]. Overall, Nanopore sequencing detected more mutations co-occurring in the same amplicon, or the adjacent amplicons, compared to Illumina (Figure S2). To compare the sequencing rate of the co-occurrence of mutations between both sequencing methods, we focused on the signature mutations of the Alpha variant as this variant contains a number of co-occurrences in multiple amplicons (Table S4). We presented the number of co-occurrences compared to the overall frequency of the mutations within each dataset to determine the occurrence rate of these mutations (Figure 5). There were five amplicons in which the co-occurrence of mutations occurred in the Alpha variant: amplicons 77, 78, 92, 93, and 95. In all amplicons, Nanopore detected these co-occurrences at an equal or higher frequency than in the equivalent samples sequenced by Illumina. This difference was significant in three amplicons: amplicon 77 (p = 0.0018), amplicon 78 (p = 0.0076), and amplicon 93 (p = 0.049). However, both Illumina and Nanopore detected the co-occurrence of mutations in almost all samples in which the lesser frequent mutation in the amplicon was present.

**Co-occurrence of B.1.1.7 (Alpha) signature mutations within the same amplicon.** Dark red and blue bars indicate the number of samples containing the co-occurrence in the Illumina and Nanopore datasets, respectively, while light bars show the overall frequency of the mutation. * Mutation A28111G is located in both Amplicon 92 and 93. Chi-squared test: **: p < 0.01, *: p < 0.05, ns: p > 0.05.

Looking at mutations occurring sequentially, GAT28280CTA and GGG28881AAC, all three single nucleotide changes occurred at an equal or similar frequency compared to the non-sequential signature mutation in the amplicon. Mutations at positions 28,280–28,282 occurred in eight samples sequenced by Illumina, in which a co-occurrence without mutation A28111G was detected in all eight. Of the seven samples including the non-sequential mutation, the detection rate remained at 100%. Conversely, mutation A28111G was detected in a higher number of Nanopore samples than mutations in positions 28,280–28,282 (23 samples vs. 16 samples (28,281 and 28,282) and 18 samples (28,280)), of which only 15 samples were detected with the co-occurrence both with and without A28111G. In amplicon 95, mutation C28977T was sequenced at a much lower frequency than mutations at positions 28,881–28,883 in both datasets. Mutations at positions 28,881–28,883 were the only Alpha signature mutations sequenced at a higher frequency in the Illumina sample set than the Nanopore sample set. Both sequencing methods detected the co-occurrence of the three mutations in all samples in which the lowest frequent mutation was present, though all three mutations occurred in a very similar frequency.

4. Discussion

Advances in next-generation sequencing have provided a breakthrough in detecting SARS-CoV-2 and tracking evolving variant lineages during the COVID-19 pandemic [8]. Using various methods, we compared the efficacy of Illumina and Nanopore sequencing at detecting variants and the advantages of using one sequencing method over the other. Major differences in the reads between the Nanopore and Illumina datasets included a higher depth and accuracy using Illumina sequencing, which is to be expected for typical RNA sequencing [19]. This meant a greater amount of filtering was needed on samples sequenced by Nanopore in order to make a more meaningful comparison. However, Illumina’s reads did not span the length of the amplicons used for sequencing, which means that inference was needed to link single mutations in order to detect variants. Nanopore sequencing provides more direct evidence of variants in which their mutations occur along the same amplicon, as is the case for B.1.1.7. Looking at variants present in the wastewater samples, as well as key mutations differing between the two sequencing methods, we were able to determine a pattern in the differences between the Illumina and Nanopore sequencing. Using this, we present a workflow that integrates data from the two sequencing technologies to obtain more comprehensive detection of SARS-CoV-2 variants in wastewater.

Using signature and marker mutations, we were able to identify VOCs and VOIs present within the Illumina and Nanopore datasets. A robust tool used during the pandemic is Freyja, which recovers relative lineage abundances from mixed SARS-CoV-2 samples such as wastewater [37]. For example, Freyja was used to perform VOC per sample analysis on wastewater samples as part of an environmental surveillance study in Malawi, which has limited COVID-19 testing capacity and no formal sewage systems [42]. Due to the lower coverage spread in our wastewater samples, Freyja identified a high number of “other” variants, which made it difficult to clearly visualise variant abundances. However, we were able to utilise the tool to confirm the presence of VOCs that we had identified using signature mutations. Our method also allowed us to focus on the co-occurrence of mutations within amplicons, as well as mutations with a high discordance in frequency between datasets. Nanopore sequencing detected five VOCs/VOIs at an equal or higher frequency than in the Illumina samples: Alpha, Gamma, Zeta, Eta, and Delta. Illumina was able to detect these variants with some overlap in samples, thereby increasing the frequency of samples containing a variant when taking into account either sequencing method. Conversely, Illumina detected four VOCs/VOIs at a higher frequency than in the Nanopore sequenced samples (Beta, Theta, Lambda, and Mu), as Nanopore sequencing did not detect these variants in any samples. Alpha, Gamma, and Delta variants were the most prevalent in the area, while the Beta variant detected by Illumina had a low presence compared to the variants detected by the Nanopore sequencing [43]. Aside from the Alpha and Beta variants, the low frequency of samples containing variants led to an insignificant difference between datasets. The detection of low frequent variants can be due to the high accuracy and higher coverage of the Illumina sequencing. We showed that Illumina detected individual mutations of lesser frequent and relatively unknown variants at a higher frequency than Nanopore sequencing due to higher accuracy and sequencing depth, while Nanopore preferentially detected mutations that were most predominant in highly frequent variants such as the Alpha variant.

The co-occurrence of mutations in the same amplicon is a more robust indicator of the presence of a variant in wastewater samples [39]. As Nanopore was able to sequence reads spanning the entire amplicon, the frequency of mutations can be used to provide more direct evidence of variants. The longer reads ensured there were no coverage dips within amplicons, while gaps in coverage may occur among Illumina’s sequencing reads. Therefore, there is a greater chance of Nanopore sequencing detecting all co-occurring mutations within the same read compared to Illumina, increasing the detection of variants such as Alpha. Our data showed a similar frequency in mutations in the same amplicon in many instances, and co-occurrence with other signature mutations occurred in a high number of samples where a mutation was present. While the use of co-occurrences is not useful for all lineages, particularly in those where mutations are not close enough to be sequenced in the same amplicon, it can provide earlier detection for variants where this occurs. Similarly, a high frequency of co-occurrences within a sample can provide a benchmark for new lineages. The increased frequency of co-occurrences in Nanopore compared to Illumina highlights its advantages and can explain why it detects Alpha better. While the presence of mutations in the same amplicon across the population may be inferred to have the presence of a variant, it is only through long-read sequencing that we can resolve multiple mutations in the same read.

We present a workflow that combines Illumina and Nanopore wastewater sequencing data, improving the detection of SARS-CoV-2 variants (Figure 6). As shown above, both sequencing technologies detected variants at different frequencies depending on their abundance in the wastewater as well as specific mutations in the variant. By integrating both datasets, we can use the strengths of each sequencing method to increase the number of variants found within a sample. A majority of the workflow remains the same as workflows involving SARS-CoV-2 variant analysis using one sequencing technology. The same wastewater collection protocol can be used, as well as RNA extraction and SARS-CoV-2 amplification. Separate amplicon PCR products from the same wastewater sample will go through library preparation with either Illumina or Nanopore protocols and will therefore be sequenced with the respective instruments. Downstream processing of raw sequencing data, such as trimming and alignment, and mutational calling will occur separately between the Illumina and Nanopore datasets, using readily available tools. Mutations from each dataset can then be integrated into one dataset and may be filtered as a group depending on quality control samples and background noise, which was particularly present in the Nanopore samples. Finally, the integrated dataset can be used for SARS-CoV-2 variant analysis using the usual tools developed during the pandemic.

**General workflow to integrate Illumina and Nanopore wastewater sequencing data.** Steps include: (A) collection of wastewater from target location; (B) RNA extraction from wastewater and amplification of the SARS-CoV-2 genome; (C) sequencing of the prepared sample on both Illumina and Nanopore instruments, followed by downstream processing and mutation calling using their respective tools; (D) combining detected mutations from both datasets, with background noise removal; (E) SARS-CoV-2 variant analysis using combined dataset.

When tracking the spread and evolution of SARS-CoV-2 variants using wastewater surveillance, Nanopore lacked the accuracy and depth to provide a full picture when compared to Illumina sequencing. While the sequencing turnaround is quicker, the lack of detection of certain variants can lead to a lag in public health response. Improved single read accuracy in Nanopore sequencing, as indicated using the recently released R10.4 chemistry, may improve the detection of less frequent variants and detection of unknown variants. The current Nanopore chemistry allows for over 99% modal read accuracy, corresponding to a Phred score of Q21, while the chemistry used in this study had a lower Phred score of Q17 [44]. Although Nanopore sequencing is still limited by its lower throughput compared to Illumina, Oxford Nanopore is continuously improving its high-throughput devices such as the PromethION. This study was also limited by the availability of data. Analysis was completed on basecalled Nanopore data as the raw signal data were unavailable. As there have been vast improvements in basecalling technology from Nanopore, using newer basecalling technology on the raw data may also lead to improved single read accuracy with increased performance [45,46]. Furthermore, looking at the raw signal allows us to analyse samples without the bias of a basecaller, using tools such as nanopolish [47,48].

While analysis was completed on samples taken at the height of the pandemic, during a peak in a major VOC, future studies may be performed during periods of low virulence. This may be able to further differentiate the sequencing methods in detecting new mutations and variants without a known lineage. Our results show that combining short- and long-read sequencing improves the detection of variant lineages in mixed population samples by providing earlier evidence of variants and increased detection of unknown lineages.

Acknowledgments

We acknowledge the bioinformatics support from the Canadian Centre for Computational Genomics (C3G), a platform at the McGill Genome Centre and Victor Phillip Dahdaleh Institute of Genomic Medicine. This research was supported in part through the computational resources and services provided by Advanced Research Computing at the University of British Columbia.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/v16091495/s1: Figure S1: Sequencing depth of mutations along SARS-CoV-2 genome; Figure S2: Frequency of mutations co-occurring in different amplicons; Table S1: SARS-CoV-2 wastewater sample information; Table S2: Raw mutation data from SARS-CoV-2 wastewater samples; Table S3: Filtered mutation data from SARS-CoV-2 wastewater samples; Table S4: Co-occurrence of mutations within the same amplicon; SARS_WW_Mutations.py; SARS_WW_Lineage.py; SARS_WW_CoocAmp.py.

viruses-16-01495-s001.zip^{(1.8MB, zip)}

Author Contributions

Conceptualization, S.K.L., P.A.V., S.D., R.D., B.J.S., D.F., J.R. and T.P.S.; methodology, G.J. and T.P.S.; software, G.J. and A.N.; validation, G.J., J.-L.L. and J.H.G.; formal analysis, G.J.; investigation, G.J., J.-L.L., S.J.R., S.C., S.L., S.-H.C., A.T., F.S.-Q., E.G. and M.H.; resources, G.J., J.-L.L., J.H.G., S.J.R., S.C., A.N., A.T., F.S.-Q., T.M., E.G., M.H., S.K.L., P.A.V., S.D., R.D., B.J.S., D.F., J.R. and T.P.S.; data curation, G.J., J.H.G., S.C., A.T., F.S.-Q., T.M., E.G., M.H. and E.M.; writing—original draft preparation, G.J.; writing—review and editing, G.J., J.-L.L., S.C., P.A.V., S.D., R.D., B.J.S., D.F., J.R. and T.P.S.; visualization, G.J.; supervision, J.R. and T.P.S.; project administration, J.R. and T.P.S.; funding acquisition, S.K.L., P.A.V., S.D., R.D., B.J.S., D.F., J.R. and T.P.S. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Raw sequencing data presented in the study are openly available in the NCBI database under BioProject ID PRJNA1127571 (https://www.ncbi.nlm.nih.gov/bioproject/1127571) (accessed on 4 December 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

Funding Statement

This study was supported by the Coronavirus Variants Rapid Response Network (CoVaRR-Net). CoVaRR-Net is funded by an operating grant from the Canadian Institutes of Health Research (CIHR)—Instituts de recherche en santé du Canada (FRN# 175622). Wastewater sampling in Québec was supported by the Fond de la Recherche du Québec—Nature et Technologie (COVID-19 Projets spéciaux—Eaux usées), the Trottier Family Foundation, and the Molson Foundation. Further support of the work was provided by the Canada Foundation of Innovation special opportunities grant #41012 to J.R. and by the Canada Foundation for Innovation grants CFI 33406 and CFI-MSI 35444 to J.R.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

References

1.Amicone M., Borges Alves M.J., Isidro J., Ze-Ze L., Duarte S., Vieira L., Guiomar R., Gomes J.P., Gordo I. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol. Med. Public Health. 2022;10:142–155. doi: 10.1093/emph/eoac010. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Updated Working Definitions and Primary Actions for SARS-CoV-2 Variants. [(accessed on 30 January 2024)]. Available online: https://www.who.int/publications/m/item/updated-working-definitions-and-primary-actions-for--sars-cov-2-variants.
3.Rambaut A., Holmes E.C., O’Toole Á., Hill V., McCrone J.T., Ruis C., du Plessis L., Pybus O.G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020;5:1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Polo D., Baluja-Quintela M., Corbishley A., Jones D.L., Singer A.C., Graham D.W., Romalde J.L. Making waves: Wastewater-based epidemiology for COVID-19—Approaches and challenges for surveillance and prediction. Water Res. 2020;186:116404. doi: 10.1016/j.watres.2020.116404. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Xiao A., Wu F., Bushman M., Zhang J., Imakaev M., Chai P.R., Duvallet C., Endo N., Erickson T.B., Armas F., et al. Metrics to relate COVID-19 wastewater data to clinical testing dynamics. medRxiv. 2021 doi: 10.1016/j.watres.2022.118070. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kallem P., Hegab H.M., Alsafar H., Hasan S.W., Banat F. SARS-CoV-2 detection and inactivation in water and wastewater: Review on analytical methods, limitations and future research recommendations. Emerg. Microbes Infect. 2023;12:2222850. doi: 10.1080/22221751.2023.2222850. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Chen X., Kang Y., Luo J., Pang K., Xu X., Wu J., Li X., Jin S. Next-Generation Sequencing Reveals the Progression of COVID-19. Front. Cell. Infect. Microbiol. 2021;11:632490. doi: 10.3389/fcimb.2021.632490. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.John G., Sahajpal N.S., Mondal A.K., Ananth S., Williams C., Chaubey A., Rojiani A.M., Kolhe R. Next-Generation Sequencing (NGS) in COVID-19: A Tool for SARS-CoV-2 Diagnosis, Monitoring New Strains and Phylodynamic Modeling in Molecular Epidemiology. Curr. Issues Mol. Biol. 2021;43:845–867. doi: 10.3390/cimb43020061. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Chiara M., D’Erchia A.M., Gissi C., Manzari C., Parisi A., Resta N., Zambelli F., Picardi E., Pavesi G., Horner D.S., et al. Next generation sequencing of SARS-CoV-2 genomes: Challenges, applications and opportunities. Brief. Bioinform. 2020:bbaa297. doi: 10.1093/bib/bbaa297. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Slatko B.E., Gardner A.F., Ausubel F.M. Overview of Next Generation Sequencing Technologies. Curr. Protoc. Mol. Biol. 2018;122:e59. doi: 10.1002/cpmb.59. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Explore Illumina Sequencing Technology. [(accessed on 30 January 2024)]. Available online: https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html.
12.Nanopore DNA Sequencing. [(accessed on 30 January 2024)]. Available online: https://nanoporetech.com/platform/technology.
13.Szoboszlay M., Schramm L., Pinzauti D., Scerri J., Sandionigi A., Biazzo M. Nanopore is Preferable over Illumina for 16S Amplicon Sequencing of the Gut Microbiota When Species-Level Taxonomic Classification, Accurate Estimation of Richness or Focus on Rare Taxa Is Required. Microorganisms. 2023;11:804. doi: 10.3390/microorganisms11030804. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Linde J., Brangsch H., Holzer M., Thomas C., Elschner M.C., Melzer F., Tomasco H. Comparison of Illumina and Oxford Nanopore Technology for genome analysis of Francisella tularensis, Bacillus anthracis, and Brucella suis. BMC Genom. 2023;24:258. doi: 10.1186/s12864-023-09343-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Pecman A., Adams I., Gutierrez-Aguirre I., Fox A., Boonham N., Ravnikar M., Kutnjak D. Systematic Comparison of Nanopore and Illumina Sequencing for the Detection of Plant Viruses and Viroids Using Total RNA Sequencing Approach. Front. Microbiol. 2022;13:883921. doi: 10.3389/fmicb.2022.883921. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Stoler N., Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom. Bioinform. 2021;3:lqab019. doi: 10.1093/nargab/lqab019. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wang Y., Zhao Y., Bollas A., Wang Y., Au K.F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 2021;39:1348–1365. doi: 10.1038/s41587-021-01108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Tshiabuila D., Giandhari J., Pillay S., Ramphal U., Ramphal Y., Maharaj A., Anyaneji U.J., Naidoo Y., Tegally H., San E.J., et al. Comparison of SARS-CoV-2 sequencing using the ONT GridION and the Illumina MiSeq. BMC Genom. 2022;23:319. doi: 10.1186/s12864-022-08541-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Carbo E.C., Mourik K., Boers S.A., Munnink B.O., Nieuwenhuijse D., Jonges M., Welkers M.R.A., Matamoros S., Slooten J.v.H.T., Kraakman M.E.M., et al. A comparison of five Illumina, Ion Torrent and nanopore sequencing technology-based approaches for whole genome sequencing of SARS-CoV-2. Eur. J. Clin. Microbiol. Infect. Dis. 2023;42:701–713. doi: 10.1007/s10096-023-04590-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Garcia-Pedemonte D., Carcereny A., Gregori J., Quer J., Garcia-Cehic D., Guerrero L., Cereto-Massague A., Abid I., Bosch A., Costafreda M.I., et al. Comparison of Nanopore and Synthesis-Based Next-Generation Sequencing Platforms for SARS-CoV-2 Variant Monitoring in Wastewater. Int. J. Mol. Sci. 2023;24:17184. doi: 10.3390/ijms242417184. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.D’Aoust P.M., Graber T.E., Mercier E., Montpetit D., Alexandrov I., Neault N., Baig A.T., Mayne J., Zhang X., Alain T., et al. Catching a resurgence: Increase in SARS-CoV-2 viral RNA identified in wastewater 48h before COVID-19 clinical tests and 96h before hospitalizations. Sci. Total Environ. 2021;770:145319. doi: 10.1016/j.scitotenv.2021.145319. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Hovi T., Stenvik M., Partanen H., Kangas A. Poliovirus surveillance by examining sewage specimens. Quantitative recovery of virus after introduction into sewerage at remote upstream location. Epidemiol. Infect. 2001;127:101–106. doi: 10.1017/S0950268801005787. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Petterson S., Grondahl-Rosado R., Nilsen V., Myrmel M., Robertson L.J. Variability in the recovery of a virus concentration procedure in water: Implications for QMRA. Water Res. 2015;87:79–86. doi: 10.1016/j.watres.2015.09.006. [DOI] [PubMed] [Google Scholar]
24.Schang C., Crosbie N.D., Nolan M., Poon R., Wang M., Jex A., John N., Baker L., Scales P., Schmidt J., et al. Passive Sampling of SARS-CoV-2 for Wastewater Surveillance. Inviron. Sci. Technol. 2021;55:10432–10441. doi: 10.1021/acs.est.1c01530. [DOI] [PubMed] [Google Scholar]
25.ARTIC Network—SARS-CoV-2. [(accessed on 20 November 2023)]. Available online: https://artic.network/ncov-2019.
26.Martin M. Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
27.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013 doi: 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
28.Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Grubaugh N.D., Gangavarapu K., Quick J., Matteson N.L., Goes De Jesus J., Main B.J., Tan A.L., Paul L.M., Brackney D.E., Grewal S., et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20:8. doi: 10.1186/s13059-018-1618-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Garrison E., Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012 doi: 10.48550/arXiv.1207.3907. [DOI] [Google Scholar]
32.nanoporetech/medaka. [(accessed on 24 November 2023)]. Available online: https://github.com/nanoporetech/medaka.
33.Accukit SARS-CoV-2. [(accessed on 24 November 2023)]. Available online: https://accugenomics.com/accukit-sars-cov-2/
34.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.CoV-lineages/constellations. [(accessed on 24 November 2023)]. Available online: https://github.com/cov-lineages/constellations.
36.N’Guessan A., Tsitouras A., Sanchez-Quete F., Goitom E., Reiling S.J., Galvez J.H., Nguyen T.L., Nguyen H.T.L., Visentin F., Hachad M., et al. Detection of prevalent SARS-CoV-2 variant lineages in wastewater and clinical sequences from cities in Quebec, Canada. medRxiv. 2022 doi: 10.1101/2022.02.01.22270170. [DOI] [Google Scholar]
37.Karthikeyan S., Levy J.I., De Hoff P., Humphry G., Birmingham A., Jepsen K., Farmer S., Tubb H.M., Valles T., Tribelhorn C.E., et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature. 2022;609:101–108. doi: 10.1038/s41586-022-05049-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Chen C., Nadeau S., Yared M., Voinov P., Ning X., Roemer C., Stadler T. CoV-Spectrum: Analysis of globally shared SARS-CoV-2 data to Identify and Characterize New Variants. Bioinformatics. 2021;38:1735–1737. doi: 10.1093/bioinformatics/btab856. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Jahn K., Dreifuss D., Topolsky I., Kull A., Ganesanandamoorthy P., Fernandez-Cassi X., Banziger C., Devaux A.J., Stachler E., Caduff L., et al. Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC. Nat. Microbiol. 2022;7:1151–1160. doi: 10.1038/s41564-022-01185-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Van Rossum G., Drake F.L. Python 3 Reference Manual. CreateSpace; Scotts Valley, CA, USA: 2009. [Google Scholar]
41.Quality Scores for Next-Generation Sequencing. [(accessed on 4 December 2023)]. Available online: https://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf.
42.Barnes K.G., Levy J.I., Gauld J., Rigby J., Kanjerwa O., Uzzell C.B., Chilupsya C., Anscombe C., Tomkins-Tinch C., Mbeti O., et al. Utilizing river and wastewater as a SARS-CoV-2 surveillance tool in settings with limited formal sewage systems. Nat. Commun. 2023;14:7883. doi: 10.1038/s41467-023-43047-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Tracking Variants of the Novel Coronavirus in Canada. [(accessed on 23 January 2024)]. Available online: https://www.ctvnews.ca/health/coronavirus/tracking-variants-of-the-novel-coronavirus-in-canada-1.5296141.
44.Ni Y., Liu X., Simeneh Z.M., Yang M., Li R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput. Struct. Biotechnol. J. 2023;21:2352–2364. doi: 10.1016/j.csbj.2023.03.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.nanoporetech/dorado. [(accessed on 23 January 2024)]. Available online: https://github.com/nanoporetech/dorado.
46.Benchmarking the Oxford Nanopore Technologies basecallers on AWS. [(accessed on 23 January 2024)]. Available online: https://aws.amazon.com/blogs/hpc/benchmarking-the-oxford-nanopore-technologies-basecallers-on-aws/
47.Ferguson S., McLay T., Andrew R.L., Bruhl J.J., Schwessinger B., Borevitz J., Jones A. Species-specific basecallers improve actual accuracy of nanopore sequencing plants. Plant Methods. 2022;18:137. doi: 10.1186/s13007-022-00971-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.jts/nanopolish. [(accessed on 23 January 2024)]. Available online: https://github.com/jts/nanopolish.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

viruses-16-01495-s001.zip^{(1.8MB, zip)}

Data Availability Statement

Raw sequencing data presented in the study are openly available in the NCBI database under BioProject ID PRJNA1127571 (https://www.ncbi.nlm.nih.gov/bioproject/1127571) (accessed on 4 December 2023).

[B1-viruses-16-01495] 1.Amicone M., Borges Alves M.J., Isidro J., Ze-Ze L., Duarte S., Vieira L., Guiomar R., Gomes J.P., Gordo I. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol. Med. Public Health. 2022;10:142–155. doi: 10.1093/emph/eoac010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2-viruses-16-01495] 2.Updated Working Definitions and Primary Actions for SARS-CoV-2 Variants. [(accessed on 30 January 2024)]. Available online: https://www.who.int/publications/m/item/updated-working-definitions-and-primary-actions-for--sars-cov-2-variants.

[B3-viruses-16-01495] 3.Rambaut A., Holmes E.C., O’Toole Á., Hill V., McCrone J.T., Ruis C., du Plessis L., Pybus O.G. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 2020;5:1403–1407. doi: 10.1038/s41564-020-0770-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4-viruses-16-01495] 4.Polo D., Baluja-Quintela M., Corbishley A., Jones D.L., Singer A.C., Graham D.W., Romalde J.L. Making waves: Wastewater-based epidemiology for COVID-19—Approaches and challenges for surveillance and prediction. Water Res. 2020;186:116404. doi: 10.1016/j.watres.2020.116404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B5-viruses-16-01495] 5.Xiao A., Wu F., Bushman M., Zhang J., Imakaev M., Chai P.R., Duvallet C., Endo N., Erickson T.B., Armas F., et al. Metrics to relate COVID-19 wastewater data to clinical testing dynamics. medRxiv. 2021 doi: 10.1016/j.watres.2022.118070. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6-viruses-16-01495] 6.Kallem P., Hegab H.M., Alsafar H., Hasan S.W., Banat F. SARS-CoV-2 detection and inactivation in water and wastewater: Review on analytical methods, limitations and future research recommendations. Emerg. Microbes Infect. 2023;12:2222850. doi: 10.1080/22221751.2023.2222850. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7-viruses-16-01495] 7.Chen X., Kang Y., Luo J., Pang K., Xu X., Wu J., Li X., Jin S. Next-Generation Sequencing Reveals the Progression of COVID-19. Front. Cell. Infect. Microbiol. 2021;11:632490. doi: 10.3389/fcimb.2021.632490. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8-viruses-16-01495] 8.John G., Sahajpal N.S., Mondal A.K., Ananth S., Williams C., Chaubey A., Rojiani A.M., Kolhe R. Next-Generation Sequencing (NGS) in COVID-19: A Tool for SARS-CoV-2 Diagnosis, Monitoring New Strains and Phylodynamic Modeling in Molecular Epidemiology. Curr. Issues Mol. Biol. 2021;43:845–867. doi: 10.3390/cimb43020061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9-viruses-16-01495] 9.Chiara M., D’Erchia A.M., Gissi C., Manzari C., Parisi A., Resta N., Zambelli F., Picardi E., Pavesi G., Horner D.S., et al. Next generation sequencing of SARS-CoV-2 genomes: Challenges, applications and opportunities. Brief. Bioinform. 2020:bbaa297. doi: 10.1093/bib/bbaa297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10-viruses-16-01495] 10.Slatko B.E., Gardner A.F., Ausubel F.M. Overview of Next Generation Sequencing Technologies. Curr. Protoc. Mol. Biol. 2018;122:e59. doi: 10.1002/cpmb.59. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B11-viruses-16-01495] 11.Explore Illumina Sequencing Technology. [(accessed on 30 January 2024)]. Available online: https://www.illumina.com/science/technology/next-generation-sequencing/sequencing-technology.html.

[B12-viruses-16-01495] 12.Nanopore DNA Sequencing. [(accessed on 30 January 2024)]. Available online: https://nanoporetech.com/platform/technology.

[B13-viruses-16-01495] 13.Szoboszlay M., Schramm L., Pinzauti D., Scerri J., Sandionigi A., Biazzo M. Nanopore is Preferable over Illumina for 16S Amplicon Sequencing of the Gut Microbiota When Species-Level Taxonomic Classification, Accurate Estimation of Richness or Focus on Rare Taxa Is Required. Microorganisms. 2023;11:804. doi: 10.3390/microorganisms11030804. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B14-viruses-16-01495] 14.Linde J., Brangsch H., Holzer M., Thomas C., Elschner M.C., Melzer F., Tomasco H. Comparison of Illumina and Oxford Nanopore Technology for genome analysis of Francisella tularensis, Bacillus anthracis, and Brucella suis. BMC Genom. 2023;24:258. doi: 10.1186/s12864-023-09343-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15-viruses-16-01495] 15.Pecman A., Adams I., Gutierrez-Aguirre I., Fox A., Boonham N., Ravnikar M., Kutnjak D. Systematic Comparison of Nanopore and Illumina Sequencing for the Detection of Plant Viruses and Viroids Using Total RNA Sequencing Approach. Front. Microbiol. 2022;13:883921. doi: 10.3389/fmicb.2022.883921. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16-viruses-16-01495] 16.Stoler N., Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom. Bioinform. 2021;3:lqab019. doi: 10.1093/nargab/lqab019. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17-viruses-16-01495] 17.Wang Y., Zhao Y., Bollas A., Wang Y., Au K.F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 2021;39:1348–1365. doi: 10.1038/s41587-021-01108-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18-viruses-16-01495] 18.Tshiabuila D., Giandhari J., Pillay S., Ramphal U., Ramphal Y., Maharaj A., Anyaneji U.J., Naidoo Y., Tegally H., San E.J., et al. Comparison of SARS-CoV-2 sequencing using the ONT GridION and the Illumina MiSeq. BMC Genom. 2022;23:319. doi: 10.1186/s12864-022-08541-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19-viruses-16-01495] 19.Carbo E.C., Mourik K., Boers S.A., Munnink B.O., Nieuwenhuijse D., Jonges M., Welkers M.R.A., Matamoros S., Slooten J.v.H.T., Kraakman M.E.M., et al. A comparison of five Illumina, Ion Torrent and nanopore sequencing technology-based approaches for whole genome sequencing of SARS-CoV-2. Eur. J. Clin. Microbiol. Infect. Dis. 2023;42:701–713. doi: 10.1007/s10096-023-04590-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20-viruses-16-01495] 20.Garcia-Pedemonte D., Carcereny A., Gregori J., Quer J., Garcia-Cehic D., Guerrero L., Cereto-Massague A., Abid I., Bosch A., Costafreda M.I., et al. Comparison of Nanopore and Synthesis-Based Next-Generation Sequencing Platforms for SARS-CoV-2 Variant Monitoring in Wastewater. Int. J. Mol. Sci. 2023;24:17184. doi: 10.3390/ijms242417184. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21-viruses-16-01495] 21.D’Aoust P.M., Graber T.E., Mercier E., Montpetit D., Alexandrov I., Neault N., Baig A.T., Mayne J., Zhang X., Alain T., et al. Catching a resurgence: Increase in SARS-CoV-2 viral RNA identified in wastewater 48h before COVID-19 clinical tests and 96h before hospitalizations. Sci. Total Environ. 2021;770:145319. doi: 10.1016/j.scitotenv.2021.145319. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22-viruses-16-01495] 22.Hovi T., Stenvik M., Partanen H., Kangas A. Poliovirus surveillance by examining sewage specimens. Quantitative recovery of virus after introduction into sewerage at remote upstream location. Epidemiol. Infect. 2001;127:101–106. doi: 10.1017/S0950268801005787. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23-viruses-16-01495] 23.Petterson S., Grondahl-Rosado R., Nilsen V., Myrmel M., Robertson L.J. Variability in the recovery of a virus concentration procedure in water: Implications for QMRA. Water Res. 2015;87:79–86. doi: 10.1016/j.watres.2015.09.006. [DOI] [PubMed] [Google Scholar]

[B24-viruses-16-01495] 24.Schang C., Crosbie N.D., Nolan M., Poon R., Wang M., Jex A., John N., Baker L., Scales P., Schmidt J., et al. Passive Sampling of SARS-CoV-2 for Wastewater Surveillance. Inviron. Sci. Technol. 2021;55:10432–10441. doi: 10.1021/acs.est.1c01530. [DOI] [PubMed] [Google Scholar]

[B25-viruses-16-01495] 25.ARTIC Network—SARS-CoV-2. [(accessed on 20 November 2023)]. Available online: https://artic.network/ncov-2019.

[B26-viruses-16-01495] 26.Martin M. Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]

[B27-viruses-16-01495] 27.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv. 2013 doi: 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]

[B28-viruses-16-01495] 28.Tarasov A., Vilella A.J., Cuppen E., Nijman I.J., Prins P. Sambamba: Fast processing of NGS alignment formats. Bioinformatics. 2015;31:2032–2034. doi: 10.1093/bioinformatics/btv098. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B29-viruses-16-01495] 29.Grubaugh N.D., Gangavarapu K., Quick J., Matteson N.L., Goes De Jesus J., Main B.J., Tan A.L., Paul L.M., Brackney D.E., Grewal S., et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20:8. doi: 10.1186/s13059-018-1618-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B30-viruses-16-01495] 30.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31-viruses-16-01495] 31.Garrison E., Marth G. Haplotype-based variant detection from short-read sequencing. arXiv. 2012 doi: 10.48550/arXiv.1207.3907. [DOI] [Google Scholar]

[B32-viruses-16-01495] 32.nanoporetech/medaka. [(accessed on 24 November 2023)]. Available online: https://github.com/nanoporetech/medaka.

[B33-viruses-16-01495] 33.Accukit SARS-CoV-2. [(accessed on 24 November 2023)]. Available online: https://accugenomics.com/accukit-sars-cov-2/

[B34-viruses-16-01495] 34.Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B35-viruses-16-01495] 35.CoV-lineages/constellations. [(accessed on 24 November 2023)]. Available online: https://github.com/cov-lineages/constellations.

[B36-viruses-16-01495] 36.N’Guessan A., Tsitouras A., Sanchez-Quete F., Goitom E., Reiling S.J., Galvez J.H., Nguyen T.L., Nguyen H.T.L., Visentin F., Hachad M., et al. Detection of prevalent SARS-CoV-2 variant lineages in wastewater and clinical sequences from cities in Quebec, Canada. medRxiv. 2022 doi: 10.1101/2022.02.01.22270170. [DOI] [Google Scholar]

[B37-viruses-16-01495] 37.Karthikeyan S., Levy J.I., De Hoff P., Humphry G., Birmingham A., Jepsen K., Farmer S., Tubb H.M., Valles T., Tribelhorn C.E., et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature. 2022;609:101–108. doi: 10.1038/s41586-022-05049-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B38-viruses-16-01495] 38.Chen C., Nadeau S., Yared M., Voinov P., Ning X., Roemer C., Stadler T. CoV-Spectrum: Analysis of globally shared SARS-CoV-2 data to Identify and Characterize New Variants. Bioinformatics. 2021;38:1735–1737. doi: 10.1093/bioinformatics/btab856. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B39-viruses-16-01495] 39.Jahn K., Dreifuss D., Topolsky I., Kull A., Ganesanandamoorthy P., Fernandez-Cassi X., Banziger C., Devaux A.J., Stachler E., Caduff L., et al. Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC. Nat. Microbiol. 2022;7:1151–1160. doi: 10.1038/s41564-022-01185-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B40-viruses-16-01495] 40.Van Rossum G., Drake F.L. Python 3 Reference Manual. CreateSpace; Scotts Valley, CA, USA: 2009. [Google Scholar]

[B41-viruses-16-01495] 41.Quality Scores for Next-Generation Sequencing. [(accessed on 4 December 2023)]. Available online: https://www.illumina.com/documents/products/technotes/technote_Q-Scores.pdf.

[B42-viruses-16-01495] 42.Barnes K.G., Levy J.I., Gauld J., Rigby J., Kanjerwa O., Uzzell C.B., Chilupsya C., Anscombe C., Tomkins-Tinch C., Mbeti O., et al. Utilizing river and wastewater as a SARS-CoV-2 surveillance tool in settings with limited formal sewage systems. Nat. Commun. 2023;14:7883. doi: 10.1038/s41467-023-43047-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B43-viruses-16-01495] 43.Tracking Variants of the Novel Coronavirus in Canada. [(accessed on 23 January 2024)]. Available online: https://www.ctvnews.ca/health/coronavirus/tracking-variants-of-the-novel-coronavirus-in-canada-1.5296141.

[B44-viruses-16-01495] 44.Ni Y., Liu X., Simeneh Z.M., Yang M., Li R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput. Struct. Biotechnol. J. 2023;21:2352–2364. doi: 10.1016/j.csbj.2023.03.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B45-viruses-16-01495] 45.nanoporetech/dorado. [(accessed on 23 January 2024)]. Available online: https://github.com/nanoporetech/dorado.

[B46-viruses-16-01495] 46.Benchmarking the Oxford Nanopore Technologies basecallers on AWS. [(accessed on 23 January 2024)]. Available online: https://aws.amazon.com/blogs/hpc/benchmarking-the-oxford-nanopore-technologies-basecallers-on-aws/

[B47-viruses-16-01495] 47.Ferguson S., McLay T., Andrew R.L., Bruhl J.J., Schwessinger B., Borevitz J., Jones A. Species-specific basecallers improve actual accuracy of nanopore sequencing plants. Plant Methods. 2022;18:137. doi: 10.1186/s13007-022-00971-2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B48-viruses-16-01495] 48.jts/nanopolish. [(accessed on 23 January 2024)]. Available online: https://github.com/jts/nanopolish.

PERMALINK

Combining Short- and Long-Read Sequencing Technologies to Identify SARS-CoV-2 Variants in Wastewater

Gabrielle Jayme

Ju-Ling Liu

Jose Hector Galvez

Sarah Julia Reiling

Sukriye Celikkol

Arnaud N’Guessan

Sally Lee

Shu-Huang Chen

Alexandra Tsitouras

Fernando Sanchez-Quete

Thomas Maere

Eyerusalem Goitom

Mounia Hachad

Elisabeth Mercier

Stephanie Katharine Loeb

Peter A Vanrolleghem

Sarah Dorner

Robert Delatolla

B Jesse Shapiro

Dominic Frigon

Jiannis Ragoussis

Terrance P Snutch

Roles

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Collection and Sequencing

2.2. Mutation Calling and Filtering

2.3. Detection of Variant Lineages in Wastewater Samples

3. Results

3.1. Wastewater Sampling and Sequencing with Short- and Long-Read Methods

Table 1.

Figure 1.

3.2. Comparison of Mutation Frequency across All Samples

Table 2.

Figure 2.

Figure 3.

3.3. Use of Signature and Marker Mutations to Detect Lineages

Figure 4.

3.4. Predominant Lineages of Key Mutations from Each Dataset

Table 3.

3.5. Identifying Co-Occurrence of Mutations within the Same Amplicon

Figure 5.

4. Discussion

Figure 6.

Acknowledgments

Supplementary Materials

Author Contributions

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Funding Statement

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases