Performance of amplicon and capture based next-generation sequencing approaches for the epidemiological surveillance of Omicron SARS-CoV-2 and other variants of concern

Carlos Daviña-Núñez; Sonia Pérez; Jorge Julio Cabrera-Alvargonzález; Anniris Rincón-Quintero; Ana Treinta-Álvarez; Montse Godoy-Diz; Silvia Suárez-Luque; Benito Regueiro-García

doi:10.1371/journal.pone.0289188

. 2024 Apr 29;19(4):e0289188. doi: 10.1371/journal.pone.0289188

Performance of amplicon and capture based next-generation sequencing approaches for the epidemiological surveillance of Omicron SARS-CoV-2 and other variants of concern

Carlos Daviña-Núñez ^1,², Sonia Pérez ^1,^3,^*, Jorge Julio Cabrera-Alvargonzález ^1,³, Anniris Rincón-Quintero ^1,³, Ana Treinta-Álvarez ³, Montse Godoy-Diz ³, Silvia Suárez-Luque ⁴, Benito Regueiro-García ¹

Editor: Hin Fung Tsang⁵

PMCID: PMC11057745 PMID: 38683803

Abstract

To control the SARS-CoV-2 pandemic, healthcare systems have focused on ramping up their capacity for epidemiological surveillance through viral whole genome sequencing. In this paper, we tested the performance of two protocols of SARS-CoV-2 nucleic acid enrichment, an amplicon enrichment using different versions of the ARTIC primer panel and a hybrid-capture method using KAPA RNA Hypercap. We focused on the challenge of the Omicron variant sequencing, the advantages of automated library preparation and the influence of the bioinformatic analysis in the final consensus sequence. All 94 samples were sequenced using Illumina iSeq 100 and analysed with two bioinformatic pipelines: a custom-made pipeline and an Illumina-owned pipeline. We were unsuccessful in sequencing six samples using the capture enrichment due to low reads. On the other hand, amplicon dropout and mispriming caused the loss of mutation G21987A and the erroneous addition of mutation T15521A respectively using amplicon enrichment. Overall, we found high sequence agreement regardless of method of enrichment, bioinformatic pipeline or the use of automation for library preparation in eight different SARS-CoV-2 variants. Automation and the use of a simple app for bioinformatic analysis can simplify the genotyping process, making it available for more diagnostic facilities and increasing global vigilance.

Introduction

SARS-CoV-2, a novel betacoronavirus, was first identified in Wuhan, China, in December 2019. The virus was associated with an increase of cases of a novel pneumonia later defined as coronavirus disease 2019 (COVID-19). Detection and characterization of cases have been amongst the governmental efforts around the world to control the spread of the SARS-CoV-2 pandemic. In order to do so, healthcare systems have focused on ramping up their capacity for diagnosis throughout RT-PCR testing as well as for epidemiological surveillance through viral whole genome sequencing (WGS). For example, SARS-CoV-2 sequences uploaded to the GISAID database went from 313k in 2020 to 6.36M in 2021 and 7.75M in 2022 [1]. Across this time, several variants of concern (VOC) and variants of interest have become predominant throughout the world given their ability to spread faster or to avoid the immune system [2]. WGS data provided relevant information for SARS-CoV-2 circulating clusters, vaccine development or even insights on the intermediate zoonotic hosts for SARS-CoV-2 [3].

High-throughput next-generation sequencing (NGS) allows for massive parallel sequencing of DNA fragments. Viral WGS from clinical samples through NGS usually requires a step of viral nucleic acid enrichment in order to increase the sequencing yield. While unbiased metagenomic NGS without enrichment is possible as a mechanism for viral sequencing, it requires a very high number of reads per sample to obtain sufficient viral reads. It is a suboptimal approach, especially considering the NGS reagents cost [4]. A previous study found metagenomic approaches to map below 6% of the total reads to SARS-CoV-2, with a majority of reads being host DNA [5].The most common methods of viral enrichment are the amplicon-based and the capture-based approaches [6, 7]. In a nutshell, the amplicon-based methods rely on PCR to amplify viral genomic material using specific primers to cover the target region, while the capture-based methods use specific oligos that hybridise to the target regions, followed by a purification of the oligo-bound target DNA.

Despite the potential benefits of viral sequencing, there are challenges to the NGS implementation in diagnostic facilities. Firstly, viral enrichment and sequencing-ready library synthesis require a high degree of expertise. Secondly, although sequencing has become more affordable with the new NGS technologies, the overall cost is high, and the spending must be justified in order to implement automated high-throughput sequencing and departments that regularly track viral variants. Obtaining all the necessary laboratory equipment can be challenging as well, as there is a need for space and resources for flow chambers, freezers, sequencers, etc. Finally, bioinformatic analysis is another barrier to overcome, as it requires a skilled responsible or an user-friendly pipeline, which is not always available.

A way to reduce complexity and chances of contamination is the automation of the library preparation steps, especially when working with a high volume of samples. Commercially available pipetting platforms can integrate both enrichment and library preparation in the same workflow, reducing hands-on time [8, 9].

In summary, simple, cost-effective, high-throughput protocols of viral enrichment and library preparation, together with user-friendly online bioinformatic analysis tools, would make sequencing of SARS-CoV-2 more accessible to sequencing facilities, even in locations with more moderate resources.

In this paper, we tested the performance of two protocols of viral nucleic acid enrichment available for SARS-CoV-2. We selected the Illumina iSeq 100 platform, the smallest and most affordable Illumina sequencers, because of its ability to yield the fastest results. We also focused on the challenge of the Omicron variant for sequencing, the advantages of automated library preparation and the influence of the bioinformatic analysis in the final sequence generation.

Materials and methods

Sample selection

Nasopharyngeal swab samples from patients were selected from RT-PCR confirmed SARS-CoV-2 cases in the area of Vigo, a city in Northwest Spain. Swabs were transported in Vircell transport medium (Vircell, Granada, Spain) and frozen until viral RNA extraction. Sample cycle threshold (CT) ranged from 8 to 26. A first cohort was composed of fifty-four samples with different SARS-CoV-2 variants collected from March to June, 2021. A second cohort was added with forty Omicron samples collected from May to July, 2022. No positive or negative controls were added to the NGS runs.

Nucleic acid extraction

In the first cohort, for the amplicon-based enrichment (ABE) approach, MagNAPure 24 Total NA isolation kit (Roche Diagnostics, Mannheim, Germany) was used for RNA extraction. For the KAPA capture-based enrichment (CBE) approach, RNA extraction was performed from the same nasopharyngeal samples using the QIAGEN QiaAmp DNA Mini kit in a QiaCube extractor (Qiagen, Hilden, Germany).

In the second cohort, RNA was extracted using QIASymphony DSP Virus Pathogen Midi kit (Qiagen) according to manufacturer’s instructions.

Sequencing approaches

Amplicon-based enrichment—manual library preparation

For the first cohort, the reverse transcription was performed with random hexamers (Invitrogen, California, USA) and SuperScript™ IV kit (SSIV) (Invitrogen, California, USA). The amplification was performed with the ARTIC v3 primer panel (Integrated DNA Technologies, California, USA), a set of 98 primer pairs divided into two pools, enough to cover the whole genome (S1 Table) and the Q5 TaqPolymerase kit (New England Biolabs, Massachusetts, USA), as previously described [10]. The detailed manual RT-PCR protocol is found in S1 File. Enriched samples were then normalised and libraries were prepared using the Illumina DNA prep kit (Illumina Inc., California, USA) according to the manufacturer’s instructions. Clean-up of libraries was performed using Ampure XP beads (Beckman Coulter, California, USA) in a 1.8:1 beads-to-sample ratio.

Amplicon-based enrichment—automatic library preparation

For the second cohort, samples were enriched using the ARTIC v4.1 primer panel, an updated version for optimal amplification of the Omicron variant (S1 Table) [11]. Retrotranscription, enrichment and library preparation were performed using the Illumina CovidSeq test (Illumina, Protocol in Illumina Document #1000000126053 v04). All steps of the Illumina CovidSeq protocol were performed according to the manufacturer’s instructions by the HAMILTON Microlab STAR pipetting platform (Hamilton Iberia, Barcelona, Spain).

Capture-based enrichment

Capture-based libraries were prepared following the KAPA RNA HyperCap workflow with specific enrichment probes for SARS-CoV-2 (Roche Diagnostics, Mannheim, Germany). Each individual library was created using 10ul of extracted RNA input and following the protocol established by the kit’s manufacturer. RNA was fragmented at 96° for 6 minutes. Eighteen PCR cycles were used to enrich each library prior to capture. Libraries were quantified and then multiplexed in sets of 6 libraries. For a total of 1500 ngs of DNA per capture, 250 ngs per library were pooled. Captured pools were amplified using 17 PCR cycles and then quantified for sequencing.

Genomic library analysis

Genomic libraries were sequenced using an Illumina iSeq 100 with 2x151 paired-end cycles. A total of 18 or 20 samples per run were sequenced in the first and second cohort, respectively. Sample pools were diluted to 75 pM and added into an iSeq cartridge v2 with Illumina PhiX at 5% concentration.

All libraries enriched with an ABE approach were quantified using Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Massachusetts, USA). The libraries obtained from the CBE approach were quantified using the qPCR KAPA library quantification kit (Roche Diagnostics, Cape Town, South Africa). Sample size of all libraries was checked using an Agilent 2100 Bioanalyzer (Agilent technologies, California, USA) prior to sequencing.

Bioinformatic analysis

The quality of the fastq files was checked with FastQC 0.11.9 (Andrews 2010) and QualiMap 2.2.1 [12]. The reads were aligned to the reference NCBI code NC_045512.2 from Wuhan with BWA-mem2 [13] w. We used iVar 1.3 [14] to trim primer sequences and the reads based on a quality threshold (Default: 20) and to remove reads less than 32 bp ong. We used SAMtools v1.10 coverage (using htslib 1.10.2) to calculate the genome coverage [15].

To build a consensus sequence for each sample, we merged the reads with SAMtools mpileup and used iVar 1.3 [14] consensus with a minimum quality score threshold to count base of 20, a minimum read depth of 10 to call consensus and a minimum VAF threshold of 0.01. We assigned the consensus sequences to a SARS-CoV-2 clade with Nextclade [16] and to a SARS-CoV-2 PANGO lineage [17] with Pangolin [18].

We considered the ECDC recommendations to establish a quality threshold (QC) for each SARS CoV-2 consensus sequence: Minimum read depth of 10 over at least 95% of the genome [19]. As an additional quality threshold, samples with a median base depth below 50 were discarded due to low sequencing quality and were discarded from further analysis.

The Illumina-owned DRAGEN™ Covid Lineage 3.5.6, using 10X as coverage threshold for base-calling, was also used. DRAGEN™ (Dynamic Read Analysis forGENomics) is a Bio-IT Platform in BaseSpace™ Sequence Hub. SARS-CoV-2 variant calling was performed using Nextclade [16], based on the consensus sequences generated.

All statistical data analysis was done using R (version 4.1.1, https://cran.r-project.org/). Shapiro-Wilk normality test was performed to check for normality. Wilcoxon-sign rank sum test and Fisher’s F-test were used. A p-value below 0.05 was considered significant.

Multiple sequence alignment was performed using Multiple Alignment using Fast Fourier Transform (MAFFT) [20] and the aligned sequences were used to generate Neighbour-Joining phylogenetic trees in MEGA11 [21] with the Maximum Composite Likelihood model. For the consensus sequence comparison, unread bases and terminal bases were excluded from the mismatch count. Data visualisation was performed with the R program ggplot2 [22].

Results

Samples from nasopharyngeal swabs were tested from two Cohorts, a pre-Omicron cohort (n = 54) and an Omicron cohort (n = 40). All samples were enriched using both methods, an amplicon-based method (ARTIC v3 and Illumina DNA prep for Cohort 1, ARTIC v4.1 and Illumina CovidSeq for cohort 2) and a capture-based method (KAPA RNA Hypercap). Low to mid CT value samples were chosen (8–26). Read depth, genome coverage, allele frequency and consensus sequences were analysed in order to evaluate the yield of both methods. From 94 samples, 6 did not pass QC for sequencing using the capture-based method, obtaining 88 correctly sequenced samples.

SARS-CoV-2 base coverage

Median base depth over 50 was obtained for 100% (94/94) of the samples with the ABE and 94% (88/94) using the CBE (S2 Table). Among these samples, ABE showed a higher median base depth than CBE (median ± standard deviation (SD): 1444.5 ± 581 vs. 776.5 ± 1426; Wilcoxon test, p = 0.0057) [IQR: 933–1894 vs. 322–1628]. CBE showed a more heterogeneous depth per sample (Fisher’s F-test, p < 0.0001). This heterogeneity caused CBE to present the samples with the highest (>5000 reads/base) and the lowest values (<50 reads/base). By Cohort, pre-Omicron samples had a higher read depth than Omicron samples only in the ABE approach (median ± SD: 1841 ± 440 vs. 913 ± 378; Wilcoxon test, p < 0.001). For the CBE approach, no significant differences were found between both cohorts (median ± SD: 723 ± 1611 vs. 909 ± 1141; Wilcoxon test, p = 0.92) (Fig 1A).

Fig 1 — a) Violin plot and boxplot of the median read depth divided by cohorts. (For Amplicon enrichment, Wilcoxon test; p < 0.0001; For capture enrichment, p = 0.92). b,c) Genome coverage, percentage of the genome with a read depth over 10. Each data point corresponds to one sample. Amplicon: ARTIC panel, amplicon-based enrichment. Capture: KAPA RNA HyperCap, capture-based enrichment.

All samples passing QC had over 98% of genome coverage. The percentage of called bases was higher in the CBE than in the ABE approach (median ± SD: 99.95 ± 0.05 vs. 99.56 ± 0.40; Wilcoxon test, p < 0.0001) (Fig 1B and 1C).

Region specific genome coverage

Non-covered areas by the genome were checked across methods. For the ABE, the number of unread bases was variant-specific with Delta samples having more unread bases than non-Delta samples (mean ± SD: 204 ± 139 vs. 107 ± 102, Wilcoxon test, p = 0.00088). This correlation was not observed with the CBE, where actually non-Delta samples had less coverage, although with smaller differences (mean ± SD: 12 ± 15 vs. 15 ± 16, Wilcoxon test, p = 0.0026) (Fig 2A and 2B).

Fig 2 — a-b) Genome coverage per variant. Delta samples in the amplicon enrichment showed the lowest genome coverage. c-d) Percentage of samples with less than 10 reads on each base. Areas with dips in coverage were identified, with a notable peak consistent of Delta samples at the beginning of the *spike* gene. Amplicon: ARTIC panel, amplicon-based enrichment. Capture: KAPA RNA HyperCap, capture-based enrichment.

The location of unread bases across the genome was not randomly distributed, but rather concentrated in certain areas of the genome, regardless of the method of enrichment. The ABE showed a maximum of unread bases around base 21850. This peak was Delta-specific (Fig 2C and 2D).

Variant allele frequency

We analysed the allele frequency for each SNP detected. In our samples, ABE showed little variance in allele frequency, with most mutations detected with over 90% read agreement (Fig 3A). Using CBE, while most samples showed high allele frequency, a subset of samples showed high variability (Fig 3B). Specifically, 6 samples processed with CBE had more than 25 SNPs detected with 20–90% read agreement (low-agreement). For ABE, samples had between 0 and 8 SNPs detected with low agreement (Fig 3C).

Fig 3 — a) ARTIC panel, amplicon-based enrichment. b) KAPA RNA HyperCap, capture-based enrichment. Each point represents an SNP, while each line represents a LOESS local regression for each sample. One colour per sample. c) Violin plot and boxplot of the number of SNPs per sample at low frequency (0.2–0.9) or high frequency (>0.9).

Agreement of consensus sequence across methods

For all samples but one, the same Pango lineage was determined using both methods. The exception was a sample declared as B.1.1.529 using ABE and BA.2 using CBE (S2 Table). The sample turned out to be a possible recombinant between BA.1 and BA.2. Amplicon sequences and capture sequences showed highly similar locations in the phylogenetic tree (Fig 4), although with small branch differences within clusters caused by mismatches detected or by areas left unread due to low coverage.

Fig 4 — Samples are coloured by clade, with undesignated recombinant samples shown in yellow. The tip label indicates the Pango lineage. Tree generated by Neighbour-Joining method; Maximum Composite Likelihood. Amplicon: ARTIC panel, amplicon-based enrichment. Capture: KAPA RNA HyperCap, capture-based enrichment.

A total of 84 base changes were observed depending on the method of library enrichment (Fig 5A). From 88 samples, no base mismatch between both methods was observed in 56 samples (64%). A total of 24 (27%), 3 (3%) and 5 (6%) samples showed 1, 2 and more than 2 discrepancies, respectively. The two most common discrepant SNPs were found as errors with the ABE method: The lack of detection of G21987A in the Delta samples was the most common mismatch (n = 16). The second most common mismatch was the addition of the mutation T15521A in the Omicron samples (n = 12) analysed by the ABE method. Four samples, all analysed by the CBE method, showed a high number of discrepancies (6–24 mismatches), consisting in missing characteristic SNPs.

Discrepancies associated to variations in filtering parameters and the alignment algorithm

All consensus sequences were obtained with a customised pipeline, as described in the materials and methods section. A second bioinformatic analysis was performed for all samples using the Illumina® DRAGEN™ COVID lineage app. We compared the consensus sequences generated to evaluate the concordance between a more user-friendly method and our in-house pipeline. In the case of ABE, 17 discrepancies were found, evenly distributed across samples (one per sample) (Fig 5B). The most common of these variations was T15521A (n = 5), which had an allele frequency between 0.22 and 0.93 in ABE (Fig 3A). For CBE, 39 mismatches were found concentrated in nine samples, with 79 out of 88 samples having no discrepancies (Fig 5C). These 9 samples showed a low quality of sequencing with high variations in allele frequency.

Analysis of undesignated possible recombinant samples

Three of the samples sequenced from the Omicron cohort showed a wide arrange of mutations from both BA.1 and BA.2 variants. These samples appeared in the phylogenetic tree outside of the BA.1 or BA.2 monophyletic clusters (Fig 4). In order to check for recombination, the allelic frequency of the defining BA.1 and BA.2 mutations was plotted for these three samples (Fig 6). Allelic frequency showed distinct regional areas of the genome that are highly BA.1 or highly BA.2, suggesting a recombinant sample. Specifically, sample 55 showed two breakpoints: one likely between bases 15240–15714 and other between bases 26060–26530 (Fig 6A and 6B). Samples 57 and 80 shared a single breakpoint likely between bases 26060 and 26530 (Fig 6C–6F).

Sample 55 showed in the CBE a discrepant pattern compared with the amplicon enrichment in the second half of the genome (Fig 6B). This sample showed 9 discrepancies between both methods of enrichment (Fig 5A), and heterogeneous allele frequency. Sample 55 was the only case of Pango designation discrepancy from the analysed samples, with the ABE-generated sequence being declared as B.1.1.529 and the CBE-generated sequence being declared as BA.2 (S2 Table).

Discussion

We have compared two different methods for viral RNA enrichment in SARS-CoV-2 sequencing. Both methods of enrichment showed differences in base depth, double for the amplicon method, although only in the pre-Omicron samples, suggesting that the changes introduced in the amplicon enrichment method in the Omicron samples could have negatively affected the output. Mainly, these changes were the extraction, the automation and the Illumina CovidSeq reagent.

There are currently a few published studies on viral enrichment of SARS-CoV-2 [4, 5, 23–29]. Comparison between studies is complex due to differences in the methods, the analysed variants of SARS-CoV-2, the viral load ranges or the sequencing platforms. Samples per run (expected reads per sample) could be one of the main factors of variability of results across studies. For example, the SNP mismatch frequency detected between enrichment methods in the literature has been reported to be between almost 0 to around 5% [4, 29]. This difference could be likely due to the various factors mentioned above. Notably, as the virus keeps accumulating mutations, genetic diversity increases, and accurate detection of all mismatches becomes more challenging.

ABE provides higher homogeneity of coverage between samples

The variance across samples was noteworthy, double for the capture method. The amplicon-based method using Illumina reagents relies on normalisation of libraries by tagmentation, assuming equal tagmentation of libraries using the same amount of tagmentation reagent per sample. This proved to be enough to obtain a similar amount of read depth per sample across runs (Fig 1A). In the capture enrichment, libraries are generated first and then non-target DNA is removed, as opposed to amplicon enrichment. This enables the possibility of multiplexing enrichment, increasing the cost-effectiveness of the reagents. In this study we performed a 6-plex library enrichment. This makes individual normalisation of enriched libraries prior to sequencing impossible, causing the observed heterogeneity in read depth (Fig 1A). Singleplex capture is always a possibility and allows for improved normalisation and reads-per-sample homogeneity. However, this also highly increases the cost and hands-on time of the enrichment procedure.

The capture method performed worse in allele frequency. It was found to have more sequences with low allele frequency across the genome. While most samples provided a highly homogeneous allele frequency in SNP detection for both methods of enrichment, a subset of samples processed by the capture method showed a high number of SNPs with low read frequency. The same pattern did not appear in the amplicon method nor in other related samples processed by the capture method, suggesting errors in capture or base calling of these particular samples.

ABE is sensitive to amplicon loss due to mutations in the primer binding area

In the case of the amplicon enrichment method, we found errors due to the impact of SNPs in primer binding. This enrichment method relies on the proper binding of the primers to specific locations in the genome, so a specific mutation or indel in the primer-binding area can cause a whole amplicon to be missed in the PCR (amplicon dropout) [30, 31]. For this reason, the main disadvantage of the amplicon method in viral sequencing is the need for constant primer update in order to get a panel that is effective for all variants.

Our results showed two cases of primer binding errors causing artefacts in sequencing. In the Delta variant there was a significant drop in reads for amplicon 72 using the ARTIC v3 panel (Fig 2C and 2D). This was not found with our hybrid-capture method, nor for other variants sequenced with the ARTIC panel. Amplicon 72 of the ARTIC panel v3 could be lost in Delta virus sequences due to a Delta-specific deletion 22029–22034 [30, 32]. This deletion overlaps with the primer 72_RIGHT of the ARTIC panel v3 binding area, causing a failure in PCR. This is an example of how an amplicon dropout could cause a loss of NGS reads in specific variants. This phenomenon has already been previously reported by other publications, including the ARTIC consortium. This dropout could also cause the loss in detection of mutation G21987A (S:G142D) in Delta, covered by the same amplicon. In this study, 9 samples lacked the mutation, 10 samples left the base as unread (below 10 reads), and only 2 samples included the G21987A SNP. Using capture-enrichment, the mutation was detected in all samples, which could be close to the real prevalence [30, 32]. Regarding global data, in March of 2023 (time of writing), 33% of the Delta samples in the GISAID database lacked the G21987A mutation [33]. As a consequence, the ARTIC consortium published in June 2021 a v4 primer panel in order to correct for this error. Additionally, we found the mutation T15521A in 8 of our 40 Omicron samples, which seems to be another artefact caused by the mispriming of one of the primers included in the ARTIC v4.1 panel. The primer 93_LEFT could hybridise on the amplicon 51 area causing a secondary amplicon. Due to the effect of a mismatch between the primer and its binding region, mutation T15521A seems to have been inserted [34]. This SNP was present in a low allelic frequency for most samples in the amplicon-based enrichment, reinforcing the argument of an artificial insertion (Fig 3A).

The amplicon dropout detected using the ARTIC v3 panel corresponded to the spike gene sequence. The spike protein is responsible for the host cell invasion [35], and the target against most designed vaccines. Being the target of vaccines and highly immunogenic, the spike gene is subjected to more selective pressure, as it is driven by evolution to new variants with more immune escape. Therefore, it has a higher mutation rate than the rest of the genome [36]. It is expected that highly mutated areas of the genome such as this one will be more prone to failure in amplification with a primer panel. Given the clinical and epidemiological importance of accurate sequencing of the spike gene specifically, frequent update of primer panels is key for an optimal sequencing of new variants. Capture panel probes are unlikely to be affected by SNPs or indels due to most commercial panels consisting of tiled probes of at least 80 bps (120 bps in the case of the KAPA enrichment probes), as it was also found in a recent publication [23]. However, capture probes have been shown to have a reduced yield compared to amplicon enrichment due to the capture of off-target DNA fragments [24].

Illumina iSeq 100 is sufficient for SARS-CoV-2 sequencing with enrichment

The sequencing system tested in this paper was the Illumina iSeq 100 system, which is the smallest of the Illumina sequencing platforms. While the price per sample of Illumina iSeq is higher compared to the high-capacity sequencing platforms, Illumina iSeq is simpler, easier to install and virtually maintenance-free. In addition, it can provide faster results than other platforms, with a run completed in around 18 hours. We showed that NGS with Illumina iSeq 100 provided over 98% coverage of the SARS-CoV-2 genome in all samples above 50 median reads per base. It is important to note that the amount of reads per base can be optimised by changing the amount of samples per sequencer run. The optimal number of samples per run must therefore be determined by the user depending on the coverage desired and the resources available. A high number of reads per base is ideal as it also allows for detection of small subpopulations within one sample. In the context of the COVID-19 pandemic, this could be useful to detect co-infections with different variants, specially in immunosuppressed patients [37]. Nowadays the number of reads per base could also be relevant in the case of panels of capture enrichment that allow the detection and genotyping of several respiratory viruses in one sample at the same time. The current co-circulation of SARS-CoV-2, influenza and respiratory syncytial virus (RSV) is increasing the demand for detection of co-infection between several respiratory viruses. Hybrid-capture enrichment could be more suited for detecting coinfections between different organisms as it allows for more targets per panel [9].

Automation of ABE enrichment and library generation provided high-coverage SARS-CoV-2 sequencing

For our amplicon approach, the first cohort was processed manually while for the second one the Hamilton Microlab STAR pipetting platform was used. As the amplicon-based enrichment protocol is generally a simpler protocol than the capture-based one, automation is also easier to program and implement. Manual libraries yielded a higher coverage than automatic ones (Fig 1A, S2 Table), although this could be due to other factors, such as different variants sequenced, different versions of the ARTIC panel used, different NA extraction and amplification reagents and different number of runs per sample. In any case, both systems were successful at genotyping SARS-CoV-2 with a high coverage (Fig 1B). Future users should take into account that, as automation requires higher reagent volumes, the cost per sample increases using an automatic library preparation pipeline.

Sample viral load is a determining factor in sequencing success

In order to obtain the most representative data about circulating variants, we usually select for genotyping purposes samples with low-to-mid CT. For this reason, our study focuses on a CT range of 8–26. Samples with low concentration (high CT value) are expected to have a much lower sequencing yield, especially for capture methods [4, 38]. The CT value can be a determining factor for the enrichment method selection, with a previous publication showing amplicon enrichment to be the best-performing enrichment system in low viral load samples [27]. Amplicon enrichment has been previously found to be successful for high genome coverage even at CT values of around 38 [27]. Nevertheless, in our experience, detection of the most challenging mutations with enough quality is increasingly difficult considering the current complexity of the genome of the virus. Every epidemiological service must decide if to opt for a more unbiased approach of sequencing all received samples regardless of CT, despite the risk of lower sequencing quality, or to discard low viral load samples prior to sequencing.

Different bioinformatic pipelines can cause small differences in the final sequence obtained

This study also compared two different bioinformatic pipelines, an in-house system and Illumina® DRAGEN™ COVID lineage app. Differences in consensus sequence generated by both methods were detected, such as T15521A with the ARTIC v4.1 panel, or SNPs with low allele frequency with the capture based enrichment approach. A low allele frequency could cause differences in base calling due to differences in mapping and filtering. A previous publication similarly found differences in base calling when different algorithms were applied [39]. In our cohorts, 80% of the samples processed with the amplicon-based enrichment showed no discrepancies in the sequences generated with different bioinformatic pipelines, and only one discrepancy was found for the other samples. For the capture-based enrichment, 90% of the sequences were identical. Both systems proved to be sufficient for analysis and consensus sequence determination, and Illumina DRAGEN is an user-friendly system that could be implemented in any sequencing facility.

Limitations and potential improvements

There were limitations to our research. As we focused on previously-confirmed positive samples with low-to-mid CT value, samples were only sequenced once per method with no replicates, and no negative or positive controls were added in the runs. We encourage epidemiological vigilance services to add a negative and a positive control in diagnostic routine in order to discard false positives due to contamination as well as to discard potential sequencing errors.

Despite this, the level of agreement across samples was high using different enrichment methods and bioinformatic analysis. Both methods were also able to detect likely recombinant samples, although it must be noted that the ECDC recommends 1000 bps sequencing read length in order to detect recombinant samples [19], more than the 150 bps allowed by Illumina iSeq 100.

A future, more detailed analysis of SARS-CoV-2 sequencing could include new circulating variants, coinfections, other input samples such as saliva, different sample transport mediums, or different RNA extraction procedures, library preparation kits and sequencers. Nonetheless, our study provides insight on the quality of the Illumina iSeq 100 as a sequencing instrument for SARS-CoV-2, as well as the pros and cons of the two main mechanisms for viral RNA enrichment, using two of the most common market-available library platforms. Illumina iSeq is economical and virtually maintenance-free, and therefore its implementation should be easier than other possible options.

Final remarks

In summary, both enrichment methods showed a high sequencing quality in the samples studied. Nevertheless, capture enrichment with the KAPA RNA Hypercap reagent increased the difficulty of an optimal library normalisation and therefore provided an uneven number of reads when multiplexing samples in the same sequencing run. On the other hand, the ARTIC amplicon-based enrichment was sensitive to SNPs as well as to deletions. These SNPs could cause a decrease in primer binding efficiency and a sequencing bias in which samples from certain variants had a deeper sequencing than others. Constant updates of the primer panels are therefore required to avoid amplicon dropouts.

In this study we showed the behaviour of two SARS-CoV-2 genotyping methods with a wide variety of variants considered as VOC throughout the pandemic, including the Omicron variant. In this way, we can say that concordant results were obtained for variants Alpha, Beta, Gamma, Delta, Zeta, Iota, Mu, Omicron and even possible recombinant genomes. Another important point is that we showed the performance of the automatization of the library preparation and an app for bioinformatic analysis of NGS data that could simplify the overall genotyping process.

Supporting information

S1 File. Custom protocol for RT-PCR amplification of SARS-CoV-2 RNA.

(DOCX)

pone.0289188.s001.docx^{(8.4KB, docx)}

S1 Table. ARTIC primers (v3 & v4.1) used for amplicon-based enrichment.

(XLSX)

pone.0289188.s002.xlsx^{(37.3KB, xlsx)}

S2 Table. Coverage and depth per sample using all enrichment methods and bioinformatic analysis.

(XLSX)

pone.0289188.s003.xlsx^{(78.7KB, xlsx)}

S1 Dataset

(DOCX)

pone.0289188.s004.docx^{(9.1KB, docx)}

Acknowledgments

We would like to acknowledge the staff from the Microbiology service of the Complexo Hospitalario Universitario de Vigo (CHUVI), for their contribution to epidemiological surveillance, their work and dedication.

Data Availability

The genetic sequences analysed in the publication are published in the database epiCoV from GISAID. All GISAID IDs for each sequence can be found in the Supporting Information files.

Funding Statement

This work has been funded by: the European Centre for disease Prevention and Control under the GA ECDC/HERA/2021/024 ECD.12241, the Aid for the consolidation and structuring of competitive research units and other promotion actions of the Galician Innovation Agency, code IN607B-2022/19, and the Consellería de Sanidade, Galicia, Spain The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1.GISAID Initiative n.d. https://www.epicov.org/epi3/frontend#6307ef (accessed January 24, 2023).
2.Tracking SARS-CoV-2 variants n.d. https://www.who.int/activities/tracking-SARS-CoV-2-variants (accessed February 23, 2024).
3.John G, Sahajpal NS, Mondal AK, Ananth S, Williams C, Chaubey A, et al. Next-Generation Sequencing (NGS) in COVID-19: A Tool for SARS-CoV-2 Diagnosis, Monitoring New Strains and Phylodynamic Modeling in Molecular Epidemiology. Curr Issues Mol Biol 2021;43:845–67. doi: 10.3390/cimb43020061 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Charre C, Ginevra C, Sabatier M, Regue H, Destras G, Brun S, et al. Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation. Virus Evol 2020;6:veaa075. 10.1093/ve/veaa075. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Liu T, Chen Z, Chen W, Chen X, Hosseini M, Yang Z, et al. A benchmarking study of SARS-CoV-2 whole-genome sequencing protocols using COVID-19 patient samples. iScience 2021;24:102892. doi: 10.1016/j.isci.2021.102892 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples | Genome Medicine | Full Text n.d. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00751-4 (accessed January 31, 2023). [DOI] [PMC free article] [PubMed]
7.Chiara M, D’Erchia AM, Gissi C, Manzari C, Parisi A, Resta N, et al. Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities. Brief Bioinform 2021;22:616–30. doi: 10.1093/bib/bbaa297 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Hess JF, Kohl TA, Kotrová M, Rönsch K, Paprotka T, Mohr V, et al. Library preparation for next generation sequencing: A review of automation strategies. Biotechnol Adv 2020;41:107537. doi: 10.1016/j.biotechadv.2020.107537 [DOI] [PubMed] [Google Scholar]
9.Singh RR. Target Enrichment Approaches for Next-Generation Sequencing Applications in Oncology. Diagnostics 2022;12:1539. doi: 10.3390/diagnostics12071539 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Gallego-García P, Varela N, Estévez-Gómez N, De Chiara L, Fernández-Silva I, Valverde D, et al. Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters. Virus Evol 2022;8:veac008. doi: 10.1093/ve/veac008 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.SARS-CoV-2 V4.1 update for Omicron variant—Laboratory. ARTIC Real-Time Genomic Surveill 2021. https://community.artic.network/t/sars-cov-2-v4-1-update-for-omicron-variant/342 (accessed February 23, 2024). [Google Scholar]
12.Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 2016;32:292–4. doi: 10.1093/bioinformatics/btv566 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv; 2013. 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
14.Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 2019;20:8. doi: 10.1186/s13059-018-1618-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–9. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Aksamentov I, Roemer C, Hodcroft EB, Neher RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw 2021;6:3773. 10.21105/joss.03773. [DOI] [Google Scholar]
17.Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020;5:1403–7. doi: 10.1038/s41564-020-0770-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.O’Toole Á, Pybus OG, Abram ME, Kelly EJ, Rambaut A. Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences. BMC Genomics 2022;23:121. doi: 10.1186/s12864-022-08358-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Sequencing of SARS-CoV-2—first update 2021. https://www.ecdc.europa.eu/en/publications-data/sequencing-sars-cov-2 (accessed February 23, 2024). [Google Scholar]
20.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002;30:3059–66. doi: 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tamura K, Stecher G, Kumar S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol Biol Evol 2021;38:3022–7. doi: 10.1093/molbev/msab120 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Wickham H. GGPLOT2: Elegant Graphics for Data Analysis 2016 Springer-Verlag, New York: 2016. [Google Scholar]
23.Nicot F, Trémeaux P, Latour J, Carcenac R, Demmou S, Jeanne N, et al. Whole-genome single molecule real-time sequencing of SARS-CoV-2 Omicron. J Med Virol 2023;95:e28564. doi: 10.1002/jmv.28564 [DOI] [PubMed] [Google Scholar]
24.Rehn A, Braun P, Knüpfer M, Wölfel R, Antwerpen MH, Walter MC. Catching SARS-CoV-2 by Sequence Hybridization: a Comparative Analysis. mSystems 2021;6:e0039221. doi: 10.1128/mSystems.00392-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Nasir JA, Kozak RA, Aftanas P, Raphenya AR, Smith KM, Maguire F, et al. A Comparison of Whole Genome Sequencing of SARS-CoV-2 Using Amplicon-Based Sequencing, Random Hexamers, and Bait Capture. Viruses 2020;12:895. doi: 10.3390/v12080895 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Xiao M, Liu X, Ji J, Li M, Li J, Yang L, et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med 2020;12:57. doi: 10.1186/s13073-020-00751-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lam C, Gray K, Gall M, Sadsad R, Arnott A, Johnson-Mackinnon J, et al. SARS-CoV-2 Genome Sequencing Methods Differ in Their Abilities To Detect Variants from Low-Viral-Load Samples. J Clin Microbiol 2021;59:e01046–21. doi: 10.1128/JCM.01046-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Nicot F, Trémeaux P, Latour J, Jeanne N, Ranger N, Raymond S, et al. Whole‐genome sequencing of SARS‐CoV‐2: Comparison of target capture and amplicon single molecule real‐time sequencing protocols. J Med Virol 2022:10.1002/jmv.28123. doi: 10.1002/jmv.28123 [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Gerber Z, Daviaud C, Delafoy D, Sandron F, Alidjinou EK, Mercier J, et al. A comparison of high-throughput SARS-CoV-2 sequencing methods from nasopharyngeal samples. Sci Rep 2022;12:12561. doi: 10.1038/s41598-022-16549-w [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Davis JJ, Long SW, Christensen PA, Olsen RJ, Olson R, Shukla M, et al. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein. Microbiol Spectr 2021;9:e01803–21. doi: 10.1128/Spectrum.01803-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Bei Y, Pinet K, Vrtis KB, Borgaro JG, Sun L, Campbell M, et al. Overcoming variant mutation-related impacts on viral sequencing and detection methodologies. Front Med 2022;9:989913. doi: 10.3389/fmed.2022.989913 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Sanderson T, Barrett JC. Variation at Spike position 142 in SARS-CoV-2 Delta genomes is a technical artifact caused by dropout of a sequencing amplicon. Wellcome Open Res 2021;6:305. doi: 10.12688/wellcomeopenres.17295.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Chen C, Nadeau S, Yared M, Voinov P, Xie N, Roemer C, et al. CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants. Bioinformatics 2022;38:1735–7. doi: 10.1093/bioinformatics/btab856 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Issues with SARS-CoV-2 sequencing data. Virological 2020. https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/12 (accessed January 31, 2023).
35.Wang Q, Zhang Y, Wu L, Niu S, Song C, Zhang Z, et al. Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2. Cell 2020;181:894–904.e9. doi: 10.1016/j.cell.2020.03.045 [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Amicone M, Borges V, Alves MJ, Isidro J, Zé-Zé L, Duarte S, et al. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol Med Public Health 2022;10:142–55. doi: 10.1093/emph/eoac010 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Zannoli S, Brandolini M, Marino MM, Denicolò A, Mancini A, Taddei F, et al. SARS-CoV-2 Co-Infection in Immunocompromised Host Leads to Generation of Recombinant Strain. Int J Infect Dis 2023. 10.1016/j.ijid.2023.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Klempt P, Brož P, Kašný M, Novotný A, Kvapilová K, Kvapil P. Performance of Targeted Library Preparation Solutions for SARS-CoV-2 Whole Genome Analysis. Diagn Basel Switz 2020;10:769. doi: 10.3390/diagnostics10100769 [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Afiahayati, Bernard S, Gunadi, Wibawa H, Hakim MS, Marcellus, et al. A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains. Genes 2022;13:1330. doi: 10.3390/genes13081330 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0289188.r001

Decision Letter 0

Hin Fung Tsang

22 Jan 2024

PONE-D-23-21634Performance of amplicon and capture based next-generation sequencing approaches for the epidemiological surveillance of Omicron SARS-CoV-2 and other variants of concern.PLOS ONE

Dear Dr. Perez,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Mar 07 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Hin Fung Tsang

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1. Experimental Setup:

Two bioinformatic pipelines were used to examine the samples in the study, and Illumina iSeq 100 was used for sequencing. The results are more reliable when several methods and pipelines are used.

2. Challenges with Amplicon Method:

The paper points out drawbacks of the amplicon enrichment technique, such as amplicon dropout and mispriming, particularly in the context of particular mutations like G21987A and T15521A. This emphasizes the necessity of ongoing primer modifications for efficient sequencing.

Reviewer #2: I would recommend the following suggestions

1) Kindly arrange the manuscript sections as per journal format. Read the journal guidelines for manuscript.

2) I would suggest to include thermal profile for RT-PCR in tabular form.

3) The primer sequences should be made available as supplementary data.

4) What you meant by "selective pressure", line 340.

5) I would like the authors to highlight why they have not included positive or negative controls in their investigation.

6) I would suggest to add a separate heading for challenges at each step, faced to sequence SARS-CoV-2: that they mentions throughout this manuscript.

Reviewer #3: This scientific article is suitable for publication because it contains an ethical license to conduct research, and it also clarifies and details precise details, while proving them with diagrams and curves.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Elhadi Abdalla Ahmed

Reviewer #2: No

Reviewer #3: Yes: Murtadha Abbas

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2024 Apr 29;19(4):e0289188. doi: 10.1371/journal.pone.0289188.r002

Author response to Decision Letter 0

11 Mar 2024

We want to thank all reviewers for their time, for their comments and for their reviews. We appreciate the feedback. Please find our response to your comments below.

Reviewer 1

1. Experimental Setup:

Two bioinformatic pipelines were used to examine the samples in the study, and Illumina iSeq 100 was used for sequencing. The results are more reliable when several methods and pipelines are used.

Our aim for this paper was to show the feasibility of the sequencing of SARS-CoV-2. By showing the effectivity of a user-friendly pipeline (Illumina DRAGEN) and an economic sequencing platform (Illumina iSeq 100), we try to encourage researchers to join the global epidemiological vigilance of SARS-CoV-2 and other respiratory viruses, even when resources or bioinformatic capacity is limited in the facility.

We agree that several pipelines and sequencing platforms can ensure solid results. In order to have a more robust analysis, we compared two enrichment methods and bioinformatic pipelines.

2. Challenges with Amplicon Method:

We agree that amplicon panels need constant updates, especially in the case of RNA viruses with high mutation rate, as mentioned in the discussion section “ABE is sensitive to amplicon loss due to mutations in the primer binding area” (lines 324-371) . Thank you for your comment.

Reviewer 2

Reviewer #2: I would recommend the following suggestions

1) Kindly arrange the manuscript sections as per journal format. Read the journal guidelines for manuscript.

Thank you for the comment. We tried to arrange the sections according to journal format (3 subsections in fonts 18, 16 and 14 respectively). Main headings were for the main sections (e.g. Introduction, Methods, Results and Discussion). Subsections were used across the text, and another sublevel was used only in the Methods section, subsection “Sequencing approaches” (lines 89, 101 and 109).

In order to make the manuscript more easy to follow, and according to your suggestion 6), we have also added subsections in the discussion section of the manuscript. Additionally, one paragraph has been moved above (lines 292-301) to improve the order of the arguments exposed.

If there is something else we have missed, please let us know and we will correct it accordingly.

2) I would suggest to include thermal profile for RT-PCR in tabular form.

Thank you for the suggestion. Upon your comment, we considered that the RT-PCR protocol was perhaps scarce in detail so we have added a supplementary file (S1 file) with the full protocol for the manual amplification. In the automatic amplification, as it was performed with the Illumina CovidSeq protocol without changes, we have added the reference of the Illumina Reference Guide that was followed (line 105), as we believe it provides enough detail for another user to complete the protocol as intended.

3) The primer sequences should be made available as supplementary data.

Thank you for pointing this out. We have added another supplementary table (S1 Table) that includes all primers from both versions of ARTIC used.

4) What you meant by "selective pressure", line 340.

We mean selective pressure as the cause for a genotype to change in order to adapt. The spike protein is highly immunogenic and the target of vaccines. For this reason, evolution forces new variants of this gene in the population, in order to keep escaping the host’s immune system. This has been shown in publications such as this one:

Amicone, M., Borges, V., Alves, M.J., Isidro, J., Zé-Zé, L., Duarte, S., Vieira, L., Guiomar, R., Gomes, J.P. and Gordo, I., 2022. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evolution, medicine, and public health, 10(1), pp.142-155.

In the context of our publication, additional mutations mean more likelihood of a mutation appearing in the primer-binding area, affecting amplicon homogeneity by drop in primer binding.

We have added an additional explanation in the manuscript (lines 359-364). The publication above has also been added to the bibliography as citation number 36.

5) I would like the authors to highlight why they have not included positive or negative controls in their investigation.

Our study focuses on comparing two kits commonly used in routine diagnostics in our facility, and we believed that an efficient comparison in methodology in performance could be achieved by just focusing on the analysis of clinical samples. This also allowed us to focus on as many SARS-CoV-2 variants as possible in order to show potential defects in the enrichment across all the variability circulating. More controls are definitely an improvement in the methodology and we encourage their use in common diagnostic practice. In order to clarify this for the scientific community, we have added a paragraph detailing this (lines 450-455).

6) I would suggest to add a separate heading for challenges at each step, faced to sequence SARS-CoV-2: that they mentions throughout this manuscript.

The discussion section has been structured according to subsections in order to discuss each challenge from the different approaches (viral load, enrichment method, sequencing platform or bioinformatic analysis amongst others). Thank you for the recommendation.

Reviewer 3

Thank you very much for your time, for your comments and for your review. We appreciate the feedback and your kind words.

Attachment

Submitted filename: Response to reviewers.docx

pone.0289188.s005.docx^{(9.1KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0289188.r003

Decision Letter 1

Hin Fung Tsang

15 Mar 2024

Performance of amplicon and capture based next-generation sequencing approaches for the epidemiological surveillance of Omicron SARS-CoV-2 and other variants of concern.

PONE-D-23-21634R1

Dear Dr. Sonia Perez,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at http://www.editorialmanager.com/pone/ and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Hin Fung Tsang

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. Custom protocol for RT-PCR amplification of SARS-CoV-2 RNA.

(DOCX)

pone.0289188.s001.docx^{(8.4KB, docx)}

S1 Table. ARTIC primers (v3 & v4.1) used for amplicon-based enrichment.

(XLSX)

pone.0289188.s002.xlsx^{(37.3KB, xlsx)}

S2 Table. Coverage and depth per sample using all enrichment methods and bioinformatic analysis.

(XLSX)

pone.0289188.s003.xlsx^{(78.7KB, xlsx)}

S1 Dataset

(DOCX)

pone.0289188.s004.docx^{(9.1KB, docx)}

Attachment

Submitted filename: Response to reviewers.docx

pone.0289188.s005.docx^{(9.1KB, docx)}

Data Availability Statement

The genetic sequences analysed in the publication are published in the database epiCoV from GISAID. All GISAID IDs for each sequence can be found in the Supporting Information files.

[pone.0289188.ref001] 1.GISAID Initiative n.d. https://www.epicov.org/epi3/frontend#6307ef (accessed January 24, 2023).

[pone.0289188.ref002] 2.Tracking SARS-CoV-2 variants n.d. https://www.who.int/activities/tracking-SARS-CoV-2-variants (accessed February 23, 2024).

[pone.0289188.ref003] 3.John G, Sahajpal NS, Mondal AK, Ananth S, Williams C, Chaubey A, et al. Next-Generation Sequencing (NGS) in COVID-19: A Tool for SARS-CoV-2 Diagnosis, Monitoring New Strains and Phylodynamic Modeling in Molecular Epidemiology. Curr Issues Mol Biol 2021;43:845–67. doi: 10.3390/cimb43020061 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref004] 4.Charre C, Ginevra C, Sabatier M, Regue H, Destras G, Brun S, et al. Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation. Virus Evol 2020;6:veaa075. 10.1093/ve/veaa075. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref005] 5.Liu T, Chen Z, Chen W, Chen X, Hosseini M, Yang Z, et al. A benchmarking study of SARS-CoV-2 whole-genome sequencing protocols using COVID-19 patient samples. iScience 2021;24:102892. doi: 10.1016/j.isci.2021.102892 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref006] 6.Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples | Genome Medicine | Full Text n.d. https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-020-00751-4 (accessed January 31, 2023). [DOI] [PMC free article] [PubMed]

[pone.0289188.ref007] 7.Chiara M, D’Erchia AM, Gissi C, Manzari C, Parisi A, Resta N, et al. Next generation sequencing of SARS-CoV-2 genomes: challenges, applications and opportunities. Brief Bioinform 2021;22:616–30. doi: 10.1093/bib/bbaa297 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref008] 8.Hess JF, Kohl TA, Kotrová M, Rönsch K, Paprotka T, Mohr V, et al. Library preparation for next generation sequencing: A review of automation strategies. Biotechnol Adv 2020;41:107537. doi: 10.1016/j.biotechadv.2020.107537 [DOI] [PubMed] [Google Scholar]

[pone.0289188.ref009] 9.Singh RR. Target Enrichment Approaches for Next-Generation Sequencing Applications in Oncology. Diagnostics 2022;12:1539. doi: 10.3390/diagnostics12071539 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref010] 10.Gallego-García P, Varela N, Estévez-Gómez N, De Chiara L, Fernández-Silva I, Valverde D, et al. Limited genomic reconstruction of SARS-CoV-2 transmission history within local epidemiological clusters. Virus Evol 2022;8:veac008. doi: 10.1093/ve/veac008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref011] 11.SARS-CoV-2 V4.1 update for Omicron variant—Laboratory. ARTIC Real-Time Genomic Surveill 2021. https://community.artic.network/t/sars-cov-2-v4-1-update-for-omicron-variant/342 (accessed February 23, 2024). [Google Scholar]

[pone.0289188.ref012] 12.Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 2016;32:292–4. doi: 10.1093/bioinformatics/btv566 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref013] 13.Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv; 2013. 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]

[pone.0289188.ref014] 14.Grubaugh ND, Gangavarapu K, Quick J, Matteson NL, De Jesus JG, Main BJ, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol 2019;20:8. doi: 10.1186/s13059-018-1618-7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref015] 15.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–9. doi: 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref016] 16.Aksamentov I, Roemer C, Hodcroft EB, Neher RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J Open Source Softw 2021;6:3773. 10.21105/joss.03773. [DOI] [Google Scholar]

[pone.0289188.ref017] 17.Rambaut A, Holmes EC, O’Toole Á, Hill V, McCrone JT, Ruis C, et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020;5:1403–7. doi: 10.1038/s41564-020-0770-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref018] 18.O’Toole Á, Pybus OG, Abram ME, Kelly EJ, Rambaut A. Pango lineage designation and assignment using SARS-CoV-2 spike gene nucleotide sequences. BMC Genomics 2022;23:121. doi: 10.1186/s12864-022-08358-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref019] 19.Sequencing of SARS-CoV-2—first update 2021. https://www.ecdc.europa.eu/en/publications-data/sequencing-sars-cov-2 (accessed February 23, 2024). [Google Scholar]

[pone.0289188.ref020] 20.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002;30:3059–66. doi: 10.1093/nar/gkf436 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref021] 21.Tamura K, Stecher G, Kumar S. MEGA11: Molecular Evolutionary Genetics Analysis Version 11. Mol Biol Evol 2021;38:3022–7. doi: 10.1093/molbev/msab120 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref022] 22.Wickham H. GGPLOT2: Elegant Graphics for Data Analysis 2016 Springer-Verlag, New York: 2016. [Google Scholar]

[pone.0289188.ref023] 23.Nicot F, Trémeaux P, Latour J, Carcenac R, Demmou S, Jeanne N, et al. Whole-genome single molecule real-time sequencing of SARS-CoV-2 Omicron. J Med Virol 2023;95:e28564. doi: 10.1002/jmv.28564 [DOI] [PubMed] [Google Scholar]

[pone.0289188.ref024] 24.Rehn A, Braun P, Knüpfer M, Wölfel R, Antwerpen MH, Walter MC. Catching SARS-CoV-2 by Sequence Hybridization: a Comparative Analysis. mSystems 2021;6:e0039221. doi: 10.1128/mSystems.00392-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref025] 25.Nasir JA, Kozak RA, Aftanas P, Raphenya AR, Smith KM, Maguire F, et al. A Comparison of Whole Genome Sequencing of SARS-CoV-2 Using Amplicon-Based Sequencing, Random Hexamers, and Bait Capture. Viruses 2020;12:895. doi: 10.3390/v12080895 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref026] 26.Xiao M, Liu X, Ji J, Li M, Li J, Yang L, et al. Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples. Genome Med 2020;12:57. doi: 10.1186/s13073-020-00751-4 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref027] 27.Lam C, Gray K, Gall M, Sadsad R, Arnott A, Johnson-Mackinnon J, et al. SARS-CoV-2 Genome Sequencing Methods Differ in Their Abilities To Detect Variants from Low-Viral-Load Samples. J Clin Microbiol 2021;59:e01046–21. doi: 10.1128/JCM.01046-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref028] 28.Nicot F, Trémeaux P, Latour J, Jeanne N, Ranger N, Raymond S, et al. Whole‐genome sequencing of SARS‐CoV‐2: Comparison of target capture and amplicon single molecule real‐time sequencing protocols. J Med Virol 2022:10.1002/jmv.28123. doi: 10.1002/jmv.28123 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref029] 29.Gerber Z, Daviaud C, Delafoy D, Sandron F, Alidjinou EK, Mercier J, et al. A comparison of high-throughput SARS-CoV-2 sequencing methods from nasopharyngeal samples. Sci Rep 2022;12:12561. doi: 10.1038/s41598-022-16549-w [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref030] 30.Davis JJ, Long SW, Christensen PA, Olsen RJ, Olson R, Shukla M, et al. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein. Microbiol Spectr 2021;9:e01803–21. doi: 10.1128/Spectrum.01803-21 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref031] 31.Bei Y, Pinet K, Vrtis KB, Borgaro JG, Sun L, Campbell M, et al. Overcoming variant mutation-related impacts on viral sequencing and detection methodologies. Front Med 2022;9:989913. doi: 10.3389/fmed.2022.989913 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref032] 32.Sanderson T, Barrett JC. Variation at Spike position 142 in SARS-CoV-2 Delta genomes is a technical artifact caused by dropout of a sequencing amplicon. Wellcome Open Res 2021;6:305. doi: 10.12688/wellcomeopenres.17295.1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref033] 33.Chen C, Nadeau S, Yared M, Voinov P, Xie N, Roemer C, et al. CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants. Bioinformatics 2022;38:1735–7. doi: 10.1093/bioinformatics/btab856 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref034] 34.Issues with SARS-CoV-2 sequencing data. Virological 2020. https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/12 (accessed January 31, 2023).

[pone.0289188.ref035] 35.Wang Q, Zhang Y, Wu L, Niu S, Song C, Zhang Z, et al. Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2. Cell 2020;181:894–904.e9. doi: 10.1016/j.cell.2020.03.045 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref036] 36.Amicone M, Borges V, Alves MJ, Isidro J, Zé-Zé L, Duarte S, et al. Mutation rate of SARS-CoV-2 and emergence of mutators during experimental evolution. Evol Med Public Health 2022;10:142–55. doi: 10.1093/emph/eoac010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref037] 37.Zannoli S, Brandolini M, Marino MM, Denicolò A, Mancini A, Taddei F, et al. SARS-CoV-2 Co-Infection in Immunocompromised Host Leads to Generation of Recombinant Strain. Int J Infect Dis 2023. 10.1016/j.ijid.2023.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref038] 38.Klempt P, Brož P, Kašný M, Novotný A, Kvapilová K, Kvapil P. Performance of Targeted Library Preparation Solutions for SARS-CoV-2 Whole Genome Analysis. Diagn Basel Switz 2020;10:769. doi: 10.3390/diagnostics10100769 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0289188.ref039] 39.Afiahayati, Bernard S, Gunadi, Wibawa H, Hakim MS, Marcellus, et al. A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains. Genes 2022;13:1330. doi: 10.3390/genes13081330 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Performance of amplicon and capture based next-generation sequencing approaches for the epidemiological surveillance of Omicron SARS-CoV-2 and other variants of concern

Carlos Daviña-Núñez

Sonia Pérez

Jorge Julio Cabrera-Alvargonzález

Anniris Rincón-Quintero

Ana Treinta-Álvarez

Montse Godoy-Diz

Silvia Suárez-Luque

Benito Regueiro-García

Roles

Abstract

Introduction

Materials and methods

Sample selection

Nucleic acid extraction

Sequencing approaches

Amplicon-based enrichment—manual library preparation

Amplicon-based enrichment—automatic library preparation

Capture-based enrichment

Genomic library analysis

Bioinformatic analysis

Results

SARS-CoV-2 base coverage

Fig 1. Overall results of sequencing of SARS-CoV-2 using an amplicon-based method and a capture-based method for enrichment.

Region specific genome coverage

Fig 2. Evaluation of the effectiveness of two methods of enrichment.

Variant allele frequency

Fig 3. Allele frequency plot by nucleotide of SARS-CoV-2 genome.

Agreement of consensus sequence across methods

Fig 4. Phylogenetic tree for all samples analysed in the study.

Fig 5. Histograms of discrepancies in final consensus sequences.

Discrepancies associated to variations in filtering parameters and the alignment algorithm

Analysis of undesignated possible recombinant samples

Fig 6. Variant allele frequency plot for three possible undesignated recombinant samples.

Discussion

ABE provides higher homogeneity of coverage between samples

ABE is sensitive to amplicon loss due to mutations in the primer binding area

Illumina iSeq 100 is sufficient for SARS-CoV-2 sequencing with enrichment

Automation of ABE enrichment and library generation provided high-coverage SARS-CoV-2 sequencing

Sample viral load is a determining factor in sequencing success

Different bioinformatic pipelines can cause small differences in the final sequence obtained

Limitations and potential improvements

Final remarks

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Hin Fung Tsang

Roles

Author response to Decision Letter 0

Decision Letter 1

Hin Fung Tsang

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases