Abstract
Objective
Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) is widely used to subtype pathogens causing foodborne and waterborne disease outbreaks. The MLVAType shiny application was previously designed to extract MLVA profiles of Vibrio cholerae isolates from whole-genome sequencing (WGS) data, and provide backward compatibility with traditional MLVA typing methods. The previous development and validation work was conducted using short (pair-end 300 and 150 nt long) reads from Illumina MiSeq and Hiseq sequencing. In this study, the MLVAType application was validated using long reads generated by Oxford Nanopore Technologies (ONT) sequencing platforms. In silico MLVA profiles of V. cholerae isolates (n = 9) from the Democratic Republic of the Congo were generated using the MLVAType application on Nanopore WGS data. The WGS-derived in silico MLVA profiles were extracted from Canu (v.2.2) assemblies obtained through MinION and GridION sequencing by ONT. The results were compared to those obtained from SPAdes assemblies (v3.13.0; k-mer 175) generated from short-read (pair-end 300-bp) reference data obtained by MiSeq sequencing, Illumina.
Results
For each isolate, the in silico MLVA profiles were concordant across all three sequencing methods, demonstrating that the MLVAType application can accurately predict the MLVA profiles from assembled genomes generated by long-reads ONT sequencers.
Keywords: In silico MLVA profiles, Sequencing, Nanopore, Long reads, WGS
Background
Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis (MLVA) is widely used by laboratory-based surveillance networks for subtyping pathogens causing foodborne and water-borne disease outbreaks. We recently demonstrated that WGS data generated with short-read Illumina sequencing technology can be used to extract in silico MLVA profiles of V. cholerae isolates from WGS data while maintaining backward compatibility with traditional MLVA typing methods [1]. The percentage of censored estimations in MLVA profiles generated from WGS data was inversely proportional to the k-mer parameter used during genome assembly. However, preventing censored estimation was possible by using a longer k-mer size (e.g. 175) even though the original SPAdes v.3.13.0 [2] software did not propose this k-mer size.
Both MinION and GridION ONT sequencers are quickly gaining popularity because the long sequence reads enable to assemble contiguous microbial genome. However, their base-calling accuracy is significantly lower than that obtained with Illumina short reads, although the resolution of this shortcoming is steadily improving. More specifically, it is well known that ONT sequencers have difficulty in accurately sequencing low-complexity regions, such as homopolymers [3].
Recent studies have looked into methods for deriving in silico MLVA profiles from long-read sequencing data for several bacterial species. For multidrug-resistant organisms such as Klebsiella pneumoniae, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus, perfect concordance was achieved between in silico MLVA profiles derived from long- and short-read data, as well as conventional MLVA typing [4]. Lower concordance rates were observed for Bacillus anthracis [5], where Nanopore and Illumina sequencing yielded an 88% and 83% concordance, respectively.
To the best of our knowledge, the accuracy of in silico MLVA typing using Nanopore data has not yet been assessed on V. cholerae. This species seems to be undergoing unprecedented genetic changes, with climate change possibly acting as a trigger factor [6, 7]. These changes pose an increasing threat to public health in cholera-affected regions. Therefore, this study aimed to compare and validate MLVA results obtained with WGS data from V. cholerae using MinION, GridION and Illumina sequencing, in order to expand the scope of application of our previous MLVAtype shiny application. We analysed the in silico MLVA profiles derived from the three methods on a series of V. cholerae strains. Given that that we had previously demonstrated the accuracy of MLVA profiles derived from Illumina MiSeq on V. cholerae isolates [1], we compared the ONT results to the Illumina results, which served as benchmark.
Method
Sample collection and sequencing technology
Nine V. cholerae isolates were selected from a collection of isolates characterised in a recent study conducted in the DRC between 2014 and 2017 [8].
Two technologies were used to sequence the whole genomes of selected V. cholerae isolates: Illumina (MiSeq) and ONT (MinION and GridION). Regarding Illumina technology, whole genome assemblies were generated from paired-end 300 nt long reads, as previously detailed [1]. In brief, sequencing libraries were prepared using 70 ng of V. cholerae genomic DNA following the Illumina DNA Prep protocol (Illumina, San Diego, CA, USA). In brief, genomic DNA from V. cholerae isolates was simultaneously fragmented and tagged with sequencing adapters in a single step using Nextera transposome (Nextera XT DNA Library Preparation Kit, Illumina, San Diego, CA, USA). Tagged DNA was then amplified with a 12-cycle polymerase chain reaction (PCR), cleaned up with AMPure beads, and subsequently loaded on a MiSeq for a paired-end 2 × 300 nt sequencing run using MiSeq reagent kit V3 (600 cycles) (Illumina, San Diego, CA, USA).
ONT long-read libraries were generated using 400 ng of high molecular weight genomic DNA (GQN > 8). The DNA was initially fragmented to an average fragment length of 11.6 kb using Covaris g-TUBES (Covaris, Woburn, MA, USA). Libraries were then prepared and barcoded according to ONT’s Ligation Sequencing genomic DNA – DNA Barcoding kit SQK-NBD112.24 protocol. The nine libraries were multiplexed and loaded into two FLO-MIN112 (R10 version) flow cells. Sequencing took 72 h on a MinION Mk1C and a GridION.
Sanger-Derived MLVA Typing was used as a reference method for resolving MLVA discrepancies. Sanger-derived MLVA typing was performed by sequencing amplicons on both strands on the ABI 3130 GA, using the BigDye Terminator v1.1 cycle sequencing kit (Applied Biosystems, USA). Motif repeats were counted manually and translated into MLVA profiles.
WGS assembly and MLVA profiling
WGS data from Illumina MiSeq were assembled into contigs using SPAdes v.3.13.0 [2] with a k-mer value of 175 and other default settings. WGS data from ONT MinION and GridION were assembled into contigs using Canu v.2.2 with genome size = 4 m and other default settings [9].
For each isolate and each sequencing platform, the in silico MLVA profiles were extracted from the assembled contigs using the MLVAtype algorithm, which has been implemented in an R shiny application. This application is freely available at https://ucl-irec-ctma.shinyapps.io/NGS-MLVA-TYPING/. It enables users to upload a list of draft genomes and the nucleotide sequences of the motifs. The application was used to predict MLVA profiles for V. cholerae loci listed in Table 1, as demonstrated in our previous study.
Table 1.
Loci and motifs characterising the MLVA profiles of V. Cholerae
Locus | Motif |
---|---|
VC0147 | Aacaga |
VC0437 | Gacccta |
VC1650 | Ataatccag |
VCA0171 | Gctgtt |
VCA0283 | Ccagaa |
Results
Tables 2 and 3 summarise the sequence quality reported for Illumina MiSeq, MinION, and GridION, respectively. As expected, forward reads from Illumina MiSeq exhibited higher quality than reverse reads.
Table 2.
Quality control metrics of Illumina MiSeq reads
Isolate | Number of reads | Length of reads | Positions in forward reads with median Phred score > 30 | Positions in reverse reads median Phred score > 30 |
---|---|---|---|---|
CTMA-1426 | 2 × 2 228 605 | 2 × 300 | 0.983 | 0.751 |
CTMA-1427 | 2 × 1 521 677 | 2 × 300 | 0.944 | 0.761 |
CTMA-1432 | 2 × 2 195 887 | 2 × 300 | 0.983 | 0.761 |
CTMA-1435 | 2 × 1 613 586 | 2 × 300 | 0.9 | 0.754 |
CTMA-1461 | 2 × 2 289 895 | 2 × 300 | 0.987 | 0.761 |
CTMA-1473 | 2 × 2 525 965 | 2 × 300 | 0.987 | 0.761 |
CTMA-1402 | 2 × 1 990 392 | 2 × 300 | 0.934 | 0.744 |
CTMA-1421 | 2 × 1 949 341 | 2 × 300 | 0.987 | 0.754 |
CTMA-1424 | 2 × 1 955 144 | 2 × 300 | 0.924 | 0.764 |
Table 3.
Quality control metrics of ONT MinION and GridION long reads
ONT sequencing platform | Isolate | Number of reads | Length of reads min – max | median of reads length | Positions with median phred score > 10 |
---|---|---|---|---|---|
MinION | CTMA-1426 | 190 186 | 69 − 63 195 | 1599 | 0.996 |
CTMA-1427 | 87 929 | 70 − 43 611 | 1492 | 0.942 | |
CTMA-1432 | 112 216 | 68 − 61 372 | 759 | 0.985 | |
CTMA-1435 | 143 620 | 70 − 53 139 | 1444 | 0.990 | |
CTMA-1461 | 163 440 | 68–109 071 | 1612 | 0.978 | |
CTMA-1473 | 98 596 | 72–104 391 | 4162 | 0.989 | |
CTMA-1402 | 113 252 | 75 − 44 694 | 2098 | 0.989 | |
CTMA-1421 | 95 725 | 74 − 62 565 | 2604 | 0.991 | |
CTMA-1424 | 1 372 705 | 65–127 859 | 624 | 0.999 | |
GridION | CTMA-1426 | 122 597 | 67 − 43 560 | 1840 | 0.993 |
CTMA-1427 | 57 920 | 70 − 54 700 | 1764 | 0.982 | |
CTMA-1432 | 71 818 | 73 − 43 924 | 857 | 0.995 | |
CTMA-1435 | 93 995 | 69 − 63 410 | 1683 | 0.991 | |
CTMA-1461 | 10 6175 | 69 − 53 447 | 1869 | 0.995 | |
CTMA-1473 | 68 365 | 71 − 47 336 | 4593 | 0.996 | |
CTMA-1402 | 74 836 | 80 − 54 138 | 2450 | 0.995 | |
CTMA-1421 | 63 781 | 70–78 397 | 3095 | 0.999 | |
CTMA-1424 | 829 644 | 70–138 439 | 639 | 0.999 |
Despite having lower quality than Illumina, both ONT platforms produced significantly longer reads (Table 3).
As expected, assembled genomes generated by SPAdes using Illumina MiSeq reads were more fragmented with contigs (ranging from 74 to 89 contigs, compared to those produced by Canu with MinION and GridION reads, which ranged from 2 to 5 contigs. As shown in Table 4, MLVA profiles were generated using the MLVAType algorithm on WGS data from nine previously reported isolates [1, 8]. The results were perfectly concordant across all sequencing platforms.
Table 4.
MLVA profiles of V. Cholerae isolates obtained from WGS data using the MLVAtype shiny application
Isolate | Miseq-derived MLVA profile | MinION-derived MLVA profile | GridION-derived MLVA profile |
---|---|---|---|
CTMA-1402 | (9,7,7,10,16) | Idem | Idem |
CTMA-1421 | (9,7,7,11,17) | Idem | Idem |
CTMA-1424 | (10,7,7,11,15) | Idem | Idem |
CTMA-1426 | (10,7,6,26,20) | Idem | Idem |
CTMA-1427 | (10,7,6,24,21) | Idem | Idem |
CTMA-1432 | (10,7,6,16,20) | Idem | Idem |
CTMA-1435 | (10,7,6,23,18) | Idem | Idem |
CTMA-1461 | (11,7,7,13,16) | Idem | Idem |
CTMA-1473 | (10,7,7,12,16) | Idem | Idem |
Discussion
Due to its low cost and rapid turnaround time, ONT sequencing platforms such as MinION and GridION are appealing to clinical laboratories, with the clear potential to replace traditional typing methods. However, this type of analysis is not yet affordable in all institutions due to several new challenges, including data storage, computing power, and bioinformatics expertise. Moreover, sequencing with ONT platforms still faces the issue of base-calling accuracy when compared to other sequencing platforms such as Illumina short-reads sequencer [10]. Accordingly, the current study was designed to assess the impact of the lower sequencing accuracy of the ONT technology on assembled genomic region of V. cholerae, which is characterised by a variable number of tandem repeats, using both ONT platforms.
Given that we had previously demonstrated the accuracy of Illumina MiSeq-derived MLVA profiles on V. cholerae isolates, we compared the ONT results to the Illumina-Miseq results, which served as a reference. Notably, in this study, MiSeq-derived MLVA profiles were free of censored values due to two factors: (i) the limited number of repetitions per motif, with a maximum of 26 for the 4th motif of CTMA-1426, and (ii) the use of a long k-mer size (175) during genome assembly with SPAdes v.3.13.0.
We used the same nine DRC V. cholerae isolates as in our previous study [1] and assembled the Illumina reads into contigs with the same assembler (SPAdes). Our MLVA studies were therefore conducted in several phases, during which Vibrio cholerae strains were recultured between 2019 and 2023. During this process, minor variations in MLVA profiles were observed in two strains (CTMA-1424 and CTMA-1426) across passages, confirmed by Sanger sequencing. The long-term stability of MLVA profiles across passages has been explored by Kendall et al. [11] and Garcia et al. [12] in large-scale studies, where microevolution was observed in V. parahaemolyticus after multiple passages. In contrast, the limited number of passages (≤ 3) in our study and the low mutation rate (on the order of 10 − 4 mutant per generation) observed during culture by Garcia et al. [12] make it unlikely that significant MLVA changes occurred. We therefore believe that experimental conditions are more likely responsible for the observed MLVA variations. While the same initial colonies were used in our 2019 and 2023 studies, different glycerol stocks were employed. Although we cannot conclusively demonstrate this, the few differences in MLVA profiles are more plausibly attributable to these technical factors than to microevolutionary processes. A large-scale study on V. cholerae, involving many strains and numerous passages (e.g., n > 30), falls beyond the current scope of this study but is planned for the near future.
In conclusion, the perfect concordance of results across short- and long-reads sequencing platforms in this study demonstrates the reliability and accuracy of in silico MLVA typing using Nanopore WGS data. While ongoing technological progress is expected to improve ONT base-calling accuracy in the near future, this study confirms that the currently reported lower accuracy of ONT long-read sequencing, compared to short-read Illumina sequencing, does not affect MLVA typing results.
Limitations
As demonstrated in this study, using short- and long-read sequencing for backward comparison with historical MLVA profiles obtained through traditional methods can introduce bias due to unpredictable MLVA profile variations across passages of the same strain. These variations may result from technical factors (e.g., reculturing strains from different aliquots or randomly analysing different colonies from the same culture plate), genetic microevolution, or a combination of both. Consequently, further investigation is needed to assess the variability among multiple colonies from the same culture plate and the long-term stability of MLVA profiles across numerous passages in a larger-scale study.
Although the limited number of V. cholerae isolates in this study could be considered a genuine limitation, this is counterbalanced by the broad diversity of MLVA profiles included in the analysis and the perfect concordance observed across traditional, short-read, and long-read sequencing methods for MLVA profiling.
Acknowledgements
Not applicable.
Abbreviations
- ONT
Oxford Nanopore Sequencing
- WGS
Whole Genome Sequencing
- MLVA
Multiple-Locus Variable Number of Tandem Repeats (VNTR) Analysis
Author contributions
J.A. and J.L.G. wrote the main manuscript text. B.B. J.F.D and L.I. collected the isolates and performed short and long-reads sequencing. All authors reviewed the manuscript.
Funding
This study was funded by the Belgian Cooperation Agency of the ARES (Académie de Recherche et d’Enseignement Supérieur) [grant COOP-CONV-20-022]. The funder did not play any role in the study design, collection, analysis, and interpretation of data, manuscript writing, or the decision to submit the paper for publication.
Data availability
All NGS data are available from the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), available under study accession number PRJEB55717.
Declarations
Ethics approval and consent to participate
The study protocols, including the method for collecting rectal swab samples and conducting genomic analysis, were approved by the Ethical Review Board (ERB) of the Institut Supérieur des Techniques Médicales de Bukavu, RDC (ISTM-BUKAVU/CRPS/CIES/ML/0016/2023). The ERB explicitly granted a waiver of written informed consent, citing compliance with international ethical guidelines for research conducted during severe outbreaks (e.g., CIOMS 2016). These guidelines recommend the use of oral informed consent for participants with low literacy, particularly in public health emergencies where written consent may be impractical. In accordance with this waiver, all participants provided oral informed consent prior to sample collection. The study also adhered to national regulations of the Democratic Republic of Congo governing research ethics in public health emergencies. To ensure participant confidentiality, all collected samples were fully anonymised during the genomic analysis process.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Ambroise J, Irenge LM, Durant JF, Bearzatto B, Bwire G, Stine OC, et al. Backward compatibility of whole genome sequencing data with MLVA typing using a new MLVAtype shiny application for Vibrio cholerae. PLoS ONE. 2019;14(12):e0225848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19(5):455–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Delahaye C, Nicolas J. Sequencing DNA with nanopores: troubles and biases. PLoS ONE. 2021;16(10):e0257521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Landman F, Jamin C, de Haan A, Witteveen S, Bos J, van der Heide HG, et al. Genomic surveillance of multidrug-resistant organisms based on long-read sequencing. medRxiv. 2024;202402:18–24301916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Linde J, Brangsch H, Hölzer M, Thomas C, Elschner MC, Melzer F, et al. Comparison of Illumina and Oxford Nanopore Technology for genome analysis of Francisella tularensis, Bacillus anthracis, and Brucella suis. BMC Genomics. 2023;24(1):258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Irenge LM, Ambroise J, Bearzatto B, Durant J-F, Bonjean M, Wimba LK et al. Genomic evolution and rearrangement of CTX-Φ prophage elements in Vibrio cholerae during the 2018–2024 cholera outbreaks in eastern Democratic Republic of the Congo. Emerging Microbes & Infections. 2024(just-accepted):2399950. [DOI] [PMC free article] [PubMed]
- 7.Chowdhury FR, Nur Z, Hassan N, von Seidlein L, Dunachie S. Pandemics, pathogenicity and changing molecular epidemiology of cholera in the era of global warming. Ann Clin Microbiol Antimicrob. 2017;16(1):10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Irenge LM, Ambroise J, Mitangala PN, Bearzatto B, Kabangwa RKS, Durant JF, et al. Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014–2017). PLoS Negl Trop Dis. 2020;14(4):e0007642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27(5):722–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Petersen LM, Martin IW, Moschetti WE, Kershaw CM, Tsongalis GJ. Third-generation sequencing in the Clinical Laboratory: exploring the advantages and challenges of Nanopore sequencing. J Clin Microbiol. 2019;58(1). [DOI] [PMC free article] [PubMed]
- 11.Kendall EA, Chowdhury F, Begum Y, Khan AI, Li S, Thierer JH, et al. Relatedness of Vibrio cholerae O1/O139 isolates from patients and their household contacts, determined by multilocus variable-number tandem-repeat analysis. J Bacteriol. 2010;192(17):4367–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Garcia K, Gavilan RG, Hofle MG, Martinez-Urtaza J, Espejo RT. Microevolution of pandemic Vibrio parahaemolyticus assessed by the number of repeat units in short sequence tandem repeat regions. PLoS ONE. 2012;7(1):e30823. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All NGS data are available from the European Nucleotide Archive (ENA, http://www.ebi.ac.uk/ena), available under study accession number PRJEB55717.