Abstract
Objective
Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Genetic analyses of TPA reference strains and human clinical isolates have revealed two genetically distinct groups of syphilis-causing treponemes, called Nichols-like and SS14-like groups. So far, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Recently, Sun et al. (Oncotarget 2016. 10.18632/oncotarget.10154) described a new “phylogenetic group” (called Lineage 2) among Chinese TPA strains. This lineage exhibited a “mosaic genomic structure” of Nichols-like and SS14-like lineages.
Results
We reanalyzed the primary sequencing data (Project Number PRJNA305961) from the Sun et al. publication with respect to the molecular basis of Lineage 2. While Sun et al. based the analysis on several selected genomic single nucleotide variants (SNVs) and a subset of highly variable but phylogenetically poorly informative genes, which may confound the phylogenetic analysis, our reanalysis primarily focused on a complete set of whole genomic SNVs. Based on our reanalysis, only two separate TPA clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all Chinese strains.
Electronic supplementary material
The online version of this article (10.1186/s13104-017-3106-7) contains supplementary material, which is available to authorized users.
Keywords: Treponema pallidum, Syphilis, Genome sequencing, Phylogenetic analysis, Single nucleotide variant
Introduction
The bacterium Treponema pallidum subsp. pallidum (TPA) is the causative agent of syphilis. Other subspecies comprise Treponema pallidum subsp. pertenue (TPE) and Treponema pallidum subsp. endemicum (TEN), the causative agents of yaws and bejel, respectively. Since the pathogenic treponemes cannot be continuously cultivated under in vitro conditions, much of our understanding of these pathogens comes from accumulation of genetic and genomic data [1]. As previously shown by whole genome fingerprinting [2], analysis of several treponemal specific loci [3–5], whole genome sequence alignments of TPA reference strains [1, 6], and targeted whole genome sequencing of clinical isolates [7, 8], there is the evidence for two separate genetic subclusters within TPA treponemes, called Nichols-like and SS14-like, differing considerably at the DNA level (~ 400 nt differences).
Recently, Sun et al. [9] sequenced and analyzed eight TPA samples (propagated in rabbit testes) from syphilis-positive Chinese patients and compared them to the available treponemal genomes. Based on their results, a new “phylogenetic group” of TPA strains was identified and named Lineage 2 (Lineage 1 = Nichols-like, Lineage 3 = SS14-like). This Lineage 2 exhibited a “mosaic genomic structure” characterized by the insertion of Lineage 1-derived genes into the Lineage 3-derived genomic backbone. The authors also indicated that the analyzed Chinese TPA strains (Lineage 2) might be derived from recombination or lateral gene transfer events between Lineage 1 and Lineage 3. Until today, no genetic intermediates, i.e. strains containing a mixed pattern of Nichols-like and SS14-like genomic sequences, have been identified. Therefore, evidence for a new phylogenetic lineage would provide a new insight into the diversity and the phylogenetic relationships of TPA strains.
Main text
Methods
We reanalyzed the primary sequencing data from Sun et al. [9] with respect to the molecular basis of Lineage 2 using available SRA data (Illumina HiSeq 2500, 151 bp paired-end; Project Number PRJNA305961) and available TPA reference genomes. In contrast to Sun et al. [9], we used BWA MEM (instead of Bowtie2) and both Nichols and SS14 reference genome sequences for the genomic alignments and SNV analysis, supplemented with the de novo genome assembly analysis. Briefly, sequencing data were pre-processed with Trimmomatic. Sequencing reads were mapped to the reference genomes using BWA MEM. Resulting mappings were post-processed with Samtools to exclude low quality (MAPQ < 10), secondary, and not properly paired mappings. SNV for individual sequenced samples were called using FreeBayes. Hard-filters were applied to keep only high quality variants as recommended by FreeBayes authors; with a minimal depth of at least 5 (DP > 5) and variant call quality of at least 50 (QUAL > 50). Multiple whole genome alignment SNVs were called using NUCmer. Results were used in the phylogenetic analysis and processed with a custom R script. Only SNV detected in all analyzed samples were used in the analysis. For more details on the data collection and analysis, see Additional file 1.
Results
In our analysis, we mapped 1.65–35.61% of all input read pairs to either the SS14 or the Nichols reference genome (see Additional file 2), with average coverage depth ranging between 58 and 1184× for both reference genomes (see Additional file 3). The remaining read pairs mapped primarily to the rabbit reference genome (57.01–89.80%; see Additional file 2). We achieved 99.19–99.35% and 98.87–99.08% genome coverage in the SS14 and Nichols genome alignments, respectively (see Additional file 3). The genomic regions which cannot be covered by uniquely mapped reads were located mainly in paralogous tpr genes (C, D, E, F, G, I, J, and K), RNA operons, and genes containing repetitive sequences, i.e. tp0433 (arp) and tp0470 (see Additional file 4). Overall mapping and genome coverage statistics calculated for the individual Chinese strains are summarized in additional files (see Additional files 2, 3 and 4).
Based on the multiple whole genome alignment of 14 TPA genome sequences, the number of identified SNVs obtained from the sequencing data for the individual Chinese samples ranged between 14 and 19 when compared to the SS14 reference genome and 282–327 when compared to the Nichols reference genome (see Additional file 5). Moreover, additional detailed variant calling analysis of the sequencing data of individual Chinese samples (using FreeBayes) was performed and showed similar results (data not shown). Two separate branches, supported by a bootstrap support greater than 95%, were identified: one consisted of Nichols, DAL-1, Chicago, and Sea 81-4 strains sharing the same phylogenetic cluster (Lineage 1), the other was formed by the SS14-like TPA strains, including all the Chinese strains (Lineage 3). The clustering of the Chinese samples within Lineage 3 is shown in the Fig. 1. A clustering of Chinese strains with genome sequences of clinical TPA isolates from different countries is shown in a supplementary material in Arora et al. (original Supplementary Figure 6) [7].
In addition, we identified the tprD2 allele-specific sequence [10] among the sequencing reads from the Chinese SRA data in all samples (see Additional file 6). While the Nichols reference genome harbors identical copies of tprC and tprD genes, the SS14 reference genome carries the tprD2 allele which is not identical to the tprC gene and differs from the tprD allele by more than 300 nucleotides. The sequence alignment of these alleles comprising the most variable region is shown in Additional file 7.
Moreover, all Chinese samples were found to contain indels identical to those identified in the SS14 genome (see Additional file 6) when compared to the Nichols-like TPA strains.
The analysis of two loci, tp0136 and tp0548, potentially differentiating Nichols-like and SS14-like groups of TPA strains and isolates [11, 12], revealed that these genes were identical to the corresponding SS14 orthologous genes (see Additional file 8). In addition, the analysis of selected genes differentiating Nichols and SS14 reference strains at three or more nucleotide positions (tp0131, tp0136, tp0179, tp0304, tp0346, tp0462, tp0488, tp0515, tp0548, tp0558, and tp1031), showed them to be identical or nearly identical to the SS14 strain (data not shown).
Discussion
The identification of two genetically distinct TPA lineages has been described by earlier genetic studies [5, 6] and these findings have been supported by recent whole genome sequencing studies [7, 8]. In Arora et al. [7], the phylogenetic analysis of 28 clinical isolates from different countries showed a clear separation of TPA isolates into two clusters, SS14-like and Nichols-like, although the Nichols-like cluster revealed greater variability. Moreover, Pinto et al. [8] described three clades—SS14-like (clade I), Nichols-like (clade II) and clade III (represented by only a single genome of the TPA Sea 81-4 strain [13]), it remains to be clarified whether this putative clade III will be supported by additional strains in the future. However, the Sea 81-4 strain shares the same phylogenetic cluster as the Nichols-like TPA strains (Fig. 1). Until now, no genetic intermediates having a mosaic structure of Nichols-like and SS14-like nucleotide sequences within TPA strains have been identified.
Sun et al. [9] described a new Lineage 2 of TPA based on a phylogenetic analysis of several genomic SNVs and sequences of tpr genes, presenting the Chinese strains as SS14-like strains containing recombined sequences originating from Nichols-like strains. This mosaic structure of Lineage 2 was characterized by the insertion of Lineage 1-derived genes (in particular tprC, D, G, and J genes) into the Lineage 3-derived genomic backbone.
There are, however, several issues in the presented analyses of Sun et al. [9]. Sun et al. reported the achievement of 99.99% of the genome coverage for all samples using TPA Nichols reference genome (original Table 1 in Sun et al. [9]). However, TPA genomes have several repetitive regions (representing ~ 1% of the genome length) which cannot be covered by uniquely mapped reads. These regions comprise mainly tpr genes and RNA operons. Long-read sequencing, such as Pacific Biosciences, Oxford Nanopore or even Roche/454, could help to sequence repeat-containing and paralogous regions. However, this sequencing was not performed by Sun et al. [9]. The Bowtie2 settings used by the authors without proper post-processing caused mappings of treponemal reads to wrong genomic locations as well as mapping of the host genome (rabbit) reads to the reference genome. The used read mapping stringency together with the use of inappropriate reference sequence (TPA Nichols) resulted in false chimeric sequences, designated as “Lineage 2”. Inappropriate consensus assembly conditions, combined with the absence of de novo assembly, resulted also in the overlooking of the presence of tprD2 allele-specific sequences in the raw data and filtering out of these tprD2 allele-specific sequences during assembly. Unlike the SS14 genome containing tprC and tprD2 alleles, the Nichols reference genome harbors identical alleles in the tprC and tprD genes.
Sun et al. ([9], original Figure 4A) used alignment settings and phylogenetic trees of tprC/D loci to draw evolutionary inferences. The use of tpr genes alone to disentangle evolutionary relationships is problematic since these genes are likely subject to intra-strain genomic recombination events [3, 14–16] as well as selection pressures, which may confound phylogenetic analyses.
To date, all clinical isolates typed using tp0136 and tp0548 genes, routinely used in sequencing-based molecular typing scheme of syphilis-causing strains, grouped with either Nichols-like or SS14-like TPA groups [11, 12, 17, 18]. Moreover, more widely used enhanced CDC typing scheme sequencing an 83 nt-long fragment of the tp0548 gene [19], showed that 94.5% of 1974 characterized clinical isolates from different countries belong to the SS14-like group [17], which is consistent with the findings related to a recent spread of an epidemic cluster [7]. For all the Chinese strains, tp0136 and tp0548 genes together with 11 other loci (differentiating Nichols and SS14 reference strains at three or more nucleotide positions) were shown to be identical or nearly identical to the SS14-like TPA strains.
Our reanalysis was based on all whole genomic SNVs rather than a subset of several genomic SNVs and highly variable but phylogenetically poorly informative genes (i.e. tpr genes) that confounded the Sun et al. [9] analysis and resulted in misleading conclusions. Based on the whole genome SNV reanalysis, only two separate clusters were identified: one consisted of Nichols-like TPA strains, the other was formed by the SS14-like TPA strains, including all the Chinese strains. Our data clearly showed that all Chinese strains clustered within SS14-like TPA strains.
Limitations
Only available SRA data deposited in the NCBI SRA database (Project Number PRJNA305961) were reanalysed.
Only ~ 99% of genome coverage can be achieved for all TPA Chinese strains due to several repetitive regions which cannot be covered by uniquely mapped reads.
Paralogous and repetitive genomic regions comprising tpr genes were excluded during the processing of the sequencing data, therefore mosaic structure identified in tpr genes by Sun et al. [9] cannot be proved.
Additional files
Authors’ contributions
Conceptualization: MS, JO, NA, KN, FGC, DS. Formal analysis: MS, JO, NA, KN, FGC. Funding acquisition: MS, LM, DS. Investigation: MS, JO, LM, NA, KN, FGC, DS. Writing—original draft: MS, JO, DS. Writing—review and editing: MS, JO, LM, NA, KN, FGC, DS. All authors read and approved the final manuscript.
Acknowledgements
Access to computing and storage facilities, owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the program “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated. We acknowledge the CF New Generation Sequencing Bioinformatics supported by the CIISB research infrastructure (LM2015043 funded by MEYS CR) for their support with obtaining scientific data presented in this paper. We also thank Thomas Secrest (Secrest Editing, Ltd.) for his assistance with the English revision of the manuscript.
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Sequencing data for eight Chinese TPA strains (SHC-0, SHD-R, SHE-V, SHG-I2, B3, C3, K3, and Q3) [9] analyzed during the current study are available in the NCBI SRA database, Project Number PRJNA305961 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA305961). All data generated during this study are included in this article and its Additional information files. The datasets designated as “not shown” are available from the corresponding author on request.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
This work was supported by grants from the Grant Agency of the Czech Republic to DS and MS (GA17-25455S, GJ17-25589Y), by grant from the Czech Health Research Council to DS (17-31333A), and by funds from the Faculty of Medicine of the Masaryk University to junior researchers LM and MS.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Abbreviations
- TPA
Treponema pallidum subsp. pallidum
- TPE
Treponema pallidum subsp. pertenue
- TEN
Treponema pallidum subsp. endemicum
- SRA
sequence read archive
- SNV
single nucleotide variant
- IGR
intergenic region
Footnotes
Michal Strouhal and Jan Oppelt contributed equally to this work
Electronic supplementary material
The online version of this article (10.1186/s13104-017-3106-7) contains supplementary material, which is available to authorized users.
Contributor Information
Michal Strouhal, Email: mstrouhal@med.muni.cz.
Jan Oppelt, Email: jan.oppelt@mail.muni.cz.
Lenka Mikalová, Email: lmikal@med.muni.cz.
Natasha Arora, Email: natasha.arora@uzh.ch.
Kay Nieselt, Email: kay.nieselt@uni-tuebingen.de.
Fernando González-Candelas, Email: fernando.gonzalez@uv.es.
David Šmajs, Phone: +420 549 497 496, Email: dsmajs@med.muni.cz.
References
- 1.Šmajs D, Norris SJ, Weinstock GM. Genetic diversity in Treponema pallidum: implications for pathogenesis, evolution and molecular diagnostics of syphilis and yaws. Infect Genet Evol. 2012;12:191–202. doi: 10.1016/j.meegid.2011.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Mikalová L, Strouhal M, Čejková D, Zobaníková M, Pospíšilová P, Norris SJ, et al. Genome analysis of Treponema pallidum subsp. pallidum and subsp. pertenue strains: most of the genetic differences are localized in six regions. PLoS ONE. 2010;5:e15713. doi: 10.1371/journal.pone.0015713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gray RR, Mulligan CJ, Molini BJ, Sun ES, Giacani L, Godornes C, et al. Molecular evolution of the tprC, D, I, K, G and J genes in the pathogenic genus Treponema. Mol Biol Evol. 2006;23:2220–2233. doi: 10.1093/molbev/msl092. [DOI] [PubMed] [Google Scholar]
- 4.Harper KN, Ocampo PS, Steiner BM, George RW, Silverman MS, Bolotin S, et al. On the origin of treponematoses: a phylogenetic approach. PLoS Negl Trop Dis. 2008;2:e148. doi: 10.1371/journal.pntd.0000148. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nechvátal L, Pětrošová H, Grillová L, Pospíšilová P, Mikalová L, Strnadel R, et al. Syphilis-causing strains belong to separate SS14-like or Nichols-like groups as defined by multilocus analysis of 19 Treponema pallidum strains. Int J Med Microbiol. 2014;304:645–653. doi: 10.1016/j.ijmm.2014.04.007. [DOI] [PubMed] [Google Scholar]
- 6.Pětrošová H, Pospíšilová P, Strouhal M, Čejková D, Zobaníková M, Mikalová L, et al. Resequencing of Treponema pallidum ssp. pallidum strains Nichols and SS14: correction of sequencing errors resulted in increased separation of syphilis treponeme subclusters. PLoS ONE. 2013;8:e74319. doi: 10.1371/journal.pone.0074319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Arora N, Schuenemann VJ, Jäger G, Peltzer A, Seitz A, Herbig A, et al. Origin of modern syphilis and emergence of a contemporary pandemic cluster. Nat Microbiol. 2016;2:16245. doi: 10.1038/nmicrobiol.2016.245. [DOI] [PubMed] [Google Scholar]
- 8.Pinto M, Borges V, Antelo M, Pinheiro M, Nunes A, Azevedo J, et al. Genome-scale analysis of the non-cultivable Treponema pallidum revers extensive within-patient genetic variation. Nat Microbiol. 2016;2:16190. doi: 10.1038/nmicrobiol.2016.190. [DOI] [PubMed] [Google Scholar]
- 9.Sun J, Meng Z, Wu K, Liu B, Zhang S, Liu Y, et al. Tracing the origin of Treponema pallidum in China using next-generation sequencing. Oncotarget. 2016 doi: 10.18632/oncotarget.10154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Centurion-Lara A, Giacani L, Godornes C, Molini BJ, Brinck Reid T, Lukehart SA. Fine analysis of genetic diversity of the tpr gene family among treponemal species, subspecies and strains. PLoS Negl Trop Dis. 2013;7:e2222. doi: 10.1371/journal.pntd.0002222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Flasarová M, Pospíšilová P, Mikalová L, Vališová Z, Dastychová E, Strnadel R, et al. Sequencing-based molecular typing of Treponema pallidum strains in the Czech Republic: all identified genotypes are related to the sequence of the SS14 strain. Acta Derm Venereol. 2012;92:669–674. doi: 10.2340/00015555-1335. [DOI] [PubMed] [Google Scholar]
- 12.Grillová L, Pětrošová H, Mikalová L, Strnadel R, Dastychová E, Kuklová I, et al. Molecular typing of Treponema pallidum in the Czech Republic during 2011 to 2013: increased prevalence of identified genotypes and of isolates with macrolide resistance. J Clin Microbiol. 2014;52:3693–3700. doi: 10.1128/JCM.01292-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Giacani L, Iverson-Cabral SL, King JC, Molini BJ, Lukehart SA, Centurion-Lara A. Complete genome sequence of the Treponema pallidum subsp. pallidum Sea81-4 strain. Genome Announc. 2014;2:e00333-14. doi: 10.1128/genomeA.00333-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Centurion-Lara A, LaFond RE, Hevner K, Godornes C, Molini BJ, Van Voorhis WC, et al. Gene conversion: a mechanism for generation of heterogeneity in the tprK gene of Treponema pallidum during infection. Mol Microbiol. 2004;52:1579–1596. doi: 10.1111/j.1365-2958.2004.04086.x. [DOI] [PubMed] [Google Scholar]
- 15.Giacani L, Molini BJ, Kim EY, Godornes BC, Leader BT, Tantalo LC, et al. Antigenic variation in Treponema pallidum: TprK sequence diversity accumulates in response to immune pressure during experimental syphilis. J Immunol. 2010;184:3822–3829. doi: 10.4049/jimmunol.0902788. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Giacani L, Brandt SL, Puray-Chavez M, Reid TB, Godornes C, Molini BJ, et al. Comparative investigation of the genomic regions involved in antigenic variation of the TprK antigen among treponemal species, subspecies, and strains. J Bacteriol. 2012;194:4208–4225. doi: 10.1128/JB.00863-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Gallo Vaulet L, Grillová L, Mikalová L, Casco R, Rodríguez Fermepin M, Pando MA, et al. Molecular typing of Treponema pallidum isolates from Buenos Aires, Argentina: frequent Nichols-like isolates and low levels of macrolide resistance. PLoS ONE. 2017;12:e0172905. doi: 10.1371/journal.pone.0172905. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mikalová L, Grillová L, Osbak K, Strouhal M, Kenyon C, Crucitti T, et al. Molecular typing of syphilis-causing strains among human immunodeficiency virus-positive patients in Antwerp, Belgium. Sex Transm Dis. 2017;44(6):376–379. doi: 10.1097/OLQ.0000000000000600. [DOI] [PubMed] [Google Scholar]
- 19.Marra C, Sahi S, Tantalo L, Godornes C, Reid T, Behets F, et al. Enhanced molecular typing of Treponema pallidum: geographical distribution of strain types and association with neurosyphilis. J Infect Dis. 2010;202:1380–1388. doi: 10.1086/656533. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data for eight Chinese TPA strains (SHC-0, SHD-R, SHE-V, SHG-I2, B3, C3, K3, and Q3) [9] analyzed during the current study are available in the NCBI SRA database, Project Number PRJNA305961 (https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA305961). All data generated during this study are included in this article and its Additional information files. The datasets designated as “not shown” are available from the corresponding author on request.