Abstract
We sequenced the genome of Mycobacterium tuberculosis strain MT11, which exhibits a specific 16S rRNA gene mutation found in 6% of French Polynesian M. tuberculosis isolates. It comprises a 4,110,293-bp chromosome with 65.15% G+C content, and it encodes 3,949 proteins and contains 85 predicted RNA genes. The TbD1 region is absent in strain MT11 as in modern M. tuberculosis strains.
GENOME ANNOUNCEMENT
Pulmonary tuberculosis remains one of most deadly infectious diseases (1). The responsible agent, Mycobacterium tuberculosis, is a genetically monomorphic species, owing to its low DNA diversity (2) and clonal evolution (3), but with a host adaptation characterized by specific clones in some geographic areas (4–6). Among 34 M. tuberculosis isolates obtained from French Polynesian patients (7), we recently observed two isolates, MT11 and MT14, the peptidic spectra of which were clustered by matrix-assisted laser desorption ionization–time of flight mass spectrometry. Of note, the 16S rRNA gene sequences of both isolates showed 99% sequence similarity to that of M. tuberculosis H37Rv (GenBank accession no. AL123456) and a unique mutation at position 1247. This mutation differed from all 16S rRNA genes previously reported in the other M. tuberculosis complex members. The partial rpoB gene sequences of MT11 and MT14 showed 100% similarity with that of M. tuberculosis H37Rv (GenBank accession no. AL123456) (8). Twenty-four mycobacterial interspersed repetitive-unit–variable-number tandem-repeat (MIRU-VNTR) genotyping loci (9) and spoligotyping (10) identified the two isolates as belonging to the Haarlem lineage (O. D. Aboubaker, M. Phelippeau, M. Drancourt, and D. Musso, unpublished data). We thought that analyzing the whole-genome sequence of M. tuberculosis MT11 would help define the phylogenetic relationships within the M. tuberculosis complex and design tools for its advanced detection and identification.
Chromosomal DNA was isolated as previously described (11) and sequenced on the MiSeq Technology (Illumina, Inc., San Diego, CA, USA) through four runs using two mate-pair libraries with insert sizes of 10 and 3.3 kb in a 2 × 250-bp run for each barcoded library. The whole set of reads was trimmed using Trimmomatic (12) and assembled using the assembler software SPAdes (13, 14). Contigs were combined using SSPACE (15) and Opera (16), helped by GapFiller (17), and homemade tools in Python were used to refine the set.
The draft genome of M. tuberculosis MT11 consists of 16 contigs without gap containing 4,110,293 bp, which is the second smallest genome among M. tuberculosis, and a 65.15% G+C content. Noncoding genes and miscellaneous features were predicted using RNAmmer (18), Aragorn (19), Rfam (20), Pfam (21), and Infernal (22). Coding DNA sequences (CDSs) were predicted using Prodigal (23), and functional annotation was achieved using BLAST+ (24) and HMMER3 (25) against the UniProtKB database (26). The genome was shown to contain at least 85 predicted RNAs, including 3 rRNAs, 51 tRNAs, one transfer-messenger RNA (tmRNA), and 30 miscellaneous RNAs. A total of 3,949 genes were also identified, representing a coding capacity of 3,682,833 bp (89.6% coding density). Among these genes, 281 (7.12%) encode putative proteins, and 542 (13.72%) encode hypothetical proteins. Moreover, 2,776 genes matched a least one sequence in the Clusters of Orthologous Groups (COGs) database (27, 28) using BLASTp default parameters.
Nucleotide sequence accession numbers.
The M. tuberculosis MT11 strain annotated genome sequence has been deposited at EMBL under the accession numbers CVMX01000001 to CVMX01000016.
ACKNOWLEDGMENT
This study was financially supported by URMITE, IHU Méditerranée Infection, Marseille, France.
Footnotes
Citation Aboubaker Osman D, Phelippeau M, Musso D, Robert C, Michelle C, Croce O, Drancourt M. 2015. Draft genome sequence of Mycobacterium tuberculosis strain MT11, which represents a new lineage. Genome Announc 3(3):e00573-15. doi:10.1128/genomeA.00573-15.
REFERENCES
- 1.World Health Organization (WHO) 2014. Global tuberculosis report 2014. World Health Organization, Geneva, Switzerland: http://www.who.int/tb/publications/global_report/en/. [Google Scholar]
- 2.Achtman M. 2008. Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62:53–70. doi: 10.1146/annurev.micro.62.081307.162832. [DOI] [PubMed] [Google Scholar]
- 3.Warren RM, Richardson M, Sampson SL, van der Spuy GD, Bourn W, Hauman JH, Heersma H, Hide W, Beyers N, van Helden PD. 2001. Molecular evolution of Mycobacterium tuberculosis: phylogenetic reconstruction of clonal expansion. Tuberculosis (Edinb) 81:291–302. doi: 10.1054/tube.2001.0300. [DOI] [PubMed] [Google Scholar]
- 4.Gagneux S, DeRiemer K, Van T, Kato-Maeda M, de Jong BC, Narayanan S, Nicol M, Niemann S, Kremer K, Gutierrez MC, Hilty M, Hopewell PC, Small PM. 2006. Variable host-pathogen compatibility in Mycobacterium tuberculosis. Proc Natl Acad Sci U S A 103:2869–2873. doi: 10.1073/pnas.0511240103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hirsh AE, Tsolaki AG, DeRiemer K, Feldman MW, Small PM. 2004. Stable association between strains of Mycobacterium tuberculosis and their human host populations. Proc Natl Acad Sci U S A 101:4871–4876. doi: 10.1073/pnas.0305627101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Firdessa R, Berg S, Hailu E, Schelling E, Gumi B, Erenso G, Gadisa E, Kiros T, Habtamu M, Hussein J, Zinsstag J, Robertson BD, Ameni G, Lohan AJ, Loftus B, Comas I, Gagneux S, Tschopp R, Yamuah L, Hewinson G, Gordon SV, Young DB, Aseffa A. 2013. Mycobacterial lineages causing pulmonary and extrapulmonary tuberculosis, Ethiopia. Emerg Infect Dis 19:460–463. doi: 10.3201/eid1903.120256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.El Khéchine A, Couderc C, Flaudrops C, Raoult D, Drancourt M. 2011. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry identification of mycobacteria in routine clinical practice. PLoS One 9:e24720. doi: 10.1371/journal.pone.0024720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Adékambi T, Colson P, Drancourt M. 2003. rpoB-based identification of non pigmented and late-pigmenting rapidly growing mycobacteria. J Clin Microbiol 41:5699–5708. doi: 10.1128/JCM.41.12.5699-5708.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Supply P, Allix C, Lesjean S, Cardoso-Oelemann M, Rüsch-Gerdes S, Willery E, Savine E, de Haas P, van Deutekom H, Roring S, Bifani P, Kurepina N, Kreiswirth B, Sola C, Rastogi N, Vatin V, Gutierrez MC, Fauville M, Niemann S, Skuce R. 2006. Proposal for standardization of optimized mycobacterial interspersed repetitive unit-variable-number tandem repeat typing of Mycobacterium tuberculosis. J Clin Microbiol 12:4498–4510. doi: 10.1128/JCM.01392-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kamerbeek J, Schouls L, Kolk A, van Agterveld M, van Soolingen D, Kuijper S, Bunschoten A, Molhuizen H, Shaw R, Goyal M, van Embden J. 1997. Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol 35:907–914. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Van Soolingen D, Hermans PW, de Haas PE, Soll DR, van Embden JD. 1991. Occurrence and stability of insertion sequences in Mycobacterium tuberculosis complex strains: evaluation of an insertion sequence-dependent DNA polymorphism as a tool in the epidemiology of tuberculosis. J Clin Microbiol 29:2578–2586. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M, Usadel B. 2012. RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 40:W622–W627. doi: 10.1093/nar/gks540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Nurk S, Bankevich A, Antipov D, Gurevich AA, Korobeynikov A, Lapidus A, Prjibelski AD, Pyshkin A, Sirotkin A, Sirotkin Y, Stepanauskas R, Clingenpeel SR, Woyke T, McLean JS, Lasken R, Tesler G, Alekseyev MA, Pevzner PA. 2013. Assembling single-cell genomes and mini-metagenomes from chimeric MDA products. J Comput Biol 20:714–737. doi: 10.1089/cmb.2013.0084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579. doi: 10.1093/bioinformatics/btq683. [DOI] [PubMed] [Google Scholar]
- 16.Gao S, Sung WK, Nagarajan N. 2011. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J Comput Biol 18:1681–1691. doi: 10.1089/cmb.2011.0170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Boetzer M, Pirovano W. 2012. Toward almost closed genomes with GapFiller. Genome Biol 13:R56. doi: 10.1186/gb-2012-13-6-r56. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lagesen K, Hallin P, Rødland EA, Staerfeldt HH, Rognes T, Ussery DW. 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108. doi: 10.1093/nar/gkm160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Laslett D, Canback B. 2004. Aragorn, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32:11–16. doi: 10.1093/nar/gkh152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. 2003. Rfam: an RNA family database. Nucleic Acids Res 31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD. 2012. The Pfam protein families database. Nucleic Acids Res 40:D290–D301. doi: 10.1093/nar/gkr1065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Nawrocki EP, Kolbe DL, Eddy SR. 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335–1337. doi: 10.1093/bioinformatics/btp157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. doi: 10.1186/1471-2105-11-119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: architecture and applications. BMC Bioinformatics 10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Eddy SR. 2011. Accelerated profile HMM searches. PLoS Comput Biol 7:e1002195. doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.UniProt Consortium 2011. Ongoing and future developments at the universal protein resource. Nucleic Acids Res 39:D214–D219. doi: 10.1093/nar/gkq1020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tatusov RL, Galperin MY, Natale DA, Koonin EV. 2000. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36. doi: 10.1093/nar/28.1.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Tatusov RL, Koonin EV, Lipman DJ. 1997. A genomic perspective on protein families. Science 278:631–637. doi: 10.1126/science.278.5338.631. [DOI] [PubMed] [Google Scholar]
