Abstract
We present a short summary of recent observations on the global distribution of the major clades of the Mycobacterium tuberculosis complex, the causative agent of tuberculosis. This global distribution was defined by data-mining of an international spoligotyping database, SpolDB3. This database contains 11,708 patterns from as many clinical isolates originating from more than 90 countries. The 11,708 spoligotypes were clustered into 813 shared types. A total of 1,300 orphan patterns (clinical isolates showing a unique spoligotype) were also detected.
Keywords: Mycobacterium tuberculosis, spoligotyping
Since the publication of the second version of our spoligotypes database on Mycobacterium tuberculosis (1), the causative agent of tuberculosis (TB), the proportion of clustered isolates (shared types [STs]) increased from 84% (2,779/3,319) to 90% (11,708/13,008). Fifty percent of the clustered isolates were found in only 20 STs. Three of these isolates are M. bovis, including M. bovis BCG (ST 481, 482, and 683). The addition of the next 30 most frequent STs increased the total proportion of clustered isolates (65% instead of 50% initially).
A total of 36 potential subfamilies or subclades of M. tuberculosis complex have been tentatively identified, leading to the definition of major and minor visual recognition rules (Table). The ancestral East-African Indian family (EAI) is made up of at least five main subclades, whereas at least three major spoligotyping patterns are found within the Haarlem family (2). Two families found in central and Middle Eastern Asia (CAS1 and CAS2) are newly defined. The X family (3) is also currently split into at least three well-defined subclades. However, the subdivision of family T (T1–T4, likely to represent relatively old genotypes), which differs from the classic ST 53 (all spacers present except 33–36), remains poorly defined. Similarly, the Latino-American and Mediterranean family (LAM) is tentatively split into subclades LAM1–LAM10 (4). Spoligotyping used alone is not well suited for studying the phylogeny of these two clades (T and LAM). Such study will require results from other genotyping methods such as IS6110-restriction fragment length polymorphism (5) or mycobacterial interspersed repetitive units–variable number of DNA tandem repeats (6). Among well-characterized major clades of tubercle bacilli, four families represent 35% of 11,708 clustered isolates (Beijing 11%, LAM 9.3%, Haarlem 7.5%, and the X clade 7%).
Table. Excerpt from SpolDB3 database showing prototype spoligotypes, visual recognition rules, and binary and octal descriptiona.
Rk | ST | Classb | Total (n)c | Rulesd | Binary description | Octal |
---|---|---|---|---|---|---|
1 | 1 | Beijing | 1282 | ∆1–34 | οοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοοννννννννν | 000000000003771 |
2 | 53 | T1 | 864 | F | ννννννννννννννννννννννννννννννννοοοοννννννν | 777777777760771 |
11 | 52 | T2 | 163 | ∆40 and F | ννννννννννννννννννννννννννννννννοοοονννοννν | 777777777760731 |
30 | 37 | T3 | 71 | ∆13 and F | ννννννννννννονννννννννννννννννννοοοοννννννν | 777737777760771 |
64 | 40 | T4 | 26 | ∆19 and F | ννννννννννννννννννονννννννννννννοοοοννννννν | 777777377760771 |
7 | 47 | Haarlem1 | 246 | ∆26–30 and E | νννννννννννννννννννννννννοοοοοονοοοοννννννν | 777777774020771 |
20 | 2 | Haarlem2 | 104 | ∆1–24, ∆26–30 and E | οοοοοοοοοοοοοοοοοοοοοοοονοοοοοονοοοοννννννν | 000000004020771 |
3 | 50 | Haarlem3 | 519 | E | ννννννννννννννννννννννννννννννονοοοοννννννν | 777777777720771 |
6 | 119 | X1 | 310 | C | νννννννννννννννννοννννννννννννννοοοοννννννν | 777776777760771 |
4 | 137 | X2 | 427 | C and ∆39–42 | νννννννννννννννννοννννννννννννννοοοοννοοοον | 777776777760601 |
31 | 92 | X3 | 70 | ∆4–12 and C | νννοοοοοοοοονννννοννννννννννννννοοοοννννννν | 700036777760731 |
15 | 48 | EAI1 | 118 | A and ∆40 | ννννννννννννννννννννννννννννοοοονονννννοννν | 777777777413731 |
13 | 19 | EAI2 | 130 | ∆3, ∆20–21 and A | ννοννννννννννννννννοονννννννοοοονοννννννννν | 677777477413771 |
16 | 11 | EAI3 | 121 | ∆2-3, A and ∆37–39 | νοονννννννννννννννννννννννννοοοονοννοοονννν | 477777777413071 |
8 | 139 | EAI4 | 234 | ∆26–27 and A | νννννννννννννννννννννννννοονοοοονοννννννννν | 777777774413771 |
46 | 236 | EAI5 | 41 | A | ννννννννννννννννννννννννννννοοοονοννννννννν | 777777777413771 |
24 | 181 | Afri1 | 91 | ∆7–9 and ∆39 | ννννννοοονννννννννννννννννννννννννννννονννν | 770777777777671 |
ND | 331 | Afri2 | 9 | ∆8–12, ∆21–24 and ∆37–39 | νννννννοοοοοννννννννοοοοννννννννννννοοονννν | 774077607777071 |
ND | 438 | Afri3 | 3 | ∆8–12 and ∆37–39 | νννννννοοοοοννννννννννννννννννννννννοοονννν | 774077777777071 |
17 | 482 | M. bovis-BCG | 26 | ∆3, ∆9, ∆16 and D | ννονννννοννννννοννννννννννννννννννννννοοοοο | 676773777777600 |
ND | 641 | M. microti | 8 | 4-7, 23–24, 37–38 | οοοννννοοοοοοοοοοοοοοοννοοοοοοοοοοοοννοοοοο | 074000030000600 |
ND | 592 | M. canetti | 6 | 30 and 36 | οοοοοοοοοοοοοοοοοοοοοοοοοοοοονοοοοονοοοοοοο | 000000000101000 |
21 | 26 | CAS1 | 102 | ∆4–7, ∆23–34 | νννοοοονννννννννννννννοοοοοοοοοοοοννννννννν | 703777740003771 |
ND | 288 | CAS2 | 6 | ∆4–10, ∆23–34 | νννοοοοοοοννννννννννννοοοοοοοοοοοοννννννννν | 700377740003771 |
12 | 20 | LAM1 | 152 | ∆3 and B | ννονννννννννννννννννοοοοννννννννοοοοννννννν | 677777607760771 |
22 | 17 | LAM2 | 92 | ∆3, ∆13 and B | ννονννννννννονννννννοοοοννννννννοοοοννννννν | 677737607760771 |
19 | 33 | LAM3 | 108 | ∆9–11 and B | ννννννννοοονννννννννοοοοννννννννοοοοννννννν | 776177607760771 |
49 | 60 | LAM4 | 37 | ∆40 and B | ννννννννννννννννννννοοοοννννννννοοοονννοννν | 777777607760731 |
42 | 93 | LAM5 | 44 | ∆13 and B | ννννννννννννονννννννοοοοννννννννοοοοννννννν | 777737607760771 |
37 | 64 | LAM6 | 47 | ∆29 and B | ννννννννννννννννννννοοοοννννονννοοοοννννννν | 777777607560771 |
36 | 41 | LAM7 | 48 | ∆20, ∆26-27 and B | νννννννννννννννννννοοοοονοονννννοοοοννννννν | 777777404760771 |
NA | 290 | LAM8 | 9 | ∆27 and B | ννννννννννννννννννννοοοοννονννννοοοοννννννν | 777777606760771 |
5 | 42 | LAM9e | 344 | B | ννννννννννννννννννννοοοοννννννννοοοοννννννν | 777777607760771 |
9 | 61 | LAM10 | 202 | ∆23–25 and F | ννννννννννννννννννννννοοονννννννοοοοννννννν | 777777743760771 |
26 | 34 | Sf | 82 | ∆9–10 and F | ννννννννοοννννννννννννννννννννννοοοοννννννν | 776377777760771 |
28 | 451 | H37Rv | 78 | ∆20–21 and F | νννννννννννννννννννοονννννννννννοοοοννννννν | 777777477760771 |
aRk, ranking no.; ND, not done; ST, arbitrary designation; M., Mycobacterium. bClass: family definition. See text for the definition of the family acronyms. cTotal (n), size of the class; binary and octal, description. dRule A, absence of spacers 29–32, presence of spacer 33 and absence of spacer 34; rule B, absence of spacers 21–24 and spacers 33–36; rule C : absence of spacer 18 and spacers 33–36; rule D, absence of spacers 39–43; rule E, absence of spacer 31 and spacers 3–-36; rule F, absence of spacers 33–36. Clades defined with low sample size, such as Afri2, Afri3, CAS2, and LAM8 are subject to change. eFormerly LAM1. fFormerly LAM2.
The global distribution of the most frequently observed spoligotypes by continent in SpolDB3 is as follows. Among the patterns originating in North America (n= 4,276, 32% of the total number of isolates in the database), 16% of the strains are of the Beijing type, 14% belong to ST 137 or ST 119 (X family), and 8% are unique (results not shown). In Central America (n=587, 4.5%), 8% of the strains belong to the ubiquitous ST 53, 7% are ST 50, and 6% are ST 2; the last two STs are part of the Haarlem family. In South America (n=861, 6.6%), the distribution of ST 53 and ST 50 accounts for 10% and 9%, respectively, of the spoligotypes, whereas ST 42 accounts for as much as 9% of the total isolates. The origin of ST 42 remains to be established. In Africa (n=1,432, 11%), ST 59 and ST 53 account for 9% of all isolates studied thus far; however, the values obtained for ST 59 are biased because strains from Zimbabwe are overrepresented. We also observed that M. africanum ST 181 accounts for as much as 6% of all spoligotypes from Africa in our sample.
In Europe (n=4,360, 33.5%), ST 53 represents as much as 9% of the spoligotypes, ST 50 and 47 (Haarlem family) represent 8% of the cases, and the Beijing family accounts for 4% of the spoligotypes. In the Middle Eastern and central Asian region, where the number of samples obtained is still very low (n=351, 2.7%), a high diversity of strains within the EAI and CAS families has been observed, and no single pattern currently exceeds 5%. Further studies of isolates from these regions are needed, e.g., in India, where our sampling is still anecdotal (n=44 isolates). Notwithstanding the scarcity of available data from this region, the observed diversity suggests that this region might be of great interest for further study of the genetic variation of tubercle bacilli. Contrary to what we observed for the Middle East and central Asia, the Far East Asian region (n=801, 6.1%) is characterized by the prevalence of a single genotype, the Beijing type family, a family linked to emerging multiresistance (7). One out of two strains in the Far East is a Beijing type. In Oceania (n=340, 2.6%), ST 19 and Beijing account for 15% and 13%, respectively, of clustered isolates. Thus, this preliminary analysis of the spoligotype distribution of SpolDB3 clearly shows major differences in the population structure of tubercle bacilli within the eight subcontinents studied (Africa; Europe; North America; Central America; South America; Middle East and Central Asia; Far East Asia; and Oceania).
At present, SpolDB 3 is an experimental tool that has yet to prove its usefulness in tracking epidemics. Nevertheless, the facility with which matches between spoligotypes can be detected suggests that this tool may be a good screening mechanism for population-based studies on recent TB transmission. Indeed, the detection of a rarely found ST in SpolDB3 may be a catalyst that signals researchers to look for the clonality of the isolates and to study their epidemiologic relatedness.
Data-exchange protocols through inter-networking will also be implemented in the near future. Working groups such as the European Network for Exchange of Molecular Typing Information (available from: URL: www.rivm.nl/enemti) are coordinating such initiatives. The expanded use of the Bionumerics software (third upgrade; Applied Maths, St. Martens-Latem, Belgium) may also foster this research field. SpolDB3 will also be instrumental in facilitating better understanding of the driving forces that shape tubercle bacilli evolution. Further research should now emphasize the use of data-mining methods, in combination with experts’ knowledge, to tackle the complex dynamics of the population's genetics of tubercle bacilli and TB transmission (3). Our sample represents the compilation of many national studies and, as such, should be considered as an ongoing population-based project aimed at studying global TB genetic diversity. Nevertheless, obtaining a more precise and representative snapshot of the genetic variability of M. tuberculosis complex will require a larger sampling. Although only partially representative of worldwide spoligotypes of M. tuberculosis complex, Spo1DB3 contains a reservoir of genetic information that has already proved useful for defining the phylogenetic links that exist within the TB genomes and for constructing theoretical models of genome evolution. Much remains to be done to evaluate the potential of global genetic databases to better characterize casual contacts (that could lead to identification of sporadic cases) in TB epidemiology. An improved version of our database, which will focus on areas with a high prevalence of TB, is currently in development; as of August 26, 2002, it had 20,000 isolates and 3,000 alleles. Ongoing population-based genotyping projects will likely help shed light on contemporary and ancient tubercle bacilli’s evolutionary history.
Acknowledgments
This paper was written as part of the EU Concerted Action project QLK2-CT-2000-00630 and partly supported by the Réseau International des Instituts Pasteur et Instituts Associés, Institut Pasteur and Fondation Française Raoul Follereau, France. An electronic, simplified, version of SpolDB3 is available from the corresponding authors upon request.
Biography
Dr. Filliol performed this work as part of her doctoral thesis. She has been working at the Institut Pasteur de Guadeloupe for the last 4 years. Her research focuses on molecular epidemiology and phylogeny of tubercle bacilli.
Footnotes
Suggested citation for this article: Filliol I, Driscoll JR, van Soolingen D, Kreiswith BN, Kremer K, Valétudie G, et al. Global distribution of Mycobacterium tuberculosis spoligotypes. Emerg Infect Dis [serial online] 2002 Nov [date cited]. Available from http://www.cdc.gov/ncidod/EID/vol8no11/02-0125.htm
References
- 1.Sola C, Filliol I, Guttierez CM, Mokrousov I, Vincent V, Rastogi N. Spoligotype database of Mycobacterium tuberculosis: biogeographical distribution of shared types and epidemiologic and phylogenetic perspectives. Emerg Infect Dis. 2001;7:390–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kremer K, van Soolingen D, Frothingham R, Haas WH, Hermans PWM, Martin C, et al. Comparison of methods based on different molecular epidemiological markers for typing of Mycobacterium tuberculosis strains: interlaboratory study of discriminatory power and reproducibility. J Clin Microbiol. 1999;37:2607–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sebban M, Mokrousov I, Rastogi N, Sola C. A data-mining approach to spacer oligonucleotide typing of Mycobacterium tuberculosis. Bioinformatics. 2002;18:235–43. 10.1093/bioinformatics/18.2.235 [DOI] [PubMed] [Google Scholar]
- 4.Sola C, Filliol I, Legrand E, Mokrousov I, Rastogi N. Mycobacterium tuberculosis phylogeny reconstruction based on combined numerical analysis with IS1081, IS6110, VNTR and DR-based spoligotyping suggests the existence of two new phylogeographical clades. J Mol Evol. 2001;53:680–9. 10.1007/s002390010255 [DOI] [PubMed] [Google Scholar]
- 5.van Embden JDA, Cave MD, Crawford JT, Dale JW, Eisenach KD, Gicquel B, et al. Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol. 1993;31:406–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Supply P, Lesjean S, Savine E, Kremer K, van Soolingen D, Locht C. Automated high-throughput genotyping for the study of global epidemiology of Mycobacterium tuberculosis based on mycobacterial interspersed repetitive units. J Clin Microbiol. 2001;39:3563–71. 10.1128/JCM.39.10.3563-3571.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Glynn JR, Whiteley J, Bifani PJ, Kremer K, van Soolingen D. Worldwide occurrence of Beijing/W strains of Mycobacterium tuberculosis: a systematic review. Emerg Infect Dis. 2002;8:843–9. [DOI] [PMC free article] [PubMed] [Google Scholar]