Skip to main content
ERJ Open logoLink to ERJ Open
letter
. 2019 Nov 28;54(5):1901154. doi: 10.1183/13993003.01154-2019

Accuracy of whole-genome sequencing to determine recent tuberculosis transmission: an 11-year population-based study in Hamburg, Germany

Roland Diel 1,2,11,, Thomas A Kohl 3,4,11, Florian P Maurer 4, Matthias Merker 3,4, Karen Meywald Walter 5, Jörg Hannemann 5, Albert Nienhaus 2,6, Philip Supply 7,8,9,10, Stefan Niemann 3,4
PMCID: PMC6881715  PMID: 31467121

Controlling human-to-human tuberculosis (TB) transmission is key for achieving the targets of the End TB Strategy set by the World Health Organization (WHO) [1, 2]. Stopping TB transmission, in large cities especially, is a challenging top priority worldwide [3]. Metropolitan areas have higher TB case notification rates than the rest of a country, as they concentrate high-risk groups, such as homeless people, drug users and migrants often from (other) high TB incidence settings. Opportunities for transmission are amplified by population density and complex social interactions, regularly leading to large, temporally extended transmission networks [3]. Targeted interventions to interrupt transmission require the combination of effective genotyping of TB strains with enhanced epidemiological investigation. While classic IS6110 DNA fingerprinting and 24-locus MIRU–VNTR (mycobacterial interspersed repetitive units–variable number of tandem repeats) typing provide standardised and easily computable typing results with an online nomenclature system, several studies have now demonstrated that whole-genome sequencing (WGS) has a superior discriminatory power, allowing for an unparalleled resolution of outbreak strains [4–10]. However, predictivity of WGS for detecting transmission in metropolitan areas has not yet been quantified versus most deterministic references, i.e. tangible epidemiological links identified by ad hoc investigation, at extended time and population scales.

Short abstract

WGS typing with a five-SNP c:d:ut-off delineates recent transmission chains with highest accuracy and also provides high-resolution resistance patterns, thus enabling direct clinical benefits http://bit.ly/2Pk37Wo


To the Editor:

Controlling human-to-human tuberculosis (TB) transmission is key for achieving the targets of the End TB Strategy set by the World Health Organization (WHO) [1, 2]. Stopping TB transmission, in large cities especially, is a challenging top priority worldwide [3]. Metropolitan areas have higher TB case notification rates than the rest of a country, as they concentrate high-risk groups, such as homeless people, drug users and migrants often from (other) high TB incidence settings. Opportunities for transmission are amplified by population density and complex social interactions, regularly leading to large, temporally extended transmission networks [3]. Targeted interventions to interrupt transmission require the combination of effective genotyping of TB strains with enhanced epidemiological investigation. While classic IS6110 DNA fingerprinting and 24-locus MIRU–VNTR (mycobacterial interspersed repetitive units–variable number of tandem repeats) typing provide standardised and easily computable typing results with an online nomenclature system, several studies have now demonstrated that whole-genome sequencing (WGS) has a superior discriminatory power, allowing for an unparalleled resolution of outbreak strains [410]. However, predictivity of WGS for detecting transmission in metropolitan areas has not yet been quantified versus most deterministic references, i.e. tangible epidemiological links identified by ad hoc investigation, at extended time and population scales.

To address this gap, we performed classical genotyping (IS6110 DNA fingerprinting and 24 locus MIRU–VNTR typing) and WGS based on Illumina technology, using established procedures [1114], of Mycobacterium tuberculosis complex strains obtained over more than a decade (from 2005 to 2015) from 1171 patients living in Hamburg, Germany, representing 92.3% of all culture-positive cases over the period. Patient strain clusters were defined based on identical classical genotyping patterns and a five-single nucleotide polymorphism (SNP) genetic distance between any two strains as a cut-off for WGS (d5WGS) [15].

All 1171 patients were classified into “clustered” or “not clustered” groups according to the respective results of IS6110 DNA fingerprinting, MIRU-VNTR typing and d5WGS. Clustering results were matched blindly against definite epidemiological links between patients, uninterruptedly identified over the period by systematic contact tracing including geographical mapping by the public health department. In Hamburg, the data of all patients are collected prospectively by trained public health staff using a standardised questionnaire. Beyond capturing the identity of the patient's household, occupational and social contacts, information is obtained on nationality, date and country of birth, immigration status (if necessary), date of entry to Germany, number of years of residency in Hamburg or elsewhere in Germany, present and prior address(es) in Hamburg or elsewhere for the past 10 years (including homeless shelters, if necessary), the nature of the patient's current and prior employment(s), clinical diseases and any previous known exposure to other persons with TB (especially within the 6 months before the development of any symptoms). These data improve the identification of transmissions the patient was previously unaware of, which may be clarified in additional interviews. In addition, they help to confirm lacking epidemiological links, e.g. if two cluster members had been infected by a highly prevalent regional strain in their respective home country before “importing” it to Hamburg, but had never met each other.

We defined the sensitivity of a method as the fraction of TB patients clustered by this method that had epidemiologically confirmed transmission links among all epidemiologically linked patients. Specificity was defined as the fraction of patients not clustered, without identified epidemiological link, among all (clustered and not clustered) patients without transmission. The positive predictive value (PPV) was the percentage of clustered patients who actually had an epidemiologically confirmed transmission among all clustered patients, whereas the negative predictive value (NPV) was the percentage of patients not clustered among those without an epidemiologically confirmed transmission.

From January 1, 2005 to December 31, 2015, cultures from 1171 patients' isolates, including 904 (77.2%) with pulmonary and 267 (22.8%) with extrapulmonary TB, were available for investigation by the three genotyping methods. The average patient age was mean±sd 46.0±19.4 years, with 60.3% male (706 out of 1171). 744 (63.5%) patients were foreign-born, coming from 96 different countries. For 135 (11.5%) of the 1171 patients, epidemiologically confirmed links could be identified (table 1).

TABLE 1.

Comparison of the three genotyping methods with respect to predicting Mycobacterium tuberculosis transmission by clustering

Total generated clusters Total patients in clusters Cluster patients with epidemiological link Sensitivity# Specificity PPV+ NPV§ Accuracyƒ
d5WGS 87 351 134 99.3 (95.9–100) 79.1 (76.4–81.5) 38.2 (33.1–43.5) 99.9 (99.2–100) 81.2 (78.9–83.4)
MIRU–VNTR 131 471 131 97.0 (94.2–99.9) 67.2 (64.3–70.0) 27.8 (23.8–31.9) 99.4 (98.9–100) 70.6 (68.0–73.2)
IS6110 DNA fingerprint 110 417 131 97.0 (94.2–99.9) 72.4 (69.7–75.1) 31.4 (27.0–35.9) 99.5 (99.0–100) 75.2 (72.8–77.7)

Data are presented as n or % (95% CI). Total patients n=1171; of those, 135 patients had a confirmed epidemiological link. PPV: positive predictive value; NPV: negative predictive value; d5WGS: five-single nucleotide polymorphism genetic distance between any two strains as a cut-off for whole-genome sequencing; MIRU–VNTR: mycobacterial interspersed repetitive units–variable number of tandem repeats. #: percentage of cluster patients with a confirmed link among all patients with a confirmed transmission link; : percentage of unclustered patients among all patients without confirmed transmission; +: percentage of clustered patients with a confirmed transmission among all clustered patients; §: percentage of patients assigned as unclustered among those without a confirmed transmission; ƒ: (true positives + true negatives)/(true positives + false positives + true negatives + false negatives).

WGS analysis grouped 351 (31.9%) of the 1171 patients into 87 WGS clusters, each comprising between two and 25 subjects (table 1). In 35 WGS clusters, no detectable transmission links were found between any cluster members (40.2%), while in 52 (59.8%) clusters there was at least one detected transmission link between two cluster members (data not shown). These 52 clusters included 134 out of the 135 individuals for whom a definite epidemiological link had been found. Only one patient involved in a recent transmission chain was thus not captured and falsely classified as not clustered by d5WGS resulting in a sensitivity of 99.3% (95% CI 95.9–100%) (134/135) (table 1). As no transmission event could be detected in 217 (351 – 134) of the WGS-clustered patients, and 819 of the nonclustered patients, specificity was 79.1% (95% CI 76.4–81.5%) (819/819 + 217). The PPV of d5WGS for detecting tangible epidemiological links between patients was 134/351, or 38.2% (95% CI 33.1–43.5%). Conversely, the NPV was 99.9% (95% CI 99.2–100%), with 819 patients not clustered among 820 without evidence for any transmission link.

IS6110 DNA fingerprinting and MIRU–VNTR typing both assigned 131 of the 135 epidemiologically linked patients in our total set of 1171 patients as cluster patients, thus each achieving a sensitivity of 97.0% (95% CI 94.2–99.9%) (table 1).

However, MIRU–VNTR typing assigned 471 patients to clusters, including 131 of the 135 patients with a definite epidemiological link (table 1). Conversely, 696 of the remaining 700 patients, who were not assigned to a cluster, had no epidemiological link. Thus, the specificity for detection of recent transmission chains was 67.2% (696/340 + 696) (95% CI 64.3–70.0%), i.e. 11.9% lower than WGS. Compared to WGS, the PPV of the MIRU was also substantially lower with 131/471, or 27.8% (95% CI 23.8–31.9%), while the NPV was nearly identical with 99.4% (696/700) (95% CI 98.4–100%).

IS6110 genotyping performed slightly better, with clusters comprising 417 patients in total, thus 54 fewer than with MIRU–VNTR typing (table 1). Consequently, specificity was 750/(286 + 750), or 72.40% (95% CI 69.7–75.1%). The PPV of the IS6110 genotyping method was 131/417, or 31.4% (95% CI 27.0–35.9%), while the NPV was 99.5% (750/754) (95% CI 99.0–100%).

Our results are consistent with the findings published in Nikolayevskyy et al.’s [16] most recently published review demonstrating that a cut-off fewer than six SNPs is key to discriminate between TB patients who have suspected transmission and those who have not. However, with respect to the lower specificity of 74% of the d5WGS method for detecting cluster patients with confirmed epidemiological links (compared to its excellent sensitivity of nearly 100%), the results of our study suggest that even a cut-off of five SNPs alone, i.e. without additional epidemiological information, is not able to fully discriminate between those cluster members with verified transmission and those without (e.g. appearing in a cluster by coincidence, but without person-to-person transmission).

Our results are also consistent with findings from a 4-year study of a patient population of the English Midlands, which showed that the expected positive association between a risk factor such as geographic proximity to another TB case and WGS-based clustering vanished with cut-offs exceeding five SNPs [8]. In addition, the latter study found that MIRU–VNTR clusters, combined with shared risk factors between patients, positively predicted only half of five-SNP-based WGS clusters, taken as a “self-defining” reference of recent transmission. Here, we used deterministic links established by intensive epidemiological investigation as an external reference, to determine PPVs and NPVs of both WGS and classical methods independently to detect transmission.

As a limitation, our epidemiological investigation probably missed a number of less tractable epidemiological links (e.g. casual contacts). However, the subtle and prospectively performed sociodemographic in-depth analysis of all TB patients minimised such misclassifications by increasing both sensitivity and specificity. Furthermore, while missed links would cause an underestimation of the PPVs of the respective genotyping methods, this should affect all methods equally.

Compared to classical fingerprinting, genotyping with d5WGS is generally able to define target groups more precisely for health examinations and to show how contact tracing strategies could be adapted to population subgroups with a high potential of exposure to M. tuberculosis transmission. For example, this may be of help in larger ongoing TB outbreaks in which long-term spreads of a single M. tuberculosis strain may occur over decades in the same milieu. Here, typing with WGS has been demonstrated to better assign index cases to recently infected, “true” secondary cases than genotyping with IS6110 restriction fragment length polymorphism, and thus may speed up the start of more focused contact investigations [6].

In addition to more precise transmission analysis, prospective use of rapid WGS also results in clinical benefits for the patients, as resistances can be determined rapidly from WGS data and used for guiding individualised treatment [1, 9].

For instance, only one initial patient in each of three two-person clusters based on IS6110 fingerprinting and MIRU–VNTR typing had isoniazid resistance. Each of these pairs were ungrouped by WGS (data not shown), which avoided an unnecessary modification of TB therapy for each second cluster patient who in fact turned out to have fully susceptible TB once phenotypic drug susceptibility (DST) results were obtained. Likewise, only three out of six multidrug-resistant (MDR)-TB patients in one single Beijing-type MIRU–VNTR cluster had resistance to all oral second-line drugs, while the other three showed susceptibility to prothionamide and moxifloxacin, raising uncertainty about which resistance patterns were correctly identified. By d5WGS, those patients were split up into two different MDR-TB clusters and the contact persons of the index cases in the second cluster could receive a personalised prophylactic combination of prothionamide and moxifloxacin (data not shown).

Especially in MDR- or extensively drug-resistant TB patients, the rapid initiation of effective treatment regimens in the weeks prior to receiving the lengthy phenotypic DST results will avoid ongoing transmission. Actually, the scientific evidence suggests that WGS resistance predictions have reached a level of precision that allows their clinical use and can potentially replace phenotypic DST for first-line drugs [17]. This is likely to be the case for second-line and new drugs in the future.

In conclusion, WGS typing with a five-SNP cut-off delineates recent transmission chains with high accuracy and also provides high-resolution resistance patterns, thus enabling direct clinical benefits. Thus, WGS typing, especially conducted in metropolitan areas, may highly effectively strengthen national contact investigation policies. As the WHO End TB Strategy promotes the early diagnosis of all cases and active case finding, WGS fingerprinting should routinely be incorporated into national TB programmes, at least in those of high-income European countries.

Shareable PDF

This one-page PDF can be shared freely online.

Shareable PDF ERJ-01154-2019.Shareable (152KB, pdf)

Acknowledgements

We would like to thank Ilse Radizio, Vanessa Mohr, Tanja Niemann and Julia Zallet (all from Research Center Borstel, Germany), for technical assistance, the staff of the Public Health Department Hamburg-Central, Germany and Caroline Allix-Béguec, Cyril Gaudin, Mathilde Mairey and Stéphanie Duthoy (all Genoscreen, Lille, France) for their help in genome sequencing.

Footnotes

Conflict of interest: R. Diel has nothing to disclose.

Conflict of interest: T.A. Kohl has nothing to disclose.

Conflict of interest: F. Maurer has nothing to disclose.

Conflict of interest: M. Merker has nothing to disclose.

Conflict of interest: K. Meywald Walter has nothing to disclose.

Conflict of interest: J. Hannemann has nothing to disclose.

Conflict of interest: A. Nienhaus has nothing to disclose.

Conflict of interest: P. Supply reports consultancy for Genoscreen, during the conduct of the study.

Conflict of interest: S. Niemann reports grants from German Center for Infection Research, Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany`s Excellence Strategy – EXC 22167-390884018, and Leibniz Science Campus EvoLUNG, during the conduct of the study.

Support statement: Parts of this work have been supported by the European Union PathoNgenTrace (FP7-278864) project, Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany`s Excellence Strategy – EXC 22167-390884018, Leibniz Science Campus EvoLUNG, and the German Center for Infection Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Raw sequence data (fastq files) have been used form published datasets deposited at the deposited at the European Nucleotide Archive (ENA) [7].

References

  • 1.Dheda K, Gumbo T, Maartens G, et al. . The epidemiology, pathogenesis, transmission, diagnosis, and management of multidrug-resistant, extensively drug-resistant, and incurable tuberculosis. Lancet Respir Med 2017; 5: 291–360. [DOI] [PubMed] [Google Scholar]
  • 2.World Health Organization (WHO). End TB Strategy. www.who.int/tb/post2015_strategy/en Date last accessed: August 5, 2019.
  • 3.van Hest NA, Aldridge RW, de Vries G, et al. . Tuberculosis control in big cities and urban risk groups in the European Union: a consensus statement. Euro Surveill 2014; 19: 20728. [DOI] [PubMed] [Google Scholar]
  • 4.Bjorn-Mortensen K, Soborg B, Koch A, et al. . Tracing Mycobacterium tuberculosis transmission by whole genome sequencing in a high incidence setting: a retrospective population-based study in East Greenland. Sci Rep 2016; 6: 33180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lee RS, Radomski N, Proulx J-F, et al. . Reemergence and amplification of tuberculosis in the Canadian Arctic. J Infect Dis 2015; 211: 1905–1914. [DOI] [PubMed] [Google Scholar]
  • 6.Roetzer A, Diel R, Kohl TA, et al. . Whole genome sequencing versus traditional genotyping for investigation of a Mycobacterium tuberculosis outbreak: a longitudinal molecular epidemiological study. PLoS Med 2013; 10: e1001387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Walker TM, Merker M, Knoblauch AM, et al. . A cluster of multidrug-resistant Mycobacterium tuberculosis among patients arriving in Europe from the Horn of Africa: a molecular epidemiological study. Lancet Infect Dis 2018; 18: 431–440.29326013 [Google Scholar]
  • 8.Wyllie DH, Davidson JA, Grace Smith E, et al. . A quantitative evaluation of MIRU-VNTR typing against whole-genome sequencing for identifying Mycobacterium tuberculosis transmission: a prospective observational cohort study. EBioMedicine 2018; 34: 122–130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Walker TM, Kohl TA, Omar SV, et al. . Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis 2015; 15: 1193–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Merker M, Kohl TA, Niemann S, et al. . The evolution of strain typing in the Mycobacterium tuberculosis complex. Adv Exp Med Biol 2017; 1019: 43–78. [DOI] [PubMed] [Google Scholar]
  • 11.van Embden JD, Cave MD, Crawford JT, et al. . Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J Clin Microbiol 1993; 31: 406–409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Supply P, Lesjean S, Savine E, et al. . Automated high-throughput genotyping for study of global epidemiology of Mycobacterium tuberculosis based on mycobacterial interspersed repetitive units. J Clin Microbiol 2001; 39: 3563–3571. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Merker M, Barbier M, Cox H, et al. . Compensatory evolution drives multidrug-resistant tuberculosis in Central Asia. eLife 2018; 7: e38200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kohl TA, Utpatel C, Schleusener V, et al. . MTBseq: a comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ 2018; 6: e5895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Walker TM, Merker M, Kohl TA, et al. . Whole genome sequencing for M/XDR tuberculosis surveillance and for resistance testing. Clin Microbiol Infect 2017; 23: 161–166. [DOI] [PubMed] [Google Scholar]
  • 16.Nikolayevskyy V, Niemann S, Anthony R, et al. . Role and value of whole genome sequencing in studying tuberculosis transmission. Clin Microbiol Infect 2019; 25: 1377–1382. [DOI] [PubMed] [Google Scholar]
  • 17.Meehan CJ, Goig GA, Kohl TA, et al. . Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol 2019; 17: 533–545. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

This one-page PDF can be shared freely online.

Shareable PDF ERJ-01154-2019.Shareable (152KB, pdf)


Articles from The European Respiratory Journal are provided here courtesy of European Respiratory Society

RESOURCES