Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2020 Mar 27;16(3):e1007687. doi: 10.1371/journal.pcbi.1007687

A model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015

Ellen Brooks-Pollock 1,2,*, Leon Danon 3,4, Hester Korthals Altes 5, Jennifer A Davidson 6, Andrew M T Pollock 7, Dick van Soolingen 5,8, Colin Campbell 6, Maeve K Lalor 6,9
Editor: Roger Dimitri Kouyos10
PMCID: PMC7141699  PMID: 32218567

Abstract

Tuberculosis (TB) remains a public health threat in low TB incidence countries, through a combination of reactivated disease and onward transmission. Using surveillance data from the United Kingdom (UK) and the Netherlands (NL), we demonstrate a simple and predictable relationship between the probability of observing a cluster and its size (the number of cases with a single genotype). We demonstrate that the full range of observed cluster sizes can be described using a modified branching process model with the individual reproduction number following a Poisson lognormal distribution. We estimate that, on average, between 2010 and 2015, a TB case generated 0.41 (95% CrI 0.30,0.60) secondary cases in the UK, and 0.24 (0.14,0.48) secondary cases in the NL. A majority of cases did not generate any secondary cases. Recent transmission accounted for 39% (26%,60%) of UK cases and 23%(13%,37%) of NL cases. We predict that reducing UK transmission rates to those observed in the NL would result in 538(266,818) fewer cases annually in the UK. In conclusion, while TB in low incidence countries is strongly associated with reactivated infections, we demonstrate that recent transmission remains sufficient to warrant policies aimed at limiting local TB spread.

Author summary

Multiple tuberculosis (TB) cases infected with a single strain are known as a TB cluster. In the United Kingdom (UK) for example, TB clusters vary in size from two cases up to over 200 cases. Previous work on cluster sizes demonstrated that highly infectious individuals influence cluster size, but the analysis did not include the largest clusters. Here, we show that the chance of observing a cluster of a given size follows the same pattern in the UK and the NL. Using a new mathematical description of how clusters are formed, we are able to predict the chance of observing the full range of cluster sizes. Using the model, we estimate how many cases are due to recent transmission and how many other cases each case generates. Although we estimate that a minority of cases (39% (26%,60%) in the UK) are due to recent in-country transmission, we find that reducing the onward transmission in the UK to levels in the NL would result in 538 (266,818) fewer cases annually in the UK.

Introduction

Tuberculosis (TB) is a chronic infectious disease and a major global public health threat. In 2017, 10.0 million people developed TB and 1.6 million people died from TB worldwide[1]. In many low TB incidence countries, a high proportion of cases occur in persons born abroad, and control measures such as migrant screening have been introduced to limit imported infection and reduce treatment costs[2,3]. However, it is often not known whether foreign-born individuals were exposed to TB before or after immigrating [4,5], which affects the impact of such interventions.

Over the past 20 years, genotyping has informed our knowledge of how TB evolved, spread around the world, and survives within hosts[68]. Unlike genotyping for other pathogens, TB genotyping cannot always definitively identify who-infected-whom, as epidemiologically-linked cases are often infected with genetically indistinguishable strains[9,10]. Furthermore, cases infected by indistinguishable strains may be epidemiologically unrelated, due to infection with a common strain[11]. Instead, genotyping is often used to rule out transmission, for instance between household members infected with different strains[12,13]. In low incidence countries, distinguishable strains are used to estimate the fraction of cases that are not due to recent transmission, but due instead to the reactivation of existing infections or cases infected elsewhere[14].

TB clusters are defined as multiple cases infected with a single genotype. Clusters are often assumed to signify sustained recent transmission and factors such as pulmonary disease and country-of-origin increase an individual’s risk of being part of a cluster[15]. For acute infections such as measles, the observation that clusters size distributions follow a power-law has been used to indicate that the epidemiological process is at a critical point[16].

For TB, analysis of the distribution of cluster sizes has been used to estimate the genetic mutation rate in a population[17] and infer the role of super-spreading individuals[18]. The latter method was contingent on identifying transmission clusters (defined as clustered cases occurring less than two years apart): alternative methods are required to apply this method without a priori epidemiological knowledge of the likely index case. Furthermore, it is not known how cluster generation differs between settings with potentially differing types of migration and social contact patterns. Here, we propose and develop a method to estimate the distribution in the number of secondary cases (the reproduction number) and the percentage of cases that are due to recent (since 2010), within-country transmission from the information in cluster size distributions for TB in the UK and the NL.

Results

In the UK, data were available for the period between 2010 and 2015, and contained 23,646 genotyped cases and 12,503 unique genotypes. 9,802 (41.5%) cases were unique genotypes and 13,844 (58.5%) cases were in clusters containing two or more individuals.

On average, 70% (42%, 98%) of clustered cases involved pulmonary disease, compared to 53% of all cases (Fig 1A). Between 2010 and 2015, 73% of cases in the UK were non-UK born. A lower proportion of those cases are diagnosed with pulmonary TB[19] (47%) compared to UK-born cases (69%). Within a cluster, 64% (10%,95%) of cases are UK-born.

Fig 1.

Fig 1

The percentage of pulmonary cases in a cluster against cluster size for the UK (a) and the Netherlands (b). The percentage of foreign-born cases in a cluster against cluster size for the UK (c) and the Netherlands (d). Dotted lines indicate the mean value for a cluster.

In the Netherlands (NL), data were available between 2004 and 2015 and contained 8,449 genotyped cases. 3,923 (46.4%) cases were unique genotypes and 4,526 (53.6%) cases were in clusters of two or more. Limiting the analysis to cases diagnosed between 2010 and 2015 for t, there were 3,841 genotyped cases: 2,026 (52.7%) cases had a unique genotype and 1,815 (47.3%) were in clusters of two or more. 67% of all cases involved pulmonary disease, whereas within clusters, on average 70% (26%,100%) of cases were pulmonary (Fig 1B). As in the UK, non-NL born cases make up the majority, 69%. A lower proportion of non-NL born cases were diagnosed with pulmonary TB (63%) compared to NL-born cases (76%). Within a cluster, 66% (33%,100%) of cases are non-NL born.

Modelling the distribution of cluster sizes

The distribution of cluster sizes in the UK and the NL were fitted to a power-law (KS statistics 0.013 and 0.007 respectively; p-values 0.63 and 0.07 respectively). The estimated exponent for the UK was 2.4 (2.3, 2.6) with a minimum cluster size consistent with the power-law, xmin, of 3 (1,5). The estimated exponent for the NL was higher than the UK, 2.8 (2.7, 2.9), with a minimum cluster size consistent with the power-law, xmin, of 1 (1, 1).

From the power-law model, we can predict that in the UK, 1 in 5 genotypes will occur twice or more; 1 in 160 genotypes will occur twenty times or more; and 1 in 6,000 genotypes will occur 200 times or more. In the NL, 1 in 6 genotypes will occur twice or more; 1 in 250 genotypes will occur twenty times or more; and 1 in 8,500 genotypes will occur 200 times or more.

The cluster size distribution in the NL was captured by both branching models, where the distribution of secondary cases follows either a negative binomial or a Poisson lognormal distribution (See Figs A-D in S1 Text for posterior distributions). Both models captured the number of unique genotypes and cluster sizes that occur only once (Fig 2, left, also section S1.2 in S1 Text). Although a branching process with a negative binomial distribution of secondary cases was able to capture the number of unique genotypes in the UK data, it systematically underestimated the frequency of large clusters (Fig 2, right).

Fig 2. The number of clusters of size 1 (i.e. unmatched cases) against the number of cluster sizes that appear exactly once.

Fig 2

The coloured points are 1,000 model replicates selected from the posterior distributions for the branching process model with the distribution of secondary infections following either a Poisson lognormal distribution (yellow triangles) or a negative binomial distribution (blue diamonds). The black points indicate the data values for the NL (left) and the UK (right).

A Poisson lognormal model resulted in increased model uncertainty, but provided an improved fit, and captured the entire distribution in the UK (Figs 2 and 3A) as well as still capturing the NL data (Figs 2 and 3B).

Fig 3.

Fig 3

The distribution of cluster sizes for the UK (a) and the NL (b) with the distribution of cluster sizes produced by 1,000 iterations of the Poisson-lognormal model with parameters drawn from the posterior distributions.

The Poisson lognormal distribution for the number of secondary cases in the UK had log-mean of -2.9 (-4.7, -1.5) and log-variance 2.0 (1.2, 2.8). In NL, between 2004 and 2015 the log-mean was -2.9 (-5.0, -1.6) and log-variance 1.9 (1.0, 2.8). Restricting the analysis to cases reported in NL between 2010 and 2015, decreased the log-mean to -3.4 (-6.7, -1.7) and slightly increased the log-variance 1.9 (1.0, 3.4).

Cases due to recent transmission

From our models, we estimate that between 2010 and 2015, the percentage of cases due to recent transmission in the UK was 39% (26%,60%); with 61% (40%, 74%) of cases due to importation or reactivation. In the NL, we estimate that 23%(13%,37%) of cases are attributable to recent transmission, the remainder being due to importation or reactivation (see section S1.3 and Figure G in S1 Text).

The effective reproduction number

The mean effective reproduction number is calculated from the Poisson lognormal distribution (Eq 1, Methods). For the UK, it was 0.41 (0.30, 0.60), suggesting that, on average, transmission is not sustained (see section S1.4 and Figure H in S1 Text). Using Eq 2 (Methods), this means an average index case will generate 0.7 (0.4, 1.5) further cases. Furthermore, using the model, we find that clusters with more than 10 cases have an average reproduction number greater than 0.9.

In NL, the reproduction number using all data from 2004 to 2015 was 0.33 (0.22, 0.50) and since 2010 this reduced to 0.25 (0.14, 0.48). Even considering onward transmission chains, an average index case in the Netherlands generates 0.3 (0.16, 0.92) further cases.

The role of superspreaders and onward transmission

Fig 4 illustrates the distribution of secondary cases by infectee reproduction number. From the UK data, we estimate that 84% (71%, 90%) of cases did not generate any secondary cases, therefore current control measures are adequately preventing onward transmission in the majority of cases. A further 10% (5%, 17%) of cases generated one secondary case only; they generated 25% (11%, 42%) of cases infected in the UK (Fig 4). 0.47% (0.1%, 0.9%) of cases generated more than 10 secondary cases, and could be considered “superspreaders”. These superspreaders were responsible for 30% (4%, 62%) of secondary recently transmitted cases.

Fig 4. The percentage of cases due to recent transmission against the reproduction number of the person who infected them.

Fig 4

The point estimates are the mean and the error bars are 95% credible intervals calculated using 10,000 parameter sets drawn from the posterior distribution of the model fit to the UK and NL data between 2010 and 2015.

In NL between 2010 and 2015, 88% (77%, 96%) of cases did not generate any secondary cases and 8% (3%, 17%) of cases generated one secondary case, resulting in 34% (9%, 61%) of cases infected in NL. Superspreaders comprised 0.3% (0.02%, 0.8%) of all cases, and were responsible for 19% (1%, 52%) of recently transmitted cases.

By scaling down the transmission parameters in the UK, we estimate that if the UK were able to bring local transmission in line with the NL, they would be able to achieve a 17% (13%, 23%) reduction in incidence, equivalent to preventing 538 (266, 818) cases per year.

Discussion

Tuberculosis (TB) remains a public health concern in low-incidence countries. As the majority of cases in low-incidence countries are foreign-born, impact of controlling recent transmission on overall TB burden is not clear.

Here, we presented methods for exploring and interpreting the full distribution of TB cluster sizes within a country in terms of recent transmission. Using data from the UK and NL, we find that the vast majority of cases did not transmit the infection. Less than 1% of cases caused more than 10 secondary cases and might be defined as “superspreaders”. Overall, the average reproduction number is less than a half in both countries.

Superspreading, where a small proportion of cases generate a disproportionate number of secondary cases, is a common feature of many infectious disease epidemics[20]. Where superspreading dominates dynamics, targeted interventions perform better than population-wide measures, however identifying superspreaders can be challenging.

We estimate that onward transmission is substantially lower in the Netherlands than in the UK. Our estimate of the average reproduction number is consistent with previous estimates in low-incidence settings[21]. In particular, our estimate is in line with Borgdorff et al.’s estimates that also allow multiple introductions per cluster [2224], suggesting that this is an important feature. Our estimate is lower than Ypma et al.’s estimate, which might be explained by the fact that they allow for the possibility of mutations within a cluster. In the UK, Vynnycky and Fine estimated that the effective reproduction number fell to well below one by 1990 [25]. We estimate that the reproduction number is now around 0.4 in the UK, and that reducing this to 0.25, in line with the Netherlands, could prevent one in six UK cases. A comparison of transmission, control measures and outcomes could elucidate the difference we observed between the UK and the NL. These would include the efficiency of contact tracing in the UK[26] and the NL[27], household transmission[13], and different transmission rates between migrant groups[22].

Whole Genome Sequencing (WGS) is increasingly being used for genotyping in high-resource settings[28], having been introduced in 2018 in the NL and 2017 in the UK. In general, WGS analysis (using a 12 nucleotide difference threshold [29]) results in smaller clusters relative to MIRU-VNTR. However, because our method includes multiple independent importations per cluster the overall results are likely to be consistent between typing methods. Applied to WGS data, our methods will provide an independent estimate of transmission, once the pipeline for TB DNA sequencing has been standardised across countries, consensus reached regarding the cut-off number of SNPs to be used for cluster definition[29] and multiple years of data have accumulated. With sufficient data, WGS can be used to re-construct transmission trees and directly estimate reproduction numbers[30]; however in many outbreaks WGS alone is not sufficient, and needs to be combined with epidemiological data and statistical inference[10].

The power of our method lies is the unification of clusters across multiple scales, and is therefore robust to missing data. However, the approach we used does have limitations. Firstly, the model did not include temporal or regional differences in transmission: these will be areas for future development. Further, we assumed that clusters were fully observed. In reality, only culture-confirmed cases can be genotyped and transmission within a cluster may be on-going. In 2015, genotyped cases represented 60.1% of all cases in the UK and 67% of all cases in the NL. The data are right-censored because clusters may not have run their full course; this will apply particularly to strains that appeared for the first time towards the end of the datasets. An inherent limitation of using a terminal branching process model is that we assume that transmission is not sustained without importation from outside the UK/NL or reactivation of old infections. The steady decline in incidence over the study periods suggests that this assumption is reasonable on average, although transmission is most likely sustained in the largest clusters. We did not capture the role of genetic mutation in generating new clusters, thereby potentially underestimating the contribution of recent transmission.

In summary, we observed consistent properties between TB clusters, irrespective of size, origin or country. We find that TB cluster sizes in low incidence countries can be captured by a simple model of importation and transmission. This work will contribute to a more well-developed understanding of TB transmission patterns in low incidence countries and how genotyping can be used for epidemiological inference. Control policies, such as contact tracing, aimed at limiting spread still have a role to play in eliminating TB in low-incidence countries.

Methods

Data sources

UK data

The analysis was conducted using TB notifications collected through the Enhanced Tuberculosis Surveillance (ETS) system in England and Wales and the Enhanced Surveillance of Mycobacterial Infections (ESMI) system in Scotland. The following data for TB notifications were used: year of notification (2010 to 2015 inclusive), country of birth, disease type (pulmonary with or without extra-pulmonary or extra-pulmonary only), strain type (at least 23 out of 24 loci mycobacterial interspersed repetitive unit-variable-number tandem repeat (MIRU-VNTR) type), cluster name (assigned by a PHE naming tool based on strain type) and whether a case was categorised as clustered (yes/no).

NL data

Data from the NL were extracted from the Netherlands Tuberculosis Register. MIRU-VNTR typing has been systematically conducted in the NL since 2004. As for the UK data, we extracted year of notification (2004–2015), country of birth, disease type (pulmonary or extra-pulmonary), strain type 24 loci MIRU-VNTR type.

Defining clusters

Cluster size was defined as the number of cases with an indistinguishable MIRU-VNTR profile, where clusters of size 1 are cases with a unique 24 loci VNTR profile. Cases with a single missing locus that matched 23 loci of another cluster were considered part of that cluster[31]. Cluster sizes were binned logarithmically to retain the distribution shape while minimising noise due to low numbers of large clusters[32].

Statistical model

A feature of clusters size distributions in the UK and the NL, is that the proportion of clusters greater than a given size declines linearly with cluster size in log-log space. In order to characterise the distribution of clusters sizes across multiple scales, we fit a power law function of the form P(x)~xα to the cluster size distributions and assess the fit by calculating an associated p-value[33]. Two parameters are estimated: xmin, the minimum cluster size that is consistent with a power law, and α, the exponent of the power law. The two parameters are estimated by minimising the Kolmogorov–Smirnov (KS) statistic, implemented in the poweRlaw R package[34]. 95% confidence intervals and a p-value are calculated. Within this framework, larger p-values indicate a better fit to the power law model than smaller values–see [33].

Developing a mechanistic mathematical model

Although the power law function estimated above provides a statistical description of the data, we were interested in finding a mechanistic explanation for the distribution of cluster sizes.

In order to do this, we used a mortal branching process model[18,35] with importation of infection to describe the process by which TB clusters are generated and evolve in low incidence settings. The central premise behind the model is that every diagnosed case must have been generated by one of two mechanisms, in a similar structure to household transmission models[36]: A) infection was acquired abroad or before the observation period (referred to as an imported or non-recent infection/reactivation) or B) the case was infected in the country during the observation period (interpreted as recently transmitted infection). Assuming these two mechanisms is broadly consistent with the data: in the UK, 81% of cases infected with a unique genotype were born outside the UK, compared to 70% of clustered cases.

For each unique genotype X, we assume that the first case cannot be due to a recent transmission event, i.e. it was either infected abroad or before routine genotyping. Each case i generates ri secondary cases infected with genotype X where ri is drawn from a probability distribution. We did not differentiate pulmonary cases from extra-pulmonary cases, as there is no evidence of a correlation in pulmonary status between infector and infectee and a previous study of NL cluster sizes[18] found that including extra-pulmonary cases did not affect estimates of the reproduction number.

In addition to recently transmitted cases, we assume that for each case i infected with genotype X, an additional, independent case also infected with genotype X is diagnosed with probability p. This process is repeated for every case in the cluster, i.e. C(X) times for a cluster with C(X) cases. This results in a binomial distribution Bin(C(X),p).

Each of the recent and non-recent cases have the opportunity to generate further secondary cases; this process is repeated until no new cases are generated. The branching process steps are as follows:

  1. Start with the index case of a new cluster. Create a list of cases, L containing a single case, L = {1}, such that the number of cases, n = 1;

  2. For each case iL, draw the number of secondary cases produced by i, ri, from the relevant distribution (Poisson lognormal or negative binomial) and add ri cases to the end of the case list, such that n = n+ri;

  3. Draw a random number between 0 and 1; if this is less than probability p, generate an imported case and add it to the end of the case list, such that n = n+1;

In order to fit this model to cluster size data without further complexity, we impose the assumption that the average number of secondary cases per case must be greater than or equal to zero and less than one, 0≤E(ri)<1, justified by the low and declining incidence in the two countries.

Distribution of secondary cases per individual

Previous analyses have modelled the number of secondary cases per TB case using a negative binomial distribution [18,20], which arises when the expected number of secondary cases per individual, λ, follows a Gamma distribution, λ~Γ(k,θ), with dispersion parameter k and scaling parameter θ. The average number of secondary cases per individual is given by R = .

We compare the negative binomial model for the distribution of within-country secondary cases with a Poisson-lognormal model. A Poisson-lognormal distribution is frequently used in ecological literature as an alternative to a negative binomial to describe species abundance for communities with many rare species[37]. It arises when the logarithm of the expected number of secondary cases per individual, log(λ), follows a normal distribution with mean μ and variance σ, λ~logN(μ,σ). In a lognormal distribution, the average number of secondary cases per individual is given by

R=exp(μ+σ2/2). (Eq 1)

As R<1, the total number of additional cases due to an average imported case is calculated as the sum of a geometric series:

R/(1R). (Eq 2)

We define a “superspreader” as a TB case in the above model that generates more than ten secondary cases. Using the model, we explore the impact of superspreaders by considering the proportion of secondary cases generated by persons with different reproduction numbers.

We use the model to estimate the impact of reducing transmission within the UK to match transmission within the Netherlands. We re-run the model with the estimated UK importation rate but scale the log mean of the Poisson lognormal distribution by a factor μNL¯/μUK¯, where μNL¯ is the average mean of the log normal distribution estimated for the NL and μUK¯ is the average mean of the log normal distribution estimated for the UK. The number of cases is totalled for the alternative scenario with lower transmission and compared to the total number of cases under the UK fitted model.

Fraction of imported cases

We estimate the proportion of cases due to recent transmission by recording the number of cases infected via direct transmission and the number of cases generated by importation during each simulation.

Model fit

In contrast to previous approaches that have used exact likelihood methods for fitting cluster size models to data[18,35], we use Approximate Bayesian Computation (ABC)[38,39]. In ABC, the likelihood is approximated by distance metrics based on summary statistics derived from the data and a realisation of the model, therefore can naturally incorporate the impact of sampling and importation. We use the Majoram MCMC search algorithm implemented in the R package EasyABC[40].

We estimated three model parameters: two for the distribution of secondary cases (either negative binomial or Poisson lognormal) and one for the importation rate. We assumed uniform prior distributions and imposed prior constraints that all parameters are greater than zero and that the reproduction number is greater than or equal to 0 and less than one. The target summary statistics were the number of observed clusters of a given size, logarithmically binned for a fixed number of bins. Using logarithmic binning attempts to compensate for the larger number of data points for lower cluster sizes. For N bins, covering a range of cluster sizes from 1 to Cmax, each bin is of length max(1,exp(n log Cmax/N)) for n = 1,…,Cmax. We chose 50 bins, and the maximum cluster size was 300. For each set of proposal parameters, we simulated the model and binned the resulting cluster sizes in the same way as the data. The distance between the model and the data was calculated using the Euclidean distance:

i=1nbins(DiMi)2,

where Di is the number of observed clusters in the ith bin and Mi is the number of clusters in the ith bin as predicted by the model.

We assessed model fit via two statistics: the proportion of clusters that are of size 1 (i.e. unmatched cases) and the number of cluster sizes that appear exactly once–see reference [17] for a further discussion of these quantities. Together, these two values capture characteristics of TB cluster size distributions across multiple settings with a high proportion of unmatched cases and larger clusters.

From the posterior distributions, we extracted the average number of secondary cases per individual (R), the degree of dispersion and the proportion of cases that are due to recent, within-country transmission. Unless otherwise stated, we report the mean from the posterior distribution and 95% credible intervals in brackets, calculated as the 2.5th and 97.5th quantiles of the posterior distributions.

Supporting information

S1 Text

Details of the models: S1.1) Posterior distributions for the model parameters; S1.2) Comparison between the Poisson lognormal model and the negative binomial distribution model fits for the UK and the NL; S1.3) Posterior distribution for the proportion of cases not due to recent transmission; S1.4) Posterior distribution for the reproduction number in the UK and the NL.

(PDF)

S1 Data. Number of clusters by size for the UK and the NL.

(CSV)

Acknowledgments

Thanks to Rolf Ypma for discussing his paper and early comparisons.

Data Availability

All relevant data are within the manuscript and its Supporting Information files.

Funding Statement

EBP was supported by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Evaluation of Interventions. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. LD gratefully acknowledges the financial support of The Alan Turing Institute under the EPSRC grant EP/N510129/1. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.WHO | Global tuberculosis report 2018. WHO. 2019. Available: https://www.who.int/tb/publications/global_report/en/
  • 2.Aldridge RW, Zenner D, White PJ, Williamson EJ, Muzyamba MC, Dhavan P, et al. Tuberculosis in migrants moving from high-incidence to low-incidence countries: a population-based cohort study of 519 955 migrants screened before entry to England, Wales, and Northern Ireland. Lancet (London, England). 2016;388: 2510–2518. 10.1016/S0140-6736(16)31008-X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kamper-Jørgensen Z, Andersen AB, Kok-Jensen A, Kamper-Jørgensen M, Bygbjerg IC, Andersen PH, et al. Migrant tuberculosis: the extent of transmission in a low burden country. BMC Infect Dis. 2012;12: 60 10.1186/1471-2334-12-60 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Lillebaek T, Andersen AB, Bauer J, Dirksen A, Glismann S, de Haas P, et al. Risk of Mycobacterium tuberculosis transmission in a low-incidence country due to immigration from high-incidence areas. J Clin Microbiol. 2001;39: 855–61. 10.1128/JCM.39.3.855-861.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Stucki D, Ballif M, Egger M, Furrer H, Altpeter E, Battegay M, et al. Standard Genotyping Overestimates Transmission of Mycobacterium tuberculosis among Immigrants in a Low-Incidence Country. J Clin Microbiol. 2016;54: 1862–70. 10.1128/JCM.00126-16 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gagneux S. Host-pathogen coevolution in human tuberculosis. Philos Trans R Soc Lond B Biol Sci. 2012;367: 850–9. 10.1098/rstb.2011.0316 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.van Soolingen D, Borgdorff MW, de Haas PEW, Sebek MMGG, Veen J, Dessens M, et al. Molecular Epidemiology of Tuberculosis in the Netherlands: A Nationwide Study from 1993 through 1997. J Infect Dis. 1999;180: 726–736. 10.1086/314930 [DOI] [PubMed] [Google Scholar]
  • 8.Nebenzahl-Guimaraes H, Verhagen LM, Borgdorff MW, van Soolingen D. Transmission and progression to disease of mycobacterium tuberculosis phylogenetic lineages in the Netherlands. J Clin Microbiol. 2015;53: 3264–3271. 10.1128/JCM.01370-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Packer S, Green C, Brooks-Pollock E, Chaintarli K, Harrison S, Beck CR. Social network analysis and whole genome sequencing in a cohort study to investigate TB transmission in an educational setting. BMC Infect Dis. 2019;19: 154 10.1186/s12879-019-3734-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Didelot X, Gardy J, Colijn C. Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data. Mol Biol Evol. 2014;31: 1869–1879. 10.1093/molbev/msu121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Davidson JA, Thomas HL, Maguire H, Brown T, Burkitt A, MacDonald N, et al. Understanding Tuberculosis Transmission in the United Kingdom: Findings from 6 Years of Mycobacterial Interspersed Repetitive Unit-Variable Number Tandem Repeats Strain Typing, 2010–2015. Am J Epidemiol. 2018;187: 2233–2242. 10.1093/aje/kwy119 [DOI] [PubMed] [Google Scholar]
  • 12.Verver S, Warren RM, Munch Z, Richardson M, van der Spuy GD, Borgdorff MW, et al. Proportion of tuberculosis transmission that takes place in households in a high-incidence area. Lancet. 2004;363: 212–214. 10.1016/S0140-6736(03)15332-9 [DOI] [PubMed] [Google Scholar]
  • 13.Lalor MK, Anderson LF, Hamblion EL, Burkitt A, Davidson JA, Maguire H, et al. Recent household transmission of tuberculosis in England, 2010–2012: retrospective national cohort study combining epidemiological and molecular strain typing data. BMC Med. 2017;15: 105 10.1186/s12916-017-0864-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Hamblion EL, Le Menach A, Anderson LF, Lalor MK, Brown T, Abubakar I, et al. Recent TB transmission, clustering and predictors of large clusters in London, 2010–2012: results from first 3 years of universal MIRU-VNTR strain typing. Thorax. 2016;71: 749–56. 10.1136/thoraxjnl-2014-206608 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fok A, Numata Y, Schulzer M, FitzGerald MJ. Risk factors for clustering of tuberculosis cases: A systematic review of population-based molecular epidemiology studies. International Journal of Tuberculosis and Lung Disease. 2008. pp. 480–492. [PubMed] [Google Scholar]
  • 16.Rhodes CJ, Anderson RM. Power laws governing epidemics in isolated populations. Nature. 1996. pp. 600–602. 10.1038/381600a0 [DOI] [PubMed] [Google Scholar]
  • 17.Luciani F, Francis AR, Tanaka MM. Interpreting genotype cluster sizes of Mycobacterium tuberculosis isolates typed with IS6110 and spoligotyping. Infect Genet Evol. 2008;8: 182–190. 10.1016/j.meegid.2007.12.004 [DOI] [PubMed] [Google Scholar]
  • 18.Ypma RJF, Altes HK, van Soolingen D, Wallinga J, van Ballegooijen WM. A Sign of Superspreading in Tuberculosis. Epidemiology. 2013;24: 395–400. 10.1097/EDE.0b013e3182878e19 [DOI] [PubMed] [Google Scholar]
  • 19.Public Health England. Tuberculosis in England: 2019 report. 2019.
  • 20.Lloyd-Smith JOO, Schreiber SJJ, Kopp PEE, Getz WMM. Superspreading and the effect of individual variation on disease emergence. Nature. 2005;438: 355–9. 10.1038/nature04153 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ma Y, Horsburgh CR, White LF, Jenkins HE. Quantifying TB transmission: a systematic review of reproduction number and serial interval estimates for tuberculosis. Epidemiol Infect. 2018;146: 1478–1494. 10.1017/S0950268818001760 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Borgdorff MW, Nagelkerke N, van Soolingen D, de Haas PEW, Veen J, van Embden JDA. Analysis of Tuberculosis Transmission between Nationalities in the Netherlands in the Period 1993–1995 Using DNA Fingerprinting. Am J Epidemiol. 1998;147: 187–195. 10.1093/oxfordjournals.aje.a009433 [DOI] [PubMed] [Google Scholar]
  • 23.Borgdorff MW, van der Werf MJ, de Haas PEW, Kremer K, van Soolingen D. Tuberculosis Elimination in the Netherlands. Emerg Infect Dis. 2005;11: 597–602. 10.3201/eid1104.041103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Borgdorff MW, Van Den Hof S, Kremer K, Verhagen L, Kalisvaart N, Erkens C, et al. Progress towards tuberculosis elimination: Secular trend, immigration and transmission. Eur Respir J. 2010;36: 339–347. 10.1183/09031936.00155409 [DOI] [PubMed] [Google Scholar]
  • 25.Vynnycky E, Fine PE. The long-term dynamics of tuberculosis and other diseases with long serial intervals: implications of and for changing reproduction numbers. Epidemiol Infect. 1998;121: 309–24. Available: http://www.ncbi.nlm.nih.gov/pubmed/9825782 10.1017/s0950268898001113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Cavany SM, Sumner T, Vynnycky E, Flach C, White RG, Thomas HL, et al. An evaluation of tuberculosis contact investigations against national standards. Thorax. 2017;72: 736–745. 10.1136/thoraxjnl-2016-209677 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Mulder C, van Deutekom H, Huisman EM, Meijer-Veldman W, Erkens CGM, van Rest J, et al. Coverage and yield of tuberculosis contact investigations in the Netherlands. Int J Tuberc Lung Dis. 2011;15: 1630–1637. 10.5588/ijtld.11.0027 [DOI] [PubMed] [Google Scholar]
  • 28.Cabibbe AM, Walker TM, Niemann S, Cirillo DM. Whole Genome Sequencing of Mycobacterium tuberculosis. Eur Respir J. 2018; 1801163 10.1183/13993003.01163-2018 [DOI] [PubMed] [Google Scholar]
  • 29.Walker TM, Ip CLC, Harrell RH, Evans JT, Kapatai G, Dedicoat MJ, et al. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study. Lancet Infect Dis. 2013;13: 137–46. 10.1016/S1473-3099(12)70277-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kühnert D, Coscolla M, Brites D, Stucki D, Metcalfe J, Fenner L, et al. Tuberculosis outbreak investigation using phylodynamic analysis. Epidemics. 2018;25: 47–53. 10.1016/j.epidem.2018.05.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Davidson JA, Lalor MK, Mohiyuddin T, Loutet M, Uddin T, Venugopalan S, et al. Tuberculosis in England 2016 Report (presenting data to end of 2015). Public Heal Engl Rep. 2016. [Google Scholar]
  • 32.Danon L, House TA, Read JM, Keeling MJ. Social encounter networks: collective properties and disease transmission. J R Soc Interface. 2012;9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Clauset A, Shalizi CR, Newman MEJ. Power-law distributions in empirical data. SIAM Review. 2009. pp. 661–703. 10.1137/070710111 [DOI] [Google Scholar]
  • 34.Gillespie CS. Fitting heavy tailed distributions: The powerlaw package. J Stat Softw. 2015;64: 1–16. 10.18637/jss.v064.i02 [DOI] [Google Scholar]
  • 35.Becker N. On Parametric Estimation for Mortal Branching Processes. Biometrika. 1974;61: 393–399. 10.2307/2334720 [DOI] [Google Scholar]
  • 36.Brooks-Pollock E, Becerra MC, Goldstein E, Cohen T, Murray MB. Epidemiologic inference from the distribution of tuberculosis cases in households in Lima, Peru. J Infect Dis. 2011;203: 1582–9. 10.1093/infdis/jir162 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bulmer MG. On Fitting the Poisson Lognormal Distribution to Species-Abundance Data. Biometrics. 1974;30: 101 10.2307/2529621 [DOI] [Google Scholar]
  • 38.Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009;6: 187–202. 10.1098/rsif.2008.0172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Brooks-Pollock E, Roberts GO, Keeling MJ. A dynamic model of bovine tuberculosis spread and control in Great Britain. Nature. 2014;511: 228–231. 10.1038/nature13529 [DOI] [PubMed] [Google Scholar]
  • 40.Jabot F, Faure T, Domoulin N, Albert C. EasyABC: Efficient Approximate Bayesian Computation Sampling Schemes. 2015. [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007687.r001

Decision Letter 0

Jason A Papin, Roger Dimitri Kouyos

4 Sep 2019

Dear Dr Brooks-Pollock,

Thank you very much for submitting your manuscript 'A universal model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: De Brooks-Pollock and colleagues present an analysis of TB genotyping data in the United Kingdom and the Netherlands using branching processes. The methods are interesting and well suited to the objectives. The result that improvements in TB prevention in the UK similar to what has been done in the Netherlands could lead to an important decrease of TB incidence is convincing and important. However, several points of the paper need to be improved before publication, in particular the lack of rigor regarding the use and reporting of statistical methods.

Major points.

I have a problem with the way statistical inference and causality are handled in the results. First, the formulation suggests causality when only association is inferred (e.g. line 125: “having pulmonary disease increased the likelihood of belonging to a cluster”). In this example, the association could be explained by confounding factors, and even if causality exists it could go on the other direction, from belonging to a cluster to having pulmonary disease. The authors need to be a lot more careful with this very basic concept.

Second, the authors need to choose a framework for statistical inference and stick to it. If a null hypothesis significance testing framework is chosen (as suggests the use of p values), then the authors should systematically report effect sizes, confidence intervals and exact p-values for each performed inferred relationship, and not for instance R-squared values (line 127), nothing at all (line 135) or only “p value >0.10” (line 139).

Third, I’m very surprised to see the authors interpret the absence of statistical significance as an absence of effect, especially since this common mistake got a lot of publicity a few months ago with the Nature paper by Amrhein, Greenland and McShane. The authors should reformulate their conclusions on lines 126 (“had no effect”), 138 (“no consistent relationship”) and 345 (“there was no association”).

Fourth, a visual inspection of the fit (which is not even shown) is not sufficient evidence to infer that a power law “describes well” the data (line 143) or that a Poisson-lognormal model “captures the entire distribution” better than a negative binomial model. The authors should use model selection tools (AIC, DIC, cross-validation…) to be able to support such claims. A plot comparing the fits of different models might also help convince the readers.

Minor points

- Abstract/intro: The given intervals should be systematically defined.

- Line 87: I think the authors mean exactly the opposite.

- Line 103: The authors need to define what recent transmission means earlier.

- Lines 103-204: The authors should make it clear that these results are predictions conditional on the chosen model.

- Line 107: The title of the paragraph is “cluster size distribution” but no distribution is shown beyond the proportion of clusters of size 1. I would suggest showing the histograms in Fig. 1, together with the fits as suggested before.

- Line 115: No explanation is given regarding the time limits applied here.

- Line 184: Figure 3 is absolutely terrible and impossible to understand. The authors should definitely find a better way than pie charts to visualize the distribution of secondary cases.

- Line 262: Including cases that were not genotyped wouldn’t alter the estimates of transmission only if these cases are missing at random. This should be discussed in the context of the UK and the NL.

- Line 396: Where does this definition of superspreading come from? If it was used in previous works a reference should be added.

- Line 416: It is a bit clumsy to qualify approximate bayesian computation as an exact likelihood method.

Reviewer #2: This study analyses large data sets of clusters of tuberculosis based on MIRU-VNTR typing from the UK and from the Netherlands. In both places the incidence of TB is dropping so the effective reproduction number is below one. By fitting a model that includes the importation of new cases and a subcritical branching process to the data the authors are able to estimate the effective reproduction number and assess the predicted distribution of cluster sizes under the model. The focus is on pulmonary disease (the proportion of cases that are pulmonary is similar across countries and across cluster sizes). Fitting is performed with approximate Bayesian computation. The authors highlight the importance of superspreaders who transmit >10 cases each and discuss the epidemiological implications of reducing disease transmission.

This study makes excellent use of the large genetic data set; the model is simple yet able to reproduce features of the data successfully. A major issue, however, is that the manuscript feels incomplete at times - it's as if details are assumed to be obvious or already known or perhaps unimportant. Crucial details are missing particularly in the description of the simulation model and the ABC inference. I think it would be better to supply details especially since the current manuscript is not terribly long. Some examples are given below among specific comments and suggestions.

- Title: What is meant by "universal" in the title?

- Luciani et al 2008, Infection, Genetics and Evolution, considered the distribution of TB cluster sizes using population models. It could be worth looking at as part of the background information.

- L 251. I appreciate the argument that WGS alone may not provide enough information to estimate reproduction numbers, but this sentence is a bit unclear. I would explain why in terms of genetic markers and mutation rates.

- L 268 "However previous analysis found that right censoring..." This sentence is vague and cryptic -- spell out what the problem is. Right censoring of what? Why did it not affect the overall results? What is the meaning of "overall" results?

- L 290 This sentence mentions "at least 23 loci" before defining the 24-locus MIRU-VNTR typing. I would define VNTR first for clarity.

- L 354 m is the result of a binomial distribution; p is given but what is the "n" (number of trials) parameter? At first I thought it might be prevalence but I saw that later it is the quantity C(X), though I found that confusing as I'll explain below.

- L 359 "(see results)" Be more specific; say "see Figure 1" if that's what is meant.

- L 367 I find this equation unintuitive in the way it defines C(x) recursively. I can see that in a subcritical branching process the sum of r_i can go up to the total number of cases, but I don't understand the binomial term. How is the expression used computationally when the binomial term requires C(x) which it also contributes to? Could you please clarify and/or give details of how the expression is actually used in the simulation?

- L384 Use of the Poisson-lognormal. This is fair enough but provide a reference from the ecological literature, e.g. Bulmer 1974, Biometrics.

- L395 Eqn 2. Where does this come from? Explain or give a reference. I would also add nearby the condition that R<1.

- L 396 Is a superspreader one who generates more than 10 cases under the same model? Or has a different underlying rate? I believe it's the former but please clarify.

- L 415 The paragraph about the application of ABC is too brief in my opinion. I don't even know what parameters are being estimated. E.g. is p an unknown parameter to be estimated or set to some value? Give details of the prior distributions for them and the rationale for the choices. What's being estimated and what is fixed and assumed to be known?

What distance metrics are used? What do the posterior distributions of the parameters look like? Would be good to see them along with credible intervals, as they are the values that generate the model predictions in Figure 2.

I assume the "95% CIs" given in L170-204 are the credible intervals from the posteriors. If so, make that clearer. Since there are multiple ways to compute credible intervals, how are they actually computed here? "CI" is often read as "Confidence interval" which is a frequentist concept and presumably not what is meant here; clarify what you mean.

- Fig 1 The panel labels seem to be missing. On the pulmonary axis, perhaps add "UK" and "NL" somewhere.

- Fig 2. Missing panel labels again.

- Fig 3. The pie chart is not visually helpful. There is a large sector and many small sectors. The scrunched up sectors have to be expanded to show another unhelpful pie. Could you find another way to show the data? A histogram or a table perhaps?

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: None

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007687.r003

Decision Letter 1

Jason A Papin, Roger Dimitri Kouyos

16 Dec 2019

Dear Dr Brooks-Pollock,

Thank you very much for submitting your manuscript, 'A model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.raised.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled 'Dataset', 'Figure', 'Table', 'Text', 'Protocol', 'Audio', or 'Video'.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org.

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I am satisfied with the revision.

Reviewer #2: The manuscript is improved and most of my suggestions/queries have been incorporate/addressed.

However, I would like to see the description of the model further developed near lines 354-365. The description of the branching process model is better but needs more work. The dummy variable i is used in a confusing way. First, it should be incremented somewhere - otherwise it stays at i=1 permanently and the loop can't be exited. Second, all cases become "relabelled" with respect to i after each "round" but this is not made explicit. Third, the value i=1 starts as the index case but after the first round i=1 is no longer the index case.

The equation on L364 still seems odd and misleading to me. As written, C(x) depends on itself but actually, this is a recursive function which is evaluated iteratively as the algorithm shows. This means that the C(x) on the left hand side is not actually the same quantity as the C(x) on the right hand side. After each round the C(x) must be updated.

I think the iterative structure of the calculations should be made more explicit. My understanding of the algorithm (and the accompanying equation) is as follows.

As described in the manuscript r_i is the number of secondary cases per case, distributed as specified by the model. Define C(X,j) to be the total number of cases of genotype X after j iterations and M(X,j) to be the number of imported cases (of genotype X) after j iterations.

1. Set initial conditions:

j = 0,

C(X,0) = 1 and r_i = 1 when j = 0 (the index case),

M(X,0) = 0 (no imports until the outbreak has begun)

2. Compute

M(X,j) ~ Binomial(C(X,j), p)

C(X,j+1) = \\sum_{i=1}^{C(X,j)} r_i + M(X,j)

3. Assign C(X,j) := C(X,j+1) and increment j to j+1

4. Repeat the recursion (steps 2 and 3) until C(X,j+1) = 0.

Minor comments.

L267 comma instead of full-stop/period in front of "thereby"

L276 reword by moving clause "such as contract tracing" next to "Control policies"

L420 Are all the priors uniform (or "flat")? Whatever they are, supply the information.

L421 I'm assuming that R>0 is also a condition so that 0<r<1.

</r<1.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: None

Reviewer #2: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007687.r005

Decision Letter 2

Jason A Papin, Roger Dimitri Kouyos

16 Jan 2020

Dear Dr Brooks-Pollock,

We are pleased to inform you that your manuscript 'A model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process.

One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org).

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.

Sincerely,

Roger Dimitri Kouyos

Associate Editor

PLOS Computational Biology

Jason Papin

Editor-in-Chief

PLOS Computational Biology

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #2: The description of the model now makes a lot more sense.

This paper is a fine contribution which I look forward to seeing published.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007687.r006

Acceptance letter

Jason A Papin, Roger Dimitri Kouyos

17 Mar 2020

PCOMPBIOL-D-19-01185R2

A model of tuberculosis clustering in low incidence countries reveals more transmission in the United Kingdom than the Netherlands between 2010 and 2015

Dear Dr Brooks-Pollock,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text

    Details of the models: S1.1) Posterior distributions for the model parameters; S1.2) Comparison between the Poisson lognormal model and the negative binomial distribution model fits for the UK and the NL; S1.3) Posterior distribution for the proportion of cases not due to recent transmission; S1.4) Posterior distribution for the reproduction number in the UK and the NL.

    (PDF)

    S1 Data. Number of clusters by size for the UK and the NL.

    (CSV)

    Attachment

    Submitted filename: ReviewerComments.docx

    Attachment

    Submitted filename: ReviewerComments.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES