Skip to main content
Nature Portfolio logoLink to Nature Portfolio
. 2021 Sep 20;6(10):1279–1288. doi: 10.1038/s41564-021-00963-3

Dissemination of Mycobacterium abscessus via global transmission networks

Christopher Ruis 1,2, Josephine M Bryant 1, Scott C Bell 3,4, Rachel Thomson 5, Rebecca M Davidson 6, Nabeeh A Hasan 6, Jakko van Ingen 7, Michael Strong 6, R Andres Floto 1,8,9,, Julian Parkhill 2,
PMCID: PMC8478660  PMID: 34545208

Abstract

Mycobacterium abscessus, a multidrug-resistant nontuberculous mycobacterium, has emerged as a major pathogen affecting people with cystic fibrosis (CF). Although originally thought to be acquired independently from the environment, most individuals are infected with one of several dominant circulating clones (DCCs), indicating the presence of global transmission networks of M. abscessus. How and when these clones emerged and spread globally is unclear. Here, we use evolutionary analyses of isolates from individuals both with and without CF to reconstruct the population history, spatiotemporal spread and recent transmission networks of the DCCs. We demonstrate synchronous expansion of six unrelated DCCs in the 1960s, a period associated with major changes in CF care and survival. Each of these clones has spread globally as a result of rare intercontinental transmission events. We show that the DCCs, but not environmentally acquired isolates, exhibit a specific smoking-associated mutational signature and that current transmission networks include individuals both with and without CF. We therefore propose that the DCCs initially emerged in non-CF populations but were then amplified and spread through the CF community. While individuals with CF are probably the most permissive host, non-CF individuals continue to play a key role in transmission networks and may facilitate long-distance transmission.

Subject terms: Bacterial genetics, Pathogens, Genetic variation


In this Article, the authors perform evolutionary analyses of M. abscessus clinical isolates and report the emergence of dominant circulating clones (DCCs) in non-cystic fibrosis (CF) individuals followed by amplification in the CF community.

Main

Mycobacterium abscessus is a multidrug-resistant species of nontuberculous mycobacteria (NTM) that has recently emerged as a major threat to individuals with CF, with increasing rates of infection seen in CF cohorts around the world1. This rapidly growing NTM is divided into three subspecies: M. abscessus subspecies abscessus (M. a. abscessus), M. a. massiliense and M. a. bolletii2. Infections with M. abscessus lead to accelerated inflammatory lung damage3, are often difficult or impossible to treat despite prolonged courses of combination antibiotics46 and may prevent safe lung transplantation1,7.

Although M. abscessus was originally thought to only be independently acquired from the environment, we8 and others9 have shown that individuals with CF can become infected through hospital-based person-to-person transmission (probably through the generation of long-lived infectious aerosols or via fomite spread2). Indeed, large scale whole genome sequencing of M. abscessus isolates from CF centres around the world has revealed that most individuals with CF are infected with one of three highly prevalent globally dispersed clones, referred to as DCCs 1–3 (ref. 2). Assigning the DCC classification onto whole genome sequences from other studies indicates that the DCCs can also infect non-CF individuals10,11. DCC isolates are associated with worse clinical outcomes, have greater antibiotic resistance and are more virulent in in vitro and in vivo infection models2, suggesting that multiple rounds of within-host evolution have promoted increased pathogenic potential.

The high levels of genetic relatedness within the DCCs suggest that they have emerged recently and have rapidly spread within and between countries, as well as across continents2,10,12,13. There are many examples of isolates from individuals in different CF centres or in different countries whose sequences differ by fewer mutations than have been seen in a single individual during chronic infection2,14, suggesting individuals are linked by recent and widespread transmission networks. However, despite multiple studies combining whole genome sequencing with epidemiological data2,8,9,1318, the relative importance of different routes of acquisition remains unclear. In addition, the mechanism of transcontinental spread of DCCs6 (given the lack of movement of CF individuals) is unknown. Several studies have found near-identical isolates in CF individuals with no obvious epidemiological links1518, suggesting that transmission chains include additional unknown links, potentially implicating environmental or human intermediates.

To understand how the M. abscessus DCCs emerged and the routes through which they have spread globally, we have applied evolutionary phylogenetic analyses to whole genome sequences of clinical isolates from 1,178 individuals on five continents. We show that the DCCs expanded synchronously around the time of increases in CF life expectancy, spread globally (particularly from Europe to North America) and contain a mutagenic signature of exposure to smoking-related mutagens during their expansion, implicating smokers and more generally non-CF individuals, as an important second human niche for M. abscessus.

Results

Synchronous clonal expansions coincident with the emergence of the CF lung niche

We first reconstructed the phylogenetic relationships between 2,045 M. abscessus whole genome sequences from 1,178 individuals to identify recently emerged clones whose ancestry could be dated. We identified seven clonal clusters, including the three previously identified DCCs2, that contained highly related isolates from at least 20 individuals (Fig. 1a and Extended Data Fig. 1) and were found on multiple continents (Fig. 1b and Supplementary Table 1). We classified these clones as DCC1–7.

Fig. 1. Emergence of M. abscessus DCCs.

Fig. 1

a, The phylogenetic placement of the DCCs in the M. abscessus species phylogenetic tree. Several non-DCC clades have been collapsed for clarity; the full tree is presented in Extended Data Fig. 1. b, The proportion of individuals from each location infected with each DCC. All samples from Brazil are from a surgical outbreak caused by a single lineage30. c, Temporal maximum clade credibility phylogenetic tree of each DCC. DCC most recent common ancestor dates are shown as median with 95% HPD in parentheses. Shaded areas show the first period of rapid divergence in each DCC. d, Expansion date distributions for each DCC as calculated from the complete posterior distribution of trees (number of samples from the posterior distribution: DCC1 = 2,255, DCC2 = 2,255, DCC3 = 2,255, DCC4 = 2,699, DCC5 = 2,702, DCC6 = 2,613, DCC7 = 2,703). Boxplot centre lines show median value; upper and lower bounds show the 25th and 75th quantile, respectively; upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively; and points show samples outside the whisker range.

Source data

Extended Data Fig. 1. Population structure of M. abscessus.

Extended Data Fig. 1

A phylogenetic tree was reconstructed containing all isolates from all individuals (n = 1335 for M. abscessus subsp abscessus (M. a. abscessus), n = 710 for M. a. massiliense). As Fig. 1a but no clades were collapsed in this tree. The sequences that cluster within each DCC are labelled. FastBAPS clusters are plotted next to the tree for M. a. abscessus (green) which contains 19 FastBAPS clusters and M. a. massiliense (black) which contains 17 FastBAPS clusters.

Source data

Bayesian temporal reconstruction19 indicated that each of the DCCs emerged relatively recently, with their most recent common ancestors dating between 1902 and 1999 (Fig. 1c). The dated phylogenetic tree of each DCC is characterized by a section with multiple concurrent short branches, indicating a historical period of rapid expansion from a small number of lineages into multiple independent transmission chains that have persisted to the present day (Fig. 1c). In support of this conclusion, we found strong statistical evidence (posterior probability of one for each DCC, Methods) that each DCC had undergone at least one historical population expansion, the first of which occurred for six of the seven DCCs in the 1960s and for DCC4 much later (Fig. 1d).

We proposed that the expansion in DCCs could have been driven by changes within the CF population since M. abscessus has become a main cause of lung infection in individuals with CF1, can transmit onwards from CF individuals2,8 and is more prevalent in this cohort than in other patient groups20. We found that the DCC expansion occurred shortly after life expectancy of CF individuals began to increase from infancy to over 10 years of age in the 1950s and 1960s (Extended Data Fig. 2)21,22. This period also saw the establishment of CF treatment centres and the widespread use of antipseudomonal and antistaphylococcal antibiotics21 (to which M. abscessus is intrinsically resistant) (Extended Data Fig. 2), suggesting that the synchronous expansion of DCCs in the 1960s may have been driven both by increases in the number and density of susceptible individuals (promoting person-to-person transmission2,8) and also the creation of a more permissive lung niche for M. abscessus, through suppression of other lung pathogens1,23. The more recent expansion of DCC4 in the mid-2000s (Fig. 1d) suggests that suitable conditions for the emergence of new clones have persisted within present day patient cohorts.

Extended Data Fig. 2. Drivers of the concurrent DCC expansion.

Extended Data Fig. 2

The bottom panel shows the expansion date distribution for each of the DCCs, as in Fig. 1d (number of samples from the posterior distribution: DCC1 = 2255, DCC2 = 2255, DCC3 = 2255, DCC4 = 2699, DCC5 = 2702, DCC6 = 2613, DCC7 = 2703). Boxplot centre lines show median value, upper and lower bounds show the 25th and 75th quantile, respectively, upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively, and points show samples outside of the whisker range. The top panel shows factors within the CF population that likely drove DCC expansion. The middle panel shows the trend in cigarette sales in developed countries; each blue line is the trend in a single country and the black line is the average trend across countries combined.

Source data

Rare intercontinental transmission events drive global transmission networks

We next investigated the global transmission network of M. abscessus by reconstructing the broad-scale geographical distribution of the three most prevalent clones, DCCs 1–3. In each case, isolates from different continents are interspersed within the phylogenetic tree (Fig. 2a), demonstrating historical intercontinental transmission events that are relatively uncommon, with approximately one out of every 50 lineages moving between continents each year (Fig. 2b). Furthermore, we found a significant correlation between the age of a clade and the number of continents it has been detected in, with most newly emerged clades only present in one continent (Extended Data Fig. 3). However, despite these low overall rates of intercontinental transmission, we found strong evidence for migration of M. abscessus between specific pairs of continents and, in particular, of migration of all three main DCCs from Europe to North America (Fig. 2c), potentially driven by the high density of CF individuals in Europe24,25.

Fig. 2. Global migration network of the DCCs.

Fig. 2

a, Phylogeographic trees for DCCs 1–3. Each branch is coloured by continent. b, Intercontinental migration rates for each DCC for the complete dataset and five random subsamples (Methods, number of samples from the posterior distribution: DCC1 complete dataset of 3,157, all DCC1 subsampled datasets, DCC2 subsampled datasets 1, 3 and 5, all DCC3 subsampled datasets of 2,703 each; DCC2 complete dataset, DCC2 subsampled datasets 2 and 4, DCC3 complete dataset of 2,256 each). Boxplot centre lines show median value; upper and lower bounds show the 25th and 75th quantile, respectively; upper and lower whiskers show the largest and smallest values within 1.5 times the interquartile range above the 75th percentile and below the 25th percentile, respectively; and points show samples outside the whisker range. The median and 95% HPD of the complete DCC datasets are shown above the bars of the corresponding DCC. An intercontinental migration rate of 0.02 migrations per lineage per year corresponds to one in 50 lineages moving continent in a year. c, Intercontinental migration pathways for each DCC. Arrows show the direction of supported migration from source continent to recipient continent. Arrows are coloured by originating continent.

Source data

Extended Data Fig. 3. Older phylogenetic clades contain sequences from more continents than more recent phylogenetic clades.

Extended Data Fig. 3

The age and number of continents descendent sequences were sampled from was calculated for all clades across the posterior distribution of phylogenetic trees (number of clades: DCC1 = 539847, DCC2 = 173635, DCC3 = 182655). Boxplots depict all clades from all trees. The distribution of clades ages is significantly different for all pairs of continent numbers for all DCCs (Bonferroni corrected two sample two-sided Kolmogorov-Smirnov test, p < 0.001).

Source data

Mutational spectrum analysis implicates smokers in DCC transmission networks

While our results indicate a major role for individuals with CF in the expansion of DCCs, it is clear from our temporal reconstruction that several of the DCCs emerged within the human population before CF survival increased beyond early childhood (Fig. 1c), suggesting an alternative pre-existing niche. Similarly, the transcontinental spread of DCCs is difficult to explain without an alternative human vector for transmission, given the historically limited travel of CF individuals.

We therefore proposed that individuals who smoke, given their recognized predisposition to mycobacterial infection2629, might have provided the early niche for DCC emergence and facilitated their global dissemination. To test this hypothesis, we performed a mutational spectrum analysis of the M. abscessus genomes. The mutational spectrum is the pattern of different types of DNA mutation arising as a consequence of the action of various mutagenic processes and DNA repair mechanisms over time30,31. Since bacteria are likely to be exposed to different mutagens during pulmonary infection than when in the environment, we reasoned that we could use mutational spectrum analysis to define the historical exposure of DCCs to these different niches.

We first calculated the spectrum of mutations in M. abscessus attributable to the environment, by analysing mutations occurring along the internal phylogenetic branches of the non-DCC clades, which constitute long periods of predominantly or exclusively environmental replication (Fig. 3a,b). We were then able to define the specific mutational profile arising only during chronic infection of CF individuals (by examining within-patient longitudinal samples), and the profile generated during the expansion and dissemination of DCCs (by looking at the internal branches of the DCC phylogenies), which were both notably different from the spectrum attributable to the environment and from each other (Fig. 3c).

Fig. 3. DCCs exhibit a smoking-related mutational spectrum.

Fig. 3

a, Branch classification for each branch type. The left-hand tree illustrates the categorization of non-DCC branches into: within patient, which occur downstream of a node in the tree where all of the descendent sequences are from the same patient; tip branches and branches leading to within-patient clade ancestors where some of the evolution may have occurred within a patient; and internal branches where all the change is likely to have occurred in the environment. The right-hand tree illustrates the additional categorization of DCC branches into deep branches that precede the first population expansion. Note that in DCCs, all branch categories are likely to be within patients, not the environment. b, Inferred environmental mutational spectrum of M. abscessus as calculated from the internal branches of the non-DCCs. The mutational spectrum consists of six different mutations, shown in different colours, each in 16 different nucleotide contexts25. The bars show the proportion of the total mutations of that type in that context along the non-DCC internal branches. c, Mutational spectra attributable to mutations occurring (top) within patients with CF (on the basis of longitudinal isolates collected from infected individuals), and (bottom) during expansion and dissemination of the DCCs (on the basis of analysis of the internal phylogenetic branches of the DCCs excluding the deep branches, that is, after the population expansion). Results are shown as the difference from the environmental spectrum. Asterisks indicate significantly different proportions relative to the environmental mutational spectrum, calculated through permutation analysis. d, Deconvolution analysis reveals relative contributions of specific mutational processes to each spectrum: unassigned (grey), spontaneous (yellow), alkylating agent–associated (purple), ROS-associated (red) and smoking-related (blue).

Source data

Decomposition analysis32 (Fig. 3d) suggests that the environmental mutational spectrum consists of contributions from spontaneous mutation events (arising from cytosine deamination and inefficient homologous recombination repair) and mutations caused by alkylating agents, some of which have previously been reported to be present in soil33,34. In contrast, the within-CF patient spectrum has contributions from mutations associated with exposure to reactive oxygen species (ROS)30, thought to be generated at high levels within the CF lung35.

We found that the DCC-specific spectrum, in addition to having some contribution from ROS-related mutagenesis, was dominated by a general (context-independent) elevation in C-to-A/G-to-T mutations, a pattern seen in lung cancers from smokers36 and in human cells exposed to tobacco smoke in vitro30 (Fig. 3d). We can therefore infer that the DCC expansion and transmission networks involve smokers as well as individuals with CF.

Individuals with CF and without CF (non-CF) are involved in DCC transmission networks

Given the mutational evidence of involvement of smokers in the initial emergence and continued transmission of the DCCs, we sought to quantify whether current transmission networks included both those with and without CF, or whether they form discrete transmission chains. We found that isolates from the lungs of individuals with and without CF could both be found in all seven DCCs (Fig. 4a, Extended Data Fig. 4a and Supplementary Table 2) and in non-DCCs (Extended Data Fig. 4b), showing that all of the emergent clones are capable of infecting members of either group. We then linked individuals on the basis of the genetic distance of their isolates in terms of single nucleotide polymorphisms (SNPs), using a range of linkage cut-offs that capture possible person-to-person transmission2. This demonstrated many close acquisition linkages between individuals within both DCC and non-DCC clades (Fig. 4b and Extended Data Fig. 5), showing that M. abscessus transmission is not limited to just the DCCs and raising the possibility of the emergence or discovery of more circulating clones in the future. We showed that local linkages dominate at smaller genetic distances, with national and then international linkages becoming more frequent at higher SNP distances (Fig. 4c and Extended Data Fig. 5); the expected signature of geographical spread of transmissible clones. We were also able to show that, at all genetic distances, linkages could be identified between individuals with CF, between individuals with and without CF and directly between individuals without CF (Fig. 4c and Extended Data Fig. 5) (although the relative proportions of these will be confounded by the different levels of sampling in the two groups). To confirm this apparent linkage between individuals with and without CF, we reconstructed transmission networks on the basis of SNP cut-offs of 10 and 38 SNPs (representing likely and possible transmission based on within-host diversity2, Fig. 4d). At both cut-off levels, transmission networks exist that include both individuals with and without CF, with each group exhibiting strong connectivity within the network (Extended Data Fig. 6). We further investigated the largest network (which consists of isolates from DCC1 from both individuals with and without CF, Fig. 4e) and found that this network had a global reach, with multiple international and national connections evident.

Fig. 4. Transmission network analyses.

Fig. 4

a, Number of isolates in each DCC from patients with CF and without (non-CF). b, Number and proportion of patients infected with DCCs and non-DCC strains showing genetic links at various SNP cut-offs (y axis). c, Number and proportion of linkages of different geographical distances at various SNP cut-offs, and those that are between and within patients with and without CF. d, Network diagrams at 10 and 38 SNP cut-offs showing transmission clusters coloured by patients with and without CF. e, Detail of the largest cluster in the 38-SNP network showing the geographical location of patients and the network diagram coloured by geographical location. Nodes with black and grey outlines indicate city- and country-level information about isolate location, respectively.

Source data

Extended Data Fig. 4. CF and non-CF patient isolates are intermingled in DCC phylogenetic trees and infected with non-DCCs.

Extended Data Fig. 4

(a) Maximum likelihood phylogenetic trees are shown for each DCC with tips coloured by CF status. The scale bars show expected nucleotide substitutions per variable genome site. (b) All non-DCC isolates from each subspecies are combined.

Source data

Extended Data Fig. 5. Transmission summaries divided into M. a. abscessus and M. a. massiliense.

Extended Data Fig. 5

Pairs of individuals are classified as linked at a given SNP cut-off if their closest related isolates differ by fewer than that number of SNPs. (a) Linkages at each SNP cut-off split by subspecies. (b) Linkages were classified as local if the individuals are from the same hospital, CF center, city or state. Within country linkages are pairs of individual within the same country that do not meet the criteria for local. (c) Linkages were classified based on the CF status of the two individuals.

Source data

Extended Data Fig. 6. CF and non-CF individuals are well connected within the transmission network.

Extended Data Fig. 6

Network connectivity scores were calculated from the transmission calculated at a cutoff of 38 SNPs presented in Fig. 4D. (a) The number of connections exhibited by each individual in the network was calculated and split into CF and non-CF individuals. The mean number of connections exhibited by an individual with the corresponding CF status is plotted. (b) Each connection in the transmission network was given a weighting based on the minimum SNP distance between isolates from the corresponding individuals, calculated as ‘39 – minimum SNP distance’. Therefore linkages representing fewer SNPs are given a higher weighting, the maximum weighting is 38 if isolates from two individuals differ by 0 SNPs and linkages at 38 SNPs have a weighting of 1. The weighted linkages of all individuals with the corresponding CF status were summed and divided by the number of individuals with that status. (c) Connections in the transmission network were divided into three categories based on the CF status of the two linked individuals: CF < - >CF, CF < - >non-CF, non-CF < - >non-CF. Linkages were weighted as in (b). All weighted linkages within each type of linkage were summed and divided by the total number of potential linkages between individuals with the corresponding CF status.

Source data

Discussion

Halting the growing rate of infections with M. abscessus will require a comprehensive understanding of how individuals acquire these infections and how the bacteria spread at local and global scales. We have applied evolutionary analyses of whole genome sequences to reconstruct the population history, spatiotemporal spread and mutational history of M. abscessus. Our analysis demonstrates a key role for person-to-person transmission in M. abscessus epidemiology. The synchronized expansion of six of the seven DCCs (apparent despite their different sample numbers, sample density and tree depths) was tightly associated with a period of multiple changes in the CF population (Fig. 1d and Extended Data Fig. 2), indicating that the expansion was probably driven by the establishment of person-to-person transmission chains on the emergence of the CF lung niche. It is possible that these expansions may have additionally been promoted by increasing tobacco smoking rates in developed countries seen from the early 1900s to a peak around 1980 (Extended Data Fig. 2).

The rapid diversification in the DCC trees in the 1960s (Fig. 1c) can only be explained by multiple independent transmission chains that were established at this time, have persisted to the present day, and which involve both CF and non-CF individuals (Fig. 4). The age of these clones, the large number of individuals linked at very small SNP distances (Fig. 4), the high frequency of long range national and international linkages, and the correlation between genetic and geographical distance (Fig. 4c) are all patterns expected of a human lung pathogen and effectively rule out the hypothesis that a broadly spread genetically homogeneous environmental clone can explain the infection of CF individuals with related isolates18. While the main niche for the DCCs appears to be within human lungs, transmission is likely to be indirect and involve environmental and fomite intermediates2,8, which may lead to extrapulmonary DCC infections.

The smoking-related mutational signature that we identified within the DCCs (Fig. 3c,d) occurred along the internal phylogenetic branches within these clones, indicating that smokers have contributed to the expansion and continued transmission chains of M. abscessus DCCs. Smokers exhibit increased susceptibility to many infections, including NTMs26,27,29,37,38 and Mycobacterium tuberculosis28, due to structural lung changes and impaired immune responses3941. Correspondingly, previous studies have frequently identified M. abscessus in smokers6,29,38. It therefore seems likely that tobacco smoke contributes to M. abscessus mutational burden, although overall substitution rates may be similar across different patient groups due to the action of other mutagens, such as high ROS levels in CF individuals35.

Our data therefore support a model where individuals with CF are the most permissive host for M. abscessus infection (as rapid expansion only occurred when this niche became available) but where smokers and other individuals without CF can act as an inefficient reservoir for infection and long-distance transmission. This supports recent evidence that emerging M. abscessus lineages are evolving to become human adapted42. Strong evidence of M. abscessus transmission from individuals with CF2,8 and higher rates of infection in this group compared with other susceptible groups20 support the importance of individuals with CF as key hosts for M. abscessus. Involvement of individuals without CF in transmission chains (Fig. 4) efficiently explains two previously puzzling observations; the missing epidemiological links between individuals with CF infected with highly genetically similar isolates identified in previous studies1517, and the evidence for international transmission when individuals with CF have historically had limited intercontinental travel. Furthermore, while infections do occur within hospitals8,9, our data indicate that many transmission events occur in other settings. Future studies incorporating a greater number of isolates from individuals without CF will be needed to determine the relative transmission rates between the different groups.

The population expansion that we describe in the DCCs only represents the initial expansion of these clones and epidemiological data indicate that the DCCs have continued to increase in prevalence to the present day1. While additional mechanisms have probably contributed, our data indicate that increasing person-to-person transmission is the key driver of the increased prevalence of M. abscessus. Our previous work2 demonstrated phenotypic differences between clustered and unclustered M. abscessus isolates and comparison with the analysis here showed that that work included isolates from DCCs 1, 2, 3 and also from the newly identified 4 and 6 (but not DCCs 5 or 7). This shows that phenotypes associated with the clustered isolates are not limited to DCCs 1–3. However, further phenotyping comparing DCCs will be useful in future work. The seven DCCs are currently the only clones within the species whose expansion has been detected in our dataset. The more recent expansion of DCC4 (Fig. 1d) potentially occurred when this clone entered the population or had the opportunity to establish onward transmission. This coupled with the evidence for linkage between individuals carrying non-DCC strains (Fig. 4b), suggests that, as a species, M. abscessus is capable of human infection and transmission and that additional clones will continue to expand in the future when they have the opportunity to do so. Phylogeographic reconstruction (Fig. 2a) did not identify the regions within which early circulation occurred and it is therefore difficult to forecast where additional clones may expand in the future. However, our data indicate that it is reasonable to expect the early stages of expansion to occur within CF populations. Continued sequencing of clinical isolates and comparison with global sequences will be important to detect any additional clones that emerge in the future.

Further studies to understand how individuals without CF might transmit M. abscessus (both to individuals with and without CF) are vital and may provide information on how to reduce transmission risk. However, from previous studies2,8,9 it is reasonable to expect transmission to be indirect through the environment, and the role of individuals without CF in contaminating such environments requires particular attention. Comprehensive surveillance of potential environmental intermediates and isolates from individuals with and without CF may better define these transmission pathways.

In conclusion, reconstruction of the evolutionary history of the M. abscessus DCCs supports early emergence into non-CF populations, including smokers, followed by expansion due to person-to-person transmission that began with the emergence of the CF lung niche and has continued with contributions from individuals with and without CF. It also suggests that long-distance transmission may be due to low-level infection of individuals without CF. Our study demonstrates the use of combining whole genome sequencing with evolutionary analyses to disentangle mechanisms of pathogen transmission.

Methods

Dataset construction, cluster identification and definition of DCCs

Whole genome sequencing of two collections of isolates from Manchester, UK, and the Netherlands was carried out as previously described2. Briefly, DNA was extracted from colony sweeps of subcultured samples before to paired-end sequencing using the Illumina HiSeq platform. These samples were combined with whole genome sequencing samples from previous studies2,10,1216,18,4346 to give a final dataset containing 2,045 samples from 1,178 patients. Samples were genotyped to the subspecies level using Mash47, comparing against one reference genome from each subspecies. Sample were assigned to the subspecies with the smallest genetic distance; all samples exhibited a clear subspecies assignment. A summary of all samples used, including sample accession numbers, is provided in Supplementary Table 3.

Sequencing reads from each sample were mapped against the corresponding subspecies reference sequence using the multiple_mappings_to_bam pipeline (https://github.com/sanger-pathogens/bact-gen-scripts) with BWA-MEM as the aligner. ATCC19977 (accession no. CU458896.1) was used as the reference for M. a. abscessus and CIP_108297 (accession no. GCF_001792625.1) for M. a. massiliense. SNPs were called from this alignment using the multiple_mappings_to_bam pipeline. Subspecies phylogenetic trees were reconstructed using RAxML v.8.2.12 (ref. 48) with the general time reversible (GTR) model of nucleotide substitution and gamma rate heterogeneity with four gamma classes. To enable extraction of maximal genetic variation, clusters of samples were identified in the subspecies trees using FastBAPS49. This clustering identified 19 clusters in M. a. abscessus and 17 clusters in M. a. massiliense (Extended Data Fig. 1). A summary of each FastBAPS cluster is provided in Supplementary Table 1.

All subsequent analyses were carried out on each FastBAPS cluster independently. De novo assembly was carried out on each sample as previously described50 and the best assembly was identified for each FastBAPS cluster on the basis of number of contigs and N50-N90 values (Supplementary Table 1). Samples from each FastBAPS cluster were mapped against their respective best assembly as above to maximize the captured SNP diversity. Recombination was removed from FastBAPS cluster alignments using Gubbins v.2.4.1 (ref. 51) and phylogenetic trees were reconstructed for each FastBAPS cluster as above. Phylogenetic trees were viewed and figures constructed using FigTree52 and GGTREE53.

DCCs were identified as clusters of highly related sequences collected from at least 20 patients on multiple continents. DCCs 1, 2, 3, 5 and 7 are FastBAPS clusters while DCCs 4 and 6 are subclusters within a FastBAPS cluster (Supplementary Table 1). No DCCs were identified in M. a. bolletii.

Phylogenetic analyses, temporal, spatiotemporal and population reconstruction

Temporal phylogenetic reconstruction was carried out on DCCs 1–7. DCC-specific datasets were constructed containing the earliest sequenced sample from each patient that clusters within the DCC. These samples were mapped against the respective DCC reference sequence (Supplementary Table 1) as above and a maximum likelihood phylogenetic tree reconstructed with RAxML48 as above. Methods to infer substitution rates and ancestral dates are only valid if there is a temporal signal within the dataset54. We initially assessed temporal signal within each dataset using root-to-tip randomization. In each case, the maximum likelihood tree was rooted to minimize the heuristic residual mean square score using TempEst54. We examined the root-to-tip correlation visually (Extended Data Fig. 7a) and through comparison of the R2 correlation between sample collection date and root-to-tip distance with 1,000 randomizations of the tip dates to identify significance of the correlation. A significant positive correlation was observed for DCCs 1–4 (P < 0.001). We therefore initially reconstructed the temporal history of these DCCs using BEAST v.2.4.2 (ref. 19). We used the Hasegawa–Kishono–Yano (HKY) model of nucleotide substitution. We used the relaxed log-normal clock model with a log-normal prior on the substitution rate with mean set to the estimated slope in TempEst and standard deviation 0.5. We modelled the population history using the coalescent Bayesian skyline population prior. At least three independent runs were carried out for 100,000,000 steps for each dataset. Convergence was assessed using Tracer v.1.7 (ref. 55).

Extended Data Fig. 7. Assessment of DCC root-to-tip correlation and substitution rate comparison.

Extended Data Fig. 7

(a) DCC root-to-tip distance vs collection date. Dashed lines represent the regression line between root-to-tip distance and collection date. Correlations were calculated from maximum likelihood phylogenetic trees. DCCs 1-4 exhibited a significant correlation based on bootstrap analyses while DCCs 5-7 did not. (b) Comparison of DCC 1-4 substitution rates. Substitution rate estimates are plotted for each DCC based on the posterior distribution (number of samples from the posterior distribution: DCC1 = 2255, DCC2 = 2255, DCC3 = 2255, DCC4 = 2703). Points represent median estimates and bars the 95% HPD. The substitution rate is highly similar for DCCs 1-3. (c) Comparison of divergence levels in each DCC. Phylogenetic trees were reconstructed for each DCC and midpoint rooted. Plotted is the root-to-tip distance (substitutions/genome site from the root of the tree each tip) for each sequence in each DCC (number of sequences: DCC1 = 172, DCC2 = 78, DCC3 = 82, DCC4 = 16, DCC5 = 15, DCC6 = 19, DCC7 = 22). DCC4 has accumulated substantially less divergence than the other DCCs.

Source data

As a more thorough test of temporal signal within each DCC, we carried out the date randomization test56. BEAST v.2.4.2 was run on each DCC dataset using a uniform substitution rate prior between 1 × 10−9 and 1 × 10−5, with these bounds chosen to encompass the likely substitution rates for Mycobacteria based on previous work2,56. Other priors were as described above. The results from these uniform prior runs were highly similar to those with the informed substitution rate prior in each case (Extended Data Fig. 8). Ten date randomizations were performed where the sequence collection dates were randomly assigned to tips. BEAST was run on each of these randomized datasets independently using the same uniform substitution rate prior (1 × 10−9 – 1 × 10−5). All four DCCs passed the date randomization test, defined here as the median posterior substitution rate and most recent common ancestor dates with the real sample collection dates not overlapping with that of any of the ten date randomizations (Extended Data Fig. 8). We did not attempt these analyses with DCCs 5–7 as they did not pass the correlation test.

Extended Data Fig. 8. DCCs 1-4 exhibit a significant temporal signal.

Extended Data Fig. 8

Date randomization test for DCCs 1-4. Points show median values and error bars the 95% HPD (number of samples from the posterior distribution: DCC1 informed, DCC1 uniform, DCC2 informed, DCC3 informed, DCC3 uniform = 2255 each, DCC4 informed, DCC4 uniform, all date randomisations = 2703 each, DCC2 uniform = 4506). The temporal evolutionary history of each DCC was reconstructed with an informed prior on the substitution rate based on root-to-tip analysis and a uniform prior (see methods). The results from these priors matched closely in all cases. We compared the tree root date and substitution rate inferences from these priors with ten date randomization runs where the sample collection dates were randomly assigned to sequences and the temporal history reconstructed. In all cases, the posterior distributions inferred with the real dates were highly different from those inferred with the date randomizations, suggesting the existence of a significant temporal signal within each DCC.

Source data

The inferred substitution rates of DCCs 1–3 were highly similar (Extended Data Fig. 7b). The substitution rate of DCC4 is higher (Extended Data Fig. 7b), probably due to this clade having a far more recent common ancestor date than DCCs 1–3 (Fig. 1c) and therefore less opportunity to remove deleterious substitutions. As DCCs 5–7 contain similar levels of diversity to DCCs 1–3 (Extended Data Fig. 7c), we reconstructed their temporal history as above but using a uniform substitution rate prior with boundaries of 8.76 × 108 – 2.41 × 107, chosen as the upper and lower 95% highest probability density (HPD) substitution rate estimates for DCCs 1–3 (Extended Data Fig. 7b).

We determined whether each DCC has undergone a historical population expansion by using the Bayesian skyline plot estimates (Extended Data Fig. 9) of relative genetic diversity in the posterior distribution. We used all samples in the posterior distribution and found that all samples in all DCCs exhibited an increase in relative genetic diversity of more than tenfold relative to the value at the root of the tree, thereby strongly supporting a historical population expansion in each case. We identified the date of the expansion in each DCC by calculating the earliest date at which the relative genetic diversity increased by more than tenfold relative to the root of the tree and combined these values into a single distribution, from which the median and 95% HPD was calculated in each case.

Extended Data Fig. 9. DCC Bayesian skyline plots.

Extended Data Fig. 9

The relative genetic diversity (Neτ) is shown through time for each DCC as estimated from BEAST analyses. The black line represents the median estimate and the shaded area the 95% HPD.

Source data

Before carrying out spatiotemporal reconstruction, we calculated the association index57 of the distribution of collection continents across the maximum likelihood tree of each DCC. This was significant in each case on the basis of 1,000 location randomizations (P < 0.001 in each DCC), indicating a correlation between phylogeny and continent of collection. We carried out asymmetric phylogeographic reconstructions of DCCs 1–3 using the BEAST_CLASSIC package v.1.3.2 in BEAST v.2.4.2 (ref. 19). Each sequence was labelled with the continent of collection. We used an informed log-normal substitution rate prior and Bayesian skyline population prior as above. We used an exponential prior on the overall rate of lineage movements with mean 1.0. The relative rates of migration between different continent pairs were modelled using a gamma distribution with alpha and beta both set to 1.0. As the number of sequences collected from each continent is unequal for each DCC, we assessed the robustness of our inferences by randomly subsampling the sequences from overrepresented continents and rerunning the spatiotemporal reconstruction. We carried out the subsampling five times and found that the results were highly similar in all subsamples (for example, Fig. 2b). Supported migration routes were identified using SPREAD v.0.9.6 (ref. 58) as directed continent pairs had Bayes factor support greater than three in the dataset without subsampling and at least four of the five subsampled datasets.

Mutational spectrum analysis

The mutational spectrum consists of all of the mutations that have occurred within the history of a sample set in their surrounding nucleotide context31. It is necessary to identify the direction of each mutation, that is, the parental nucleotide and the descendent nucleotide. To identify the mutations that have occurred and their direction, we carried out ancestral reconstruction on each FastBAPS cluster phylogenetic tree. Recombination was removed and phylogenetic trees reconstructed as above. Ancestral reconstruction was carried out on all variable alignment positions using the phylogenetic analysis by maximum likelihood (PAML) package v.4.9 (ref. 59). We compared the fit of HKY, HKY + GAMMA, GTR and GTR + GAMMA models of nucleotide substitution. Results were highly similar with all models and in all cases either the GTR or GTR + GAMMA model was supported. The mutations that occurred along each branch in the phylogeny were extracted from the PAML output. The surrounding nucleotide context of each mutation was identified from the reference sequence that was mapped against. The number of polymorphic sites contributing to each mutational spectrum is shown in Extended Data Fig. 10.

Extended Data Fig. 10. Polymorphic sites used in the mutational spectra analyses.

Extended Data Fig. 10

The number of alignment sites within each FastBAPS cluster used for mutational spectrum analysis of the environmental, within-CF patient and DCC internal branches is shown. Some of these sites will have mutated on multiple occasions within a FastBAPS cluster so the number of polymorphic sites is not the same as the number of mutations. M.a: M. abscessus subsp abscessus. M.m: M. abscessus subsp massiliense.

Source data

To compare the mutational spectrum in different niches, the branches in the phylogenetic tree were divided into categories (Fig. 3a). We reasoned that the internal branches within the non-DCC clusters will have been environmental as these branches probably often span hundreds to thousands of years during which time prolonged human infection will have been unlikely. We therefore calculated the environmental mutational spectrum by combining contextual mutations inferred along all internal branches of non-DCC clusters.

The phylogenetic branches within clades containing sequences from a single patient represent within-patient evolution. We therefore calculated the within-patient mutational spectrum by combining contextual mutations inferred along branches within monophyletic patient clades.

The relative contributions of environmental and within-patient evolution along tip branches and branches leading to patient ancestors is unclear as the patient may have acquired the infection at any point along this branch. We therefore did not include these branches in the environmental mutational spectrum or the within-patient spectrum.

To examine the mutations acquired during DCC transmission chains, we combined the contextual mutations that occurred along the internal branches within the seven DCC trees (Fig. 3a). We excluded the deep branches in these clades that precede population expansion (Fig. 3a) to only examine mutations that have occurred since emergence and therefore in more recent transmission chains.

We compared mutational spectra between niches by subtracting the inferred environmental spectrum from the DCC internal branch and within CF patient spectra. Significance of observed differences was assessed through 1,000 independent down-samplings of the inferred environmental mutations to the number identified along DCC internal branches or within CF patients. Contextual mutations were identified as significant if their calculated proportion fell outside two standard deviations of the mean proportion in the 1,000 replicates. This process was repeated ten times and all reported significant mutations were significant in all ten runs.

Decomposition of the mutational spectrum into input signatures was carried out using signal32 (https://signal.mutationalsignatures.com/, date last accessed 15 November 2020). The contextual mutations that were elevated in the DCCs relative to the environment were combined into a 10,000 mutation catalogue with their relative frequencies representing their relative enrichment above the environment. This catalogue was used as input for signal specifying lung as the originating organ. Mutational drivers were assigned from the respective COSMIC mutational signature (https://cancer.sanger.ac.uk/cosmic/signatures/SBS/index.tt, date last accessed 15 November, 2020).

Transmission network reconstruction

SNP distances were calculated between all pairs of samples within each FastBAPS cluster using PairSNP60 and the minimum SNP distance between each pair of patients extracted. Patients were classified as being linked at a given SNP cut-off if their closest pair of samples differed by that number of SNPs or fewer. Localized linkages were classified on the basis of available metadata if the patients were in the sample hospital, CF Trust, city or state.

Transmission networks were reconstructed on the basis of minimum SNP distance between patient isolates. SNP distances of 20 and 38 SNPs were previously shown to represent ‘probable’ and ‘possible’ transmission, respectively, on the basis of the number of SNPs observed in within patient infections2. We therefore plotted the transmission network at ten SNPs to represent very likely transmission and 38 SNPs to represent possible transmission.

Transmission network connectivity measures (Extended Data Fig. 6) were calculated using 38 SNPs as a cut-off to include linkages, with 38 chosen to include possible person-to-person transmission events2. Therefore, any patient linkages of 39 SNPs or more were not included. The total number of linkages involving patients with or without CF was identified and divided by the total number of patients within the respective group to calculate the average number of transmission linkages per patient with and without CF. To calculate the weighted connectivity measures, each edge in the transmission network was given a weighting of a 39-SNP distance. Therefore, edges linking patient pairs whose isolates differ by zero SNPs were given a weighting of 39 and edges had zero weighting if they connect patient pairs whose isolates differ by 39 or more SNPs. The total weighting of all edges involving patients with or without CF was identified and divided by the total number of patients within the respective group to calculate the average weighted connectivity. To identify the average weighting of CF–CF, CF–non-CF and non-CF–non-CF linkages, the total weighted connectivity of each edge type was calculated by summing the weights of all respective edges and this was divided by the total number of potential linkages of that type.

Trends in tobacco smoking

Annual estimates of the number of cigarette sales per adult per day were obtained from https://ourworldindata.org/smoking (last accessed 18 March 2021). 30 countries were included with data available from before 1960.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Supplementary information

Reporting Summary (69.6KB, pdf)
Supplementary Tables (124.7KB, xlsx)

Supplementary Tables 1–3.

Acknowledgements

Funding for this work was provided by The Wellcome Trust (investigator award no. 107032/Z/15/Z to R.A.F.), Fondation Botnar (Programme grant no. 6063) and the UK CF Trust (Innovation Hub award no. 001; Strategic Research Centre award no. 010). M.S., N.A.H. and R.M.D. acknowledge the Cystic Fibrosis Foundation for funding.

Extended data

Source data

Source Data Fig. 1 (314.8KB, zip)

Phylogenetic trees and population expansion distributions for Fig. 1.

Source Data Fig. 2 (3.7MB, zip)

Phylogeographic trees and migration distributions for Fig. 2.

Source Data Fig. 3 (9.5KB, zip)

Mutational spectra and signature proportions for Fig. 3.

Source Data Fig. 4 (575.9KB, zip)

SNP distances and associated metadata for Fig. 4.

Source Data Extended Data Fig. 1 (37.5KB, zip)

Phylogenetic trees for Extended Data Fig. 1.

Source Data Extended Data Fig. 2 (119.4KB, zip)

Population expansion distributions, CF life expectancy and smoking data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3 (6.1MB, zip)

Clade ages and descendent continents for Extended Data Fig. 3.

Source Data Extended Data Fig. 4 (140.7KB, zip)

Phylogenetic trees and associated metadata for Extended Data Fig. 4.

Source Data Extended Data Fig. 5 (459.7KB, zip)

SNP distances for Extended Data Fig. 5.

Source Data Extended Data Fig. 6 (459.7KB, zip)

SNP distances for Extended Data Fig. 6.

Source Data Extended Data Fig. 7 (30.2KB, zip)

Root-to-tip distances and substitution rates for Extended Data Fig. 7.

Source Data Extended Data Fig. 8 (183.7MB, zip)

Ancestor data and substitution rate distributions for Extended Data Fig. 8.

Source Data Extended Data Fig. 9 (1.1MB, zip)

DCC Bayesian skyline plots for Extended Data Fig. 9.

Source Data Extended Data Fig. 10 (1.9KB, zip)

Number of polymorphic sites in each dataset for Extended Data Fig. 10.

Author contributions

C.R. worked on study conception and design, dataset construction and analyses, manuscript writing and manuscript editing. J.M.B. worked on study design, dataset analyses and manuscript editing. S.B. worked on dataset construction and manuscript editing. R.T. worked on dataset construction and manuscript editing. R.M.D. worked on dataset construction and manuscript editing. N.A.H. worked on dataset construction and manuscript editing. J.v.I. worked on dataset construction and manuscript editing. R.A.F. worked on study conception and design, funding acquisition and supervision, manuscript writing and manuscript editing. J.P. worked on study conception and design, funding acquisition, supervision, manuscript writing and manuscript editing. All authors have read and approved the final paper.

Data availability

All source data files are available at 10.5281/zenodo.5116229. Accession numbers of all samples used in analyses are included in this source data and in Supplementary Table 3. Source data are provided with this paper.

Code availability

Code used for data analysis is available at 10.5281/zenodo.5116229.

Competing interests

The authors declare no competing interests.

Footnotes

Peer review informationNature Microbiology thanks Daniel Falush, Sebastien Gagneux and the other, anonymous, reviewers for their contribution to the peer review of this work

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

R. Andres Floto, Email: arf27@cam.ac.uk.

Julian Parkhill, Email: jp369@cam.ac.uk.

Extended data

is available for this paper at 10.1038/s41564-021-00963-3.

Supplementary information

The online version contains supplementary material available at 10.1038/s41564-021-00963-3.

References

  • 1.Floto RA, et al. US Cystic Fibrosis Foundation and European Cystic Fibrosis Society consensus recommendations for the management of non-tuberculous mycobacteria in individuals with cystic fibrosis. Thorax. 2016;71:i1–i22. doi: 10.1136/thoraxjnl-2015-207360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bryant JM, et al. Emergence and spread of a human-transmissible multidrug-resistant nontuberculous mycobacterium. Science. 2016;354:751–757. doi: 10.1126/science.aaf8156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Esther CR, Esserman DA, Gilligan P, Kerr A, Noone PG. Chronic Mycobacterium abscessus infection and lung function decline in cystic fibrosis. J. Cyst. Fibros. 2010;9:117–123. doi: 10.1016/j.jcf.2009.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jarand J, et al. Clinical and microbiologic outcomes in patients receiving treatment for Mycobacterium abscessus pulmonary disease. Clin. Infect. Dis. 2011;52:565–571. doi: 10.1093/cid/ciq237. [DOI] [PubMed] [Google Scholar]
  • 5.Nessar R, Cambau E, Reyrat JM, Murray A, Gicquel B. Mycobacterium abscessus: a new antibiotic nightmare. J. Antimicrob. Chemother. 2012;67:810–818. doi: 10.1093/jac/dkr578. [DOI] [PubMed] [Google Scholar]
  • 6.Kwak, N. et al. Mycobacterium abscessus pulmonary disease: individual patient data meta-analysis. Eur. Resp. J.10.1183/13993003.01991-2018 (2019). [DOI] [PubMed]
  • 7.Taylor JL, Palmer SM. Mycobacterium abscessus chest wall and pulmonary infection in a cystic fibrosis lung transplant recipient. J. Heart Lung Transplant. 2006;25:985–988. doi: 10.1016/j.healun.2006.04.003. [DOI] [PubMed] [Google Scholar]
  • 8.Bryant JM, et al. Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: a retrospective cohort study. Lancet. 2013;381:1551–1560. doi: 10.1016/S0140-6736(13)60632-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Aitken ML, et al. Respiratory outbreak of Mycobacterium abscessus subspecies massiliense in a lung transplant and cystic fibrosis center. Am. J. Respir. Crit. Care Med. 2012;185:231–232. doi: 10.1164/ajrccm.185.2.231. [DOI] [PubMed] [Google Scholar]
  • 10.Davidson RM, et al. Genome sequencing of Mycobacterium abscessus isolates from patients in the United States and comparisons to globally diverse clinical strains. J. Clin. Microbiol. 2014;52:3573–3582. doi: 10.1128/JCM.01144-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Davidson, R. M. A closer look at the genomic variation of geographically diverse Mycobacterium abscessus clones that cause human infection and disease. Front. Microbiol.10.3389/fmicb.2018.02988 (2018). [DOI] [PMC free article] [PubMed]
  • 12.Tettelin, H. et al. High-level relatedness among Mycobacterium abscessus subsp. massiliense strains from widely separated outbreaks. Emerg. Infect. Dis.10.3201/eid2003.131106 (2014). [DOI] [PMC free article] [PubMed]
  • 13.Yan, J. et al. Investigating transmission of Mycobacterium abscessus amongst children in an Australian cystic fibrosis centre. J. Cystic Fibros.10.1016/j.jcf.2019.02.011 (2019). [DOI] [PubMed]
  • 14.Redondo, N. et al. Genomic analysis of an Irish population of Mycobacterium abscessus complex collected between 2006 and 2017. J. Clin. Microbiol.10.1128/JCM.00295-20 (2020). [DOI] [PMC free article] [PubMed]
  • 15.Doyle, R. M. et al. Cross-transmission is not the source of new Mycobacterium abscessus infections in a multi-centre cohort of cystic fibrosis patients. Clin. Infect. Dis.10.1093/cid/ciz526 (2019). [DOI] [PMC free article] [PubMed]
  • 16.Tortoli E, et al. Mycobacterium abscessus in patients with cystic fibrosis: low impact of inter-human transmission in Italy. Eu. Resp. J. 2017;50:1602525. doi: 10.1183/13993003.02525-2016. [DOI] [PubMed] [Google Scholar]
  • 17.Harris KA, et al. Whole-genome sequencing and epidemiological analysis do not provide evidence for cross-transmission of Mycobacterium abscessus in a cohort of pediatric cystic fibrosis patients. Clin. Infect. Dis. 2015;60:1007–1016. doi: 10.1093/cid/ciu967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Hasan, N. A. et al. Population genomics of nontuberculous mycobacteria recovered from United States cystic fibrosis patients. Preprint at bioRxiv10.1101/663559 (2019).
  • 19.Bouckaert R, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2014;10:e1003537. doi: 10.1371/journal.pcbi.1003537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lee, M.-R. et al. Mycobacterium abscessus complex infections in humans. Emerg. Infect. Dis. 10.3201/eid2109.141634 (2015). [DOI] [PMC free article] [PubMed]
  • 21.Davis PB. Cystic fibrosis since 1938. Am. J. Respir. Crit. Care Med. 2006;173:475–482. doi: 10.1164/rccm.200505-840OE. [DOI] [PubMed] [Google Scholar]
  • 22.Dodge JA, Lewis PA, Stanton M, Wilsher J. Cystic fibrosis mortality and survival in the UK: 1947–2003. Eur. Resp. J. 2007;29:522–526. doi: 10.1183/09031936.00099506. [DOI] [PubMed] [Google Scholar]
  • 23.Saiman L, et al. Infection prevention and control guideline for cystic fibrosis: 2013 update. Infect. Control Hospital Epidemiol. 2014;35:s1–s67. doi: 10.1086/676882. [DOI] [PubMed] [Google Scholar]
  • 24.Scotet V, L’Hostis C, Férec C. The changing epidemiology of cystic fibrosis: incidence, survival and impact of the CFTR gene discovery. Genes. 2020;11:589. doi: 10.3390/genes11060589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Burgel P-R, et al. Future trends in cystic fibrosis demography in 34 European countries. Eur. Resp. J. 2015;46:133–141. doi: 10.1183/09031936.00196314. [DOI] [PubMed] [Google Scholar]
  • 26.Lake MA, Ambrose LR, Lipman MCI, Lowe DM. ‘Why me, why now?’ Using clinical immunology and epidemiology to explain who gets nontuberculous mycobacterial infection. BMC Med. 2016;14:54. doi: 10.1186/s12916-016-0606-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Adjemian J, Olivier KN, Seitz AE, Holland SM, Prevots DR. Prevalence of nontuberculous mycobacterial lung disease in U.S. medicare beneficiaries. Am. J. Respir. Crit. Care Med. 2012;185:881–886. doi: 10.1164/rccm.201111-2016OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Bagaitkar, J., Demuth, D. R. & Scott, D. A. Tobacco use increases susceptibility to bacterial infection. Tob. Induc. Dis. 10.1186/1617-9625-4-12 (2008). [DOI] [PMC free article] [PubMed]
  • 29.Lee M-R, et al. Risk factors for Mycobacterium chelonae-abscessus pulmonary disease persistence and deterioration. J. Infect. 2012;64:228–230. doi: 10.1016/j.jinf.2011.11.006. [DOI] [PubMed] [Google Scholar]
  • 30.Kucab JE, et al. A compendium of mutational signatures of environmental agents. Cell. 2019;177:821–836.e16. doi: 10.1016/j.cell.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nik-Zainal S, et al. The genome as a record of environmental exposure. Mutagenesis. 2015;30:763–770. doi: 10.1093/mutage/gev073. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Degasperi A, et al. A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies. Nat. Cancer. 2020;1:249–263. doi: 10.1038/s43018-020-0027-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Sedgwick B, Vaughan P. Widespread adaptive response against environmental methylating agents in microorganisms. Mutat. Res. 1991;250:211–221. doi: 10.1016/0027-5107(91)90178-Q. [DOI] [PubMed] [Google Scholar]
  • 34.Agarwal MK. Streptozotocin: mechanisms of action. FEBS Lett. 1980;120:1–3. doi: 10.1016/0014-5793(80)81031-3. [DOI] [PubMed] [Google Scholar]
  • 35.Brunel, S. F., Willment, J. A., Brown, G. D., Devereux, G. & Warris, A. Aspergillus-induced superoxide production by cystic fibrosis phagocytes is associated with disease severity. ERJ Open Res.4, (2018). [DOI] [PMC free article] [PubMed]
  • 36.Alexandrov LB, et al. Mutational signatures associated with tobacco smoking in human cancer. Science. 2016;354:618–622. doi: 10.1126/science.aag0299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Davies BS, Roberts CH, Kaul S, Klein JL, Milburn HJ. Non-tuberculous slow-growing mycobacterial pulmonary infections in non-HIV-infected patients in south London. Scand. J. Infect. Dis. 2012;44:815–819. doi: 10.3109/00365548.2012.694469. [DOI] [PubMed] [Google Scholar]
  • 38.Koh W-J, et al. Clinical significance of differentiation of Mycobacterium massiliense from Mycobacterium abscessus. Am. J. Respir. Crit. Care Med. 2011;183:405–410. doi: 10.1164/rccm.201003-0395OC. [DOI] [PubMed] [Google Scholar]
  • 39.O’Leary SM, et al. Cigarette smoking impairs human pulmonary immunity to Mycobacterium tuberculosis. Am. J. Respir. Crit. Care Med. 2014;190:1430–1436. doi: 10.1164/rccm.201407-1385OC. [DOI] [PubMed] [Google Scholar]
  • 40.Feng Y, et al. Exposure to cigarette smoke inhibits the pulmonary T-cell response to influenza virus and Mycobacterium tuberculosis. Infect. Immun. 2011;79:229–237. doi: 10.1128/IAI.00709-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Arcavi L, Benowitz NL. Cigarette smoking and infection. Arch. Intern. Med. 2004;164:2206–2216. doi: 10.1001/archinte.164.20.2206. [DOI] [PubMed] [Google Scholar]
  • 42.Bryant, J. M. Stepwise pathogenic evolution of Mycobacterium abscessus. Science372, eabb8699 (2021). [DOI] [PMC free article] [PubMed]
  • 43.Li, B. et al. Relationship between antibiotic susceptibility and genotype in Mycobacterium abscessus clinical isolates. Front. Microbiol. 10.3389/fmicb.2017.01739 (2017). [DOI] [PMC free article] [PubMed]
  • 44.Davidson RM, et al. Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains. Infect., Genet. Evolution. 2013;20:292–297. doi: 10.1016/j.meegid.2013.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Sapriel G, et al. Genome-wide mosaicism within Mycobacterium abscessus: evolutionary and epidemiological implications. BMC Genomics. 2016;17:118. doi: 10.1186/s12864-016-2448-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ripoll F, et al. Non mycobacterial virulence genes in the genome of the emerging pathogen Mycobacterium abscessus. PLoS ONE. 2009;4:e5660. doi: 10.1371/journal.pone.0005660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Ondov BD, et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:132. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313. doi: 10.1093/bioinformatics/btu033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Res. 2019;47:5539–5549. doi: 10.1093/nar/gkz361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Page AJ, et al. Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data. Microb. Genomics. 2016;2:e000083. doi: 10.1099/mgen.0.000083. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Croucher NJ, et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 2015;43:e15. doi: 10.1093/nar/gku1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Rambaut, A. FigTree. http://tree.bio.ed.ac.uk/software/figtree/ (GitHub, 2007).
  • 53.Yu G, Smith DK, Zhu H, Guan Y, Lam TT-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Meth. Ecol. Evol. 2017;8:28–36. doi: 10.1111/2041-210X.12628. [DOI] [Google Scholar]
  • 54.Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Virus Evol. 2016;2:vew007. doi: 10.1093/ve/vew007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using tracer 1.7. Syst. Biol. 2018;67:901–904. doi: 10.1093/sysbio/syy032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Menardo F, Duchêne S, Brites D, Gagneux S. The molecular clock of Mycobacterium tuberculosis. PLoS Pathog. 2019;15:e1008067. doi: 10.1371/journal.ppat.1008067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Parker J, Rambaut A, Pybus OG. Correlating viral phenotypes with phylogeny: accounting for phylogenetic uncertainty. Infect. Genet. Evol. 2007;8:239–246. doi: 10.1016/j.meegid.2007.08.001. [DOI] [PubMed] [Google Scholar]
  • 58.Bielejec F, Rambaut A, Suchard MA, Lemey P. SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics. Bioinformatics. 2011;27:2910–2912. doi: 10.1093/bioinformatics/btr481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591. doi: 10.1093/molbev/msm088. [DOI] [PubMed] [Google Scholar]
  • 60.Tonkin-Hill, G. pairsnp. https://github.com/gtonkinhill/pairsnp (GitHub, 2020).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Reporting Summary (69.6KB, pdf)
Supplementary Tables (124.7KB, xlsx)

Supplementary Tables 1–3.

Data Availability Statement

All source data files are available at 10.5281/zenodo.5116229. Accession numbers of all samples used in analyses are included in this source data and in Supplementary Table 3. Source data are provided with this paper.

Code used for data analysis is available at 10.5281/zenodo.5116229.


Articles from Nature Microbiology are provided here courtesy of Nature Publishing Group

RESOURCES