Abstract
Mycobacterium abscessus is an emerging multidrug-resistant non-tuberculous mycobacterium that causes a wide spectrum of infections and has caused several local outbreaks worldwide. To facilitate standardized prospective molecular surveillance, we established a novel core genome multilocus sequence typing (cgMLST) scheme. Whole genome sequencing data of 1991 isolates were employed to validate the scheme, re-analyze global population structure and set genetic distance thresholds for cluster detection and taxonomic identification. We confirmed and amended the nomenclature of the main dominant circulating clones and found that these also correlate well with traditional 7-loci MLST. Dominant circulating clones could be linked to a corresponding reference genome with less than 250 alleles while 99% of pairwise comparisons between epidemiologically linked isolates were below 25 alleles and 90% below 10 alleles. These thresholds can be used to guide further epidemiological investigations. Overall, the scheme will help to unravel the apparent global spread of certain clonal complexes and as yet undiscovered transmission routes.
Subject terms: Comparative genomics, Phylogenomics, Bacterial genomics
Mycobacterium abscessus is an emerging infection of increasing public health concern due to outbreaks and intrinsic multidrug-resistance. Here, the authors develop and evaluate a core-genome multilocus sequence typing scheme for this pathogen to facilitate standardised molecular surveillance.
Introduction
Non-tuberculous mycobacteria (NTM) comprise all Mycobacterium species that do not cause tuberculosis or leprosy1. NTM bacteria are ubiquitous in natural aquatic and soil environments, but have also been frequently isolated from tab water, swimming pools and showers2–5. Although bacteria of most NTM species are harmless for humans, some are (opportunistic) pathogens that can cause severe disease especially in immunocompromised patients6.
Of particular clinical relevance is the rapidly growing NTM M. abscessus (Mab), which is increasingly being reported as the etiological agent of severe respiratory, skin and mucosal infections and is extremely difficult to treat due to intrinsic resistance against several antibiotics6–9.
Pulmonary infections with Mab are especially prevalent among individuals with pre-existing structural lung diseases such as cystic fibrosis (CF) and bronchiectasis. In these patients, Mab infection leads to an accelerated decline in lung function and is associated with increased mortality10,11. It is thought that CF patients predominately acquire NTM infections from the environment, although there have been reports of possible indirect person-to-person transmission within healthcare facilities through fomites or long-living infectious aerosols2,12–15.
Mab can also cause extra-pulmonary disease such as skin and soft tissue infection (SSTI) in healthy persons and sporadically otitis media (ear infections) in children16. Many SSTI outbreaks affecting multiple individuals have been related to surgical or cosmetic procedures likely due to contaminated water sources or medical devices, e.g. hospital water supply, wading pool, tattoo ink, ultrasonography gel, and bronchoscopes17–23.
Based on multiple genomic comparison studies, Mab strains have been divided into three subspecies: M. abscessus subsp. abscessus (MabA), M. abscessus subsp. bolletii (MabB), and M. abscessus subsp. massiliense (MabM)24. Recently, several clusters of closely related isolates within both MabA and MabM have been identified. These so called dominant circulating clones (DCC) have been isolated from both CF and non-CF patients across the whole globe, are thought to have emerged around 196012,25,26 and have been associated with increased virulence, higher rates of resistance, and worse clinical outcomes compared to unclustered isolates12.
Previous molecular outbreak investigations, source tracking, and population structure analyses of Mab have mainly been performed by repetitive sequence-PCR (rep-PCR), random amplified polymorphic DNA typing (RAPD PCR), multilocus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE) and core genome single nucleotide polymorphism (cgSNP) analysis4,27–29. While cgSNP clearly offers the highest resolution for outbreak investigations, the lack of standardized bioinformatic pipelines and nomenclature scheme limits the application to individual retrospective investigations30–32.
For prospective molecular surveillance, core genome multilocus sequence typing (cgMLST) has been shown to provide easy inter-laboratory comparability and a continuous comparative analysis, which facilitates real-time multicenter outbreak investigations for different pathogens such as Pseudomonas aeruginosa33, Mycobacterium tuberculosis34, Listeria monocytogenes and Paenibacillus larvae30,33–35. The gene-by-gene comparison approach of cgMLST relies on a fixed set of conserved genes distributed across the entire genome that are present in the majority of strains within a species. An inherently standardized and expandable nomenclature implementation translates the DNA sequences of the respective genes into integer allele numbers, which can then easily be compared between strains36.
In this study, we developed a novel cgMLST scheme using 97 diverse genomes, to allow harmonized whole genome sequencing based typing of all three Mab subspecies. Then, we evaluated its potential for population structure analysis, outbreak investigations and transmission analysis as well as compatibility with traditional 7-loci MLST and the cgSNP-based approach by utilizing a large set of 1991 isolates including, among others, dominant circulating clones, longitudinal pulmonary isolates from chronically infected cystic fibrosis (CF) patients and isolates from three extra-pulmonary Mab outbreaks.
Results
Design and technical validation of a stable Mab cgMLST scheme
We used MabA type strain ATCC19977 and 96 additional publicly available assemblies (scheme creation set) from a genetically diverse and global set of Mab isolates to define a hard core genome with SeqSphere+ software (Supplementary Data 1, Supplementary Figs. 1-3 and Supplementary Methods 1). The scheme creation set included representatives for all subspecies, for seven previously defined DCCs26 as well as non-DCC strains, which were collected in at least 11 different countries. The resulting cgMLST scheme consists of 2904 loci (Supplementary Data 2), representing 59% of the gene set from MabA type strain ATCC19977. To assess the robustness of the scheme, we compared cgMLST profiles obtained for draft genomes generated from the same sequencing read set with different assembly approaches for 30 diverse isolates (Technical validation set; Supplementary Data 3 and 4). Details are available in Supplementary Methods 2. In summary, cgMLST analysis was very fast and cgMLST profiles (i.e. allele numbers) were identical for draft genomes generated with different assembly pipelines (i.e. SeqSphere+37 and shovill38) using different assemblers (i.e. skesa39 and SPAdes40) and different read pre-processing steps (i.e. performing default trimming and/or read error correction or not). On the other hand, larger differences up to 19 distinct alleles between cgMLST profiles were observed when de novo assembly approaches were compared with a mapping approach (Supplementary Fig. 4).
Analysis of the global population structure
To further validate whether the scheme works for Mab strains from different phylogenetic groups and also, classifies Mab strains according to the known global population structure, we performed cgMLST analysis on 1797 isolates, including 1110 strains belonging to MabA, 563 to MabM and 124 to MabB (Supplementary Data 1). For 1786 out of 1797 (99.4%) datasets, more than 95.0% good cgMLST targets were found and for 1796 (99.9%) more than 90% of the cgMLST genes were present, indicating a stable core genome applicable for all Mab strains (Supplementary Fig. 1). The strains with less than 95% good cgMLST targets (8 MabM, 2 MabA and 1 MabB) were removed from further analyses. The neighbor-joining (NJ) tree calculated from pairwise allelic distances of the remaining 1786 isolates revealed that the subspecies classification derived from phylogenetic position in the initial mash distance-based tree (Supplementary Fig. 1) correlated well with cgMLST-based phylogeny (Fig. 1). Consistent with this finding, all isolates belonging to MabA, MabM and MabB were most closely related (i.e. had the lowest amount of allele differences) to the type strains for subsp. abscessus (strain ATCC19977; accession NC_010397.1), massiliense (JCM 15300; NZ_AP014547.1) and bolletii (BD; NZ_AP018436.1), respectively. These three type strains (Supplementary Data 5) can thus be used for distance-based classification of new isolates at subspecies level, without the need for phylogenetic tree building.
Strains that were previously classified within a DCC complex based on cgSNP/Fastbaps analysis by Ruis and coworkers26 also clustered together in the cgMLST-based phylogeny (Fig. 1). The mean intra DCC pairwise genetic distances ranged between 26 (DCC6) and 127 (DCC3) alleles (Supplementary Data 6). The majority of strains within a DCC belonged to one single MLST sequence type (ST). For DCC3, however, two different subclades were identified in the phylogenetic tree that were represented also by two distinct ST types (Fig. 1 and Supplementary Data 6). The differentiation into two groups, i.e. DCC3a (ST33) and DCC3b (ST37), was further supported by the bimodal distribution of the pairwise distances among DCC3 strains (Supplementary Fig. 5).
Next, isolates with unknown DCC status that were positioned within a clade in the NJ tree containing isolates with known DCC status were classified into the corresponding DCC (Fig. 1 and Supplementary Data 1). Using a set of eight representatives, one for each DCC (including 3a, b; Supplementary Data 5), we found a clear separation between intra-DCC pairwise distances (i.e. distances between the representative of the DCC and isolates belonging to the same DCC) and pairwise distances between the representative and isolates not belonging to the corresponding DCC (Supplementary Fig. 6). More concrete, the majority of DCC strains had less than 250 allele differences compared to the corresponding reference genome (Supplementary Fig. 6).
Transmission and outbreak analysis
To determine an allele threshold that can be used to classify strains in genomic clusters as indicators for possible epidemiologically linked cases, we investigated the genetic diversity within outbreaks and putative transmission clusters, which were previously defined based on cgSNP analysis. In particular, we calculated pairwise allele distances and re-calculated SNP distances between 76 isolates (Supplementary Data 3 and 7) from three extra-pulmonary outbreaks and 12 putative intra-hospital transmission clusters: (i) an outbreak of post-surgical wound infections in Brazil41, (ii) an outbreak of post-tattooing skin infections in Quebeq21, (iii) an outbreak of otitis media in paediatric patients in Quebeq21, and (iv) pulmonary isolates from predominantly CF patients attending clinics in Papworth, Seattle, Milan, Florence or Frankfurt12,15,42,43. For some of these CF patients, indirect cross-infection within the hospital was proposed, while for the others, epidemiological investigations did not support this transmission mode (Supplementary Data 7).
Except for two clusters (Frankfurt_CF_C2 [DCC2] and Italy_CF_A1 [DCC1]), all isolates within previously defined clusters were more closely related to each other than to isolates from another cluster (Fig. 2). Interestingly, isolates from an otitis media (OM) outbreak in Quebec were closely related (16 allelic/SNP mismatches) with isolates from a pulmonary Mab outbreak among CF patients in Papworth (Fig. 2).
For most closely related isolates, pairwise SNP and allele distances were in the same range (Fig. 2). However, for two MabA isolates from cluster Italy_CF_A4 (GI2 and IR1), the pairwise SNP distance compared to the other cluster members MC1 and TE1 was much higher (>100 SNPs) compared to the number of allelic differences (<10 alleles). More detailed analysis revealed that the majority of these SNPs in the GI2 and IR1 genome were concentrated in 20 consecutive genes (MAB_1023c-1042c), eight of which were not included in the cgMLST scheme (Supplementary Data 8). BLASTN search with this high-density SNP region revealed a higher total blast score for Mycobacterium immunogenum (25,261 with 95% coverage and 87.21% identity) compared to the top Mab hit (24,515 with 89% coverage and 87.37% identity), further pointing towards a putative recombination event.
The pairwise genetic distance between any two isolates belonging to the same extra-pulmonary outbreak was less than 15 alleles with a median of 4 alleles (Fig. 3, group A). Suspected indirect nosocomial transmission within CF centers, on the other hand, was characterized by less than 25 alleles with a median of 2 alleles (Fig. 3, group B). Overall, 95% of pairwise distances between isolates with epi links (i.e. group A and B) were less than 10 alleles. The median pairwise allele distance between clustered isolates with no epidemiological links was significantly higher compared to isolates with supporting evidence for intra-hospital transmission or isolates from well-defined extra-pulmonary outbreaks, but still below 25 alleles (Fig. 3, group C).
Within-patient diversity and evolution
To determine the genetic diversity of isolates from the same patient sampled at different time points (within-patient diversity), we analyzed the cgMLST profiles of 291 sequential isolates from 69 chronically infected CF patients included in three previous studies12,15,42. For each patient, there were between two and thirty isolates, sampled over a period of maximum 12 years (Supplementary Data 3 and 9). Except for one patient (patient 512), the distance between any two isolates from the same patient did not exceed 16 alleles and 89% of same-patient isolates differed by less than 10 alleles (Fig. 3). The median number of allele differences between same-patient isolates (i.e. 3 alleles) was similar to the median pairwise distances observed among epidemiological related isolates (four and two for extra-pulmonary and pulmonary isolates, respectively) (Fig. 3).
Minimum spanning trees from sequential isolates of CF patients often showed a star-like structure consistent with the clonal evolution of a monomorphic pathogen (Fig. 4). Interestingly, some strains were genetically highly stable. This was especially notable for two patients, i.e. IF and RM15, where the same cgMLST type was sampled over 12 years (Fig. 4). On the other hand, larger genetic distances of more than 20 alleles between same-patient isolates sampled within the same year or even same month (e.g. patient 3 and 5, Figs. 3 and 4) were also observed.
If only pairwise distances to the first available isolate were considered, the within-patient divergence was lower than 10 alleles within 12 years for 64 out of 69 (93%) of the patients (Supplementary Fig. 7). If the five outliers were removed, we observed a weak positive linear correlation of allelic distance and time with an evolutionary rate of 0.45 alleles/genome/year (95% CI 0.13–0.76 and R2 = 0.105) for MabA, 0.28 alleles/genome/year (95% CI 0.09–0.46 and R2 = 0.079) for MabM, and 0.44 alleles/genome/year (95% CI 0.08-0.81 and R2 = 0.294) for MabB (Supplementary Fig. 7). The initial genetic diversity (within the first year) was 1.9 alleles [1.0–2.7] for MabA, 1.6 alleles [1.2–2.0] for MabM, and 1.3 alleles [0–4.0] for MabB.
Discussion
In this study, we developed a robust cgMLST scheme for the emerging pathogen Mycobacterium abscessus, which can delineate Mab population structure, outbreaks and within-patient diversity. The high discriminatory power has the ability for an early detection of infection sources, transmission hotspots, and yet undetected contact cases. The stable scheme (i.e. fixed loci) with a harmonized expandable nomenclature (i.e. allele numbers) allows direct comparability of results by different laboratories and facilitates prospective global Mab surveillance. Importantly, we also defined thresholds for cgMLST-based cluster analysis, e.g. to determine Mab outbreaks in the hospital setting, and to classify new isolates at subspecies level and within known global complexes.
Several researchers have considered a cgMLST stable if at least 95% of the cgMLST genes are present in all or most strains44–46. Our newly developed cgMLST Mab scheme consists of 2904 core loci of which at least 95% were found in 99.4% of a diverse set of 1797 Mab isolates. All seven dominant circulating clones (DCC) previously defined by Ruis and coworkers26 were confirmed by cgMLST analysis of the population structure of the large global dataset investigated in our study. We also showed that isolates with unknown taxonomy could be classified as DCC using a maximum pairwise distance of 250 alleles compared to a DCC reference genome, thereby offering a convenient alternative to classification based on positioning within a large reference tree. In addition, we illustrate that the DCC nomenclature is largely congruent with distinct STs of the traditional 7-loci MLST scheme47. However, we propose a differentiation of DCC3 into two clades which are clearly distinguished by (i) two distinct ST types, i.e. DCC3a (associated with ST33) and DCC3b (associated with ST37) and (ii) inter-DCC pairwise distances of >250 alleles. These results underline that traditional MLST analysis is compatible with cgSNP and cgMLST analysis and can still distinguish between the global DCCs, however, it is not suited for the detection of recent transmission events or outbreaks.
Analysis of the within-patient diversity revealed that some patients were infected with a highly persistent clone that was sampled for over a decade without any allelic differences. On the other hand, we also observed considerable genetic diversity (>20 alleles) between isolates sampled within the same year for some patients. This might be the result of a mixed infection with a phylogenetically related clone or an undetected long-term chronic Mab infection with subsequent diversification of the infecting clone into different co-existing subpopulations. Also the presence of a putative “hypermutator” strain has been previously discussed48.
Overall, 99% of pairwise distances (1620 out of 1637 comparisons) between epidemiologically linked isolates, i.e. isolates from the same patient, isolates belonging to the same extra-pulmonary outbreak or isolates from CF patients with suggested nosocomial transmission, were below 25 alleles and 90% (1484 out of 1637 comparison) below 10 alleles. On the other hand, several isolates from patients for which there was no obvious epidemiological link could also be linked with less than 25 alleles. Therefore, we propose a threshold of 25 for initial clustering and indicating “possible” transmission and a threshold of 10 for “probable” recent transmission (whether direct or indirect). This is similar to previously defined SNP thresholds for Mab12,15,25,49, and indicates the high discriminatory power of the developed cgMLST scheme. Still, these thresholds need to be validated in further studies, but present a valid guideline to initiate further epidemiological studies and to potentially unravel new transmission routes.
Although SNP and allele distances were typically in the same range for closely related isolates, there were also two MabA (DCC2) isolates that could be connected with less than 10 alleles, while differing by more than 100 SNPs compared to two other cluster members. This might be the result from a horizontal gene transfer (HGT)/homologues recombination event with DNA from another Mab strain or even another mycobacterial species48,50–52. As recombination has shaped the Mab genome more than mutations53,54, cgSNP is likely more prone to overestimate evolutionary distances and thereby missing epidemiological linked patients. On the other hand, genetic diversity might also be underestimated by cgMLST, as it does not include mutations in intergenic regions, accessory genes or plasmids and because multiple mutations within one gene (not caused by HGT) are translated into only one allele difference55–57. Therefore, estimates of evolutionary rates based on alleles should also be interpreted with caution.
Compared to cgSNP, cgMLST analysis is faster, more easily expandable with new isolates and more standardized55,58,59. The genes comprised in the cgMLST scheme can be identified in new samples using different algorithms (i.e. from assemblies using BLAST or from reads using kmer mapping), depending on the cgMLST software used60–63. Although our scheme was developed, applied and validated within a commercial software suite (SeqSphere + ), loci definitions and allele nomenclature are public (https://www.cgmlst.org) and available for implementation in other, open-source software or web-based applications (e.g. BIGSdb60). This means that cgMLST results can be compared directly between laboratories and public health institutions worldwide, further facilitating standardization and surveillance on a global level.
In conclusion, the whole genome sequencing-based cgMLST approach is a powerful tool for high-resolution molecular epidemiological investigations of Mab strains. It facilitates standardized prospective transmission analysis e.g. for early outbreak detection and identification of potential transmission routes in a hospital setting. The cgMLST scheme that was developed in this study is publicly accessible (https://www.cgmlst.org), allowing for efficient surveillance on a global level. Therefore, we believe that it should be part of a strategy to tackle the growing public health treat of this emerging pathogen.
Methods
Data collection
For this study, we downloaded assemblies (fastA files) from the NCBI/RefSeq Assembly database64,65 for 1797 isolates (Supplementary Data 1) and Illumina paired-end read datasets (fastQ files) from the sequence read archive (SRA)66 for 372 isolates (Supplementary Data 3). The total dataset (n = 2169) comprised 1991 unique biosamples. An overview of all datasets used in this study is provided in Supplementary Fig. 2.
Public assembly set (n = 1797)
The genome of the MabA ATCC 19977 type strain (accession number NC_010397.1), as well as all nonanomalous and non-suppressed genome assemblies (FastA files) annotated as “Mycobacterium abscessus” that were available on May 26th, 2021 (n = 1,810) were downloaded from NCBI/RefSeq64,65. All assemblies with more than 300 contigs (n = 13) were removed from further analysis, resulting in a final set of 1797 genome sequences (Supplementary Data 1). We used Mashtree v.1.2.067 to determine pairwise mash distances68 and to place the 1797 assembled genomes in a neighbor-joining (NJ) tree (Supplementary Fig. 1). Isolates were taxonomically classified into one of the three subspecies based on their position in the mash-based NJ tree using known taxonomic classification of their neighbors as reference (Supplementary Data 1).
Public read set (n = 372)
WGS data (FastQ files) from 30 isolates included in the technical validation set (Supplementary Methods 2) and 342 isolates from the calibration set were downloaded from SRA. The calibration set includes isolates from six previously published studies concerning Mab transmission or outbreaks12,15,21,41–43. Only patients which were involved in outbreak or putative transmission clusters or for which multiple longitudinal samples with the same Mab subspecies were available, were considered. For the Brazilian outbreak41, only 2 isolates per region were retained. In addition, only isolates with more than 95% good cgMLST targets (see section “Design of cgMLST scheme”) and for which the length of the respective assembly did not deviated by more than 25% compared to the Mab type strain (NC_010397.1) were included. Using these criteria, the final calibration set comprised 342 isolates from 119 patients (Supplementary Data 3).
Genome assemblies
In addition to the 30 assemblies available from RefSeq, we also calculated 180 new assemblies for the isolates from the technical validation set. These assemblies were made starting from the 30 downloaded read sets (fastQ files) with different tools, preprocessing steps and assemblers (Supplementary Data 4). All datasets were downsampled to a coverage of 100x if applicable. The assembly tool shovill v1.1.038 was run on a linux 5.4.0 (Ubuntu 20.04 LTS) server with Intel®Xeon® E5-2650 v4 processor @ 2.2 GHz and 48 Gb RAM using 8 threads. Default trimming in the shovill pipeline was performed using trimmomatic v0.3969 with the following parameters “leading:3 trailing:3 minlen:30 tophred33”. Prior read error correction in the shovill pipeline was performed using lighter v.1.1.270. Two assemblers available in shovill were evaluated: skesa v2.4.039 as well as SPAdes v3.15.040. In addition, assemblies were made using skesa v2.3.0 in SeqSphere + (v7.7.5) on a windows 10 laptop with intel®Core® i7-10510U processor @ 1.8 GHz, 16 Gb RAM and 8 threads. Default trimming in SeqSphere+ includes trimming reads at 5‘ and 3‘ until average quality is ≥30 in a window of 20 bases. Mapping was performed in SeqSphere+ against the seed genome (NC_010397.1) using BWA-MEM v0.7.15.
For isolates included in the calibration set (Supplementary Data 3), assemblies were made with skesa v2.4.0 without read error correction using shovill v1.1.0.
Design and application of cgMLST scheme
Detailed information regarding scheme creation can be found in Supplementary Methods 1 and Supplementary Fig. 2. In summary, core loci were defined using the cgMLST target definer v1.5 implemented in the SeqSphere+ software (client v7.7.5). The finished genome of the MabA ATCC19977 type strain was used as seed genome. For the penetration set, all genome assemblies up to chromosome or complete genome level were included as well as 31 draft genomes to better represent Mab diversity (Supplementary Data 1 and Supplementary Fig. 1). All publicly available plasmids (n = 17 on July 2nd, 2021) were used to exclude plasmid-borne sequences from the scheme.
Extraction of cgMLST loci from genome assemblies and assignment of allele numbers was also performed in SeqSphere+. According to the default requirements of the target quality control of SeqSphere+, “good quality” cgMLST targets were defined as loci with (i) the same length as reference genes + /− 3 triplets, (ii) no ambiguities (e.g. N), (iii) no frameshifts compared to reference genes, (iv) at least 90% identity to reference sequence and (v) valid start and stop codons and no internal stop codons.
Pairwise distances between two isolates were calculated as the amount of cgMLST loci with a different allele number, ignoring missing (bad quality or absent) cgMLST loci.
Multilocus sequence typing
The seven loci (argH, cya, gnd, murC, pta, purH and rpoB) included in the recently updated Mab scheme from pubMLST47 were extracted from whole genome assemblies using BioNumerics v7.6. Unknown sequences for which no allele number was retrieved and unknown STs were submitted to the pubMLST database and subsequently were assigned new numbers.
Core genome single nucleotide polymorphism (cgSNP) analysis
Whole genome sequence reads (FastQ files) of isolates from the outbreak/transmission set (Supplementary Data 3) were processed by the MTBseq pipeline (v.1.0.4)31,71 with default settings using the MabA ATCC19977 type strain genome (NC_010397.1) as a reference. The resulting core SNP alignment (80.651 variant positions) was used to calculate a pairwise distance matrix.
Statistics
Pairwise allele distances between groups were compared with a Wilcoxon rank sum test (with Bonferroni correction) in R v.4.0.2, as we did not assume a normal distribution (P < 0.05, Shapiro-Wilk normality test). To explore a possible temporal signal (i.e., the rate of allele changes per genome per year) in sequential patient isolates we employed a linear regression (least square approach) of the allelic distance between the first available and a subsequent isolate against time (years) and reported R2 values. The rate of change was inferred from the slope of the linear regression equation, and the initial genetic diversity (i.e., distance to the first isolate within the first year) was inferred from the intercept.
Reporting summary
Further information on research design is available in the Nature Research Reporting summary linked to this article.
Supplementary information
Acknowledgements
We thank Dag Harmsen and Jörg Rothgänger for their technical help with regard to the construction of the cgMLST scheme. This work was supported by a financial grant from Mukoviszidose Institut gGmbH, Bonn, the research and development arm of the German Cystic Fibrosis Association Mukoviszidose e.V (project number 2004 – FM). Parts of this work have been supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germanys Excellence Strategy – EXC 2167 Precision Medicine in Inflammation, the German Ministry of Education and Research (BMBF) for the German Center of Infection Research (DZIF), and the Leibniz Science Campus Evolutionary Medicine of the LUNG (EvoLUNG).
Author contributions
F.M., S.N., and M.D. conceived and directed the project. M.D. and M.M. wrote the first draft of the manuscript. N.W., T.K. and all other authors commented and edited various versions of the draft manuscript. All authors read and approved the final manuscript.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Data availability
Accession numbers of all whole genome sequencing datasets analyzed in this study are listed in Supplementary Data 1 and 3. The cgMLST scheme is publicly available at cgmlst.org (https://www.cgmlst.org/ncs/schema/22602285/).
Code availability
Custom scripts used for this study and a detailed manual to perform cgMLST analysis for M. abscessus isolates within the commercial software SeqSphere+37 can be found at github (ngs-fzb/NTMtools)72.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Stefan Niemann, Florian P. Maurer.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-022-32122-5.
References
- 1.Fedrizzi T, et al. Genomic characterization of Nontuberculous Mycobacteria. Sci. Rep. 2017;7:1–14. doi: 10.1038/srep45258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Cook JL. Nontuberculous mycobacteria: opportunistic environmental pathogens for predisposed hosts. Br. Med. Bull. 2010;96:45–59. doi: 10.1093/bmb/ldq035. [DOI] [PubMed] [Google Scholar]
- 3.Thomson R, et al. Isolation of Nontuberculous Mycobacteria (NTM) from Household Water and Shower Aerosols in Patients with Pulmonary Disease Caused by NTM. J. Clin. Microbiol. 2013;51:3006–3011. doi: 10.1128/JCM.00899-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.van Ingen J, Boeree MJ, Dekhuijzen PNR, van Soolingen D. Environmental sources of rapid growing nontuberculous mycobacteria causing disease in humans. Clin. Microbiol. Infect. 2009;15:888–893. doi: 10.1111/j.1469-0691.2009.03013.x. [DOI] [PubMed] [Google Scholar]
- 5.Honda JR, Virdi R, Chan ED. Global environmental nontuberculous mycobacteria and their contemporaneous man-made and natural niches. Front. Microbiol. 2018;9:2029. doi: 10.3389/fmicb.2018.02029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Johansen MD, Herrmann JL, Kremer L. Non-tuberculous mycobacteria and the rise of Mycobacterium abscessus. Nat. Rev. Microbiol. 2020 187. 2020;18:392–407. doi: 10.1038/s41579-020-0331-1. [DOI] [PubMed] [Google Scholar]
- 7.Adjemian J, Olivier KN, Seitz AE, Holland SM, Prevots DR. Prevalence of Nontuberculous Mycobacterial Lung Disease in U.S. Medicare Beneficiaries. 2012;185:881–886. doi: 10.1164/rccm.201111-2016OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Luthra S, Rominski A, Sander P. The role of antibiotic-target-modifying and antibiotic-modifying enzymes in mycobacterium abscessusdrug resistance. Front. Microbiol. 2018;9:2179. doi: 10.3389/fmicb.2018.02179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Degiacomi G, et al. Mycobacterium abscessus, an Emerging and Worrisome Pathogen among Cystic Fibrosis Patients. Int. J. Mol. Sci. 2019;20:5868. doi: 10.3390/ijms20235868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Park IK, Olivier KN. Nontuberculous Mycobacteria in Cystic Fibrosis and Non–Cystic Fibrosis Bronchiectasis. Semin. Respir. Crit. Care Med. 2015;36:217. doi: 10.1055/s-0035-1546751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Esther CR, Esserman DA, Gilligan P, Kerr A, Noone PG. Chronic Mycobacterium abscessus infection and lung function decline in cystic fibrosis. J. Cyst. Fibros. 2010;9:117–123. doi: 10.1016/j.jcf.2009.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bryant JM, et al. Whole-genome sequencing to identify transmission of Mycobacterium abscessus between patients with cystic fibrosis: A retrospective cohort study. Lancet. 2013;381:1551–1560. doi: 10.1016/S0140-6736(13)60632-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Aitken ML, et al. Respiratory outbreak of Mycobacterium abscessus subspecies massiliense in a lung transplant and cystic fibrosis center. Am. J. Respir. Crit. Care Med. 2012;185:231–232. doi: 10.1164/ajrccm.185.2.231. [DOI] [PubMed] [Google Scholar]
- 14.Harris KA, et al. Whole-genome sequencing and epidemiological analysis do not provide evidence for cross-transmission of mycobacterium abscessus in a cohort of pediatric cystic fibrosis patients. Clin. Infect. Dis. 2015;60:1007–1016. doi: 10.1093/cid/ciu967. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tortoli E, et al. Mycobacterium abscessus in patients with cystic fibrosis: low impact of inter-human transmission in Italy. Eur. Respir. J. 2017;50:1602525. doi: 10.1183/13993003.02525-2016. [DOI] [PubMed] [Google Scholar]
- 16.Myojin S, et al. Chronic otitis media caused by Mycobacterium abscessus spp. massiliense treated with tigecycline in a 10-year-old child. Int. J. Infect. Dis. 2018;74:10–12. doi: 10.1016/j.ijid.2018.06.019. [DOI] [PubMed] [Google Scholar]
- 17.Carter KK, et al. First United States Outbreak of Mycobacterium abscessus Hand and Foot Disease among Children Associated with a Wading Pool. J. Pediatr. Infect. Dis. Soc. 2019;8:291–296. doi: 10.1093/jpids/piy036. [DOI] [PubMed] [Google Scholar]
- 18.Griffin I, et al. Outbreak of Tattoo-associated Nontuberculous Mycobacterial Skin Infections. Clin. Infect. Dis. 2019;69:949–955. doi: 10.1093/cid/ciy979. [DOI] [PubMed] [Google Scholar]
- 19.Guimarães T, et al. Pseudooutbreak of rapidly growing mycobacteria due to Mycobacterium abscessus subsp bolletii in a digestive and respiratory endoscopy unit caused by the same clone as that of a countrywide outbreak. Am. J. Infect. Control. 2016;44:e221–e226. doi: 10.1016/j.ajic.2016.06.019. [DOI] [PubMed] [Google Scholar]
- 20.Davidson RM, et al. Genomic characterization of sporadic isolates of the dominant clone of Mycobacterium abscessus subspecies massiliense. Sci. Rep. 2021 111. 2021;11:1–10. doi: 10.1038/s41598-021-94789-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Wuzinski, M. et al. Investigation of Two Mycobacterium abscessus Outbreaks in Quebec Using Whole Genome Sequencing. Biomed Res. Int. 2020, (2020).
- 22.Cheng A, et al. Prolonged postprocedural outbreak of Mycobacterium massiliense infections associated with ultrasound transmission gel. Clin. Microbiol. Infect. 2016;22:382.e1–382.e11. doi: 10.1016/j.cmi.2015.11.021. [DOI] [PubMed] [Google Scholar]
- 23.Maurer FP, et al. Postsurgical wound infections due to rapidly growing mycobacteria in Swiss medical tourists following cosmetic surgery in Latin America between 2012 and 2014. Eurosurveillance. 2014;19:20905. doi: 10.2807/1560-7917.ES2014.19.37.20905. [DOI] [PubMed] [Google Scholar]
- 24.Tortoli E, et al. Emended description of mycobacterium abscessus mycobacterium abscessus subsp. Abscessus and mycobacterium abscessus subsp. bolletii and designation of mycobacterium abscessus subsp. massiliense comb. nov. Int. J. Syst. Evol. Microbiol. 2016;66:4471–4479. doi: 10.1099/ijsem.0.001376. [DOI] [PubMed] [Google Scholar]
- 25.Bryant JM, et al. Emergence and spread of a humantransmissible multidrug-resistant nontuberculous mycobacterium. Sci. (80-.). 2016;354:751–757. doi: 10.1126/science.aaf8156. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ruis C, et al. Dissemination of Mycobacterium abscessus via global transmission networks. Nat. Microbiol. 2021;6:1279–1288. doi: 10.1038/s41564-021-00963-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Davidson RM. A closer look at the genomic variation of geographically diverse mycobacterium abscessus clones that cause human infection and disease. Front. Microbiol. 2018;9:2988. doi: 10.3389/fmicb.2018.02988. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Cheng A, et al. Comparing the Utilities of Different Multilocus Sequence Typing Schemes for Identifying Outbreak Strains of Mycobacterium abscessus subsp. massiliense. J. Clin. Microbiol. 2019;58:e01304–e01319. doi: 10.1128/JCM.01304-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Kim SY, et al. Standardization of multilocus sequence typing scheme for Mycobacterium abscessus and Mycobacterium massiliense. Diagn. Microbiol. Infect. Dis. 2013;77:143–149. doi: 10.1016/j.diagmicrobio.2013.06.023. [DOI] [PubMed] [Google Scholar]
- 30.Papić B, Diricks M, Kušar D. Analysis of the Global Population Structure of Paenibacillus larvae and Outbreak Investigation of American Foulbrood Using a Stable wgMLST Scheme. Front. Vet. Sci. 2021;8:161. doi: 10.3389/fvets.2021.582677. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Jajou R, et al. Towards standardisation: comparison of five whole genome sequencing (WGS) analysis pipelines for detection of epidemiologically linked tuberculosis cases. Eurosurveillance. 2019;24:1900130. doi: 10.2807/1560-7917.ES.2019.24.50.1900130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Meehan CJ, et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat. Rev. Microbiol. 2019;17:533–545. doi: 10.1038/s41579-019-0214-5. [DOI] [PubMed] [Google Scholar]
- 33.Tönnies H, Prior K, Harmsen D, Mellmann A. Establishment and evaluation of a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of pseudomonas aeruginosa. J. Clin. Microbiol. 2021;59:e01987–20. doi: 10.1128/JCM.01987-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kohl TA, et al. Harmonized Genome Wide Typing of Tubercle Bacilli Using a Web-Based Gene-By-Gene Nomenclature System. EBioMedicine. 2018;34:131. doi: 10.1016/j.ebiom.2018.07.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Chen Y, et al. Core genome multilocus sequence typing for identification of globally distributed clonal groups and differentiation of outbreak strains of Listeria monocytogenes. Appl. Environ. Microbiol. 2016;82:6258–6272. doi: 10.1128/AEM.01532-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sheppard SK, Jolley KA, Maiden MCJ. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter. Genes (Basel) 2012;3:261. doi: 10.3390/genes3020261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Harmsen, D. & Rothgänger, J. Ridom Seqsphere. https://www.ridom.de/seqsphere/.
- 38.Seemann, T.Shovill. https://github.com/tseemann/shovill.
- 39.Souvorov A, Agarwala R, Lipman DJ. SKESA: strategic k-mer extension for scrupulous assemblies. Genome Biol. 2018 191. 2018;19:1–13. doi: 10.1186/s13059-018-1540-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Bankevich A, et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. J. Comput. Biol. 2012;19:455. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Everall I, et al. Genomic epidemiology of a national outbreak of post-surgical Mycobacterium abscessus wound infections in Brazil. Microb. genomics. 2017;3:e000111. doi: 10.1099/mgen.0.000111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Wetzstein N, et al. Antimicrobial Susceptibility and Phylogenetic Relations in a German Cohort Infected with Mycobacterium abscessus. J. Clin. Microbiol. 2020;58:e01813–e01820. doi: 10.1128/JCM.01813-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Tettelin H, et al. High-level Relatedness among Mycobacterium abscessus subsp. massiliense Strains from Widely Separated Outbreaks. Emerg. Infect. Dis. 2014;20:364. doi: 10.3201/eid2003.131106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Neumann B, et al. A Core Genome Multilocus Sequence Typing Scheme for Enterococcus faecalis. J. Clin. Microbiol. 2019;57:e01686–18. doi: 10.1128/JCM.01686-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ruppitsch W, et al. Defining and evaluating a core genome multilocus sequence typing scheme for whole-genome sequence-based typing of listeria monocytogenes. J. Clin. Microbiol. 2015;53:2869–2876. doi: 10.1128/JCM.01193-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Ghanem M, El-Gazzar M. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme. Vet. Microbiol. 2018;218:84–89. doi: 10.1016/j.vetmic.2018.03.021. [DOI] [PubMed] [Google Scholar]
- 47.Jolley, K. & Diricks, M. pubMLST - Mab. https://pubmlst.org/bigsdb?db=pubmlst_mabscessus_seqdef&page=schemeInfo&scheme_id=1 (2021).
- 48.Bryant JM, et al. Stepwise pathogenic evolution of Mycobacterium abscessus. Science. 2021;372:eabb8699. doi: 10.1126/science.abb8699. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lipworth S, et al. Epidemiology of Mycobacterium abscessus in England: an observational study. Lancet Microbe. 2021;2:e498–e507. doi: 10.1016/S2666-5247(21)00128-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Beres SB, et al. Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes. MBio. 2016;7:e00403–e00416. doi: 10.1128/mBio.00403-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Estivill X, et al. Chromosomal regions containing high-density and ambiguously mapped putative single nucleotide polymorphisms (SNPs) correlate with segmental duplications in the human genome. Hum. Mol. Genet. 2002;11:1987–1995. doi: 10.1093/hmg/11.17.1987. [DOI] [PubMed] [Google Scholar]
- 52.Croucher NJ, et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 2015;43:e15–e15. doi: 10.1093/nar/gku1196. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Tan JL, Ng KP, Ong CS, Ngeow YF. Genomic comparisons reveal microevolutionary differences in Mycobacterium abscessus subspecies. Front. Microbiol. 2017;8:2042. doi: 10.3389/fmicb.2017.02042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ripoll F, et al. Non Mycobacterial Virulence Genes in the Genome of the Emerging Pathogen Mycobacterium abscessus. PLoS One. 2009;4:e5660. doi: 10.1371/journal.pone.0005660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Uelze L, et al. Typing methods based on whole genome sequencing data. One Heal. Outlook. 2020;2:1–19. doi: 10.1186/s42522-019-0007-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Jagadeesan B, Baert L, Wiedmann M, Orsi RH. Comparative Analysis of Tools and Approaches for Source Tracking Listeria monocytogenes in a Food Facility Using Whole-Genome Sequence Data. Front. Microbiol. 2019;10:947. doi: 10.3389/fmicb.2019.00947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Miro E, et al. Core/Whole Genome Multilocus Sequence Typing and Core Genome SNP-Based Typing of OXA-48-Producing Klebsiella pneumoniae Clinical Isolates From Spain. Front. Microbiol. 2020;10:2961. doi: 10.3389/fmicb.2019.02961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Eyre DW, Peto TEA, Crook DW, Sarah Walker A, Wilcox MH. Hash-Based Core Genome Multilocus Sequence Typing for Clostridium difficile. J. Clin. Microbiol. 2019;58:e01037–19. doi: 10.1128/JCM.01037-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Janezic S, Rupnik M. Development and Implementation of Whole Genome Sequencing-Based Typing Schemes for Clostridioides difficile. Front. Public Heal. 2019;7:309. doi: 10.3389/fpubh.2019.00309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Jolley KA. & Maiden, M. C. J. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinforma. 2010;11:595. doi: 10.1186/1471-2105-11-595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Palma F, et al. In vitro and in silico parameters for precise cgMLST typing of Listeria monocytogenes. BMC Genomics. 2022;23:1–24. doi: 10.1186/s12864-022-08437-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Silva M, et al. chewBBACA: A complete suite for gene-by-gene schema creation and strain identification. Microb. Genomics. 2018;4:e000166. doi: 10.1099/mgen.0.000166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Feijao P, et al. MentaLiST - A fast MLST caller for large MLST schemes. Microb. genomics. 2018;4:e000146. doi: 10.1099/mgen.0.000146. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61. doi: 10.1093/nar/gkl842. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Kitts PA, et al. Assembly: A resource for assembled genomes at NCBI. Nucleic Acids Res. 2016;44:D73–D80. doi: 10.1093/nar/gkv1226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Leinonen R, Sugawara H, Shumway M. The sequence read archive. Nucleic Acids Res. 2011;39:D19. doi: 10.1093/nar/gkq1019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Katz L, et al. Mashtree: a rapid comparison of whole genome sequence files. J. Open Source Softw. 2019;4:1762. doi: 10.21105/joss.01762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ondov BD, et al. Mash: Fast genome and metagenome distance estimation using MinHash. Genome Biol. 2016;17:1–14. doi: 10.1186/s13059-016-0997-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Usadel, B. Trimmomatic. https://github.com/usadellab/Trimmomatic.
- 70.Song L, Florea L, Langmead B. Lighter: fast and memory-efficient sequencing error correction without counting. Genome Biol. 2014 1511. 2014;15:1–13. doi: 10.1186/s13059-014-0509-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kohl TA, et al. MTBseq: A comprehensive pipeline for whole genome sequence analysis of Mycobacterium tuberculosis complex isolates. PeerJ. 2018;6:e5895. doi: 10.7717/peerj.5895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Diricks, M. https://github.com/ngs-fzb/NTMtools. (2022) 10.5281/zenodo.6811028.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Accession numbers of all whole genome sequencing datasets analyzed in this study are listed in Supplementary Data 1 and 3. The cgMLST scheme is publicly available at cgmlst.org (https://www.cgmlst.org/ncs/schema/22602285/).
Custom scripts used for this study and a detailed manual to perform cgMLST analysis for M. abscessus isolates within the commercial software SeqSphere+37 can be found at github (ngs-fzb/NTMtools)72.