Abstract
This study leverages the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) to analyze over 27,000 Mycobacterium tuberculosis (MTB) genomic strains, providing a comprehensive and large-scale overview of antibiotic resistance (AMR) prevalence and resistance patterns. We used MTB++, which is the newest and most comprehensive AI-based MTB drug resistance profiler tool, to predict the resistance profile of each of the 27,000 MTB isolates and then used feature analysis to identify key genes that were associated with the resistance. There are three main contributions to this study. Firstly, it provides a detailed picture of the prevalence of specific AMR genes in the BV-BRC dataset as well as their biological implications, providing critical insight into MTB’s resistance mechanisms that can help identify genes of high priority for further investigation. The second aspect of this study is to compare the prevalence of antibiotic resistance across previous studies that have addressed both the temporal and geographical evolution of MTB drug resistance. Lastly, this study emphasizes the need for targeted diagnostics and personalized treatment plans. In addition to these contributions, the study acknowledges the limitations of computational prediction and recommends future experimental validation.
Subject terms: Computational biology and bioinformatics, Microbiology, Diseases
Introduction
The bacterium Mycobacterium tuberculosis (MTB), the primary cause of tuberculosis, spreads through the air, making it a major public health concern. According to the World Health Organization (WHO) 2024 report, the mortality rate for MTB disease is alarmingly high, approaching 50% without treatment. While the treatment success rate for drug-susceptible MTB remains remarkable at 88%, the rate for multi-drug resistant TB and rifampicin-resistant TB (MDR/RR-TB) has risen to 68% in comparison to 2022 WHO report 1. Hence, accurate identification of antimicrobial resistance (AMR) is crucial for managing patient care, guiding antibiotic stewardship, and mitigating the selection of AMR strains2. DNA sequencing can complement the traditional methods of determining AMR by identifying genetic differences between antibiotic-resistant and susceptible MTB strains. Various methods3–6 have been developed to classify resistance in MTB by cataloging genetic variants known to cause resistance to specific antibiotic drugs. Despite these advancements, critical gaps still exist in the knowledge of MTB resistance. First, there is a scarcity of comprehensive antibiotic susceptibility testing (AST) data for new and repurposed drugs (NRDs)7,8.
Second, the mechanisms of cross-resistance, wherein resistance to one antibiotic confers resistance to another, and caompensatory evolution, where bacteria acquire mutations that mitigate the fitness costs of resistance, are not fully understood9–12. Third, bridging the gap between genomic data and clinical outcomes is essential for translating resistance predictions into effective treatment strategies13. This requires not only the integration of diverse data types but also a deep understanding of how genetic mutations influence drug efficacy and patient outcomes. Lastly, as TB treatment evolves, so too do the mechanisms of resistance2. Novel mutations and resistance mechanisms can emerge rapidly, outpacing current detection and prediction methods. Continuous surveillance and research are necessary to identify and understand these emerging threats.
Fortunately, there are resources available that allow for some of the aforementioned critical gaps to be addressed; most notable of these resources are large datasets that have been curated or created to facilitate the acquisition of a wide diversity of MTB resistance data. For example, consider the Comprehensive Resistance Prediction for Tuberculosis: an International Consortium (CRyPTIC)14, which encompasses whole genome sequencing data and phenotypic data for tens of thousands of MTB isolates collected worldwide. CRyPTIC has previously been utilized in a genome-wide study to identify specific genetic elements associated with resistance to single antibiotics, expanding on previous research that focused on single nucleotide polymorphisms (SNPs) pertaining to drug resistance in MTB. In addition to CRyPTIC, there is the Bacterial and Viral Bioinformatics Resource Center (BV-BRC)15 dataset, which hosts hundreds of thousands of bacterial genome sequencing and phenotypic data samples. Of this data, BV-BRC provides access to phenotypic and sequence data for 27,268 MTB isolates collected and uploaded by research teams worldwide. While providing larger amounts of data, BV-BRC can be considered more of a public collaborative data-sharing platform when compared to CRyPTIC. The highly collaborative nature of BV-BRC can allow for the collection of large swaths of data but can also result in less consistency of the data collected. This limitation poses a significant challenge, as inconsistent or unavailable phenotypic resistance data can often hinder accurate analysis and interpretation for multiple isolates.
Recognizing this gap, our study provides the first comprehensive analysis of both the BV-BRC and CRyPTIC datasets. Addressing this gap through imputation, we use MTB++16, a trained classifier designed to predict antibiotic drug resistance of MTB isolates. Leveraging both Logistic Regression (LR) and Random Forest (RF) machine learning models, MTB++ is capable of predicting the resistance of unannotated MTB isolates to 13 different antibiotic drugs and 3 antibiotic families. As demonstrated in our previous study16, MTB++ was able to provide state-of-the-art results in terms of F1-score when compared to ResFinder17, TBProfiler18, Mykrobe4,19, and KvarQ5. Due to the performance of and its broader coverage of various antibiotic drugs compared to other methods, this study aims to broaden the use of and predict the antibiotic drug resistance of over 27,000 MTB strains in the BV-BRC, irrespective of the presence of phenotypic information. Predicting AMR across the spectrum of antibiotics facilitates a multifaceted analysis that draws significant insights into the genomic landscape of AMR and its management on a global scale.
In summary, this analysis aims to provide (1) a comprehensive examination of AMR prevalence on an international scale, (2) the identification of resistant strains in relation to established AMR mechanisms, and the potential discovery of novel ones, and (3) the generation of hypotheses regarding clinical implications and AMR distribution patterns, particularly when contrasting first-line with NRDs. We hypothesize that many resistance associations may be shared between CRyPTIC and BV-BRC, however, the examination of both may lead to the determination of novel associations with potential clinical implications. Additionally, we anticipate that the analysis of the features used by the MTB++ models when predicting on the BV-BRC dataset will allow us to draw conclusions on the current landscape of global MTB datasets.
Methods
Data retrieval
As of this study, the BV-BRC20 database encompassed 27,268 MTB isolates that were used in 303,519 MTB AST experiments. We note that only 27,239 MTB isolates could be retrieved from the BV-BRC database. 469 of these isolates were used in the development of . Henceforth, these isolates were excluded to maintain the integrity of the evaluation of MTB++, ensuring that training data was not included. Additionally, 61 isolates in the dataset were obtained from non-human mammal hosts and therefore also excluded. Ultimately, 26,709 MTB isolates were included in this study.
Development of the MTB++ classifier
MTB++21 is a robust machine learning classifier developed using 6,224 MTB isolates from the CRyPTIC14 database, which contains whole genome sequencing data of human host MTB isolates paired with AST for 13 antibiotics (amikacin, bedaquiline, clofazimine, delamanid, ethionamide, ethambutol, isoniazid, kanamycin, levofloxacin, linezolid, moxifloxacin, rifampicin, and rifabutin). Oligonucleotide sequences, k-mers of size 31, were extracted from the CRyPTIC MTB isolates, ranked, and used to develop both LASSO LR and RF machine learning models. It should be noted that our feature matrix was over 95% sparse while highly-dimensional (over 17 million features). We selected both LASSO LR and RF for their complementary strengths: LASSO LR excels with sparse datasets by performing automatic feature selection, reducing overfitting, and improving model interpretability22,23, while RF enhances robustness by reducing overfitting through random feature selection and capturing non-linear interactions, even in noisy data24,25.
The selection of k being 31 was informed by prior research on prokaryotic genome assembly26–28. Choosing k-mer sizes that are too small often results in non-informative outcomes, as such sizes are common across all isolates, while excessively large k-mer sizes can also be non-informative. It is important to note that the number of isolates in the CRyPTIC dataset that were resistant to the NRDs antibiotic drugs was very low in comparison to the first two lines of antibiotic drugs. In order to reduce this imbalance, some of these drugs were combined into classes based on the classification hierarchy of Doster et al.29: (1) kanamycin and amikacin were combined into the aminoglycosides group, (2) rifabutin and rifampicin were combined into the rifampin group, and (3) moxifloxacin, levofloxacin, and clofazimine were combined into the fluoroquinolones group. The combination process is as follows: if an isolate is phenotypically resistant to at least one of the antibiotic drugs in a group, then that isolate will be considered resistant to that group. For some of the isolates, the phenotypic resistance for two antibiotic drugs (ethambutol and ethionamide) was “intermediate”; in those cases, we labeled the isolates as phenotypically resistant since they still exhibited resistance to the aforementioned antibiotic drugs.
Prediction of resistance using MTB++
Using pre-trained MTB++ models (available at: https://github.com/M-Serajian/MTB-Pipeline), we predicted antibiotic resistance to the 26,709 MTB isolates obtained from the BV-BRC database. In addition to reporting the results of LR and RF on the BV-BRC dataset, the number of isolates deemed resistant according to one or both models was also considered for each antibiotic and superfamily. In particular, the agreement between the LR and RF MTB++ models on predicting the resistance of BV-BRC isolates was examined using Cohen’s Kappa Coefficient (), a statistical measure useful in quantifying model agreement. Cohen’s Kappa is defined as:
where P(A) is defined as the observed agreement between MTB++ LR and MTB++ RF, and P(E) is defined as the expected or chance agreement between models. Cohen’s Kappa places model agreement on a scale of -1 to 1, with kappa scores of 1 outlining complete agreement while scores of -1 signify complete disagreement between two models. We provide a more complete analysis of these kappa scores by analyzing the features used by the MTB++ models along with feature occurrence, the mean magnitude of coefficient (MMC), and the mean feature significance (MFS) of the top five shared features between CRyPTIC and BV-BRC.
Identification of multi-drug resistance
Using the BV-BRC MTB dataset, we analyzed cross-resistance patterns by computing the Jaccard index of single antibiotic resistance classifications between each pair of antibiotic drugs. In addition, we examined the occurrence of commonly observed patterns of MTB cross-resistance using the most recent definitions from the WHO. In detail, we considered MDR-TB, which is resistant to both isoniazid and rifampicin; ’highly-resistant tuberculosis’ (HR-TB), which resists isoniazid but not rifampicin; and ’extensively drug-resistant tuberculosis’ (XDR-TB). XDR-TB cases are a subset of MDR-TB but with additional resistance to any fluoroquinolone and at least one of the following antibiotics: levofloxacin, moxifloxacin, bedaquiline, and linezolid.
Alignment of features to reference genome
We used to predict the resistance (to each antibiotic drug) of the 26,709 viable BV-BRC isolates and determined the number of times each feature occurred in each isolate that was predicted to be resistant. After identifying the BV-BRC resistant isolates using , for each antibiotic drug, we analyzed the number of times each feature appeared in the isolates predicted to be resistant to that specific antibiotic drug. To further understand the mechanisms of resistance involved and their prevalence for different antibiotic drugs, we aligned all features present in the dataset to the MTB reference genome (H37Rv, NCBI Reference Sequence: NC_000962.3) for each antibiotic drug separately. This process is done using BWA30 with a configuration that only permits exact alignments. BEDTools (version 2.30.0)31 was then used to intersect BAM files with the MTB GFF file (NCBI RefSeq assembly: GCF_000195955.2) in order to identify the genes in the MTB genome where the aforementioned alignments occurred. We document any genes to which more than 200 31-mers sequences are aligned or those where over 25% of the gene’s length is covered by 31-mers. This approach helps eliminate any spurious findings from our analysis.
Prevalence comparison across various databases and studies
When comparing the prevalence of antibiotic resistance between two studies or databases using the 2-proportion Z-test, the goal is to determine whether the difference in proportions of a specific outcome (e.g., resistance or susceptibility) between the two groups is statistically significant. We proceed as follows. First, we define the following terms: (1) is the proportion of tests showing resistance in the first study, calculated as , where is the number of resistant tests in study 1 and is the total number of tests in the first study. (2) Similarly, is the proportion of resistant tests in the second study, where is the number of resistant tests in the second study and is the total number of tests in the second study. Next, the Z-score is calculated as:
where is the pooled proportion, which provides an overall estimate by combining data from both studies:
The Z-score measures how many standard errors the observed difference () deviates from the hypothesized difference, which is zero under the assumption of no difference between the studies. Specifically, the p-value is calculated as:
If the absolute value of the Z-score exceeds the critical value (e.g., 1.96 for a 95% confidence level), the difference in proportions is considered statistically significant. Otherwise, the difference is deemed not significant, suggesting that any observed variation may be due to random chance. The p-value represents the probability of observing the Z-score under the standard normal distribution.
Results
BV-BRC had a lower prevalence of resistance than CRyPTIC for all the treatment lines
Our analysis of the BV-BRC dataset included 27,268 unique MTB isolates, out of which 27,239 were successfully retrieved. From these, 469 isolates were also included in the CRyPTIC dataset and had therefore previously been used to train in our earlier study16. A total of 61 isolates derived from non-human hosts were also excluded. Thus, this study examined 26,709 MTB isolates and predicted AMR using . Among these isolates, 9161 lacked specific geographic information, while 622 had more than one national territory associated with them. A breakdown by continent showed 6,488 isolates from Europe, 3771 from Asia, 2548 from North America, 2431 from Africa, 1636 from South America, and 51 from Australia and Oceania. Isolates from the United Kingdom and Canada contributed the most, respectively, 3956 and 2444. Of the 26,709 isolates in this study, 20,845 had at least one laboratory AST for a specific antibiotic, while the remaining 5864 isolates only had the computational method of AMR prediction. The majority of these isolates lacked laboratory AST phenotypes for the 13 antibiotics and three antibiotic families, so we did not have a basis for evaluating the performance of MTB++. Despite this, the superior performance of demonstrated in our recent study16 supports its use in analyzing the prevalence of antibiotic resistance in MTB isolates. Consequently, a significant contribution of this study is the prediction of antibiotic resistance for a range of less commonly prescribed antibiotics in cases where laboratory AST data are not available.
Using MTB++ on the BV-BRC dataset, we observed the highest resistance prevalence in first-line antibiotic drugs: isoniazid (33.90%), rifampicin (28.00%), and ethambutol (25.26%). These figures exceed the resistance rates of other antibiotic classes by more than 10%, with the exception of rifabutin, as shown in Tables 2 and 3. In order to contextualize these results, we compared them to isolates collected from CRyPTIC (6224 isolates). The CRyPTIC resistance prevalence profiles among treatment lines align with the BV-BRC resistance prevalence, although the rates are higher; in detail, a resistance prevalence of 49.42% for isoniazid, 41.96% for rifampicin, and 34.72% for ethambutol was observed. A z-test for proportions rejected the null hypothesis of no difference between CRyPTIC and BV-BRC rates (p-values for isoniazid, rifampicin, and ethambutol were 9.8e–115, 3.7e–102, and 1.6e–51).
Table 2.
Comparative analysis of drug resistance in MTB isolates using contrasting the number of isolates deemed resistant and susceptible for the first-line, second-line, and NRDs MTB drugs among all the BV-BRC20 isolates in our analysis (26,709).
Resistance class | Resistant | Susceptible | |
---|---|---|---|
First line | Isoniazid | 9055 (33.90%) | 17,654 |
Rifampicin | 7476 (28.00%) | 19,233 | |
Ethambutol | 6746 (25.26%) | 19,963 | |
Second line | Rifabutin | 7229 (27.07%) | 19,480 |
Ethionamide | 2749 (10.29%) | 23,960 | |
Levofloxacin | 2273 (8.51%) | 24,436 | |
Moxifloxacin | 2173 (8.14%) | 24,536 | |
Kanamycin | 1968 (7.37%) | 24,741 | |
Amikacin | 1573 (5.89%) | 25,136 | |
NRDs | Clofazimine | 5 (0.02%) | 26,704 |
Delamanid | 0 (0.00%) | 26,709 | |
Linezolid | 3 (0.01%) | 26,706 | |
Bedaquiline | 9 (0.03%) | 26,700 |
Table 3.
Comparative analysis of drug resistance in MTB isolates using contrasting the number of isolates deemed resistant and susceptible to the superfamilies of MTB antibiotic drugs among all the BV-BRC20 isolates in our analysis (26,709).
Resistance class | Resistant | Susceptible | |
---|---|---|---|
Super families | Rifampin | 8129 (30.44%) | 18,580 |
Fluoroquinolones | 3047 (11.41%) | 23,662 | |
Aminoglycosides | 2056 (7.71%) | 24,653 |
Second-line antibiotic drugs (i.e. ethionamide, levofloxacin, moxifloxacin, kanamycin, rifabutin, and amikacin) exhibited at least a 15.00% lower resistance prevalence than the first-line antibiotic drugs. As indicated in Table 2, MTB++ found a resistance prevalence less than 11% for all second-line antibiotics, excluding rifabutin (ethionamide 10.29%, levofloxacin 8.51%, moxifloxacin 8.14%, kanamycin 7.37%, and amikacin 5.89%). Rifabutin, with a resistance prevalence of 27.07%, was the one exception of the trend exhibited by second-line antibiotics. The resulting resistance prevalence of rifabutin is comparable to that of first-line antibiotics and most likely a result of frequent use in antibiotic treatment of MTB20. As depicted by Table 1, CRyPTIC reports significantly higher resistance prevalence across all second-line antibiotics including rifabutin. A 2-proportion Z-test statistical analysis was used to validate the higher prevalence of all second-line antibiotics in CRyPTIC over BV-BRC, yielding statistically significant results across all antibiotics.
Table 1.
The number of isolates in the CRyPTIC dataset that are susceptible, resistant, ambiguous, and resistant to the 13 different antibiotic drugs.
Resistance Class | Resistant | Susceptible | Ambiguous | |
---|---|---|---|---|
First line | Isoniazid | 3047 (49.00%) | 3119 | 58 |
Rifampicin | 2589 (41.96%) | 3581 | 54 | |
Ethambutol | 2147 (34.50%) | 4036 | 41 | |
Second line | Rifabutin | 2417 (39.15%) | 3757 | 50 |
Ethionamide | 1320 (21.20%) | 4816 | 88 | |
Levofloxacin | 1220 (19.60%) | 4950 | 54 | |
Moxifloxacin | 984 (15.81%) | 5196 | 44 | |
Kanamycin | 573 (9.21%) | 5565 | 86 | |
Amikacin | 487 (7.82 %) | 5682 | 55 | |
NRDs | Clofazimine | 248 (3.98%) | 5882 | 94 |
Delamanid | 111 (1.78%) | 5989 | 124 | |
Linezolid | 79 (1.27%) | 116 | 29 | |
Bedaquiline | 45 (0.72%) | 6,059 | 120 |
The results are generated through AST. There are a total of 6224 MTB isolates.
Antibiotics that are viewed as the NRDs of defense, specifically clofazimine, delamanid, bedaquiline, and linezolid, have shown the lowest rates of resistance.
Table 2 illustrates that fewer than 1% of the isolates to be resistant to each of these antibiotics. This result translates to less than 10 resistant isolates among 26,709 BV-BRC samples. Again, when comparing these findings to the CRyPTIC dataset, we observed a notably higher resistance rate in CRyPTIC than in BV-BRC, as shown in Tables 1 and 2. Ultimately, in the BV-BRC isolates, antibiotic superfamilies were associated with a higher prevalence of resistance than their individual components. As shown in Table 2, the rifampin superfamily had a resistance prevalence of 30%, which is higher than both rifampicin and rifabutin. Similarly, the aminoglycosides superfamily showed a resistance rate of 7.7%, which was higher than both kanamycin and amikacin. Lastly, the fluoroquinolones family showed a resistance prevalence of 11.4%. It underlines the importance of considering both broad-spectrum and specific drug resistances when developing therapeutic strategies for BV-BRC infections.
Cross-resistance patterns varied across the first-line, second-line, and NRDs antibiotic drugs
Patterns of high cross-resistance were notably observed in first-line antibiotic drugs, with the combination of ethambutol and rifampicin showing a cross-resistance prevalence of 84.85%, closely followed by isoniazid and rifampicin at 79.75%, and isoniazid paired with ethambutol at 74.31%. As opposed to first-line antibiotics, second-line antibiotics had significantly lower cross-resistance rates, ranging from 13.3% between amikacin and ethionamide to 29.8% between kanamycin and levofloxacin. However, antibiotics within the same superfamily, such as rifabutin and rifampicin, amikacin and kanamycin, and levofloxacin and moxifloxacin, showed cross-resistance rates above 75%, with specific rates of 89.23%, 76.7%, and 79.93% respectively. All results for the cross-resistance analysis are shown in Fig. 1.
Fig. 1.
A heatmap illustrating the multi-drug-resistant MTB isolates for pairs of antibiotic drugs, quantified using the Jaccard index. Each cell in the heatmap represents the total number of isolates identified as resistant to both drugs within a pair, with the color intensity of the heatmap corresponding to the Jaccard index values. This intensity visually indicates the degree of co-resistance among the isolates, providing insights into the patterns of cross-resistance between different antibiotic combinations.
Furthermore, antibiotics considered as NRDs displayed minimal cross-resistance, both among themselves and with other antibiotics, as illustrated in the referenced figure. Specifically, among the BV-BRC isolates, fewer than five isolates showed resistance to other antibiotic drugs for each of the NRDs.
According to the World Health Organization’s definition of cross-resistance, the percentages of highly resistant tuberculosis (HR-TB), multidrug-resistant tuberculosis (MDR-TB), and extensively drug-resistant tuberculosis (XDR-TB) were 6.46%, 27.44%, and 8.39%, respectively on the whole BV-BRC isolates. Further, Table 4 presents the prevalence rates of MDR-TB, HR-TB, and XDR-TB across different continents. The findings indicate that North America and Oceania exhibited the lowest and highest prevalence of MDR-TB at 2.75% and 65.38%, respectively. For XDR-TB, North America showed the lowest prevalence at 0.12%, while Africa had the highest at 24.6%. Regarding HR-TB, the lowest and highest prevalence rates were observed in Africa and Oceania, at 4.07% and 11.54%, respectively. We note that the BV-BRC MTB isolates are not based on regular surveillance and each set of isolates has been uploaded in association with specific studies and goals. Therefore, the distribution of the data and AMR prevalence among different countries varies significantly.
Table 4.
Number of isolates, the MDR-TB, HR-TB, and XDR-TB prevalence corresponding to each continent across the BV-BRC dataset.
Continent | Isolates | MDR-TB | HR-TB | XDR-TB |
---|---|---|---|---|
Europe | 6488 | 24.6 | 4.36 | 6.89 |
Asia | 3771 | 50.01 | 8.64 | 13.66 |
North America | 2548 | 2.75 | 6.75 | 0.12 |
Africa | 2431 | 53.19 | 4.07 | 24.6 |
South America | 1636 | 56.36 | 8.99 | 8.86 |
Oceania | 52 | 65.38 | 11.54 | 11.54 |
Feature analysis supported numerous resistance mechanisms previously identified by CRyPTIC
Alignment of k-mer signature features to the MTB reference genome (See Subsection 2.4) revealed many drug resistance genes in the BV-BRC isolates that were also found previously in the CRyPTIC dataset14. Across all 13 antibiotics, 101 genes out of 195 reported by the CRyPTIC consortium were found also in BV-BRC. With respect to first- and second-line drugs, the rpoB gene was found to be associated with resistance to all the first- and second-line antibiotic drugs. We note that rpoB gene encodes subunits of RNA polymerase, which is an enzyme complex responsible for transcribing DNA into RNA. RNA polymerase plays a central role in the process of gene expression, enabling the synthesis of messenger RNA (mRNA), which is subsequently translated into proteins. The rpo genes are essential for the viability of the bacterium and its pathogenicity32–34. As noted by Farhat et al.35, there is a growing body of evidence indicating an association between the RNA polymerase b subunit gene (rpoB) and resistance to rifabutin and rifampicin. This correlation is increasingly recognized due to the expanded utilization of molecular diagnostic assays that expediently identify the genetic mutation in the rpoB gene. These mutations serve as indicators for rifampicin resistance and multidrug-resistant tuberculosis (MDR-TB).
The resistance to first-line antibiotic drugs was associated with the following genes: embB, katG, and exclusively rpsL as detailed in Table 5. The CRyPTIC consortium also documented these resistance associations. It should be noted that resistance to ethambutol is primarily conferred through mutations in the embB gene, which encodes for the arabinosyltransferase involved in the synthesis of the cell wall arabinogalactan; this mutation thereby reducing the drug’s ability to inhibit this vital process36. The novel associations of embB with amikacin and kanamycin, identified in this study, extend the significance of this gene beyond its established role in ethambutol resistance. These novel associations suggest a broader functional role for embB in resistance mechanisms. Clinically, this could lead to reconsideration of treatment strategies, particularly in multidrug-resistant strains where the use of aminoglycosides like amikacin and kanamycin is prevalent. The katG gene encodes for a catalase-peroxidase enzyme necessary for isoniazid activation into its toxic form within the bacterial cell and its mutations can introduce isoniazid resistance37. Moreover, rpsL gene encodes the ribosomal protein S12; the rpsL mutations can lead to alterations in the target site of certain antibiotics, notably streptomycin, one of the first antibiotics used in MTB treatment. Streptomycin binds to the ribosome and disrupts protein synthesis by causing a misread of mRNA. However, mutations in the rpsL gene can alter the structure of the ribosomal protein S12, reducing or eliminating the binding affinity of streptomycin to the ribosome. As a result, the antibiotic drug cannot exert its bactericidal effect, leading to resistance from the adaptive capability of MTB to evolve under the selective pressure of an antibiotic treatment38,39.
Table 5.
Genes were identified by aligning the statistically significant features (Identified by the feature analysis detailed in Section “Alignment of features to reference genome”) associated with first-line, second-line, and NRDs antibiotic drug resistance against the MTB reference genome (H37Rv). The genes corroborating with findings from the CRyPTIC study14 are shown in this table.
Antibiotic drug | MTB gene |
---|---|
Isoniazid | rpoB, rpsL, fabG1, katG, embB, gid |
Rifampicin | rpoB, katG, rpoC, rpsL, fabG1, relA, Rv3183, Rv3327, dxs2, guaA, embB |
Ethambutol | embB, embA, gyrA, rpoB, rpsL, Rv1371, rpsA, Rv1752, katG, pncA, Rv3183, spoU, guaA |
Rifabutin | rpoB, rpoC, cysA2, mprA, katG, Rv2277c, Rv2650c, cysA3, embB |
Ethionamide | gyrA, PPE3, Rv0565c, rpoB, rrs, Rv1371, fabG1, inhA, Rv2019, pncA, plsC, mpt53, PPE56, Rv3698, embA, embB, ethA |
Levofloxacin | gyrA, mce2F, rpoB, rrs, katG, embB |
Moxifloxacin | gyrA, rpoB, rrs, katG, embB |
Kanamycin | rrs, eis, gyrA, narU, mmaA4, rpoB, pgi, lprCC, murA, Rv1362c, Rv1371, fabG1, pncA, glnE, PPE42, viuB, ethA |
Amikacin | rrs, gyrA, Rv0078A, narU, rpoB, Rv0792c, Rv1371, glnE, PPE42, cyp141, PPE54, espA, ethA |
Linezolid | rplC, PE_PGRS4 |
Bedaquiline | rrs, PPE54 |
Genes gyrA and rrs were found as significant mechanisms associated with resistance to all the second-line antibiotic drugs with the exception of rifabutin. Similar findings were reported by the CRyPTIC consortium report14. GyrA gene mutations have been reported in many studies as resistance-conferring for many first- and second-line antibiotic drugs such as the fluoroquinolones family of antibiotics14,40. Fluoroquinolones are a class of antibiotic drugs that work by targeting an enzyme called DNA gyrase, which helps an MTB bacterium replicate and transcribe its DNA. Having this enzyme blocked, the MTB bacterium can not transcribe DNA and replicate. Nonetheless, changes in the gyrA gene can modify the enzyme’s 3-D structure, making antibiotic drugs less effective at binding affinity. Fluoroquinolones frequently become less effective against MTB and other bacteria40–43 due to this process. Moreover, the rrs gene encodes the 16S rRNA of the 30S ribosomal subunit, which is critical for maintaining the structure and function of an MTB bacteria’s ribosome. This includes the binding sites for antibiotics like amikacin, and kanamycin. Resistance typically arises through mutations leading to alterations in the ribosomal binding sites of these antibiotics, thus preventing their effective binding and action. The rrs gene has been shown to be significantly connected with the resistance to aminoglycosides class of antibiotics14,44–47.
Interestingly, we found there was an exclusive association of guaA and Rv3183 with rifampicin and ethambutol, which was substantiated by the previous findings of CRyPTIC. The guaA gene is responsible for encoding the enzyme Guanosine monophosphate synthetase (GMPS), which plays a crucial role in the biosynthesis of guanine nucleotides in MTB. Specifically, GMPS has a significant role in the transformation of xanthosine 5’-monophosphate into guanosine 5’-monophosphate, representing an essential process within the purine biosynthesis pathways critical for cellular nucleotide production. Additionally, investigations into the guaA gene across various bacterial pathogens have revealed its association with virulence traits. Yet, a comprehensive understanding of the direct physiological consequences stemming from guaA deletion in MTB remains not fully addressed. Additionally, Rv3183 is predicted to encode a transcriptional regulator in the MTB bacterium48; however, its accurate functionality has yet to be investigated. It is crucial to emphasize that further exploration is needed to understand the effects of GuaA gene variants and Rv3183 on MTB drug resistance. Apart from this study, the only other research associating these genes to MTB drug resistance is the CRyPTIC consortium14.
Regarding NRDs antibiotics, linezolid resistance is conferred through mutations in the rplC gene as also reported by the CRyPTIC consortium. Exclusively detected for linezolid, the rplC gene encodes the ribosomal protein L3. Resistance is conferred through specific mutations within the rplC gene, disrupting the binding site of linezolid. Prior studies have confirmed that mutations in the rplC genes are the primary genetic mutations associated with linezolid resistance in MTB pathogen49. Additionally, resistance to bedaquiline was found to be associated with the genes rrs and PPE54. PPE genes are a family of resistance mechanisms that have been associated with antibiotic drug resistance in second-line and NRDs antibiotics in the CRyPTIC consortium. Nevertheless, our findings did associate some of the PPE genes as novel mechanisms of resistance to various antibiotic drugs.
In summary, the alignment of k-mer signature features to the MTB reference genome revealed several significant gene associations with antibiotic resistance across BV-BRC isolates, corroborating findings from the CRyPTIC datasets. Notably, the identification of novel associations, such as embB with amikacin and kanamycin, extends our understanding of the gene’s role beyond ethambutol resistance, potentially influencing clinical treatment strategies, particularly in the management of multidrug-resistant strains. Additionally, the discovery of exclusive associations between guaA, Rv3183, and resistance to rifampicin and ethambutol provides new avenues for research into the mechanistic roles of these genes in resistance. These findings highlight the need to further explore the functional impact of these genes and their potential to improve diagnostic tools and targeted therapies in the future. Through the integration of these novel associations, molecular diagnostic techniques may be able to provide more precise and personalized treatment approaches for MTB, which may improve clinical outcomes in drug-resistant cases.
Feature analysis yielded potentially new mechanisms of resistance for each antibiotic drug
Many of the genes that were found to be associated with resistance to an antibiotic drug were not reported in CRyPTIC14. Table 6 outlines these associations. These genes can help uncover novel mechanisms of resistance, and therefore, require further investigation. The proline-proline-glutamic acid (PPE) family genes were found to be associated with resistance to six antibiotics: isoniazid, rifampicin, ethionamide, kanamycin, moxifloxacin, and bedaquiline. Some other genes of the PPE family have also been previously associated with amikacin, ethionamide, kanamycin, and bedaquiline and in CRyPTIC14. These genes encode a vast array of proteins characterized by a Pro-Pro-Glu motif, which are implicated in modulating the host immune response, thereby facilitating MTB pathogen survival within the hostile intracellular environment of host macrophages. Emerging evidence suggests that PPE proteins may also contribute to drug resistance by altering drug targets, affecting cell wall integrity, and regulating the expression of genes associated with drug efflux systems50. For instance, PPE42 was found to be associated with antibiotic drug resistance in both amikacin and kanamycin—both of which are aminoglycosides.
Table 6.
Gene from the MTB reference genome (H37Rv) that were deemed to be associated with resistance to each antibiotic drug through the feature analysis described in Section “Alignment of features to reference genome”, and were not reported by CRyPTIC14.
Antibiotic drug | MTB gene |
---|---|
Isoniazid | Rv0095c, Rv0298, PE_PGRS6, B55, rpsN1, PE_PGRS13, pks12, rpmG1, Rv2128, Rv2237A, mcr16, lrpA, PPE59 |
Rifampicin | Rv0298, PE_PGRS6, B55, rplN, PE_PGRS13, esxI, PE_PGRS20, cut1, blaI, pks12, rpmG1, Rv2128, mcr16, esxO, lrpA, PPE59, esxV |
Ethambutol | Rv0095c, Rv0298, Rv0699, mcr11, rpmG1, Rv2237A, esxO, lppA, lppB, vapB18 |
Rifabutin | Rv0071, Rv0095c, Rv0298, B55, Rv0699, PE_PGRS13, esxI, PE_PGRS28, cut1, wag22, Rv2021c, rpmG1, Rv2237A, mcr16, esxO, vapB17, lppA, lppB, vapB18, esxV |
Ethionamide | Rv0061c, Rv0095c, Rv0459, B55, Rv0810c, Rv0979c, mcr11, lppC, rpmG1, Rv2237A, mcr16, mazF8, esxO, Rv2468A, lppA, lppB, vapB18, PPE54, PPE57, lsr2, Rv3642c, Rv3656c |
Levofloxacin | Rv0071, Rv0072, Rv0073, Rv0074, Rv0095c, Rv0298, Rv0325, Rv0699, rplX, mcr11, Rv1532c, Rv1904, rpmG1, Rv2237A, esxO, fas, vapB17, lppA, lppB, vapB18, Rv3785, Rv3898c |
Moxifloxacin | Rv0071, Rv0072, Rv0073, Rv0074, Rv0095c, Rv0298, PPE8, Rv0699, PE_PGRS22, mcr11, Rv1532c, Rv1904, rpmG1, Rv2237A, uspC, Rv2319c, esxO, vapB17, lppA, lppB, vapB18, Rv2816c, Rv2817c, Rv2818c, Rv2819c, Rv2820c |
Kanamycin | Rv0061c, Rv0095c, PPE5, Rv0378, B55, rpmD, Rv2237A, mcr16, esxO, PPE54, Rv3656c, vapB48, PPE67, embB |
Amikacin | Rv0061c, Rv0072, Rv0073, Rv0095c, Rv0298, B55, Rv0699, esxJ, esxL, mcr11, esxN, Rv2021c, Rv2237A, mcr16, esxO, vapB17, lppA, lppB, vapB18, Rv2923c, esxS, Rv3656c, embA, embB, gid |
Linezolid | Rv1088a, lipJ, esxS, Rv3654c, serV |
Clofazimine | argV |
Bedaquiline | PE_PGRS5, PPE8, PE_PGRS7, Rv0750, esxI, Rv1047, argV, garA, lipJ, PE_PGRS43, Rv2512c, lysU, esxR, esxS, Rv3023c, esxV, Rv3716c, serV, esxC, PE36 |
The associations are categorized by first-line, second-line, and NRDs antibiotic.
The PE_PGRS family of genes were found as novel resistance mechanisms associated with five antibiotic drugs: isoniazid, rifampicin, rifabutin, moxifloxacin, and bedaquiline. It is noteworthy that PE_PGRS4 was reported to be associated with linezolid resistance by CRyPTIC. This family of genes is characterized by the presence of a Pro-Glu (PE) motif at their N-terminus and a polymorphic GC-rich repetitive sequence (PGRS) domain. This family is highly represented within the MTB genome and is thought to play significant roles in the bacterium’s pathogenesis, immune evasion, and interaction with the host immune system. The precise roles of many PE_PGRS proteins are yet to be fully determined. The diversity of these proteins presents challenges for developing vaccines, necessitating immune responses that can effectively target a wide array of variants51. Notably, clofazimine is linked to four of these genes, while linezolid is linked to only one (PE_PGRS4), also reported in CRyPTIC14. Both these drugs are mainly utilized in treating MDR-TB and XDR-TB.
A newly discovered potential mechanism for resistance identified is esxO (ectopic expression of MTB Rv2346c). In particular, it was observed that this genomic region is linked to resistance against both first- and second-line antibiotic drugs, with the exception of isoniazid. EsxO is the encoded Rv2346 gene, belonging to the ESAT6-subfamily 1 within MTB, and is located within the RD7 region of the MTB genome52. The function of esxO is to facilitate the invasion of host cells and sustain intracellular survival by influencing the host’s immune responses. Specifically, the esxO protein plays a significant role in mycobacterial pathogenicity by inhibiting the antibiotic-effector functions of macrophages. This inhibition is largely accomplished through the induction of oxidative stress, which causes genomic instability in host cells. The oxidative stress response is characterized by a heightened production of reactive oxygen species (ROS) and changes in the concentrations of nitric oxide (NO) and superoxide dismutase (SOD) activities in the infected macrophages. Such oxidative stress inflicts damage on cellular components, including DNA, proteins, and lipids, potentially leading to cellular dysfunction and death, or a compromised immune response53,54. The association between this protein and MTB drug resistance to various antibiotics remains a subject for in-depth exploration in future research.
Moreover, Rv0298 and rpmG1 were found to be novel resistance-associated genes to all the first-line antibiotic drugs. Also, Rv2237A and Rv0095c were found to be another significant resistance-associated factor to all second-line antibiotic drug. Further, the gene argV (encoding ) was uniquely found to be associated with bedaquiline and clofazimine. However, the effect of this gene on MTB antibiotic drug resistance is not clear. The genes esxS and exclusively serV, lipJ found to be associated with bedaquiline and linezolid. Moreover, linezolid was uniquely associated with Rv3654c, Rv1088a. The fluoroquinolones antibiotic drugs (levofloxacin and moxifloxacin) were exclusively associated with Rv1532c, Rv1904, and Rv0074 which can be candidates for novel mechanisms of resistance. We emphasize again that none of the aforementioned genes have yet been fully investigated; rather, they potentially represent novel mechanisms of resistance and should be subjected to further scrutiny.
Model prediction disagreements highlight the need for additional data for NRDs
To more thoroughly identify any limitations in the data used to develop the MTB++ models we applied Cohen’s kappa coefficient to quantify the level of agreement between MTB++ models on their prediction for the drug resistance of isolates in the BV-BRC dataset. Figure 2 outlines the observed kappa scores between both LR and RF MTB++ models when predicting isolate resistance. Significant levels of agreement are observed for all first- and second-line antibiotic drugs. Both models displayed near-perfect kappa scores (0.81-0.99) on amikacin (0.97), isoniazid (0.95), levofloxacin (0.95), the rifampin superfamily (0.89), rifabutin (0.89), rifampicin (0.88), kanamycin (0.83), ethionamide (0.81), and the aminoglycosides superfamily (0.81). The models also displayed substantial agreement (0.61-0.80) on moxifloxacin (0.78), ethambutol (0.78), and the fluoroquinolones superfamily (0.68). Little to no agreement was observed on the NRDs, bedaquiline (0.00), clofazimine (0.04), delamaind (0.00), and linezolid (0.04).
Fig. 2.
The Cohen’s kappa coefficients between both MTB++ models on antibiotic drugs and superfamilies. High levels of agreement between both the LR and RF MTB++ models are observed on the BV-BRC isolates for both first- and second-line drugs. NRDs show chance agreement between the models on predicting the resistance of BV-BRC isolates.
By more closely examining the features used by each of the MTB++ models, we can discern more information regarding the nature of their predictions while also taking into consideration the resulting kappa statistics. Table 7 outlines the total number of features used by each model and the number of these features that possess a coefficient or feature significance greater than zero for predicting resistance to each respective antibiotic. On average, both the LR and RF models used to predict resistance in first-line drugs utilized more features when compared to the models used for second-line and NRDs. Models applied to second-line drugs, while using fewer features than models applied to first-line drugs still used more features than NRDs. This result is observed for both the total features used and the non-zero features used for both LR and RF MTB++ models. Coinciding with the lowest Cohen’s kappa on the BV-BRC isolates, the LR model used to predict resistance in bedaquiline used the smallest amount of non-zero coefficients, which account for less than 1% of the total features used by this model. A strikingly low number of features greater than zero were also present with the RF delamanid model, using only 8 non-zero features. Interestingly, the amikacin LR and RF models both used 32 and 20 non-zero features respectively while also exhibiting the highest Cohen’s cappa of 0.97 on the BV-BRC isolates. This result alludes to the importance of the small subset of features used by both the LR and RF models when predicting the resistance to amikacin.
Table 7.
The number of features alongside the number of features with a magnitude of the coefficient (LR) or the significance (RF) greater than zero used by the MTB++ models when making predictions.
Antibiotic drug | LR feat. | LR feat. > 0 | RF feat. | RF feat. > 0 | LR mean | LR mean (abs) | RF mean |
---|---|---|---|---|---|---|---|
Isoniazid | 16,384 | 419 | 8192 | 322 | 0.00 ± 0.31 | 0.11 ± 0.29 | 0.00 ± 0.00 |
Rifampicin | 4,096 | 888 | 4096 | 764 | 0.01 ± 0.33 | 0.11 ± 0.31 | 0.00 ± 0.00 |
Ethambutol | 32,768 | 2157 | 65,536 | 781 | 0.01 ± 0.20 | 0.09 ± 0.17 | 0.00 ± 0.00 |
Mean | 17,749.33 | 1154.67 | 25,941.33 | 622.33 | 0.01 | 0.10 | 0.00 |
Std. dev | 11,745.04 | 734.16 | 28,047.55 | 212.48 | 0.00 | 0.01 | 0.00 |
Rifabutin | 32,768 | 615 | 1024 | 594 | − 0.01 ± 0.33 | 0.11 ± 0.31 | 0.00 ± 0.00 |
Ethionamide | 16,384 | 1774 | 16,384 | 1257 | 0.01 ± 0.18 | 0.08 ± 0.17 | 0.00 ± 0.00 |
Levofloxacin | 256 | 128 | 256 | 82 | − 0.08 ± 0.42 | 0.21 ± 0.37 | 0.01 ± 0.01 |
Moxifloxacin | 4096 | 128 | 1024 | 83 | − 0.06 ± 0.33 | 0.14 ± 0.31 | 0.01 ± 0.01 |
Kanamycin | 32,768 | 505 | 32,768 | 722 | 0.06 ± 0.24 | 0.09 ± 0.23 | 0.00 ± 0.00 |
Amikacin | 8192 | 32 | 65,536 | 20 | 0.37 ± 0.95 | 0.37 ± 0.95 | 0.05 ± 0.10 |
Mean | 15,744.00 | 530.33 | 19,498.67 | 459.67 | 0.050 | 0.17 | 0.01 |
Std. dev | 12,991.21 | 595.34 | 23,657.70 | 447.29 | 0.15 | 0.10 | 0.02 |
Clofazimine | 32,768 | 372 | 2048 | 472 | 0.04 ± 0.12 | 0.13 ± 0.15 | 0.00 ± 0.00 |
Delamanid | 8192 | 647 | 128 | 8 | 0.03 ± 0.18 | 0.10 ± 0.15 | 0.00 ± 0.00 |
Linezolid | 512 | 256 | 512 | 114 | 0.03 ± 0.23 | 0.17 ± 0.16 | 0.00 ± 0.00 |
Bedaquiline | 4096 | 8 | 1024 | 175 | − 0.00 ± 0.33 | 0.24 ± 0.22 | 0.01 ± 0.01 |
Mean | 11,392.00 | 320.75 | 928.00 | 192.25 | 0.02 | 0.16 | 0.00 |
Std. dev | 12,637.04 | 229.71 | 720.53 | 172.21 | ± 0.015 | ± 0.053 | ± 0.002 |
The mean and standard deviation of feature coefficients (LR), magnitude of feature coefficient (LR), and mean feature significance (RF) for the MTB++ models are also displayed. On average, more features are used by the models applied to the first- and second-line antibiotics when compared to the models applied to the NRDs.
Table 7 illustrates the mean and standard deviation of non-zero coefficients (LR), the magnitude of these coefficients (LR), and the feature significance (RF) categorized by antibiotic and drug class. When compared to the first-line drugs, the models predicting resistance in NRDs exhibit, on average, higher feature coefficients, greater magnitude of feature coefficients, and increased feature significance. However, concomitant to these higher averages, larger standard deviations are observed, suggesting a higher degree of variance in feature importance for models applied to NRDs as opposed to those applied to first-line antibiotics. Comparing the MTB++ models applied to second-line antibiotics with those of first-line and NRDs, we observe higher mean and standard deviation values for feature coefficients, coefficient magnitudes, and feature significance. This finding indicates that models for second-line drugs display larger values for important features but also demonstrate greater variance in these values on average. The scarcity of data alongside the insufficient identification of important features, lower standard deviation, and low kappa scores associated with models applied to NRDs may account for heightened feature significance or magnitude of coefficients. In contrast, the results for second-line drugs hint at more successful model convergence and the acquisition of more precise, highly important features by the models due to a higher availability of data.
To better decipher the connection between the features used by the models and the BV-BRC isolates predicted to be resistant to an antibiotic drug, we conducted a more meticulous analysis of the features used by the models when making predictions. Figure 3 illustrates the count of each feature used by the models for individual antibiotics across all isolates identified as resistant by MTB++. The isolates identified as resistant to first- and second-line antibiotics display a larger distribution and count of the features used by the MTB++ models. In contrast, the isolates predicted as resistant to NRDs show lower counts and more sparse distributions of features. This sparse distribution of features underlines that while many of the features from the CRyPTIC dataset used to develop the MTB++ models overlap with the features present in BV-BRC isolates for first- and second-line drugs, there appears to be scarcity in the overlap of features for NRDs. To determine the occurrence of the important features used to predict isolates as resistant, we isolated the top five important features used by each MTB++ model with the highest frequency of occurrence in the BV-BRC isolates predicted to be resistant. Table 8 illustrates the mean percent occurrence of the top five features with the highest mean magnitude of coefficient (MMC) or mean feature significance (MFS), for both MTB++ models applied to the BV-BRC isolates. Comparing second-line drugs to first-line drugs we see a larger percent occurrence of the top five features used by the LR MTB++ model with a slightly lower percent occurrence of the top five features for the RF MTB++ model. The MMC of the top 5 occurring features for second-line drugs (1.86) is slightly lower than that of the first-line drugs (2.32). However, the MFS of the top 5 occurring features for second-line drugs (0.04) is larger than that of the first-line drugs (0.01). When comparing both the MMC and the MFS of the five features for the NRDs to the first- and second-line antibiotics we see substantially lower values for both MMC (0.16016) and MFS (0.01). Contrasting the lower MMC and MFS, a much larger percent of occurrence for the top five features of both the LR and RF models are observed. The larger percent of occurrence of the top five features for NRDs is a direct result of the smaller number of isolates predicted to be resistant. Clofazimine, delamanid, linezolid, and bedaquiline all saw less than 10 predicted resistant isolates, with delamanid being the only antibiotic with no resistant isolates predicted. In contrast, the first- and second-line drugs all saw over 6000 and 1500 isolates predicted resistant, respectively. The low MMC and MFS of the top five features alongside low kappa scores and the lower number of features greater than zero on average allude to the need for more informative features for NRDs. The acquisition of additional AST data for NRDs may allow for the identification of more features and aid in the prediction of isolate resistance.
Fig. 3.
Illustration of the total number of occurrences of each feature (31-mer) in the BV-BRC isolates that were predicted to be resistant to each antibiotic. The features in this graph are ranked p-value. See Section “Alignment of features to reference genome” for the calculation of the p-value. We note that there is no graph for delamanid because no isolate was predicted to be resistant to delamanid.
Table 8.
The mean percent occurrence of the top 5 features, based on the magnitude of coefficient (LR) and feature significance (RF), in BV-BRC isolates predicted to be resistant to an antibiotic drug by MTB++.
Antibiotic drug | % Occurrence (LR) | Mean coef. | % Occurrence (RF) | Mean significance |
---|---|---|---|---|
Isoniazid | 23.66 ± 13.85 | 2.12 ± 1.03 | 24.61 ± 15.37 | 0.02 ± 0.00 |
Rifampicin | 8.35 ± 4.04 | 3.10 ± 1.48 | 24.61 ± 15.37 | 0.02 ± 0.00 |
Ethambutol | 8.53 ± 13.36 | 1.74 ± 0.24 | 33.22 ± 14.64 | 0.01 ± 0.00 |
Mean | 13.51 ± 7.17 | 2.32 ± 0.57 | 27.35 ± 4.15 | 0.01 ± 0.00 |
Rifabutin | 8.71 ± 2.78 | 2.78 ± 0.85 | 31.42 ± 14.96 | 0.01 ± 0.00 |
Ethionamide | 18.21 ± 17.12 | 1.99 ± 0.46 | 39.32 ± 9.87 | 0.01 ± 0.00 |
Levofloxacin | 0.85 ± 0.59 | 1.73 ± 0.11 | 0.85 ± 0.59 | 0.04 ± 0.01 |
Moxifloxacin | 2.95 ± 0.14 | 1.49 ± 0.07 | 2.95 ± 0.24 | 0.04 ± 0.01 |
Kanamycin | 26.82 ± 21.08 | 1.89 ± 0.67 | 45.62 ± 8.58 | 0.02 ± 0.00 |
Amikacin | 40.98 ± 10.76 | 1.28 ± 2.15 | 40.98 ± 10.76 | 0.12 ± 0.18 |
Mean | 16.42 ± 14.13 | 1.86 ± 0.48 | 26.86 ± 18.14 | 0.04 ± 0.04 |
Clofazimine | 100.00 ± 0.00 | 0.44 ± 0.21 | 84.00 ± 32.00 | 0.00 ± 0.00 |
Delamanid | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
Linezolid | 100.00 ± 0.00 | 0.20 ± 0.00 | 100.00 ± 0.00 | 0.01 ± 0.00 |
Bedaquiline | 37.78 ± 32.66 | 0.00 ± 0.00 | 64.44 ± 10.89 | 0.03 ± 0.01 |
Mean | 59.44 ± 42.70 | 0.16 ± 0.18 | 62.11 ± 38.01 | 0.01 ± 0.01 |
Models applied to NRDs displayed a significant difference in agreement when compared to models applied to first- and second-line drugs, a lower amount of overlapping features with BV-BRC isolates, and an insufficient identification of important features. This finding highlights the imbalances present in the CRyPTIC data used to develop the MTB++ models, emphasizing the need for more high-quality MTB AST data using NRDs. Additional AST data for NRDs can aid in developing MTB++ models that can provide more consistent results on these NRDs.
Discussion
Insights from BV-BRC data and comparison with global trends
Our examination of MTB isolates in BV-BRC15 unveiled a notable trend: the prevalence of resistance to second-line antibiotic drugs is markedly lower compared to that of first-line antibiotics, except for rifabutin. This observation echoes findings from earlier studies14,55–57. Additionally, we witnessed that NRDs exhibited the lowest prevalence of resistance. While it is anticipated that the reduced prescription of these antibiotic drugs lowers the likelihood of MTB bacteria developing mutations that confer immunity, it is important to acknowledge the presence of bias within the dataset. BV-BRC is not a randomized surveillance trial, and consequently, there exists a collection bias that impacts the estimates of resistance prevalence. This bias within BV-BRC has been previously highlighted by Abbasian and Smith55. Nonetheless, our results reflect those of prior studies14,57,58. For instance, the study by Yuen et al.58 highlighted a high prevalence of isoniazid-resistant MTB, indicating a rate of 26.1% among pediatric patients in Europe. Additionally, the study by Kundu et al.57 on the prevalence of antibiotic drug resistance in Bangladesh reported the highest prevalence of isoniazid resistance at 35%, followed by ethambutol resistance at 15% (rifampicin was not in the study), among the first-line antibiotics for 13,336 human patients with MTB. Moreover, Dean et al.59 studied prevalence and genetic profiles of isoniazid resistance in MTB patients on the data reported to WHO for the period 2003-2017 from 156 countries for 211,753 patients, and revealed the global prevalence of Hr-TB was 7.4% among new MTB patients and 11.4% among previously treated MTB patients. Similarly, the 2020 study of HR-TB prevalence in Australia revealed a prevalence of 7.2%60. Lastly, Yao et al.61 study on a total of 425 isolates that were included from 13 pilots in China revealed a prevalence of 26.8% and 6.8% for MDR-TB and XDR-TB respectively. Despite the varying rates and patterns of resistance observed across different settings and populations, these figures underscore the persistent challenge posed by both MDR-TB and XDR-TB. Such data are vital for tailoring public health responses and guiding the development of targeted therapies and diagnostic approaches.
Identification of key genes and associations between resistance classes
While the observed prevalence of resistance to various antibiotics provides valuable insight into the distribution of resistance across different drug classes, understanding the genetic underpinnings of this resistance is crucial for developing more effective diagnostic and therapeutic strategies. In light of this, we extended our analysis to investigate the genomic features associated with resistance. Our contextualization of the genomic features of the model revealed 195 MTB genes associated with resistance to 12 of the antibiotic drugs; delamanid was excluded due to the lack of predicted resistant isolates. Among these genes, rpoB and embB were most associated, linked to all tested antibiotics, with novel findings for embB with amikacin and kanamycin. Notably, 101 genes were also identified by the CRyPTIC consortium, confirming their broad resistance implications across multiple antibiotic classes. This strengthens prior research that hypothesized as to the resistance associations of rrs to the aminoglycosides family of antibiotics (i.e. amikacin and kanamycin) and gyrA to the fluoroquinolones family antibiotics (i.e. moxifloxacin and levofloxacin)40,45,47,62. These associations have been shown to exist in other species of bacteria43,63.
Moreover, our findings on cross-resistance patterns have significant implications for treatment strategies. The identification of shared resistance mechanisms across different antibiotic classes, such as those involving rpoB and embB, highlights the potential risk of cross-resistance influencing treatment efficacy. Understanding these genetic links enables clinicians to make more informed decisions when prescribing antibiotics, avoiding combinations of drugs that are likely to be compromised by cross-resistance. This can lead to more tailored treatment protocols that reduce the use of ineffective therapies and prevent the promotion of further resistance. Additionally, recognizing these cross-resistance patterns could guide the development of combination therapies that are less vulnerable to resistance, improving treatment outcomes for patients with multidrug-resistant MTB strains.
Biological relevance of the novel genomic features
While many studies14,64,65 have reported genes and genomic features associated with or even causing factors for the occurrence of antibiotic resistance, further studies are needed to discover newer mechanisms of resistance in MTB bacteria. In this study, we have reported novel genomic features associated with resistance in each antibiotic drug in Table 6. These novel features would require further studies in order to understand their underlying biological functions in regard to drug resistance. While identifying novel genomic features linked to drug resistance is essential, understanding specific regulatory mechanisms offers deeper insights into how MTB adapts and survives under therapeutic pressure. One such regulatory factor is LrpA, associated with resistance to the two primary anti-MTB drugs, isoniazid and rifampicin. LrpA is a global transcriptional regulator, pivotal in bacterial adaptation to stress and nutrient limitation66,67. It regulates genes involved in amino acid metabolism, ensuring balanced synthesis and breakdown. During nutrient starvation, lrpA expression is upregulated, enabling bacteria to adjust their metabolism68. Additionally, lrpA interacts with ppGpp, a key molecule in the stringent response, to influence persister formation, a state of bacterial dormancy that confers antibiotic resistance68. By coordinating these processes, lrpA empowers various species of bacteria to survive harsh conditions and persist in challenging environments. However, further study is needed to understand the role of lrpA in MTB resistance.
Further, B55 (MTS0479) was found to be associated with resistance in six of the first- and second-line antibiotic drugs. B55 is a small RNA (sRNA) in the MTB genome located between the genes Rv0609A and Rv0610c that is upregulated in response to oxidative stress, specifically induced by hydrogen peroxide (). B55 is also expressed under slow growth rate conditions before the addition of antibiotic drugs69,70. Further study by Arnvig et al. 71 suggests that B55 may be involved in intracellular survival mechanisms during the early stages of infection, aiding in the bacterium’s adaptation to hostile conditions within host cells. Nonetheless, the exact biological procedure and the direct effects of B55 on MTB drug resistance formation need further studies.
Model agreement highlights the need for additional AST data
Lastly, analysis of the agreement between the MTB++ models on BV-BRC isolates alongside a more in-depth analysis of the features used by these models has highlighted the need for additional MTB AST data. MTB++ models saw little to no agreement when predicting the resistance of NRDs. This is contrasted by the high levels of agreement between models on predicting resistance to first- and second-line antibiotics. Coinciding with the results using Cohen’s Kappa Coefficient to quantify model agreement, the features used by the models when making predictions on NRDs showed sparse count distributions with BV-BRC isolates. The lack of agreement between the MTB++ models when predicting resistance to NRDs can be explained by several machine-learning factors. Feature distribution discrepancies arise from the sparse or skewed feature distributions of NRDs, which may lead the models to rely on different subsets of features or even prioritize noise over signal, resulting in divergent predictions. Model sensitivity to outliers also contributes to the disagreement, as the rarity of resistance events may cause models to be disproportionately influenced by outliers or rare events, and each model may handle these differently. Additionally, algorithmic differences in the models, such as varying regularization methods or objective functions, can also affect how each model processes the same data, particularly when the data is limited. Lastly, inadequate training data representation for NRDs may fail to capture the full complexity of resistance mechanisms, further hindering the models’ ability to generalize and agree. Addressing these issues requires more comprehensive MTB AST data to ensure better feature representation and model performance.
Public health implications and strategies for combating MTB resistance
The implications of these findings extend beyond understanding resistance patterns and have significant public health relevance. Enhanced surveillance and more comprehensive data collection are critical, particularly for NRDs where data is sparse. Addressing this collection bias can improve the accuracy of resistance estimates and inform more effective health strategies. Additionally, the observed lower prevalence of resistance to second-line and NRDs suggests an opportunity to refine treatment protocols, encouraging clinicians to reserve these drugs for targeted use, thus delaying the emergence of resistance. Public health efforts should also focus on preventing first-line antibiotic resistance, as highlighted by the high prevalence of isoniazid-resistant MTB. Strengthening diagnostic tools and treatment adherence monitoring could minimize the spread of resistant strains. Furthermore, the identification of 195 resistance-associated MTB genes offers an avenue for developing personalized therapy, allowing treatments to be tailored to the specific resistance profiles of the infecting strain. Lastly, the comparison of global trends underscores the need for international collaboration and data sharing, which is crucial for addressing rare but highly concerning cases of drug resistance and ensuring globally relevant treatment strategies.
Further, the development and validation of AST protocols for novel MTB drugs, including bedaquiline, delamanid, linezolid, and clofazimine, are critical components of global MTB infection control. The precise identification of resistance and the implementation of effective treatment plans are made possible by these protocols. Given the emergence of MDR/XDR-TB, this ongoing effort is critical as newer agents such as bedaquiline and delamanid are being used in novel combination therapies. AST protocols are being iteratively refined in conjunction with the development of testing strategies for well-established MTB drugs, while treatment remains informed by robust evidence-based approaches. Research has indicated that to maximize drug efficacy while minimizing resistance development, it is necessary to use standardized methodologies and share global data with the public. In this evolving landscape, it is crucial to work together globally and update existing protocols to address the complex issues of drug-resistant MTB72,73.
Conclusion
Using the extensive genomics data from the BV-BRC, our analysis reveals a critical landscape of antibiotic resistance in MTB that underscores the urgency and complexity of addressing antibiotic resistance. In particular, our comprehensive examination of the BV-BRC dataset has provided a granular view of resistance patterns across first-line, second-line, and NRDs. Notably, we observed that the prevalence of resistance within the BV-BRC dataset was consistently lower than that reported in the CRyPTIC dataset for all treatment lines. This discrepancy highlights the variability in resistance across different datasets and the importance of leveraging diverse data sources for a more holistic understanding of antibiotic resistance patterns.
In addition, our analysis of cross-resistance patterns revealed nuanced interactions between different antibiotics, with significant implications for the treatment of tuberculosis. This finding suggests that resistance mechanisms in MTB are complex and often involve multiple genetic factors, as evidenced by the identification of both corroborated and potentially novel resistance mechanisms. Furthermore, our study emphasizes the critical need for comprehensive antibiotic susceptibility testing (AST) data, especially for NRDs. Lastly, our analysis of feature importance and model disagreement, particularly for NRDs, highlights the gaps in current datasets and underscores the importance of generating high-quality AST data to improve resistance prediction models.
In summary, this large-scale study provides insights into resistance prevalence, cross-resistance patterns, and the underlying genetic mechanisms of resistance. It contributes valuable knowledge to the field of infectious diseases by underscoring important implications for clinical decision-making and future research directions. Our findings have substantial potential for enhancing clinical decision-making in TB treatment; by accurately predicting antibiotic resistance (AMR) using genomic data, clinicians can more effectively tailor treatment regimens, thereby reducing the likelihood of treatment failure and improving patient outcomes. Moreover, the early and precise identification of drug-resistant strains can help prevent the spread of resistant strains of TB, supporting more targeted and effective public health interventions.
Limitations
The first limitation of this study relates to the analysis of resistance patterns within bacterial lineages. Due to the lack of standardized protocols for data submission in the BV-BRC database, inconsistencies arise in key details such as bacterial lineages, country of origin, and resistance levels. For instance, the minimum inhibitory concentrations (MICs) are often unclear, with no indication of whether they were obtained through broth microdilution or agar dilution methods, and the exact inhibition zones for disk diffusion are not consistently reported. These inconsistencies hinder a more detailed investigation into the prevalence of resistance across different lineages. Future studies could address this issue if the BV-BRC adopts a more organized and standardized data submission process.
The second limitation concerns the performance of the MTB++ models for predicting resistance to NRDs. As a machine learning-based classifier, MTB++ requires large datasets to achieve optimal performance. For antibiotics with fewer resistant isolates—particularly NRDs like delamanid—the limited data negatively impacted the models’ ability to accurately predict resistance, reducing their sensitivity in analyzing resistance prevalence for these drugs. Over time, MTB bacteria may develop resistance through processes such as spontaneous genetic mutations during DNA replication or selective pressure from antibiotic exposure. As more resistant isolates become available, more effective models can be developed.
Acknowledgements
The authors would like to thank the NVIDIA AI Technology Center at the University of Florida for their valuable feedback, technical support, and computational resources that contributed to the success of this project. Their insights and expertise were instrumental in refining our methods and optimizing the implementation of our models. The authors would also like to thank Zamin Iqbal and Jeff Knaggs. This work was supported by NSF SCH (Award No. 2013998) and NIH R01 (R01AI141810).
Author contributions
Conceptualization: MS, CB. Methodology: MS, CB, MP, CT. Investigation: MS, CT. Formal Analysis: MS. Visualization: MS, CT. Supervision: CB, MP, MS. Writing—original draft: MS, CB, CT, MP. Writing—review & editing: MS, CT, CB, MP
Data availability
All the data used in this study are accessible at https://www.bv-brc.org/view/Taxonomy/1763#view_tab=amr. Further, the software can be accessed at https://github.com/M-Serajian/MTB-plus-plus. The CRyPTIC14 data which were used to train in our previous study21 can be found at https://ftp.ebi.ac.uk/pub/databases/cryptic/release_june2022/.
All source code for this study are publicly available at https://github.com/M-Serajian/BV-BRC-Large-Scale-AMR-Prevalence-Analysis.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.World Health Organization. Global Tuberculosis Report 2024. (World Health Organization, 2024).
- 2.Ferri, M., Ranucci, E., Romagnoli, P. & Giaccone, V. Antimicrobial resistance: A global emerging threat to public health systems. Crit. Rev. Food Sci. Nutr.57(13), 2857–2876 (2017). [DOI] [PubMed] [Google Scholar]
- 3.Florensa, A. F., Kaas, R. S., Clausen, P. T. L. C., Aytan-Aktug, D. & Aarestrup, F. M. ResFinder-an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes. Microb. Genom.8(1), 1–10 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4....Hunt, M. et al. Antibiotic resistance prediction for Mycobacterium tuberculosis from genome sequence data with Mykrobe. Wellcome Open Res.4, 191 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Steiner, A., Stucki, D., Coscolla, M., Borrell, S. & Gagneux, S. KvarQ: Targeted and direct variant calling from fastq reads of bacterial genomes. BMC Genomics15, 1–12 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Verboven, L., Phelan, J., Heupink, T. H. & Van Rie, A. Tbprofiler for automated calling of the association with drug resistance of variants in Mycobacterium tuberculosis. PLoS ONE17(12), e0279644 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.McNeil, M. B., Dennison, D. D., Shelton, C. D. & Parish, T. In vitro isolation and characterization of oxazolidinone-resistant Mycobacterium tuberculosis. Antimicrob. Agents Chemother.61(10), 10–1128 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Timm, J. et al. Baseline and acquired resistance to bedaquiline, linezolid and pretomanid, and impact on treatment outcomes in four tuberculosis clinical trials containing pretomanid. PLOS Glob. Public Health3(10), e0002283 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Imperiale, B. R., Di Giulio, Á. B., Adrián Cataldi, Á. & Morcillo, N. S. Evaluation of Mycobacterium tuberculosis cross-resistance to isoniazid, rifampicin and levofloxacin with their respective structural analogs. J. Antibiot.67(11), 749–754 (2014). [DOI] [PubMed] [Google Scholar]
- 10.Ismail, N. A. et al. Defining bedaquiline susceptibility, resistance, cross-resistance and associated genetic determinants: A retrospective cohort study. EBioMedicine28, 136–142 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Maus, C. E., Plikaytis, B. B. & Shinnick, T. M. Molecular analysis of cross-resistance to capreomycin, kanamycin, amikacin, and viomycin in Mycobacterium tuberculosis. Antimicrob. Agents Chemother.49(8), 3192–3197 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Willby, M., Sikes, R. D., Malik, S., Metchock, B. & Posey, J. E. Correlation between gyra substitutions and ofloxacin, levofloxacin, and moxifloxacin cross-resistance in Mycobacterium tuberculosis. Antimicrob. Agents Chemother.59(9), 5427–5434 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Katale, B. Z. et al. Whole genome sequencing of Mycobacterium tuberculosis isolates and clinical outcomes of patients treated for multidrug-resistant tuberculosis in tanzania. BMC Genomics21, 1–15 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.The CRyPTIC Consortium. Genome-wide association studies of global Mycobacterium tuberculosis resistance to 13 antimicrobials in 10,228 genomes identify new resistance mechanisms. PLoS Biology20(8), e3001755 (2022). [DOI] [PMC free article] [PubMed]
- 15.Bacterial and Viral Bioinformatics Resource Center (BV-BRC). BV-BRC - Bacterial and Viral Bioinformatics Resource Center (2023). [DOI] [PMC free article] [PubMed]
- 16.Serajian, M. et al. Scalable de novo classification of antibiotic resistance of Mycobacterium tuberculosis. Bioinformatics40, 39–47 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother.75(12), 3491–3500 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Coll, F. et al. Rapid determination of anti-tuberculosis drug resistance from whole-genome sequences. Genome Med.7(1), 1–10 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Phelan, J. E. et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med.11, 1–7 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Olson, R. D. et al. Introducing the bacterial and viral bioinformatics resource center (bv-brc): A resource combining patric, ird and vipr. Nucleic Acids Res.51(D1), D678–D689 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Serajian, M. MTB-plus-plus. https://github.com/M-Serajian/MTB-plus-plus. Accessed 18 Dec 2023 (2023).
- 22.Tekin, A. et al. Development and validation of a preliminary multivariable diagnostic model for identifying unusual infections in hospitalized patients. Biomol. Biomed.24(5), 1387 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol.58(1), 267–288 (1996). [Google Scholar]
- 24.Breiman, L. Random forests. Mach. Learn.45, 5–32 (2001). [Google Scholar]
- 25.Hong, S. & Lynn, H. S. Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction. BMC Med. Res. Methodol.20, 1–12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Chikhi, R. & Medvedev, P. Informed and automated k-mer size selection for genome assembly. Bioinformatics30(1), 31–37 (2014). [DOI] [PubMed] [Google Scholar]
- 27.Chor, B., Horn, D., Goldman, N., Levy, Y. & Massingham, T. Genomic DNA k-mer spectra: Models and modalities. Genome Biol.10, 1–10 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A. & Korobeynikov, A. Using SPAdes de novo assembler. Curr. Protoc. Bioinform.70(1), e102 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Doster, E. et al. MEGARes 2.0: A database for classification of antimicrobial drug, biocide and metal resistance determinants in metagenomic sequence data. Nucleic Acids Res.48(D1), D561–D569 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler Transform. Bioinformatics25(14), 1754–1760 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Quinlan, A. R. & Hall, I. M. Bedtools: A flexible suite of utilities for comparing genomic features. Bioinformatics26(6), 841–842 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mani, C., Selvakumar, N., Narayanan, S. & Narayanan, P. Mutations in the rpob gene of multidrug-resistant Mycobacterium tuberculosis clinical isolates from India. J. Clin. Microbiol.39(8), 2987–2990 (2001). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Miller, L. P., Crawford, J. T. & Shinnick, T. M. The rpob gene of Mycobacterium tuberculosis. Antimicrob. Agents Chemother.38(4), 805–811 (1994). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Tracevska, T., Jansone, I., Broka, L., Marga, O. & Baumanis, V. Mutations in the rpob and katg genes leading to drug resistance in Mycobacterium tuberculosis in latvia. J. Clin. Microbiol.40(10), 3789–3792 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Farhat, M. R. et al. Rifampicin and rifabutin resistance in 1003 Mycobacterium tuberculosis clinical isolates. J. Antimicrob. Chemother.74(6), 1477–1483 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Sreevatsan, S. et al. Ethambutol resistance in Mycobacterium tuberculosis: Critical role of embb mutations. Antimicrob. Agents Chemother.41(8), 1677–1681 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Unissa, A. N., Subbian, S., Hanna, L. E. & Selvakumar, N. Overview on mechanisms of isoniazid action and resistance in Mycobacterium tuberculosis. Infect. Genet. Evol.45, 474–492 (2016). [DOI] [PubMed] [Google Scholar]
- 38.Cuevas-Córdoba, B. et al. rrs and rpsl mutations in streptomycin-resistant isolates of Mycobacterium tuberculosis from Mexico. J. Microbiol. Immunol. Infect.46(1), 30–34 (2013). [DOI] [PubMed] [Google Scholar]
- 39.Nair, J., Rouse, D. A., Bai, G.-H. & Morris, S. L. The rpsl gene and streptomycin resistance in single and multiple drug-resistant strains of Mycobacterium tuberculosis. Mol. Microbiol.10(3), 521–527 (1993). [DOI] [PubMed] [Google Scholar]
- 40.Malik, S., Willby, M., Sikes, D., Tsodikov, O. V. & Posey, J. E. New insights into fluoroquinolone resistance in Mycobacterium tuberculosis: Functional genetic analysis of gyra and gyrb mutations. PLoS ONE7(6), e39754 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lau, R. W. et al. Molecular characterization of fluoroquinolone resistance in Mycobacterium tuberculosis: Functional analysis of gyra mutation at position 74. Antimicrob. Agents Chemother.55(2), 608–614 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Von Groll, A. et al. Fluoroquinolone resistance in Mycobacterium tuberculosis and mutations in gyra and gyrb. Antimicrob. Agents Chemother.53(10), 4498–4500 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Weigel, L. M., Steward, C. D. & Tenover, F. C. gyrA mutations associated with fluoroquinolone resistance in eight species of Enterobacteriaceae. Antimicrob. Agents Chemother.42(10), 2661–2667 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Alangaden, G. J. et al. Mechanism of resistance to amikacin and kanamycin in Mycobacterium tuberculosis. Antimicrob. Agents Chemother.42(5), 1295–1297 (1998). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Garneau-Tsodikova, S. & Labby, K. J. Mechanisms of resistance to aminoglycoside antibiotics: Overview and perspectives. Medchemcomm7(1), 11–27 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Jana, S. & Deb, J. Molecular understanding of aminoglycoside action and resistance. Appl. Microbiol. Biotechnol.70(2), 140–150 (2006). [DOI] [PubMed] [Google Scholar]
- 47.Kotra, L. P., Haddad, J. & Mobashery, S. Aminoglycosides: Perspectives on mechanisms of action and resistance and strategies to counter resistance. Antimicrob. Agents Chemother.44(12), 3249–3256 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Bacon, J. et al. The influence of reduced oxygen availability on pathogenicity and gene expression in Mycobacterium tuberculosis. Tuberculosis84(3–4), 205–217 (2004). [DOI] [PubMed] [Google Scholar]
- 49.Beckert, P. et al. rplc t460c identified as a dominant mutation in linezolid-resistant Mycobacterium tuberculosis strains. Antimicrob. Agents Chemother.56(5), 2743–2745 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Qian, J., Chen, R., Wang, H. & Zhang, X. Role of the pe/ppe family in host-pathogen interactions and prospects for anti-tuberculosis vaccine and diagnostic tool design. Front. Cell. Infect. Microbiol.10, 594288 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Sharma, T. et al. The Mycobacterium tuberculosis pe_pgrs protein family acts as an immunological decoy to subvert host immune response. Int. J. Mol. Sci.23(1), 525 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Mustafa, A. S. Chemical and biological characterization of Mycobacterium tuberculosis-specific esat6-like proteins and their potentials in the prevention of tuberculosis and asthma. Med. Princ. Pract.32(4–5), 217–224 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Mohanty, S. et al. Mycobacterium tuberculosis esxo (rv2346c) promotes bacillary survival by inducing oxidative stress mediated genomic instability in macrophages. Tuberculosis96, 44–57 (2016). [DOI] [PubMed] [Google Scholar]
- 54.Pinto, S. M. et al. Integrated multi-omic analysis of Mycobacterium tuberculosis h37ra redefines virulence attributes. Front. Microbiol.9, 1314 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Abbasian, F. & Smith, J. Epidemiology of drug-resistant tuberculosis: A comprehensive review. J. Epidemiol. Glob. Health14(3), 215–226 (2024). [Google Scholar]
- 56.Becerril-Montes, P. et al. A population-based study of first and second-line drug-resistant tuberculosis in a high-burden area of the Mexico/United States border. Mem. Inst. Oswaldo Cruz108, 160–166 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Kundu, S., Marzan, M., Gan, S. H. & Islam, M. A. Prevalence of antibiotic-resistant pulmonary tuberculosis in Bangladesh: A systematic review and meta-analysis. Antibiotics9(10), 710 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Yuen, C. M., Jenkins, H. E., Rodriguez, C. A., Keshavjee, S. & Becerra, M. C. Global and regional burden of isoniazid-resistant tuberculosis. Pediatrics136(1), e50–e59 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Dean, A. S. et al. Prevalence and genetic profiles of isoniazid resistance in tuberculosis patients: A multicountry analysis of cross-sectional data. PLoS Med.17(1), e1003008 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Wilson, M., O’Connor, B., Matigian, N. & Eather, G. Management of isoniazid-monoresistant tuberculosis (hr-tb) in queensland, australia: A retrospective case series. Respir. Med.173, 106163 (2020). [DOI] [PubMed] [Google Scholar]
- 61.Yao, C. et al. Prevalence of extensively drug-resistant tuberculosis in a Chinese multidrug-resistant tb cohort after redefinition. Antimicrob. Resist. Infect. Control10, 1–8 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Avalos, E. et al. Frequency and geographic distribution of gyra and gyrb mutations associated with fluoroquinolone resistance in clinical Mycobacterium tuberculosis isolates: A systematic review. PLoS ONE10(3), e0120470 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Heisig, P., Schedletzky, H. & Falkenstein-Paul, H. Mutations in the gyra gene of a highly fluoroquinolone-resistant clinical isolate of Escherichia coli. Antimicrob. Agents Chemother.37(4), 696–701 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Consortium, C. et al. Epidemiological cut-off values for a 96-well broth microdilution plate for high-throughput research antibiotic susceptibility testing of m. tuberculosis. Eur. Respir. J.60(4) (2022) . [DOI] [PMC free article] [PubMed]
- 65.The CRyPTIC Consortium and the 100,000 Genomes Project. Prediction of susceptibility to first-line tuberculosis drugs by DNA sequencing. New England Journal of Medicine379(15), 1403–1415 (2018). [DOI] [PMC free article] [PubMed]
- 66.Brinkman, A. B., Ettema, T. J., De Vos, W. M. & Van Der Oost, J. The lrp family of transcriptional regulators. Mol. Microbiol.48(2), 287–294 (2003). [DOI] [PubMed] [Google Scholar]
- 67.Lewis, K. Persister cells. Annu. Rev. Microbiol.64(1), 357–372 (2010). [DOI] [PubMed] [Google Scholar]
-
68.Duan, X. et al. Mycobacterium lysine
-aminotransferase is a novel alarmone metabolism related persister gene via dysregulating the intracellular amino acid level. Sci. Rep.6(1), 19695 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Arnvig, K. B. & Young, D. B. Identification of small rnas in Mycobacterium tuberculosis. Mol. Microbiol.73(3), 397–408 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Jeeves, R. E. et al. Mycobacterium tuberculosis is resistant to isoniazid at a slow growth rate by single nucleotide polymorphisms in katg codon ser315. PLoS ONE10(9), e0138253 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Arnvig, K. & Young, D. Non-coding rna and its potential role in Mycobacterium tuberculosis pathogenesis. RNA Biol.9(4), 427–436 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Lange, C. et al. Management of drug-resistant tuberculosis. The Lancet394(10202), 953–966 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.World Health Organization. Global Antimicrobial Resistance and Use Surveillance System (GLASS) Report 2022. (World Health Organization, 2022).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All the data used in this study are accessible at https://www.bv-brc.org/view/Taxonomy/1763#view_tab=amr. Further, the software can be accessed at https://github.com/M-Serajian/MTB-plus-plus. The CRyPTIC14 data which were used to train in our previous study21 can be found at https://ftp.ebi.ac.uk/pub/databases/cryptic/release_june2022/.
All source code for this study are publicly available at https://github.com/M-Serajian/BV-BRC-Large-Scale-AMR-Prevalence-Analysis.