Skip to main content
Sage Choice logoLink to Sage Choice
. 2023 Nov 3;32(12):2423–2439. doi: 10.1177/09622802231211010

Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacterium tuberculosis resistance mutations

Clara Grazian 1,2,
PMCID: PMC10710010  PMID: 37920984

Abstract

Antimicrobial resistance is becoming a major threat to public health throughout the world. Researchers are attempting to contrast it by developing both new antibiotics and patient-specific treatments. In the second case, whole-genome sequencing has had a huge impact in two ways: first, it is becoming cheaper and faster to perform whole-genome sequencing, and this makes it competitive with respect to standard phenotypic tests; second, it is possible to statistically associate the phenotypic patterns of resistance to specific mutations in the genome. Therefore, it is now possible to develop catalogues of genomic variants associated with resistance to specific antibiotics, in order to improve prediction of resistance and suggest treatments. It is essential to have robust methods for identifying mutations associated to resistance and continuously updating the available catalogues. This work proposes a general method to study minimal inhibitory concentration distributions and to identify clusters of strains showing different levels of resistance to antimicrobials. Once the clusters are identified and strains allocated to each of them, it is possible to perform regression method to identify with high statistical power the mutations associated with resistance. The method is applied to a new 96-well microtiter plate used for testing Mycobacterium tuberculosis.

Keywords: Antimicrobial resistance, censored data, minimal inhibitory concentration distributions, mixture models, genome-wide association study

1. Introduction

Public health authorities throughout the world are becoming more and more concerned about antimicrobial resistance, due to the reduced ability of standard compounds to treat infectious diseases.13 Antimicrobial resistance mechanisms have been observed in bacteria,46 in fungi,7,8 and in viruses.9,10

There are two main causes of the development of drug resistance, that is, either the prescription of suboptimal treatments which encourage the development of resistance or direct transmission of resistant strains. Methods used to tackle the rise of antimicrobial resistance include a wiser prescription of antimicrobials, that takes into account known resistance patterns. Such patterns are studied through antimicrobial susceptibility testing to identify at which concentration of a particular drug the growth of the pathogen is inhibited. In this respect, microtiter plates allow the effectiveness of several drugs to be tested at the same time on a single clinical isolate.

Antimicrobial data, obtained through dilution methods, 11 are registered as minimum inhibitory concentration (MIC) values, expressed in milligrams per litre (mg/L). The MIC is defined as the minimal concentration of an antimicrobial substance that inhibits the visual growth of a pathogen after incubation. Since this type of test is more accurate than diffusion tests, MICs are considered the golden standard of susceptibility tests. 12 According to the experiment design adopted to obtain MIC values, data for a specific drug follow a distribution as shown in Figure 1. The shape of this distribution may vary considerably from drug to drug, according to the specific resistance patterns.

Figure 1.

Figure 1.

Barplots of the minimal inhibitory concentration (MIC) distributions of each drug under plate design UKMYC5. The y -axis represents the density of each class/dilution and the x -axis represents the log2(MIC) values.

The aim of this work is to propose a general method, the censored Gaussian mixture approach, to define clusters of strains showing different levels of resistance in order to associate them to specific mutations in genome-wide association studies (GWASs). 13 The method relies on a latent represention where a continuous variable, following a Gaussian mixture model with a prior on the number of components, is only partially observed. This representation allows to derive a posterior distribution on the number of clusters, that is, levels of resistance, and the allocations of strains to each cluster will be shown to have a better performance in association studies, in particular for rare or low-frequency mutations.

Although the methods presented in this work may be applied to any pathogen and any dilution method, the attention is focused on Mycobacterium tuberculosis, given the importance of the resistance mechanisms developed by this pathogen. While the trend of new cases of tuberculosis is decreasing, 14 the number of cases resistant to one or more drugs, in particular to first-line drugs (rifampicin, ethambutol, isoniazid, and pyrazinamide) is increasing.15,16

In this work, we propose an alternative approach to the standard definition of critical concentrations to define resistance. A critical concentration is the concentration used to classify isolates in the susceptible group or the resistant group. However, the critical concentrations of most of the anti-TB drugs have been recently revised and updated by the World Health Organization,17,18 through an extensive study of the literature, and it has emerged that the identification of critical concentrations is not a simple task, as is usually assumed. We propose, instead, to use a classification approach, where isolates are allocated to clusters of resistance, in order to identify potential intermediate levels to define phenotypic subgroups (and not only two main groups – susceptible and resistant); this multi-label classification will be shown to be essential in order to identify the mutations associated with specific levels of resistance more clearly, in particular for those antimicrobials for which only a few resistant cases are observed (e.g. for bedaquiline which is a new treatment).

When defining critical concentrations, it is ofted assumed that the wild-type group of isolates (defined as the group of isolates with no acquired resistance to antimicrobials) follows a log-normal distribution, as given by Turnidge et al., 19 where the cutoffs are identified by fitting a log-normal cumulative distribution, through non-linear least squares regression. This method is implemented in the ECOFFinder software available on the website of the European Society of Clinical Microbiology and Infectious Diseases (EUCAST). This method strongly relies on the assumption that a log-normal is a suitable model for the binary logarithm of MIC values and it does not take into account the region of the distribution where wild-type and non-wild types strains overlap. Jaspers et al. 20 relaxed the assumption that the wild-type distribution is log-normal: the MIC values for the wild-type distribution are still considered realization of continuous random variables, however, the authors model the MIC groupings with a multinomial distribution, with parameters corresponding to the probabilities to belong to any of the different concentrations (or dilutions) analysed on the testing plate.

Such methods, also defined ‘local’, rely on the possibility to well identify the wild-type group of isolates; however, in many cases, such as M. tuberculosis, the wild-type group itself may be heterogeneous. On the other hand, the approach proposed in this work is ‘global’, that is, it is aimed at modelling the whole mixing distribution. Jaspers et al.,2123 considered a mixture-type model

g(y)=πf1(y;θ1)+(1π)f2(y;θ2)

where f1 and f2 represent the wild-type and the non-wild-type component, respectively; f1 has a parametric form (log-normal or gamma), while f2 is fitted by following a nonparametric approach. Isolates are classified as wild-type when

πf1(yi;θ1)πf1(y;θ1)+(1π)f2(yi;θ2)0.5

However, these classification is still binary and may not realistically represent the available groups, in particular in the presence of intermediate levels of resistance.

Gaussian mixture models have already been suggested in the study of MIC distributions, for example, by Craig 24 and Annis and Craig. 25 However, although MIC values may be considered ideally continuous, they are registered as discrete values or, more specifically, as counts of the number of isolates associated to every dilution. Moreover, considering a fixed and known number of components is a strong implicit assumption when the resistance mechanism is not yet fully understood.

Mixture models for ordinal data, including regression on covariates, have been proposed by McLachlan and Jones, 26 Cadez et al., 27 and Hamdan and Wu 28 with estimation via expectation–maximization (EM) algorithm, and Kottas et al., 29 and DeYoreo and Kottas 30 in a Bayesian setting, among others. Similarly to the approach proposed in this work, a latent Gaussian variable is introduced, following a mixture model, to describe the behaviour of the implicit continuous variable which is observed at a discrete scale. The main difference with the censored Gaussian mixture model (censored GM) proposed here is that in previous works the latent continuous variables is modelled according to an infinite mixture of Gaussian distributions, using a Dirichlet process prior. However, Miller and Harrison 31 showed inconsistency of this model in estimating the number of clusters. The results on the available dataset (Section 4) will show such inconsistency empirically.

The use of a prior distribution on the number of components for a finite mixture model has been shown to allow for consistency in estimating the number of clusters, differently from the use of Dirichlet process priors. The reason for this is that, in finite mixture models, most of the prior mass is associated to clusters of similar size, while in Dirichlet processes the prior mass is associated to clusters of highly variable size, favouring an increasing number of small clusters. 32 See also Frühwirth-Schnatter et al. 33 for a recent characterization of the prior distribution on the number of clusters, induced by the prior distribution on the number of components.

The remaining of the article is organized as follows. Section 2 describes the dataset which motivates the study. The censored Gaussian mixture approach proposed in this work is formally presented in Section 3. Several approaches are applied and compared on the motivating dataset in Section 4. The labelling provided by the proposed method is then used to perform a GWAS in Section 5, in order to identify mutations associated with resistance to each of the antimicrobials under considerations: several previously unreported variants, or variants identified in smaller studies will be associated, to several levels of resistant to specific drugs. Section 6 concludes the article. Supporting Information includes a simulation study to test the approach.

2. The dataset: Resistance prediction by means of CRyPTIC

The CRyPTIC Consortium (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium) was created in order to collect and study about 20,000 isolates of M. tuberculosis and to define a catalogue of mutations associated with resistance to 14 antituberculosis compounds: three first-line drugs (isoniazid INH, rifampicin RIF, and ethambutol EMB), other drugs already used in practice as antituberculosis compounds (rifabutin RFB, amikacin AMI, kanamycin KAN, ethionamide ETH, phage-antibiotic synergy PAS, levofloxacin LEV, and moxifloxacin MXF), two new compounds (delamanid DLM and bedaquiline BDQ), and two repurposed compounds (clofazimine CFZ and linezolid LZD).

As part of the project, the CRyPTIC Consortium designed a UKMYC 96-well microtiter plate. The plate design has been validated by seven laboratories in Asia, Europe, South America, and Africa, by using 19 external quality assessment (EQA) strains, including the most frequently studied tuberculosis strain, H37Rv. A full description of the experiment and of the results, in terms of reproducibility of the plate, is available by Rancoita et al. 34 Since the highest level of reproducibility was identified for readings at day 14 after inoculation with the Vizion imaging system, attention is here only concentrated on data relative to this subset. Moreover, the PAS compound was shown to not perform well on the plate, and has therefore been discarded in the following part of the CRyPTIC study. For this reason, the outcomes relative to PAS, although still presented in this work, should be considered more uncertain. The validation experiment also showed that there is a biological variability depending on both the plate and the culture preparation, so that by repeating the culture of the same strain several times, a full distribution of possible values is obtained and this distribution is concentrated within three dilutions 95% of the times.

In this article, MIC values obtained from dilution experiments on a 96-well microtiter plate containing a liquid growth medium (broth) are analysed, where the same dose of pathogen is cultured in each well, but in the presence of successively increasing antimicrobial concentrations (double dilutions). The MIC value is identified as the concentration of the first well which does not allow the pathogens to grow. By convention, if growth is inhibited in all wells, the MIC is set to the lowest concentration available and, if growth is observed at each concentration level, the MIC is set to an agreed higher level of antimicrobial concentration that has not been studied on the plate.

While the results from the validating experiment were studied, the laboratories involved in the CRyPTIC consortium analysed the first set of strains with the initial plate design (Plate Design ‘UKMYC5’); in May 2018 a new plate design (Plate Design ‘UKMYC6’) was concorded and the laboratories started to use it in July 2018. The dataset used for the current work includes only strains analysed with the UKMYC5 plate design ( 7500 isolates). The absolute frequencies of isolates for each compound are shown in Table 1, while the empirical distributions of the log2 (MIC) for each compound are shown in Figure 1.

Table 1.

Number of isolates analysed for each compound.

Compound n Compound n Compound n
AMI 7312 ETH 7310 MXF 6385
BDQ 7054 INH 7097 PAS 6319
CFZ 6793 KAN 7207 RFB 7331
DLM 7016 LEV 6607 RIF 7145
EMB 6584 LZD 6420

RFB: rifabutin; AMI: amikacin; KAN: kanamycin; ETH: ethionamide; PAS: phage-antibiotic synergy; LEV: levofloxacin; MXF: moxifloxacin; INH: isoniazid; RIF: rifampicin; EMB: ethambutol; DLM: delamanid; BDQ: bedaquiline; CFZ: clofazimine; LZD: linezolid.

Figure 1 also shows an important feature of the dataset and, in general, of the problem of studying MIC distributions: the data are censored. First, the MIC value is only partially known at the boundary of the analysed concentration range. Moreover, the MIC values are not continuous variables as they are observed at fixed levels of concentrations (interval-censoring). Approaches usually applied to the estimation of MIC distributions often do not take into account these two sources of censoring, while one of the advantages of our proposed approach is being able to control for both of them.

3. The proposed models

Consider a set of random variables Y1,,Yn of size n . In the case of the application of Section 2, Yi represents the log2 (MIC) for the drug under analysis. In this work, each antimicrobial is considered as independent from the others, so Yi is a univariate random variable. A mixture model assumes that the distribution of Yi can be written as a composition of distributions known in closed form

g(yi;π,θ)=k=1Kπkfk(yi;θk)i=1,,n (1)

where fk() is the k -th component density of the mixture depending on parameter θk and πk is known as a mixture weight such that 0πk1 for k=1,,K , and k=1Kπk=1 . Even though the probability distributions fk() may be from any family (and can also model either discrete or continuous random variables), it is usually assumed, in many applications, that all the distributions in the mixture come from the same family, albeit denoted by different parameters. The number of components K is in general unknown and may be considered finite (finite mixture models) 35 or infinite (nonparametric mixtures). 36 When fk(;θk)=N(μk,σk2) , where μk and σk2 are the mean and the variance of the k -th component, respectively, the model is a Gaussian mixture model.

The model can be rewritten as

Yik=μk+εi,kεi,kN(0,σk2) (2)

where μk is an intercept specific to component k ; the intercept can be modelled such that E[Ys,d|k]=μs,d,k=ad,k+bs,d,k , where ad,k is an intercept that is specific of the compound d and bs,d,k is an intercept that is specific of the strain s , tested with compound d . In this work, antimicrobial are considered separately. A model that allows for interactions among antimicrobials may better identify cross-associations. However, in order to flexibly represent this interaction, complex multivariate models are needed. An initial analysis of the association among log2 (MIC) values recorded for pairs of antimicrobials showed that the dependence is highly non-linear, suggesting that non-normal models should be preferred. Therefore, such extensions are left for further research.

Equation (1) can be augmented including a latent variable relative to the allocation of an observation to a particular component: it is possible to hypothesize the existence of a latent variable Zi that assumes value in {1,,K} with probabilities {π1,,πK} and labels the component to which the observation belongs; in other words, the conditional density of Yi given Zi=k corresponds to the Gaussian distribution N(μk,σk2) . It follows that Z=(Z1,,Zn) is distributed according to a multinomial distribution. Diebolt and Robert 37 showed that this latent variable representation produces a Gibbs sampling that is ergodic and characterized by geometric convergence.

The decision to fit a mixture model is motivated by three reasons. First, it seems more appropriate to model the whole mixing structure rather than only the wild-type group of isolates (preferring a global method to a local method), since the classification is unsupervised and the microtiter plates under study are characterized by biological noisiness. Second, the mechanisms of resistance are heterogeneous; considering the possibility that the resistant group can be described by more than one component may allow to identify intermediate levels of resistance. Moreover, the complete patterns of resistance are not known for most of the drugs under analysis and are almost completely unknown for new drugs. This represents an important first step for subsequent analysis, like GWAS. Third, the standard way of defining a wild-type group is by looking at those isolates that have no known conferring-resistance mutations. However, strains of M.tuberculosis have been exposed to antimicrobials for decades and the so-called ‘wild-type’ group is itself heterogeneous. Using an unsupervised method, like a mixture model, allows to cluster strains into several groups, in order to separately investigate their genomic patterns and link the specific mechanisms of resistance to particular genomic variants.

Although the use of Gaussian components is already very flexible, it does not take into account the discrete and censored nature of the data: MIC values are not actually continuous, and are in fact rounded to the next two-fold dilution. Moreover, the data are truncated at the minimum and maximum dilution chosen for the plate. Therefore, it is possible to consider a mixture of distributions, where the discrete nature of the data is taken into account by rounding continuous (for instance, Gaussian) variables. A latent variable, Y*R is introduced, which is related to the observed variable Y that represents the registered MIC value, so that:

yi={dilution1,dyi*<dilution1,ddilutionj,ddilution(j1),dyi*<dilutionj,ddilutionmax,dyi*dilutionmax,d

that is, the observed yi assumes values in the dilution set, for drug d , on the basis of a Gaussian latent variable Yi* , which has the distribution described in equation (1). Here, dilutionmax,d is a value not actually tested on the plate, but at which observations are registered when growth of the pathogen is observed in every well: each value of Yi* larger than this maximum dilution corresponds to a value Yi equal to dilutionmax,d ; similarly, each value of Yi* smaller than the minimum dilution (i.e. no growth is observed in any well) corresponds to a value Yi equal to dilution1,d ; this is the way left and right censoring are dealt with in the proposed approach.

The probability mass function p() of Y=(Y1,,Yn) is defined as

p(yi=dilutionj,d)=dilutionj1,ddilutionj,dg(yi*)dyi*=dilutionj1,ddilutionj,dk=1Kπkfk(yi*;θk)dyi*

This approach may be considered as a generalization to the case of mixture models of the latent Gaussian representation of Albert and Chib 38 defined for discrete variables. The mixed nature of the data is transferred to an implicit and richer variable, which, when observed, is censored and then only registered at a discrete scale.

Several approaches are available to generalize latent variable algorithms 38 to mixture models, in particular in a nonparametric setting; for example, a nonparametric estimation for mixed count data based on the infinite mixture models is proposed by Kottas et al. 29 While these methods are similar to the one proposed here with respect to the latent representation, there are some important differences: the goal of these approaches is often density estimation and not clustering. In this work, the use of a finite mixture model with an unknown number of components is preferred in order to introduce the information that a small number of components is expected; this is particularly important in this setting, where the clusters are defined with a biological interpretation. Moreover, it avoids the inconsistency of the Dirichlet process in estimating the correct number of components discussed by Miller and Harrison. 31

The estimation of the proposed model is made within a Bayesian framework 39 to obtain posterior distributions (and the relative credible intervals) of all the parameters involved. In this analysis, it is necessary to define prior distributions for all parameters, such that they describe the prior knowledge the experimenter has about them. For the location and scale parameters, it is common to use weakly informative priors, for instance a N(μ0,τ2) for each location parameter μk , where τ2 is a precision parameter that can be fixed with respect to the range of the observations; for the precision parameters σk2 , it is often used a gamma prior distribution Γ(c,d) , with shape parameter a and rate parameter b ; see Richardson and Green 40 for a full description of these prior distributions. A Dirichlet prior distribution for the mixture weights is often considered, πDir(δ,,δ) for some choice of δ : Rousseau and Mengersen 41 and Grazian and Robert 42 suggested δ<1 for finite mixture model with a known number of components, which is set to be large in order to have a posterior distribution concentrated on a lower number of meaningful components; differently from their approach, here we set δ=1 and fix a prior distribution on the number of components K , to better investigate the ability of such prior distribution to encourage consistency of the posterior distribution towards the correct number of clusters.

The prior distribution for the number of components is known to be delicate. Here, the default prior distribution proposed by Grazian et al. 43 based on a loss-information definition is used, since it has shown a good balance between conservativeness and accuracy: it is important that the number of components is well estimated and that, at the same time, lower values of K are preferred to larger values, unless there is enough support for larger values. This assumption follows a parsimonious principle which helps both the interpretation and the estimation procedure. This prior distribution is defined for KN and is obtained by considering a loss function LossC(K) , representing a complexity loss which increases as the number of parameters increases, so that simpler models are preferred unless there is enough evidence to prefer more complex models. This loss function is associated to the prior distribution such that

p(K)exp{LossC(K)}

From this definition, Grazian et al. 43 derived a beta-negative-binomial distribution as prior distribution p(K) , where the number of successes before stopping the experiment is equal to one and with shape parameters α,β>0 .

The parameters α and β can be used to describe available prior information about the true number of components because

E(K)=α+β1α1,for α>1Var(K)=αβ(α+β1)(α2)(α1)2,for α>2

In this work, α and β are taken to be both equal to one as a default choice in the presence of weak prior information. This choice implies that the probability of success in the latent representation of the beta-negative-binomial is provided a uniform prior distribution, 43 resulting in the expression:

p(K)=1K(K+1)

This prior distribution assigns higher probability mass to small values of K , with the probability mass rapidly approaching zero as K increases.

It is worth reminding that a major issue when estimating the parameters of mixture models is the label-switching phenomenon, due to the symmetry in the likelihood of the model parameters. The method used to tackle this problem in this article is post-processing the output of the Bayesian algorithms to re-label the components and keep the labels consistent. In more details, once the MCMC samples are obtained, each sample is permuted to induce an identifiability constraint such that μ1<μ2<<μK . This can be shown to minimize the Kullblack-Leibler divergence between the estimated matrix of classification probability and the corresponding true matrix.44,45 Other methodologies can also been applied; see, for example, Celeux 46 and Sperrin et al. 47

4. Results

The methodology described in Section 3 is now applied to the dataset presented in Section 2. The goal of the analysis is to characterize the clusters representing different levels of resistance. Antimicrobials are analysed independently here.

For the parameters of the mixture, the following prior distributions are used: for each k=1,,K , μkN(0,100) ; 1/σkΓ(1.5,0.5) , so that very concentrated components are considered unlikely a priori. Finally, π is given a Dirichlet prior with all the parameters equal to δ=1 .

The censored Gaussian mixture model with conservative prior on the number of components proposed in this work has been compared with other three methods of classifications:

  • ECOFFinder, 19 as implemented in the R package antibioticR 48 ; three choices of the quantile of interest are selected and compared: 0.95 , 0.99 , and 0.999 ;

  • a Gaussian mixture (GM) model, as given by Annis and Craig 25 ;

  • a Dirichlet process (DP) mixture for discrete observations. 29

Supplemental Appendix A provides information about the MCMC scheme that was implemented for the censored Gaussian mixture model. For all methods, the MCMC algorithm has been implemented with 106 iterations, with a burnin of 105 iterations and using a ( ×10 ) thinning factor. A convergence study is available in Supplemental Appendix B.

Table 2 displays the estimated number of clusters obtained via the censored GM, along with the posterior means of the mixture weights associated with these clusters for each antimicrobial. Certain antimicrobials exhibit associations with two clusters; for instance, the majority of the new or repurposed drugs, such as BDQ, CFZ, and LZD. Conversely, DLM appears to possess a heavy tail, encompassing highly resistant strains as well as intermediate cases, as depicted in Figure 1. Interestingly, RIF, RFB, and INH each display three or four clusters, reinforcing the notion of multiple intermediate levels of resistance. In contrast, EMB presents a single cluster; however, it is worth noting that the range of MIC values might not have been precisely defined in the study.

Table 2.

Number of identified clusters, and posterior means of the relative mixture weights for the censored GM.

Compound K (MAP) π1 π2 π3 π4
AMI 2 0.9728 0.0272
BDQ 2 0.9982 0.0018
CFZ 2 0.9228 0.0772
DLM 3 0.9184 0.0678 0.0137
EMB 1 1.0000
ETH 2 0.8850 0.1150
INH 4 0.4949 0.0445 0.0647 0.3959
KAN 3 0.8166 0.0709 0.1125
LEV 2 0.8638 0.1362
LZD 2 0.8557 0.1443
MZF 3 0.7909 0.0951 0.1141
PAS 3 0.8374 0.0388 0.1238
RFB 3 0.6276 0.0371 0.3354
RIF 4 0.5310 0.0397 0.0753 0.3541

RFB: rifabutin; AMI: amikacin; KAN: kanamycin; ETH: ethionamide; PAS: phage-antibiotic synergy; LEV: levofloxacin; MXF: moxifloxacin; INH: isoniazid; RIF: rifampicin; EMB: ethambutol; DLM: delamanid; BDQ: bedaquiline; CFZ: clofazimine; LZD: linezolid; GM: Gaussian mixture.

The ground truth for the dataset described in Section 2 is not known: we have no information about whether a strain belongs to the wild-type group or to one of the resistant groups. However, some strains can be predicted to be resistant with high confidence because they are characterized by genomic variants well known to be associated with resistance to specific antimicrobials. For example, Walker et al. 49 reported candidate genomic variants from the literature and classified them as not conferring resistance, resistance determinants, or uncharacterized.

To test the ability of the compared methods to identify resistant cases, we selected strains predicted as resistant to first-line drugs (EMB, INH, and RIF) with high predictive ability ( >95% ) according to Walker et al. 49 Specifically, we used 14 variants in genes embA and embB for resistance to EMB, 42 variants in genes ahpC, fabG1, inhA, katG, and ndh for resistance to INH, and 30 variants in gene rpoB for resistance to RIF.

Table 3 shows the percentage of strains in the dataset correctly identified as resistant, meaning strains that were classified by the method as resistant and were characterized by one or more of the genomic variants selected from Walker et al. 49 as conferring resistance with high probability ( >95% ).

Table 3.

Percentages of strains characterized by known resistance mutations for the first-line drugs and correctly classified as resistant.

ECOFFinder ECOFFinder ECOFFinder Censored
Drug 0.95 0.99 0.999 GM DP GM
EMB 21.062 0.000 0.000 99.159 91.150 91.062
INH 94.131 92.054 92.054 97.813 97.813 92.054
RIF 91.904 91.904 91.904 97.885 97.885 93.508

INH: isoniazid; RIF: rifampicin; EMB: ethambutol; GM: Gaussian mixture; DP: Dirichlet process.

ECOFFinder directly produces cutoffs that classify isolates into a susceptible and a resistant group. For the other methods, it is assumed that the first component represents the susceptible isolates, while the others represent some level of resistance. Once the classification is done, the strains are checked for the presence of genomic variants identified by Walker et al. 49 to predict resistance. All the methods identify the resistant strains with an accuracy above 90% , except for ECOFFinder for EMB. GM and DP show very high levels of accuracy in identifying true positives, particularly for INH and RIF. The censored GM is characterized by a slightly lower level of accuracy but is still larger than 90% for all the first-line drugs.

The dataset described in Section 2 does not include information about susceptible strains. However, the experiment carried out by CRyPTIC was first validated by a pilot experiment, as described by Rancoita et al. 34 In this validation experiment, the fully susceptible strain H37Rv (reference strain) was subcultured and tested 10 times, and an additional 4 times as a blind strain in each of the laboratories participating in the experiment.

To study the percentage of strains correctly identified as susceptible (true negatives), the methods under comparison were then applied to the validation data-set, and the percentage of duplicates of strain H37Rv correctly identified as susceptible was recorded. For this analysis, we assumed the model with drug and strain intercepts (see Section 3). Percentages of true negative cases are shown in Table 4.

Table 4.

Percentages of strains H37Rv tested during the validation experiment and correctly classified as susceptible.

ECOFFinder ECOFFinder ECOFFinder Censored
DRUG 0.95 0.99 0.999 GM DP GM
AMI 91.500 91.500 91.500 35.000 35.000 95.333
BDQ 92.358 92.358 95.772 13.171 42.764 96.585
CFZ 95.772 95.772 95.772 56.944 56.944 96.528
DLM 94.316 96.448 97.869 98.579 77.798 94.316
EMB 98.152 100.000 100.000 0.185 63.586 98.152
ETH 97.976 97.976 97.976 23.609 82.799 99.325
INH 80.993 91.952 91.952 5.137 5.137 91.952
KAN 98.042 98.042 98.042 8.320 8.320 83.850
LEV 97.414 97.414 97.414 97.414 32.069 98.621
LZD 94.188 94.188 94.188 8.034 8.034 97.265
MXF 97.028 97.028 97.727 2.972 78.846 97.028
PAS 91.107 95.134 100.000 91.107 56.544 91.107
RFB 97.162 98.330 100.000 96.494 93.823 96.494
RIF 96.329 96.329 96.329 76.049 76.049 93.007

RFB: rifabutin; AMI: amikacin; KAN: kanamycin; ETH: ethionamide; PAS: phage-antibiotic synergy; LEV: levofloxacin; MXF: moxifloxacin; INH: isoniazid; RIF: rifampicin; EMB: ethambutol; DLM: delamanid; BDQ: bedaquiline; CFZ: clofazimine; LZD: linezolid; GM: Gaussian mixture; DP: Dirichlet process.

For the censored GM, duplicates of H37Rv are correctly identified as susceptible in most cases, more than 90% of the time for all drugs, except KAN. On the other hand, the percentages of correct classification for GM and DP are low for many antimicrobials (only 10 drugs for GM and 13 for DP are correctly classified in less than 90% of the cases). ECOFFinder performs well; however, the choice of the reference quantile has a strong impact on the performance of the method.

Comparing Tables 3 and 4 allows to see that the censored GM seems to perform well (with correct classification >90% ) in most of the cases, while ECOFFinder performs well to correctly classify the susceptible cases, but can have low performance in identifying the resistant cases; GM and DP show high levels of correct classifications for the resistant cases, but low levels of correct classification for the susceptible ones, and therefore they are not conservative enough.

In general, censored GM allows to reach good levels of correct classification without the introduction of additional information or experimental choices (as the choice of the reference quantiles) and can be seen as an automatic method of definition of the resistance levels, which is particularly important for the less investigated antimicrobials, but can also highlight unknown mechanisms of resistance for first-line drugs.

5. Application to GWASs

GWASs are a class of methods that involve a model of association of a particular phenotype (e.g. resistance to a specific antimicrobial or the MIC value with respect to that antimicrobial) to a set of genomic variants. Once a genetic association is identified, researchers can further study the biological mechanisms and develop better strategies to detect or treat the disease.

The methods can be classified depending on the type of covariates (e.g. single nucleotide polymorphisms (SNPs) or substrings of some length of the genome, k -mers) or the type of the response variable (e.g. a binary variable of classification for resistance, a continuous variable representing the MIC, or a discrete variable representing the level of resistance). See Marees et al. 50 and Uffelmann et al. 51 for recent reviews.

GWASs have been run for each of the antimicrobials under study, including SNPs of the whole genome as predictors. The involved model is

cYiMN(ni,pi,1,,pi,K)pi,k=exp(ηik)k=1Kexp(ηik)ηik=xiTβk+uuN(0,σu2Σ)

where cYi is a multinomial random variable representing the dilution into which the phenotype of observation Yi is classified (here the log2 (MIC) associated with each drug), ni is usually equal to one, xi is a p×1 vector of p SNPs, β is a p×1 vector of fixed effect size of genetic variants, which may or may not include an intercept (the SNP effect size), u is a random effect that captures the polygenic effect of other SNPs, and σu2 measures the genetic variation of the phenotype, Σ is the genetic relationship matrix. A Bayesian categorical regression has been performed, by assuming inverse gamma prior distributions for σu2 , and spike-and-slab priors for β . It is assumed that if the posterior distribution of βj is concentrated around zero (spike) or the corresponding credible intervals include zero with high-posterior probability (>95%), the coefficient is not significantly different from zero.

Table 5 includes all the variants that has been identified as positively associated to some levels of resistance in the isolates, for each compound. For each compound, the proposed approach has been able to suggest variants which are not included in the recent WHO Catalogue, 18 whose results are based on the same dataset analysed in our work. Some of these variants have already been proposed in the literature, but usually with experiments involving a small number of isolates, or only virulent version of H37Rv, or more generically associated with resistance but not for a specific compound.

Table 5.

Genomic variants which have been identified by the GWAS for each compound.

Drug Catalogue Not in the catalogue Reference
AMI rrs_G1484T Rv3639c_A132E Unreported
rrs_C1402 Rv3897c_G74V Unreported
rrs_A1401G Rv0823c_D156N Unreported
Rv2242_S43L Unreported
Rv2348c_I101M Unreported
mmpL10_K384T Unreported
lipL_S41G Unreported
BDQ Rv1979c_A-129G Rv0678_CG286-287 Guo et al. 52 (on H37Rv)
Rv0678_T179C Guo et al. 52 (on H37Rv)
Rv0678_G198* Guo et al. 52 (on H37Rv)
atpE_G61A Andres et al. 53 (124 patients, in vivo)
CFZ Rv1979c_D286G Rv1979c_T1052C Zhang et al. 61 (96 isolates)
Rv0678_S68G Zhang et al. 61 (96 isolates)
Rv0678_S53L Zhang et al. 61 (96 isolates)
Rv0678_S2I Xu et al. 60 (90 isolates)
Rv0678_M146T Xu et al. 60 (90 isolates)
Rv0678_L117R Xu et al. 60 (90 isolates)
Rv1979c_V52G Xu et al. 60 (90 isolates)
Rv0678_V52G Xu et al. 60 (90 isolates)
pepQ_L44P Almeida et al. 56 (on H37Rv)
DLM ddn_W20* Gómez-González et al. 54 (>33,000 isolates)
ddn_A76E Unreported
ddn_Y89A Unreported
ddn_L37G Unreported
Rv1676_E34T Unreported
EMB embA_c-12t embA_c-16g Perdigão et al. 83 (17 isolates)
embB_Q497R embA_c-16t Jouet et al. 84 (429 isolates)
embB_Q497K embA_c-11t Phelan 85 (518 isolates)
embB_G406A embB_N1033K Chen et al. 66 (110 isolates)
embB_G406D embB_Q1002R Earle et al. 74 (3144 isolates)
(embB_G406S) embB_E405D Napier et al. 86 (535 isolates)
embB_D328Y Rv1565_V48G Unreported
(embB_D354A) pknJ_V447A Unreported
embB_M306I Rv2000_Y305C Unreported
embB_M306V
embB_Y319C
embB_Y319S
ETH fabG1_c-15t ethA_T186K DeBarber et al. 87 (11 isolates)
inhA_S94A ethA_Y84D DeBarber et al. 87 (11 isolates)
(ethA_M1R) ethA_P51L DeBarber et al. 87 (11 isolates)
ethA_A381P DeBarber et al. 87 (11 isolates)
ethA_D55A Morlock et al. 88 (41 isolates)
ethA_G385D Morlock et al. 88 (41 isolates)
ethA_G413D Morlock et al. 88 (41 isolates)
ethA_G124D Brossier et al. 89 (87 isolates)
ethA_S266R Brossier et al. 89 (87 isolates)
ethA_I194T Machado et al. 90 (17 isolates)
ethA_T321P Unreported
ethA_Q246* Unreported
INH (ndh_g-70t) katG_A109V Cardoso et al. 91 (97 isolates)
katG_S315N inhA_I21T Hazbón et al. 92 (1011 isolates)
katG_S315T inhA_I194T Hazbón et al. 92 (1011 isolates)
katG_G125D Chen et al. 66 (110 isolates)
katG_S315I Jeeves et al. 93 (on strain H37Rv)
katG_S315R Jeeves et al. 93 (on strain H37Rv)
Rv3403c_S23R Unreported
Rv2896_S153A Unreported
Rv1922_D282Y Unreported
Rv0163_T45A Unreported
LEV gyrA_D94H gyrA_S91P Hameed et al. 94 (400 isolates)
gyrA_A90V ruvA_R39W Unreported
gyrA_D94G
(gyrB_E501D)
(gyrB_E501V)
gyrA_D94A
gyrA_D94N
gyrA_D94Y
gyrB_N499T
gyrB_D461N
LZD rplC_C154R rrs_G2447T Lee et al. 57 (41 isolates)
rrl_G2061T Hillemann et al. 58 (six isolates)
rplC_H155D Unreported
rrs_V403I Unreported
pks4_E537* Unreported
MXF gyrA_D94H secD_Y171D Unreported
gyrA_D94G Rv2923c_A46V Unreported
gyrA_D94N metS_A440V Unreported
gyrA_D94Y desA3_T236P Unreported
(gyrB_N499D) ruvA_R39W Unreported
RFB rpoB_H445D Li et al. 95 (154 isolates)
rpoB_H445Y Farhat et al. 96 (1003 isolates)
rpoB_S450L Farhat et al. 96 (1003 isolates)
RIF rpoB_D435F Rv1565c_V48G CRyPTIC Consortium 73
(rpoB_L452P) Rv2011c_D129 Cui et al. 72 (on H37Rv)
rpoB_S450Y Rv2011c_R128 Cui et al. 72 (on H37Rv)
rpoB_S450W
rpoB_S450Q
rpoB_S450L
rpoB_S431T
rpoB_Q432P
rpoB_I491F
rpoB_H445Y
rpoB_H445R
(rpoB_H445G)
rpoB_H445D
(rpoB_H445C)
rpoB_M434I
rpoB_V170F

The variants are clustered into (a) already present in the 2021 WHO Catalogue, 18 (b) not present in the 2021 WHO Catalogue; if the variant is not included int the 2021 WHO Catalogue, it is indicated whether the variant is already been suggested as associated with resistance to the particular compound or is unreported. Under the group already present in the catalogue, variants which are here identified but with credible intervals including zero are shown in brackets and in italic.

With respect to the new drugs, it is interesting to notice that the method has identified mutations on Rv0678 as involved in resistance mechanisms for BDQ, as suggested by Guo et al. 52 on mutant H37Rv (with a concentration of 0.5 mg/L was for mutant selection). Differently from Guo et al., 52 the study run here is able to suggest specific variants. Moreover, the method is able to identify one mutation on atpE as associated with resistance; the gene was previously found by Andres et al. 53 as mutated in seven out of 124 patients, within 9 months after the addition of BDQ and CFZ to the routine treatment. Our approach was also able to identify several mutations on ddn as associated with resistance to DLM: of these, just one was previously identified in a large (>33,000) study, 54 while four were previously unreported and two were found to be generically associated to resistance; in particular, Antonova et al. 55 found Rv1676 as generically associate to resistance.

Relatively to the repurposed drugs, in the 2021 WHO Catalogue no mutation meets the criteria for association to the CFZ resistance. On the other hand, in this study, we were able to associate three mutations in Rv1979c , six mutations in Rv0678 , and one mutation in pepQ . Among these, eight variants were already identified on smaller studies (96 isolates and 90 isolates, respectively), while the one on pepQ was suggested on a mutant variant of H37Rv used in vitro and in mice. 56 With respect to LZD, the 2021 WHO Catalogue only reports one mutation on rplC to be associated with resistance, while mutations on rrs and rrl do not meet their criteria. On the other hand, our approach finds one additional mutation on rplC , which was previously unreported, two mutations on rrs and one on rrl . In particular, rrs _G2447T was reported only on one patient, 57 and rrl _G2061T was associated to resistance to LZD in a study with only six isolates. 58 In particular, Rv1979c and Rv0678 are genes which have been recently suggested as possibly associated to resistance to CFZ in cohort studies5961 or in vitro. 62 As a note, Hartkoorn et al. 63 speculated that Rv0678 can represent a confounder when analysing resistance to BDQ and CFZ.

For these new or repurposed drugs, the proposed approach present one strong advantage. From Figure 1, it is evident that the distributions of some drugs present long tails, but with small number of cases with high values of MIC (BDQ, CFZ, and DLM): since the drugs have more recently introduced for treatment of tuberculosis, the bacteria have not yet developed widespread mechanisms of resistance. The ability to identify clusters to separate susceptible from resistant cases is important because in a GWAS for such drugs the signals coming from the susceptible cases are stronger than the ones coming from resistant cases, since there is a disparity in the number of strains associated with each group. As an example, Figure 2 shows the Manhattan plots for CFZ when using the clusters identified using GM (similar results for DP) and when using the clusters identified using censored GM. A Manhattan plot is a scatter plot displaying the p-values in logarithmic scale associated with each genomic variant, ordered on the x -axis depending on its position on the genome. The red line in the figures represents the threshold of significance, which is computed here through the Bonferroni correction. With GM thousands of variants appear to be positively associated with the phenotype, while they are reduced to only four significant variants when using censored GM: GM (and DP) identifies a larger number of clusters, including more than one cluster for the susceptible isolates. Therefore, a GWAS tends to explain the heterogeneity of the susceptible group. On the other hand, when using the clusters identified by censored GMM, it is possible to select few candidates that can be associated with resistance to CFZ.

Figure 2.

Figure 2.

Manhattan plots resulting from a genome-wide regression with outcomes given by level of resistance identified by a Gaussian mixture model (a) and the censored Gaussian mixture model proposed in this work (b).

The first-line drugs have been extensively studied in the literature. The proposed approach identifies most of the positive associations for resistance to INH in genes katG (high levels of resistance) and inhA (low levels of resistance). Several mutations have already been identified either in the 2021 WHO Catalogue or in other studies. Similar to the 2021 WHO Catalogue, several mutations already identified in the literature are here found to be not significantly different from zero: katG _W191G, 64 katG _L141F, 65 katG _L159P and katG _L704S, 66 katG _A614E. 67 In addition to mutations in katG and inhA , the proposed approach identifies few other genes that can be of interest for further investigation: Rv3403c , which was previously associated to resistance to INH by Kruh et al. 68 through an experiment on a guinea pig, Rv2896 , which has been associated generically to resistance in two isolates treated with INH by Niemann et al., 69 Rv1922 , which was generically associated to resistance by Mortimer et al., 70 and Rv0163 , whose mutations were found to be needed for M.~tuberculosis to seed in the lung of mice by Payros et al. 71

Most mutations present in the 2021 WHO Catalogue as associated to resistance to RIF have been identified in this study as well, however, it is interesting to notice that our approach is able to identify two variants relative to gene Rv2011c as having a significant impact in the evolution of low resistance levels; while this gene was previously proposed as possibly involved in mechanisms of resistance to RIF, 72 there is not yet agreement on its role. The role of mutations on rpoB is so strong that standard methods have difficulties in identifying the associations with respect to intermediate level of resistance; however, our approach is able to identify three groups of resistance (susceptible isolates, intermediate resistant isolates, and high resistant isolates), which allows more easily to associate the second group to the important variants. Moreover, the CRyPTIC Consortium 73 also identified Rv1565c as having a role in mechanisms of resistance to RIF through a standard GWAS, and here we are able to identify one specific variant.

Several variants already present in the 2021 WHO Catalogue as associated to resistance to EMB have been found with our approach (including few variants which were not found significant). Six more variants on genes embA and embB were identified, five of them were already reported in smaller studies, while embB _Q1002K was already found by Earle et al. 74 in a large study. Three previously unreported mutations were also found as significantly associated with low levels of resistance, on Rv1565 , on pknJ , and on Rv2000 . In particular, this last gene was already found to be generically associated to resistance in the Tulega Ferry isolate. 75

Among the other second-line drugs, the proposed approach allows to identify additional genes involved in the development of resistance to AMI, beyond rrs : in particular, Jain et al. 76 identified Rv3639c as highly up-regulated during the early stages of invasion for bacteria treated with AMI, Li 77 observed Rv3897c to be down-regulated in virulent H37Rv treated with AMI, Muzondiwa 78 identified Rv0823c _D156N as a compensatory mutation, Domenech et al. 79 suggested that Rv2242 might be a gene relevant to the host–pathogen dialogue, Bhargavi et al. 80 identified Rv2348c as involved in the interactome network, while mmL10 _K384T was found involved in the resistance to KAN by CRyPTIC Consortium. 73 KAN is no longer endorsed for TB treatment and does not appear in Table 5. Differently from the WHO Catalogue, our approach is not only able to identify mutations on fabG1c and inhA as associated to resistance to ETH, but also several variants on gene ethA , some previously unreported, and some already suggested by smaller studies (on < 100 isolates).

Most of the variants associated to resistance to LEV in the WHO Catalogue, on gene gyrA and gyrB , also result significant in the current study; moreover, one variant previously reported on a smaller study was also identified, and one variant on ruvA , which Klopper et al. 81 reported as generically associated to resistance in a study with 211 isolates. Similarly, most of the variants associated to resistance to MXF are also found here, together with several previously unreported variants; in particular, Sharma et al. 82 reported secD as generically associated to resistance on mutated strain of H37Rv through a proteomic approach, while Klopper et al. 81 reported Rv2923c as associated to resistance, even if its function was unclear.

Finally, three variants in rpoB were identified to be associated to resistance to RFB; all of them were already reported in previous smaller studies.

6. Conclusion

This work has proposed a method to analyse distributions of MICs through mixture models and allocate strains to groups representing different levels of resistance to the antimicrobials, instead of using a binary classification defined via critical concentrations. The method presents several advantages.

First, the use of mixture models allows to identify several levels of resistance and possibly associate each of them with different genomic variants: some of them can be associated with high level of resistance, while others can be associated with intermediate levels of resistance and the possibility to separate levels of resistance is important to identify rare genomic variants.

Second, the method is defined in a Bayesian framework and this allows to introduce assumptions on the phenomenon of resistance. In particular, Section 4 shows that an assumption of conservativeness in the number of groups in the mixture model allows to increase the accuracy of the classification of susceptible and resistant strains. On the contrary, using a uniform prior distribution on the number of components or a nonparametric approach based on Dirichlet process priors leads to a large number of components, and it is more likely to split the susceptible group into subgroups, which may hide resistance mechanisms in particular associated with rare variants.

Finally, the method allows to deal with the discrete nature of the registered data, which are characterized by double censoring: both interval censoring (data are recorded at fix levels of concentrations) and boundary censoring (there are a maximum and a minimum concentration tested in the plate). The possibility to deal with this double censoring reduces the bias in the estimation process noticed by Annis and Craig. 25

The proposed approach is flexible and general, and can be automatically applied, with the reduction of experimental inputs. At this stage, antimicrobials are treated independently, however, since treatments to tuberculosis are usually defined as combinations of drugs given at the same time to the patient, tuberculosis is known to have developed high levels of multi-drug resistance. Generalizations to a multivariate version of the approach are subject of current research; such modification needs to take into account the complex structure of dependence among drugs: while some drugs are dependent because they have similar chemical structure, other groups of drugs are dependent because they are often prescribed together and strains develop associated mechanisms of resistance. Therefore, it is reasonable to expect non-linear structure of dependence.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231211010 - Supplemental material for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacteriumtuberculosis resistance mutations

Supplemental material, sj-pdf-1-smm-10.1177_09622802231211010 for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacteriumtuberculosis resistance mutations by Clara Grazian in Statistical Methods in Medical Research

Footnotes

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship and/or publication of this article.

Supplemental material: Supplemental material for this article is available in another file.

References

  • 1.European Commission. A European one health action plan against antimicrobial resistance (AMR) . Brussels, Belgium: European Commission, 2017. [Google Scholar]
  • 2.Gelband H, Molly Miller P, Pant S, et al. The state of the world’s antibiotics 2015. Wound Healing Southern Africa 2015; 8: 30–34. [Google Scholar]
  • 3.World Health Organization. Global framework for Development and stewardship to combat antimicrobial resistance? World Health Organization, Geneva, Switzerland, 2017.
  • 4.Kohanski MA, DePristo MA, Collins JJ. Sublethal antibiotic treatment leads to multidrug resistance via radical-induced mutagenesis. Mol Cell 2010; 37: 311–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tenover FC. Mechanisms of antimicrobial resistance in bacteria. Am J Infect Control 2006; 34: S3–S10. [DOI] [PubMed] [Google Scholar]
  • 6.Zignol M, Hosseini MS, Wright A, et al. Global incidence of multidrug-resistant tuberculosis. J Infect Dis 2006; 194: 479–485. [DOI] [PubMed] [Google Scholar]
  • 7.Gulshan K, Moye-Rowley WS. Multidrug resistance in fungi. Eukaryotic Cell 2007; 6: 1933–1942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Vandeputte P, Ferrari S, Coste AT. Antifungal resistance and new strategies to control fungal infections. Int J Microbiol 2012; 2012: 713687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Unemo M, Nicholas RA. Emergence of multidrug-resistant, extensively drug-resistant and untreatable gonorrhea. Future Microbiol 2012; 7: 1401–1422. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Yim HJ, Hussain M, Liu Y, et al. Evolution of multi-drug resistant hepatitis B virus during sequential therapy. Hepatology 2006; 44: 703–712. [DOI] [PubMed] [Google Scholar]
  • 11.Wiegand I, Hilpert K, Hancock RE. Agar and broth dilution methods to determine the minimal inhibitory concentration (MIC) of antimicrobial substances. Nat Protoc 2008; 3: 163. [DOI] [PubMed] [Google Scholar]
  • 12.Turnidge J, Paterson DL. Setting and revising antibacterial susceptibility breakpoints. Clin Microbiol Rev 2007; 20: 391–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005; 6: 95. [DOI] [PubMed] [Google Scholar]
  • 14.Dheda K, Gumbo T, Maartens G, et al. The epidemiology, pathogenesis, transmission, diagnosis, and management of multidrug-resistant, extensively drug-resistant, and incurable tuberculosis. Lancet Resp Med 2017; 5: 291–360. [DOI] [PubMed] [Google Scholar]
  • 15.Falzon D, Mirzayev F, Wares F, et al. Multidrug-resistant tuberculosis around the world: what progress has been made? Eur Respir J 2015; 45: 150–160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.World Health Organization. Global tuberculosis report 2015. Geneva, Switzerland: World Health Organization, 2015. [Google Scholar]
  • 17.World Health Organization. Technical report on critical concentrations for drug susceptibility testing of medicines used in the treatment of drug-resistant tuberculosis. Technical report, World Health Organization, 2018.
  • 18.World Health Organization. Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance. Technical report, World Health Organization, 2021.
  • 19.Turnidge J, Kahlmeter G, Kronvall G. Statistical characterisation of bacterial wild-type MIC value distributions and the determination of epidemiological cut-off values. Clin Microbiol Infect 2006; 12: 418–425. [DOI] [PubMed] [Google Scholar]
  • 20.Jaspers S, Aerts M, Verbeke G, et al. Estimation of the wild-type minimum inhibitory concentration value distribution. Stat Med 2014; 33: 289–303. [DOI] [PubMed] [Google Scholar]
  • 21.Jaspers S, Aerts M, Verbeke G, et al. A new semi-parametric mixture model for interval censored data, with applications in the field of antimicrobial resistance. Comput Stat Data Anal 2014; 71: 30–42. [Google Scholar]
  • 22.Jaspers S, Verbeke G, Böhning D, et al. Application of the vertex exchange method to estimate a semi-parametric mixture model for the MIC density of Escherichia coli isolates tested for susceptibility against ampicillin. Biostatistics 2015; 17: 94–107. [DOI] [PubMed] [Google Scholar]
  • 23.Jaspers S, Lambert P, Aerts M. A Bayesian approach to the semiparametric estimation of a minimum inhibitory concentration distribution. Ann Appl Stat 2016; 10: 906–924. [Google Scholar]
  • 24.Craig BA. Modeling approach to diameter breakpoint determination. Diagn Microbiol Infect Dis 2000; 36: 193–202. [DOI] [PubMed] [Google Scholar]
  • 25.Annis DH, Craig BA. Statistical properties and inference of the antimicrobial MIC test. Stat Med 2005; 24: 3631–3644. [DOI] [PubMed] [Google Scholar]
  • 26.McLachlan GJ, Jones PN. Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics 1988; 44: 571–578. [PubMed] [Google Scholar]
  • 27.Cadez IV, Smyth P, McLachlan GJ, et al. Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach Learn 2002; 47: 7–34. [Google Scholar]
  • 28.Hamdan H, Wu J. EM algorithm of spherical models for binned data. In: 2011 IEEE international symposium on signal processing and information technology (ISSPIT), Bilbao, Spain, 2011, pp. 99–105. [Google Scholar]
  • 29.Kottas A, Müller P, Quintana F. Nonparametric Bayesian modeling for multivariate ordinal data. J Comput Graph Stat 2005; 14: 610–625. [Google Scholar]
  • 30.DeYoreo M, Kottas A. Bayesian nonparametric modeling for multivariate ordinal regression. J Comput Graph Stat 2018; 27: 71–84. [Google Scholar]
  • 31.Miller JW, Harrison MT. Inconsistency of Pitman-Yor process mixtures for the number of components. J Mach Learn Res 2014; 15: 3333–3370. [Google Scholar]
  • 32.Miller JW, Harrison MT. Mixture models with a prior on the number of components. J Am Stat Assoc 2018; 113: 340–356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Frühwirth-Schnatter S, Malsiner-Walli G, Grün B. Generalized mixtures of finite mixtures and telescoping sampling. arXiv preprint arXiv:2005.09918, 2021.
  • 34.Rancoita PMV, Cugnata F, Gibertoni Cruz AL, et al. Validating a 14-drug microtitre plate containing bedaquiline and delamanid for large-scale research susceptibility testing of Mycobacterium tuberculosis. Antimicrob Agents Chemother 2018; 62: e00344-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Frühwirth-Schnatter S. Finite mixture and Markov switching models. Berlin, Germany: Springer Science & Business Media, 2006. [Google Scholar]
  • 36.Hjort NL, Holmes C, Müller P, et al. Bayesian Nonparametrics. 28. Cambridge, UK: Cambridge University Press, 2010. [Google Scholar]
  • 37.Diebolt J, Robert CP. Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B (Methodological) 1994; 56: 363–375. [Google Scholar]
  • 38.Albert JH, Chib S. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 1993; 88: 669–679. [Google Scholar]
  • 39.Robert C. The Bayesian choice: from decision-theoretic foundations to computational implementation. Berlin, Germany: Springer Science & Business Media, 2007. [Google Scholar]
  • 40.Richardson S, Green PJ. On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc: Ser B (statistical methodology) 1997; 59: 731–792. [Google Scholar]
  • 41.Rousseau J, Mengersen K. Asymptotic behaviour of the posterior distribution in overfitted mixture models. J R Stat Soc: Ser B (Statistical Methodology) 2011; 73: 689–710. [Google Scholar]
  • 42.Grazian C, Robert CP. Jeffreys priors for mixture estimation: properties and alternatives. Comput Stat Data Anal 2018; 121: 149–163. [Google Scholar]
  • 43.Grazian C, Villa C, Liseo B. On a loss-based prior for the number of components in mixture models. Stat Probab Lett 2020; 158: 108656. [Google Scholar]
  • 44.Jasra A, Holmes CC, Stephens DA. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat Sci 2005; 20: 50–67. [Google Scholar]
  • 45.Stephens M. Dealing with label switching in mixture models. J R Stat Soc: Ser B (Statistical Methodology) 2000; 62: 795–809. [Google Scholar]
  • 46.Celeux G. Bayesian inference for mixture: the label switching problem. In: Payne R and Green P (eds) Compstat. Heidelberg: Physica, 1998; pp. 227–232. [Google Scholar]
  • 47.Sperrin M, Jaki T, Wit E. Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat Comput 2010; 20: 357–366. [Google Scholar]
  • 48.Petzoldt T. antibioticR-package: Analysis of Antbiotic Resistance Data. https://rdrr.io/github/tpetzoldt/antibioticR/man/antibioticR-package.html, 2019.
  • 49.Walker TM, Kohl TA, Omar SV, et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis 2015; 15: 1193–1202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Marees AT, de Kluiver H, Stringer S, et al. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int J Methods Psychiatr Res 2018; 27: e1608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies. Nat Rev Method Primers 2021; 1: 59–00. [Google Scholar]
  • 52.Guo Q, Bi J, Lin Q, et al. Whole genome sequencing identifies novel mutations associated with bedaquiline resistance in Mycobacterium tuberculosis. Front Cell Infect Microbiol 2022; 12: 807095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Andres S, Merker M, Heyckendorf J, et al. Bedaquiline-resistant tuberculosis: dark clouds on the horizon. Am J Respir Crit Care Med 2020; 24(23), 201: 1564–1568. [DOI] [PubMed] [Google Scholar]
  • 54.Gómez-González PJ, Perdigao J, Gomes P, et al. Genetic diversity of candidate loci linked to Mycobacterium tuberculosis resistance to bedaquiline, delamanid and pretomanid. Nat Sci Rep 2021; 11: 19431. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Antonova AV, Gryadunov DA, Zimenkov DV. Molecular mechanisms of drug tolerance in Mycobacterium tuberculosis. Mol Biol (N.Y.) 2018; 52: 372–384. [DOI] [PubMed] [Google Scholar]
  • 56.Almeida D, Ioerger T, Tyagi S, et al. Mutations in pepQ confer low-level resistance to bedaquiline and clofazimine in Mycobacterium tuberculosis. Antimicrob Agents Chemother 2016; 60: 4590–4599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Lee M, Lee J, Carroll MW, et al. Linezolid for treatment of chronic extensively drug-resistant tuberculosis. N Engl J Med 2012; 367: 1508–1518. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hillemann D, Rüsch-Gerdes S, Richter E. In vitro-selected linezolid-resistant Mycobacterium tuberculosis mutants. Antimicrob Agents Chemother 2008; 52: 800. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Ismail NA, Omar SV, Joseph L, et al. Defining bedaquiline susceptibility, resistance, cross-resistance and associated genetic determinants: a retrospective cohort study. EBioMedicine 2018; 28: 136–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Xu J, Wang B, Hu M, et al. Primary clofazimine and bedaquiline resistance among isolates from patients with multidrug-resistant tuberculosis. Antimicrob Agents Chemother 2017; 61: e00239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhang S, Chen J, Cui P, et al. Identification of novel mutations associated with clofazimine resistance in Mycobacterium tuberculosis. J Antimicrob Chemother 2015; 70: 2507–2510. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Ismail N, Peters RP, Ismail NA, et al. Clofazimine exposure in vitro selects efflux pump mutants and bedaquiline resistance. Antimicrob Agents Chemother 2019; 63: e02141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hartkoorn RC, Uplekar S, Cole ST. Cross-resistance between clofazimine and bedaquiline through upregulation of MmpL5 in Mycobacterium tuberculosis. Antimicrob Agents Chemother 2014; 58: 2979–2981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mitarai S, Kato S, Ogata H, et al. Comprehensive multicenter evaluation of a new line probe assay kit for identification of Mycobacterium species and detection of drug-resistant Mycobacterium tuberculosis. J Clin Microbiol 2012; 50: 884–890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brossier F, Veziris N, Truffot-Pernot C, et al. Performance of the genotype MTBDR line probe assay for detection of resistance to rifampin and isoniazid in strains of Mycobacterium tuberculosis with low-and high-level resistance. J Clin Microbiol 2006; 44: 3659–3664. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Chen X, He G, Wang S, et al. Evaluation of whole-genome sequence method to diagnose resistance of 13 anti-tuberculosis drugs and characterize resistance genes in clinical multi-drug resistance Mycobacterium tuberculosis isolates from China. Front Microbiol 2019; 10: 1741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Singh P, Jamal S, Ahmed F, et al. Computational modeling and bioinformatic analyses of functional mutations in drug target genes in Mycobacterium tuberculosis. Comput Struct Biotechnol J 2021; 19: 2423–2446. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Kruh NA, Troudt J, Izzo A, et al. Portrait of a pathogen: the Mycobacterium tuberculosis proteome in vivo. PLoS ONE 2010; 5: e13938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Niemann S, Köser CU, Gagneux S, et al. Genomic diversity among drug sensitive and multidrug resistant isolates of Mycobacterium tuberculosis with identical DNA fingerprints. PLoS ONE 2009; 4: e7407. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Mortimer TD, Weber AM, Pepperell CS. Signatures of selection at drug resistance loci in Mycobacterium tuberculosis. MSystems 2018; 3: e00108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Payros D, Alonso H, Malaga W, et al. Rv0180c contributes to Mycobacterium tuberculosis cell shape and to infectivity in mice and macrophages. PLoS Pathog 2021; 17: e1010020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Cui T, Zeng J, He ZG. Anti-tuberculosis drug target discovery by targeting the higher in-degree proteins (HidPs) of the pathogen’s transcriptional network. J Tuberc 2018; 1: 1001. [Google Scholar]
  • 73.RyPTICConsortium C. Genome-wide association studies of global Mycobacterium tuberculosis resistance to 13 antimicrobials in 10, 228 genomes identify new resistance mechanisms. PLoS Biol 2022; 20: e3001755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Earle SG, Wu CH, Charlesworth J, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nature Microbiology 2016; 1: 1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Motiwala AS, Dai Y, Jones-López EC, et al. Mutations in extensively drug-resistant Mycobacterium tuberculosis that do not code for known drug-resistance mechanisms. J Infect Dis 2010; 201: 881–888. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Jain SK, Paul-Satyaseela M, Lamichhane G, et al. Mycobacterium tuberculosis invasion and traversal across an in vitro human blood–brain barrier as a pathogenic mechanism for central nervous system tuberculosis. J Infect Dis 2006; 193: 1287–1295. [DOI] [PubMed] [Google Scholar]
  • 77.Li AHL. Identification of virulence determinants of Mycobacterium tuberculosis via genetic comparisons of a virulent and an attenuated strain of Mycobacterium tuberculosis. Doctoral dissertation, University of British Columbia, 2008.
  • 78.Muzondiwa D. Exploring the evolution of drug resistance in Mycobacterium using whole genome sequencing data. Doctoral dissertation, University of Pretoria, 2019.
  • 79.Domenech P, Reed MB, Barry III CE. Contribution of the Mycobacterium tuberculosis MmpL protein family to virulence and drug resistance. Infect Immun 2005; 73: 3492–3501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Bhargavi G, Hassan S, Balaji S, et al. Protein–protein interaction of Rv0148 with Htdy and its predicted role towards drug resistance in Mycobacterium tuberculosis. BMC Microbiol 2020; 20: 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Klopper M, Heupink TH, Hill-Cawthorne G, et al. A landscape of genomic alterations at the root of a near-untreatable tuberculosis epidemic. BMC Med 2020; 18: 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Sharma D, Bisht D, Khan AU. Potential alternative strategy against drug resistant tuberculosis: a proteomics prospect. Proteomes 2018; 6: 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Perdigão J, Silva C, Maltez F, et al. Emergence of multidrug-resistant Mycobacterium tuberculosis of the Beijing lineage in Portugal and Guinea-Bissau: a snapshot of moving clones by whole-genome sequencing. Emerg Microbes Infect 2020; 9: 1342–1353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Jouet A, Gaudin C, Badalato N, et al. Deep amplicon sequencing for culture-free prediction of susceptibility or resistance to 13 anti-tuberculous drugs. Eur Respir J 2021; 57: 2002338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Phelan J. A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data. Doctoral dissertation, London School of Hygiene & Tropical Medicine, 2018.
  • 86.Napier G, Khan AS, Jabbar A, et al. Characterisation of drug-resistant Mycobacterium tuberculosis mutations and transmission in Pakistan. Nature Scientific Reports 2022; 12: 7703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.DeBarber AE, Mdluli K, Bosman M, et al. Ethionamide activation and sensitivity in multidrug-resistant Mycobacterium tuberculosis. Proc Natl Acad Sci USA 2000; 97: 9677–9682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Morlock GP, Metchock B, Sikes D, et al. ethA, inhA, and katG loci of ethionamide-resistant clinical Mycobacterium tuberculosis isolates. Antimicrob Agents Chemother 2003; 47: 3799–3805. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Brossier F, Veziris N, Truffot-Pernot C, et al. Molecular investigation of resistance to the antituberculous drug ethionamide in multidrug-resistant clinical isolates of Mycobacterium tuberculosis. Antimicrob Agents Chemother 2011; 55: 355–360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Machado D, Perdigão J, Ramos J, et al. High-level resistance to isoniazid and ethionamide in multidrug-resistant Mycobacterium tuberculosis of the Lisboa family is associated with inhA double mutations. J Antimicrob Chemother 2013; 68: 1728–1732. [DOI] [PubMed] [Google Scholar]
  • 91.Cardoso RF, Cooksey RC, Morlock GP, et al. Screening and characterization of mutations in isoniazid-resistant Mycobacterium tuberculosis isolates obtained in Brazil. Antimicrob Agents Chemother 2004; 48: 3373–3381. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Hazbón MH, Brimacombe M, Bobadilla del Valle M, et al. Population genetics study of isoniazid resistance mutations and evolution of multidrug-resistant Mycobacterium tuberculosis. Antimicrob Agents Chemother 2006; 50: 2640–2649. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Jeeves RE, Marriott AA, Pullan ST, et al. Mycobacterium tuberculosis is resistant to isoniazid at a slow growth rate by single nucleotide polymorphisms in katG codon Ser315. PLoS ONE 2015; 10: e0138253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Hameed HA, Tan Y, Islam MM, et al. Phenotypic and genotypic characterization of levofloxacin-and moxifloxacin-resistant Mycobacterium tuberculosis clinical isolates in southern China. J Thorac Dis 2019; 11: 4613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Li J, Yang T, Hong C, et al. Whole-genome sequencing for resistance level prediction in multidrug-resistant tuberculosis. Microbiol Spectr 2022; 10: e02714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Farhat MR, Sixsmith J, Calderon R, et al. Rifampicin and rifabutin resistance in 1003 Mycobacterium tuberculosis clinical isolates. J Antimicrob Chemother 2019; 74: 1477–1483. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sj-pdf-1-smm-10.1177_09622802231211010 - Supplemental material for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacteriumtuberculosis resistance mutations

Supplemental material, sj-pdf-1-smm-10.1177_09622802231211010 for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacteriumtuberculosis resistance mutations by Clara Grazian in Statistical Methods in Medical Research


Articles from Statistical Methods in Medical Research are provided here courtesy of SAGE Publications

RESOURCES