Multi-label classification for multi-drug resistance prediction of Escherichia coli

Yunxiao Ren; Trinad Chakraborty; Swapnil Doijad; Linda Falgenhauer; Jane Falgenhauer; Alexander Goesmann; Oliver Schwengers; Dominik Heider

doi:10.1016/j.csbj.2022.03.007

. 2022 Mar 10;20:1264–1270. doi: 10.1016/j.csbj.2022.03.007

Multi-label classification for multi-drug resistance prediction of Escherichia coli

Yunxiao Ren ^a, Trinad Chakraborty ^b,^c, Swapnil Doijad ^b,^c, Linda Falgenhauer ^c,^d,^e, Jane Falgenhauer ^b,^c, Alexander Goesmann ^c,^f, Oliver Schwengers ^c,^f, Dominik Heider ^a,^⁎

PMCID: PMC8918850 PMID: 35317240

Graphical abstract

Keywords: Multi-drug resistance, Machine learning, Multi-label classification

Abbreviations: AMR, Antimicrobial Resistance; MDR, Multi-Drug Resistance; MLC, Multi-Label Classification

Highlights

•
Multi-label classification (MLC) methods can be used to reliably predict multi-drug resistance in pathogens.
•
ECC model outperforms the other four MLC methods and can effectively predict MDR with high accuracy.
•
Our study paves the way for improving diagnostics of infections in patients.

Abstract

Antimicrobial resistance (AMR) is a global health and development threat. In particular, multi-drug resistance (MDR) is increasingly common in pathogenic bacteria. It has become a serious problem to public health, as MDR can lead to the failure of treatment of patients. MDR is typically the result of mutations and the accumulation of multiple resistance genes within a single cell. Machine learning methods have a wide range of applications for AMR prediction. However, these approaches typically focus on single drug resistance prediction and do not incorporate information on accumulating antimicrobial resistance traits over time. Thus, identifying multi-drug resistance simultaneously and rapidly remains an open challenge. In our study, we could demonstrate that multi-label classification (MLC) methods can be used to model multi-drug resistance in pathogens. Importantly, we found the ensemble of classifier chains (ECC) model achieves accurate MDR prediction and outperforms other MLC methods. Thus, our study extends the available tools for MDR prediction and paves the way for improving diagnostics of infections in patients. Furthermore, the MLC methods we introduced here would contribute to reducing the threat of antimicrobial resistance and related deaths in the future by improving the speed and accuracy of the identification of pathogens and resistance.

1. Introduction

Antimicrobial resistance (AMR) is rapidly increasing and is, therefore, one of the greatest threats to global health and also causes significant economic problems. According to WHO estimates, without countermeasures, up to 10 million deaths will be caused by AMR in the future, with immense costs to the healthcare system of approximately $100 trillion by 2050 [1]. In particular, infection due to multi-drug resistance (MDR) pathogens has become most threatening to public health, as MDR can lead to failure of treatment of patients [2], [3]. For instance, the emergence of MDR in Escherichia coli (E. coli) has become one of the global health concerns [4], [5], [6]. In general, bacteria are resistant to antibiotics by spontaneous mutations in existing genes or by the acquisition of extraneous genes [6], [7]. Many previous studies investigating AMR have focused on well-known resistance genes or mutations in well-known genes, such as mutations in the gyrA gene and parC gene in E. coli [8], [9]. However, there is a lack of AMR studies based on overall mutations without previous knowledge.

While antimicrobial susceptibility testing (AST) is widely used for AMR profiles in clinical practice, machine learning models have been shown to produce highly reliable predictions in a shorter turnaround time. Typically, these machine learning models combine sequencing data with antibiotic resistance databases with phenotypic information [10], [11]. For instance, Yang et al., [12] and Kouchaki et al., [13] used different machine learning algorithms, namely support vector machine (SVM), logistic regression (LR), and random forest (RF) to predict AMR from whole-genome sequencing data and achieved high accuracy prediction. Other approaches also included deep learning to predict new antibiotic drugs, AMR genes, and AMR peptides [14], [15], [16], [17], [18], [19], [20]. However, all of these studies are based on single drug resistance information and do not take into account the MDR information of the bacteria.

Multi-label classification (MLC) offers a potential solution for AMR prediction based on MDR information. Traditionally, multi-label problems are transformed into single-label problems [21]. For instance, the widely known binary relevance (BR) approach, is a simple and straightforward method that treats each label as an independent binary problem [22]. One of the limitations of the BR approach is that it does not take into account the dependencies between the labels [23]. Unlike BR, the classifier chain (CC) takes into account the correlation among labels and uses the predicted results from the previous classifiers as an additional input for the following classifier [24]. Obviously, the order of the CC affects the prediction accuracy. Thus, the ensemble of classifier chains (ECC) was proposed, which contains several CCs with different orders and can be applied to study the dependencies between labels [23], [24]. CCs and ECCs have been used for cross-resistance prediction in HIV based on protein sequences of the HIV-1 reverse transcriptase [25] and protease [26], however, it has never been used with genomic data and MDR of bacteria. Other multi-label approaches include the label powerset (LP) method, which considers the dependency among labels, and each label combination is considered as a class [21]. Random label space partitioning with label powerset (RD) method is another effective ensemble method, which is based on label powerset with a random subset of k labels [23], [24].

In our study, we gave the applications of MLC methods on multi-drug resistance prediction. We aimed at identifying secondary mutations that contribute to the resistance directly or indirectly, e.g., compensatory mutations. We did not include the known resistance genes. Our approach does not need any AMR expert knowledge and can also predict resistance even without knowing the resistance genes by identifying secondary mutations. The results demonstrated that the ECC model can significantly improve overall resistance prediction in bacteria compared to the other four MLC methods. MLC models will improve patient care, in particular the treatment of patients, reduce the threat of antimicrobial resistance and related deaths in the future, and improve the speed and accuracy of the identification of pathogens and resistance.

2. Materials and methods

2.1. Dataset

In our analysis, we used 987 whole-genome sequencing (WGS) data of E. coli strains with resistance information for four antibiotics, namely ciprofloxacin (CIP), cefotaxime (CTX), ceftazidime (CTZ), and gentamicin (GEN). These data were collected by our partner institution, the University of Giessen. The isolates were obtained from human and animal clinical samples. Antimicrobial susceptibility testing was performed using the VITEK® 2 system (bioMérieux, Nürtingen, Germany) and interpreted following EUCAST guidelines. DNA isolation and whole-genome sequencing was performed as described in Falgenhauer et al. [27].

In order to use MLC, the isolates need to be filtered for missing antibiotic resistance information. The final dataset with complete MDR information contains 809 E. coli strains (see Table 1). CIP is a fluoroquinolone and is widely used to treat infections with Gram-negative bacteria, e.g., gastroenteritis, respiratory tract infections, or urinary tract infections [28]. CTX and CTZ are broad-spectrum antibiotics from the class of cephalosporins and are widely used to treat infections of Gram-positive and Gram-negative bacteria, such as meningitis, pneumonia, urinary tract infections, sepsis, and gonorrhea [29], [30]. GEN is an aminoglycoside and is widely used to treat various infections of Gram-negative bacteria, including meningitis, pneumonia, urinary tract infections, and sepsis [31].

Table 1.

Overview of the dataset.

Antibiotics	CIP	CTX	CTZ	GEN
Resistant	366	358	276	188
Susceptible	443	451	533	621

Open in a new tab

2.2. Dataset pre-processing and encoding

The pre-processing step of raw WGS data refer to our previous study [20]. Briefly, we filtered bad quality reads by fastp (v0.23.2) software [32] and then mapped the clean reads to E. coli reference genome (E. coli K-12 strain. MG1655) through BWA-MEM with default parameters [33]. We called single nucleotide polymorphisms (SNPs) variants using bcftools (v1.14) via ‘call’ function with default parameters [34], [35]. We extracted reference alleles, variant alleles and their positions, and merged all isolates based on the position of reference alleles. We retained the alleles existing variant more than half in samples. Finally, we got an SNP matrix, where the rows represent the samples and columns are the variant alleles. We utilized one-hot encoding to transform the SNP matrix into a binary matrix for subsequent machine learning.

2.3. Multi-label classification

In the current study, we used BR, CC, ECC, LP, and RD for the multi-label classification of MDR in bacteria. BR is typically used as a baseline model to compare multi-label classification models. Let $L : = {λ_{1}, . . ., λ_{m}}$ with $m > 1$ be a finite set of class labels (here: resistance for the four antibiotics), and let $X$ be the instance space, i.e., the SNPs. The training set $S$ in MLC is then defined as $S : = {(x_{1}, y_{1}), . . ., (x_{n}, y_{n})}$ , generated independently and identically according to a probability distribution $P (X,)$ on $X \times Y$ . $Y$ is the set of possible label combinations, i.e., the powerset of L (Fig. 1A).

BR divides the dataset with L labels into L binary classification problems (Fig. 1B). Accordingly, we split the data into four binary classification problems, one for each antibiotic (CIP, CTX, CTZ, and GEN). In contrast, the CC approach links the L binary classifiers into a “chain” such that the output prediction of one classifier is used as an additional input for all subsequent classifiers, which overcomes the disadvantage of not considering dependencies between labels and captures possible dependencies between the labels (Fig. 1C). The performance of CC depends heavily on the order of the chain, thus, Read et al., [23] proposed the use of ECC, which aggregates several chains with different orders by majority vote (Fig. 1D). The LP approach transforms a multi-label problem into a single-label multi-class problem, which is trained on all unique label combinations found in the training data [36] (Fig. 1E). The RD method divides the label space into partitions of size k, trains an LP classifier per partition, and predicts the testing data by aggregating the result of all LP classifiers (Fig. 1F). It is important to note that any standard method for binary classification can be used in these multi-label approaches. In the current study, we evaluated RFs, LR, and SVMs for multi-label classification of MDR in bacteria.

2.4. Evaluation metrics

In MLC, the predictions for each instance are a collection of labels, and the performance of classifiers can be calculated through the average score of an evaluation metric or directly by comparing the scores for each class. In this study, we employed seven different metrics that are widely used to evaluate the performance of the classifiers including hamming loss, 0/1 loss, F-score, accuracy, precision, recall, and Jaccard similarity.

The Hamming loss and 0/1 loss are commonly used for the evaluation of MLC models [37]. For Hamming loss, it is defined as the fraction of labels that are incorrectly predicted. The 0/1 loss simply checks whether the complete label subset is predicted correctly or not, represented as the percentage of incorrectly predicted labels.

Accuracy is defined as the proportion of correct predictions, while precision is defined as the number of resistant samples divided by the overall number of samples that are predicted to be resistant. Recall (also called sensitivity) is defined as the number of correctly predicted resistant samples divided by the total number of resistant samples. The F-score can be calculated as the weighted average of precision and recall. Jaccard similarity indicates the overlap between the ground truth and the predictions, focusing on true positives and ignoring true negatives [38]. The classifiers were trained and evaluated based on five-times 5-fold cross-validation, which means the dataset is randomly divided into 5 equal sub-groups, and one of the groups is used as the test set and the rest are used as the training set. The model is trained on the training set and scored on the test set. Then the process is repeated until each unique group has been used as the test set. Statistical significance has been calculated based on the Wilcoxon signed-rank test and T-test.

3. Results

3.1. Performance of different MLC methods on RF base classifier

We firstly constructed five MLC models (BR, CC, ECC, LP, and RD) based on RF base classifier for MDR prediction of four antibiotics (CIP, CTX, CTZ, and GEN). We compared the performance by F-score, Precision and Recall, and Jaccard score. As shown in Fig. 2, the ECC model has the highest F-score, Precision and Recall, and Jaccard score for resistance prediction against four antibiotics. For instance, the ECC model reached a F-score, precision, recall, and Jaccard score on the CIP dataset of 0.93 ± 0.04, 0.94 ± 0.05, 0.98 ± 0.03, and 0.92 ± 0.06, respectively. Especially, the ECC model significantly outperformed the BR, CC, LP, and RD for predicting resistance against CIP, CTZ, and GEN based on the F-score metric. Moreover, we observed from the Recall metric that the performance of the ECC model is significantly better than other models, which represents the ECC model has a better sensitivity to detect resistant samples. Besides, the ECC model reached, in general, the highest accuracy, as well as, lowest hamming loss, and 0/1 loss for RF (Table 2). Taken together, our results indicated that the ECC models can significantly improve the prediction performance for MDR prediction in E. coli.

Fig. 2 — Performance of different MLC methods with RF base classifiers for resistance prediction for each antibiotic. (A) F-scores, (B) Precision, (C) Recall, and (D) Jaccard score of five MLC methods with RF base classifiers for predicting resistance against each antibiotic. ∗ p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ns: no significance.

Table 2.

Accuracy, hamming loss, and 0/1 loss of five MLC methods with RF base classifier for predicting resistance against four antibiotics. Mean ± standard deviations (significance label of p-value) are shown in table. The statistical significances were compared each group to all (base-mean). ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ns: no significance.

MLC	Accuracy	Hamming Loss	0/1 Loss
BR	0.51 ± 0.07 (ns)	0.20 ± 0.03 (ns)	0.49 ± 0.07 (ns)
CC	0.52 ± 0.07 (ns)	0.20 ± 0.04 (ns)	0.48 ± 0.06 (ns)
ECC	0.72 ± 0.13 (ns)	0.11 ± 0.05 (*)	0.28 ± 0.13 (ns)
LP	0.53 ± 0.08 (ns)	0.11 ± 0.05 (ns)	0.47 ± 0.08 (ns)
RD	0.51 ± 0.09 (ns)	0.21 ± 0.04 (ns)	0.49 ± 0.09 (ns)

Open in a new tab

3.2. Performance of different MLC methods on LR base classifier

We also compared the performance of the five MLC methods (BR, CC, ECC, LP, and RD) on the LR base classifier. We found the ECC model still got a higher F-score, precision, recall, and Jaccard score (Fig. 3), which showed the consistent performance of the ECC model on LR with RF base classifier. The results on F-score suggested that ECC model is significantly better than other models for CIP, CTZ, and GEN drug, reached 0.94 ± 0.04, 0.80 ± 0.15, and 0.64 ± 0.13 (p-value < 0.05). We also found a similar trend in recall results of the ECC model, and the ECC model achieved a higher sensitivity performance for MDR prediction. Moreover, ECC model significantly outperformed other four MLC methods on CIP and GEN drug based on recall results (0.98 ± 0.03, 0.87 ± 0.23, p-value < 0.05) and Jaccard score (0.89 ± 0.07, 0.48 ± 0.14, p-value < 0.05). As well, the ECC model got the highest accuracy, lowest hamming loss, and 0/1 loss on the LR base classifier (Table 3). These results demonstrated that the ECC model still has robust performance for MDR prediction.

Fig. 3 — Performance of different MLC methods with LR base classifiers for resistance prediction for each antibiotic. (A) F-scores, (B) Precision, (C) Recall, and (D) Jaccard score of five MLC methods with RF base classifiers for predicting resistance against each antibiotic. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ns: no significance.

Table 3.

Accuracy, hamming loss, and 0/1 loss of five MLC methods with LR base classifier for predicting resistance against four antibiotics. Mean ± standard deviations (significance label of p-value) are shown in table. The statistical significances were compared each group to all (base-mean). ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ns: no significance.

MLC	Accuracy	Hamming Loss	0/1 Loss
BR	0.45 ± 0.08 (ns)	0.24 ± 0.04 (ns)	0.55 ± 0.08 (ns)
CC	0.47 ± 0.08 (ns)	0.23 ± 0.04 (ns)	0.53 ± 0.08 (ns)
ECC	0.65 ± 0.11 (ns)	0.14 ± 0.05 (*)	0.35 ± 0.11 (ns)
LP	0.50 ± 0.08 (ns)	0.23 ± 0.04 (ns)	0.50 ± 0.08 (ns)
RD	0.47 ± 0.07 (ns)	0.24 ± 0.05 (ns)	0.53 ± 0.07 (ns)

Open in a new tab

3.3. Performance of different MLC methods on SVM base classifier

For SVM, the F-score of ECC model is significantly better than BR, CC, LP, and RD only for CIP (Fig. 4A) (F-scores of 0.93 ± 0.04, 0.86 ± 0.03, 0.86 ± 0.03, 0.88 ± 0.03, and 0.87 ± 0.04, respectively). There are, however, no significant differences between BR, CC, LP, and RD models. In comparison, CC, LP, and RD did not improve the precision or recall significantly, and in some cases even performed worse compared to the BR (Fig. 4B-C). For the CCs, this might be due to the known problem of error propagation [39]. We found the same conclusion from Jaccard score that the ECC model got better performance than the other four MLC methods, and the Jaccard score of the ECC ranged from 0.42 ± 0.18 for the drug GEN to 0.88 ± 0.07 for the drug CIP (Fig. 4D). Moreover, the ECC model based on the SVM base classifier reached consistent performance with the highest accuracy, lowest hamming loss, and 0/1 loss for RF (Table 4). In summary, the results based on the SVM classifier also demonstrated that the ECC models can significantly improve the prediction performance for MDR prediction in E. coli.

Table 4.

Accuracy, hamming loss, and 0/1 loss of five MLC methods with SVM base classifier for predicting resistance against four antibiotics. Mean ± standard deviations (significance label of p-value) are shown in table. The statistical significances were compared each group to all (base-mean). ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001, ns: no significance.

MLC	Accuracy	Hamming Loss	0/1 Loss
BR	0.37 ± 0.08 (ns)	0.28 ± 0.05 (ns)	0.63 ± 0.08 (ns)
CC	0.39 ± 0.08 (ns)	0.28 ± 0.05 (ns)	0.61 ± 0.08 (ns)
ECC	0.57 ± 0.12 (ns)	0.18 ± 0.07 (ns)	0.43 ± 0.12 (ns)
LP	0.47 ± 0.07 (ns)	0.24 ± 0.03 (ns)	0.53 ± 0.07 (ns)
RD	0.41 ± 0.09 (ns)	0.26 ± 0.05 (ns)	0.59 ± 0.09 (ns)

Open in a new tab

4. Discussion

In our study, we compared five MLC models (BR, CC, ECC, LP, and RD) based on three base classifiers (RF, LR, and SVM) for MDR predictions in E. coli and evaluated the performance with seven different metrics. Our results illustrated that the ECC model outperforms the other MLC methods and can effectively predict MDR.

The ECC multi-label classification model has a wide range of applications, e.g., for cancers, chronic diseases, and viruses. For instance, Zhou et al., [40] reported that the ECC performed best in the diagnosis of four diabetic complications. ECCs have also been used for cross-resistance prediction in viral infections, e.g., in HIV-1 [25], [26]. Here, we firstly applied ECC models on multi-label drug resistance prediction based on all mutations, which could contribute to improving the MDR prediction in other model organisms or poorly known organisms.

Our results also showed that ECC obtained the highest accuracy in all three base classifiers compared to the other four MLC methods, which indicates that the ECC model has good scalability, and can be combined with multiple base classifiers, such as neural networks. Among them, the ECC model based on RF base classifier performs best compared to LR and SVM, which is consistent with our previous research results [20].

The performance of five MLC methods on each drug is different. In general, all MLC methods performed well on CIP drug, and worse on GEN drug. The comparatively lower performance for GEN may be based on the fact that bacterial resistance to GEN is predominantly mediated by plasmids carrying the resistance genes. We focused here solely on chromosomal sequences of the bacteria and did not take into account the effect of alterations in other genetic components on the MDR, like the plasmids, transposons, and integrons [41], [42]. This is one of the limitations of our study. The other limitation in our study is our MLC models are built only on four drugs, and we should integrate more types of antibiotics to further investigate the MDR prediction in the future.

5. Conclusions

In summary, our study illustrates five MLC methods based on three base classifiers that achieved accurate MDR prediction. Our results suggest ECC is a promising MLC method for MDR identification, which could be used as a reference approach for clinical staff to improve the diagnostics and patient treatments and thus contribute to reducing the threat of antimicrobial resistance and related deaths in the future.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

We would like to thank de.NBI - German Network for Bioinformatics for providing cloud computing platform.

Funding

This work is financially supported by the German Federal Ministry of Education and Research (BMBF) under grant number 031L0209B (Deep-iAMR).

Author’s contributions

D. H. conceived and supervised the study; Y. R. analyzed the data and drafted the manuscript; S. D., L. F., and J. F. collected the raw sequencing data and the clinical data. O. S. preprocessed the sequencing data and clinical data. D. H., T. C., and A. G. revised the manuscript, and all authors read and approved the final manuscript.

Data availability

Source codes for data preparation and model training are provided at Github website https://github.com/YunxiaoRen/Multi_Label-Classification.

And the final SNP matrix datasets we used for model training in this paper are also available at https://github.com/YunxiaoRen/Multi_Label-Classification.

References

1.Naylor N.R., Atun R., Zhu N., et al. Estimating the burden of antimicrobial resistance: a systematic literature review. Antimicrob Resist Infect Control. 2018;7:58. doi: 10.1186/s13756-018-0336-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Obolski U., Dellus-Gur E., Stein G.Y., et al. Antibiotic cross-resistance in the lab and resistance co-occurrence in the clinic: Discrepancies and implications in E. coli. Infect Genet Evol. 2016;40:155–161. doi: 10.1016/j.meegid.2016.02.017. [DOI] [PubMed] [Google Scholar]
3.Vivas R., Barbosa A.A.T., Dolabela S.S., et al. Multidrug-resistant bacteria and alternative methods to control them: an overview. Microb Drug Resist. 2019;25:890–908. doi: 10.1089/mdr.2018.0319. [DOI] [PubMed] [Google Scholar]
4.Tanwar J., Das S., Fatima Z., et al. Multidrug resistance: an emerging crisis. Interdisc Perspect Infect Dis. 2014;2014:1–7. doi: 10.1155/2014/541340. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Magiorakos A.-P., Srinivasan A., Carey R.B., et al. Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clin Microbiol Infect. 2012;18:268–281. doi: 10.1111/j.1469-0691.2011.03570.x. [DOI] [PubMed] [Google Scholar]
6.Nikaido H. Multidrug resistance in bacteria. Annu. Rev. Biochem. 2009;78:119–146. doi: 10.1146/annurev.biochem.78.082907.145923. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Ramadan H., Soliman A.M., Hiott L.M., et al. Emergence of multidrug-resistant Escherichia coli producing CTX-M, MCR-1, and FosA in retail food from Egypt. Front. Cell. Infect. Microbiol. 2021;11 doi: 10.3389/fcimb.2021.681588. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Ramírez Castillo F.Y., Avelar González F.J., Garneau P., et al. Presence of multi-drug resistant pathogenic Escherichia coli in the San Pedro River located in the State of Aguascalientes, Mexico. Front Microbiol. 2013;4 doi: 10.3389/fmicb.2013.00147. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Cag Y, Caskurlu H, Fan Y, et al. Resistance mechanisms. Ann Transl Med 2016; 4:326–326. [DOI] [PMC free article] [PubMed]
10.Boolchandani M., D’Souza A.W., Dantas G. Sequencing-based methods and resources to study antimicrobial resistance. Nat Rev Genet. 2019;20:356–370. doi: 10.1038/s41576-019-0108-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Liu Z., Deng D., Lu H., et al. Evaluation of machine learning models for predicting antimicrobial resistance of Actinobacillus pleuropneumoniae from whole genome sequences. Front. Microbiol. 2020;11:48. doi: 10.3389/fmicb.2020.00048. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Yang Y., Niehaus K.E., Walker T.M., et al. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Bioinformatics. 2018;34:1666–1671. doi: 10.1093/bioinformatics/btx801. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Kouchaki S., Yang Y., Walker T.M., et al. Application of machine learning techniques to tuberculosis drug resistance analysis. Bioinformatics. 2019;35:2276–2282. doi: 10.1093/bioinformatics/bty949. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Radha M., Fonseca P., Moreau A., et al. A deep transfer learning approach for wearable sleep stage classification with photoplethysmography. NPJ Digit Med. 2021:4:135. doi: 10.1038/s41746-021-00510-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Arango-Argoty GA, Garner E, Pruden A, et al. DeepARG: A deep learning approach for predicting antibiotic resistance genes from metagenomic data. 2017. [DOI] [PMC free article] [PubMed]
16.Veltri D., Kamath U., Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics. 2018;34:2740–2747. doi: 10.1093/bioinformatics/bty179. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Her H.-L., Wu Y.-W. A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains. Bioinformatics. 2018;34:i89–i95. doi: 10.1093/bioinformatics/bty276. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kavvas E.S., Catoiu E., Mih N., et al. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat Commun. 2018;9:4306. doi: 10.1038/s41467-018-06634-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Khaledi A., Weimann A., Schniederjans M., et al. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Mol Med. 2020 doi: 10.15252/emmm.201910264. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ren Y., Chakraborty T., Doijad S., et al. Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning. Bioinformatics. 2021:btab681. doi: 10.1093/bioinformatics/btab681. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-label Data. Data Mining and Knowledge Discovery Handbook 2009; 667–685.
22.Rokach L., Schclar A., Itach E. Ensemble methods for multi-label classification. Expert Syst Appl. 2014;41:7507–7523. [Google Scholar]
23.Read J, Pfahringer B, Holmes G, et al. Classifier chains: A review and perspectives. JAIR 2021; 70:683–718.
24.Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification. 2011; 16
25.Heider D., Senge R., Cheng W., et al. Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics. 2013;29:1946–1952. doi: 10.1093/bioinformatics/btt331. [DOI] [PubMed] [Google Scholar]
26.Riemenschneider M., Senge R., Neumann U., et al. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Mining. 2016;9:10. doi: 10.1186/s13040-016-0089-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Falgenhauer L., Nordmann P., Imirzalioglu C., et al. Cross-border emergence of clonal lineages of ST38 Escherichia coli producing the OXA-48-like carbapenemase OXA-244 in Germany and Switzerland. Int J Antimicrob Agents. 2020;56 doi: 10.1016/j.ijantimicag.2020.106157. [DOI] [PubMed] [Google Scholar]
28.Heeb S., Fletcher M.P., Chhabra S.R., et al. Quinolones: from antibiotics to autoinducers. FEMS Microbiol Rev. 2011;35:247–274. doi: 10.1111/j.1574-6976.2010.00247.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Sharma M. Prevalence and antibiogram of Extended Spectrum β-Lactamase (ESBL) producing Gram negative bacilli and further molecular characterization of ESBL producing Escherichia coli and Klebsiella spp. JCDR. 2013 doi: 10.7860/JCDR/2013/6460.3462. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Gums J.G., Boatwright D.W., Camblin M., et al. Differences between ceftriaxone and cefotaxime: microbiological inconsistencies. Ann Pharmacother. 2008;42:71–79. doi: 10.1345/aph.1H620. [DOI] [PubMed] [Google Scholar]
31.Garneau-Tsodikova S., Labby K.J. Mechanisms of resistance to aminoglycoside antibiotics: overview and perspectives. Medchemcomm. 2016;7:11–27. doi: 10.1039/C5MD00344J. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Chen S., Zhou Y., Chen Y., et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Danecek P., Bonfield J.K., Liddle J., et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(giab008) doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Li H., Handsaker B., Wysoker A., et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Junior J.D.C., Faria E.R., Silva J.A., et al. Label powerset for multi-label data streams. Classification with Concept Drift. 2017;9 [Google Scholar]
37.Dembczyński K., Waegeman W., Cheng W., et al. Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. Mach Learn Knowl Disc Datab. 2010;6321:280–295. [Google Scholar]
38.Shikalgar NR. JIBCA: Jaccard index based clustering algorithm for mining online review. Int J Comput Appl 105:6.
39.Senge R, del Coz JJ, Hüllermeier E. On the problem of error propagation in classifier chains for multi-label classification. Data Anal Mach Learn Knowl Discov 2014; 163–170.
40.Zhou H., Beltrán J.F., Brito I.L. Functions predict horizontal gene transfer and the emergence of antibiotic resistance. Sci Adv. 2021;7:eabj5056. doi: 10.1126/sciadv.abj5056. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Alekshun M.N., Levy S.B. Molecular mechanisms of antibacterial multidrug resistance. Cell. 2007;128:1037–1050. doi: 10.1016/j.cell.2007.03.004. [DOI] [PubMed] [Google Scholar]
42.Karczmarczyk M., Abbott Y., Walsh C., et al. Characterization of multidrug-resistant Escherichia coli isolates from animals presenting at a university veterinary hospital. Appl Environ Microbiol. 2011;77:7104–7112. doi: 10.1128/AEM.00599-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Source codes for data preparation and model training are provided at Github website https://github.com/YunxiaoRen/Multi_Label-Classification.

And the final SNP matrix datasets we used for model training in this paper are also available at https://github.com/YunxiaoRen/Multi_Label-Classification.

[b0005] 1.Naylor N.R., Atun R., Zhu N., et al. Estimating the burden of antimicrobial resistance: a systematic literature review. Antimicrob Resist Infect Control. 2018;7:58. doi: 10.1186/s13756-018-0336-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0010] 2.Obolski U., Dellus-Gur E., Stein G.Y., et al. Antibiotic cross-resistance in the lab and resistance co-occurrence in the clinic: Discrepancies and implications in E. coli. Infect Genet Evol. 2016;40:155–161. doi: 10.1016/j.meegid.2016.02.017. [DOI] [PubMed] [Google Scholar]

[b0015] 3.Vivas R., Barbosa A.A.T., Dolabela S.S., et al. Multidrug-resistant bacteria and alternative methods to control them: an overview. Microb Drug Resist. 2019;25:890–908. doi: 10.1089/mdr.2018.0319. [DOI] [PubMed] [Google Scholar]

[b0020] 4.Tanwar J., Das S., Fatima Z., et al. Multidrug resistance: an emerging crisis. Interdisc Perspect Infect Dis. 2014;2014:1–7. doi: 10.1155/2014/541340. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0025] 5.Magiorakos A.-P., Srinivasan A., Carey R.B., et al. Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clin Microbiol Infect. 2012;18:268–281. doi: 10.1111/j.1469-0691.2011.03570.x. [DOI] [PubMed] [Google Scholar]

[b0030] 6.Nikaido H. Multidrug resistance in bacteria. Annu. Rev. Biochem. 2009;78:119–146. doi: 10.1146/annurev.biochem.78.082907.145923. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0035] 7.Ramadan H., Soliman A.M., Hiott L.M., et al. Emergence of multidrug-resistant Escherichia coli producing CTX-M, MCR-1, and FosA in retail food from Egypt. Front. Cell. Infect. Microbiol. 2021;11 doi: 10.3389/fcimb.2021.681588. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0040] 8.Ramírez Castillo F.Y., Avelar González F.J., Garneau P., et al. Presence of multi-drug resistant pathogenic Escherichia coli in the San Pedro River located in the State of Aguascalientes, Mexico. Front Microbiol. 2013;4 doi: 10.3389/fmicb.2013.00147. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0045] 9.Cag Y, Caskurlu H, Fan Y, et al. Resistance mechanisms. Ann Transl Med 2016; 4:326–326. [DOI] [PMC free article] [PubMed]

[b0050] 10.Boolchandani M., D’Souza A.W., Dantas G. Sequencing-based methods and resources to study antimicrobial resistance. Nat Rev Genet. 2019;20:356–370. doi: 10.1038/s41576-019-0108-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0055] 11.Liu Z., Deng D., Lu H., et al. Evaluation of machine learning models for predicting antimicrobial resistance of Actinobacillus pleuropneumoniae from whole genome sequences. Front. Microbiol. 2020;11:48. doi: 10.3389/fmicb.2020.00048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0060] 12.Yang Y., Niehaus K.E., Walker T.M., et al. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Bioinformatics. 2018;34:1666–1671. doi: 10.1093/bioinformatics/btx801. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0065] 13.Kouchaki S., Yang Y., Walker T.M., et al. Application of machine learning techniques to tuberculosis drug resistance analysis. Bioinformatics. 2019;35:2276–2282. doi: 10.1093/bioinformatics/bty949. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0070] 14.Radha M., Fonseca P., Moreau A., et al. A deep transfer learning approach for wearable sleep stage classification with photoplethysmography. NPJ Digit Med. 2021:4:135. doi: 10.1038/s41746-021-00510-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0075] 15.Arango-Argoty GA, Garner E, Pruden A, et al. DeepARG: A deep learning approach for predicting antibiotic resistance genes from metagenomic data. 2017. [DOI] [PMC free article] [PubMed]

[b0080] 16.Veltri D., Kamath U., Shehu A. Deep learning improves antimicrobial peptide recognition. Bioinformatics. 2018;34:2740–2747. doi: 10.1093/bioinformatics/bty179. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0085] 17.Her H.-L., Wu Y.-W. A pan-genome-based machine learning approach for predicting antimicrobial resistance activities of the Escherichia coli strains. Bioinformatics. 2018;34:i89–i95. doi: 10.1093/bioinformatics/bty276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0090] 18.Kavvas E.S., Catoiu E., Mih N., et al. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat Commun. 2018;9:4306. doi: 10.1038/s41467-018-06634-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0095] 19.Khaledi A., Weimann A., Schniederjans M., et al. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Mol Med. 2020 doi: 10.15252/emmm.201910264. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0100] 20.Ren Y., Chakraborty T., Doijad S., et al. Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning. Bioinformatics. 2021:btab681. doi: 10.1093/bioinformatics/btab681. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0105] 21.Tsoumakas G, Katakis I, Vlahavas I. Mining Multi-label Data. Data Mining and Knowledge Discovery Handbook 2009; 667–685.

[b0110] 22.Rokach L., Schclar A., Itach E. Ensemble methods for multi-label classification. Expert Syst Appl. 2014;41:7507–7523. [Google Scholar]

[b0115] 23.Read J, Pfahringer B, Holmes G, et al. Classifier chains: A review and perspectives. JAIR 2021; 70:683–718.

[b0120] 24.Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification. 2011; 16

[b0125] 25.Heider D., Senge R., Cheng W., et al. Multilabel classification for exploiting cross-resistance information in HIV-1 drug resistance prediction. Bioinformatics. 2013;29:1946–1952. doi: 10.1093/bioinformatics/btt331. [DOI] [PubMed] [Google Scholar]

[b0130] 26.Riemenschneider M., Senge R., Neumann U., et al. Exploiting HIV-1 protease and reverse transcriptase cross-resistance information for improved drug resistance prediction by means of multi-label classification. BioData Mining. 2016;9:10. doi: 10.1186/s13040-016-0089-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0135] 27.Falgenhauer L., Nordmann P., Imirzalioglu C., et al. Cross-border emergence of clonal lineages of ST38 Escherichia coli producing the OXA-48-like carbapenemase OXA-244 in Germany and Switzerland. Int J Antimicrob Agents. 2020;56 doi: 10.1016/j.ijantimicag.2020.106157. [DOI] [PubMed] [Google Scholar]

[b0140] 28.Heeb S., Fletcher M.P., Chhabra S.R., et al. Quinolones: from antibiotics to autoinducers. FEMS Microbiol Rev. 2011;35:247–274. doi: 10.1111/j.1574-6976.2010.00247.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0145] 29.Sharma M. Prevalence and antibiogram of Extended Spectrum β-Lactamase (ESBL) producing Gram negative bacilli and further molecular characterization of ESBL producing Escherichia coli and Klebsiella spp. JCDR. 2013 doi: 10.7860/JCDR/2013/6460.3462. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0150] 30.Gums J.G., Boatwright D.W., Camblin M., et al. Differences between ceftriaxone and cefotaxime: microbiological inconsistencies. Ann Pharmacother. 2008;42:71–79. doi: 10.1345/aph.1H620. [DOI] [PubMed] [Google Scholar]

[b0155] 31.Garneau-Tsodikova S., Labby K.J. Mechanisms of resistance to aminoglycoside antibiotics: overview and perspectives. Medchemcomm. 2016;7:11–27. doi: 10.1039/C5MD00344J. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0160] 32.Chen S., Zhou Y., Chen Y., et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0165] 33.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0170] 34.Danecek P., Bonfield J.K., Liddle J., et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10(giab008) doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0175] 35.Li H., Handsaker B., Wysoker A., et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0180] 36.Junior J.D.C., Faria E.R., Silva J.A., et al. Label powerset for multi-label data streams. Classification with Concept Drift. 2017;9 [Google Scholar]

[b0185] 37.Dembczyński K., Waegeman W., Cheng W., et al. Regret analysis for performance metrics in multi-label classification: the case of hamming and subset zero-one loss. Mach Learn Knowl Disc Datab. 2010;6321:280–295. [Google Scholar]

[b0190] 38.Shikalgar NR. JIBCA: Jaccard index based clustering algorithm for mining online review. Int J Comput Appl 105:6.

[b0195] 39.Senge R, del Coz JJ, Hüllermeier E. On the problem of error propagation in classifier chains for multi-label classification. Data Anal Mach Learn Knowl Discov 2014; 163–170.

[b0200] 40.Zhou H., Beltrán J.F., Brito I.L. Functions predict horizontal gene transfer and the emergence of antibiotic resistance. Sci Adv. 2021;7:eabj5056. doi: 10.1126/sciadv.abj5056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b0205] 41.Alekshun M.N., Levy S.B. Molecular mechanisms of antibacterial multidrug resistance. Cell. 2007;128:1037–1050. doi: 10.1016/j.cell.2007.03.004. [DOI] [PubMed] [Google Scholar]

[b0210] 42.Karczmarczyk M., Abbott Y., Walsh C., et al. Characterization of multidrug-resistant Escherichia coli isolates from animals presenting at a university veterinary hospital. Appl Environ Microbiol. 2011;77:7104–7112. doi: 10.1128/AEM.00599-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Multi-label classification for multi-drug resistance prediction of Escherichia coli

Yunxiao Ren

Trinad Chakraborty

Swapnil Doijad

Linda Falgenhauer

Jane Falgenhauer

Alexander Goesmann

Oliver Schwengers

Dominik Heider

Graphical abstract

Highlights

Abstract

1. Introduction

2. Materials and methods

2.1. Dataset

Table 1.

2.2. Dataset pre-processing and encoding

2.3. Multi-label classification

Fig. 1.

2.4. Evaluation metrics

3. Results

3.1. Performance of different MLC methods on RF base classifier

Fig. 2.

Table 2.

3.2. Performance of different MLC methods on LR base classifier

Fig. 3.

Table 3.

3.3. Performance of different MLC methods on SVM base classifier

Fig. 4.

Table 4.

4. Discussion

5. Conclusions

Declaration of Competing Interest

Acknowledgments

Acknowledgments

Funding

Author’s contributions

Data availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases