Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2023 May 6;13:7396. doi: 10.1038/s41598-023-34438-8

A novel microbe-drug association prediction model based on stacked autoencoder with multi-head attention mechanism

Liu Fan 1,2, Lei Wang 2,3,, Xianyou Zhu 1,
PMCID: PMC10164153  PMID: 37149692

Abstract

Microbes are intimately tied to the occurrence of various diseases that cause serious hazards to human health, and play an essential role in drug discovery, clinical application, and drug quality control. In this manuscript, we put forward a novel prediction model named MDASAE based on a stacked autoencoder (SAE) with multi-head attention mechanism to infer potential microbe-drug associations. In MDASAE, we first constructed three kinds of microbe-related and drug-related similarity matrices based on known microbe-disease-drug associations respectively. And then, we fed two kinds of microbe-related and drug-related similarity matrices respectively into the SAE to learn node attribute features, and introduced a multi-head attention mechanism into the output layer of the SAE to enhance feature extraction. Thereafter, we further adopted the remaining microbe and drug similarity matrices to derive inter-node features by using the Restart Random Walk algorithm. After that, the node attribute features and inter-node features of microbes and drugs would be fused together to predict scores of possible associations between microbes and drugs. Finally, intensive comparison experiments and case studies based on different well-known public databases under 5-fold cross-validation and 10-fold cross-validation respectively, proved that MDASAE can effectively predict the potential microbe-drug associations.

Subject terms: Computational biology and bioinformatics, Genetics, Mathematics and computing

Introduction

Microbe colony are mainly composed of bacteria, viruses, etc.1. It normally survives on various human tissues, providing protection against pathogens, and can harmonize the homeostasis of the body's internal environment and regulate the pathology of the gastrointestinal tract to promote the body’s metabolic capacity2,3. Furthermore, ecological dysbiosis or imbalance of microbes may also lead to other diseases in the human host. For example, there are several pathways through which an imbalance of intestinal bacteria in the human body will lead to the risk of hypertension4. It is thus clear that the microbe is important to human health and many microbes presenting in the human organism can regulate host physiology and disease development5,6.

In recent years, as the variety of drugs investigated by the medical field increases, the resistance of microbes is becoming more and more intense7. Previous research in the pharmaceutical industry has involved culturing some microbe species under greenhouse conditions and subsequently using them in drugs8. However, this process is usually time- and money-consuming. This urgently requires researchers to adopt new computational methods to discover potential relationships between microbes and drugs, thus contributing to drug development assays and human disease diagnosis.

Until now, a great number of researchers have constructed a series of microbe and drug association databases in order to calculate potential links between microbes and drugs. For instance, Sun et al.9 established the MDAD, which is a database consisting of 5505 associations between 180 microbes and 1388 drugs. Rajput et al.10 concluded a database called aBiofilm, in which the resistance of microbes to drugs is recorded and biological, chemical, and structural details of 5027 antimicrobial film agents were contained as well. Moreover, Andersen et al. built a dataset called DrugVirus, which includes 1281 associations between 118 compound drugs and 83 human viruses11. Based on above databases, many computational models have been proposed successively to infer potential microbe-drug associations in the past few years. For example, Zhu et al. designed a computational method HMDAKATZ based on the KATZ measure to predict latent microbe-drug associations12. Long et al.13 proposed a method called EGATMDA to predict associations between microbes and drugs by using a graph convolutional network with node-level attention to learn embeddings of nodes and graph-level attention to learn the importance of different graphs. In 2020, Long et al. introduced a calculation method named GCNMDA based on the Graph Convolutional Network and Conditional Random Field with an attention mechanism to detect latent microbe-drug associations14. Deng et al.15 devised a method called Graph2MDA in 2021, which can predict potential associations of microbes with drugs by constructing a multimodal property graph as input to a variational graph autoencoder to learn information about each node and the whole graph. In 2022, Tan et al. constructed a model named GSAMDA based on a graph attention network and sparse autoencoder to compute microbe-drug correlations16. In 2023, Ma applied two heterogeneous microbe-drug networks as inputs of a graph attention network to learn feature representations of microbes and drugs, and then used a convolutional neural network classifier to obtain potential microbe-drug associations17. Predicting associations between biological entities is one of the fundamental tasks in the field of bioinformatics, and it is not only in microbe-drug association prediction that researchers have achieved excellent performance, there are also many splendid methods worth learning in areas like microbe-disease association prediction, circRNA-disease association prediction, predicting interactions between molecules and miRNA, and so on. For example, in 2022, Chen proposed a method called MATHNMDA18, which based on heterogeneous network and metapath aggregated graph neural network to predict microbe-disease associations. Peng et al. designed a model called GATCL2CD19, built a heterogeneous network by computing multiple similarities between circRNA and diseases, and proposed a feature convolution learning with heterogeneous graph attention network to predict circRNA-disease associations. In addition, Peng et al. used a deep autoencoder to obtain potential feature representations of each small molecule-miRNA pair as well as a scalable tree boosting model to predict potential associations with them20.

Despite the fact that above models have performed reliably in some aspects, there are certain limitations to them. With the rapid development of deep learning techniques in the last few years, numerous techniques become available to extract data features. In this paper, we present a new approach named MDASAE to infer potential microbe-drug associations based on a stacked autoencoder (SAE) with a multi-head attention mechanism. In MDASAE, we first adopt the restart random walk algorithm to learn inter-node features for microbes and drugs respectively based on the Gaussian kernel similarity. And then, we apply SAE with multi-head attention to extract node attribute features for microbes and drugs based on multiple similarity metrics in an unsupervised manner. Ultimately, we fuse these microbe- and drug-related features together to estimate association scores of different microbe-drug pairs. Besides, in order to evaluate the prediction performance of MDASAE, intensive comparison experiments are done based on two different well-known public databases, and experimental results show that MDASAE outperforms representative competitive methods, which means that it is practical and effective to apply multi-head attention mechanism to the stacked autoencoder for prediction of latent microbe-drug associations.

Materials and methods

As shown in Fig. 1. MDASAE is comprised of three main components. Among them, the first part is the preparation of data source including downloading known drug-microbe associations, microbe-disease associations, and drug-disease associations from well-known public databases. The second part is the construction of multi-view correlation matrices, including the microbe similarity matrix and the drug similarity matrix, based on different similarity measures, some of which will be used as inputs to a SAE with multi-head attention to learn node attribute features for microbes and drugs separately, while the remaining of which will be utilized to learn inter-node features for microbes and drugs respectively by adopting the RWR. In the third part, these newly-obtained node attribute features and inter-node features of microbes and drugs will be integrated together to obtain the final predicted scores of microbe-drug associations.

Figure 1.

Figure 1

Flowchart of MDASAE.

Data sources

In this section, we download known microbe-drug associations from the MDAD (http://www.chengroup.cumt.edu.cn/MDAD/) first, which contains 5505 known microbe-drug associations between 1388 drugs and 180 microorganisms collected from 993 papers. Based on the de-duplication operations proposed by Wang et al.21, we finally obtain 2470 known microbe-drug associations between 173 microbes and 1373 drugs. And then, after removing duplicate microbe-drug associations, we further download 2884 known microbe-drug associations between 1720 drugs and 140 microbes from the aBiofilm (http://bioinfo.imtech.res.in/manojk/abiofilm/) for validation. The detailed data of the datasets download from above two well-known public databases are shown in Table 1. And furthermore, for convenience, we have kept all newly-downloaded datasets of diseases, drugs, microbes, drug-disease associations, drug-drug interactions, microbe-drug associations, microbe-disease associations and microbe-microbe interactions in Supplementary Information 1–8 separately.

Table 1.

Statistics of two well-known public databases including MDAD and aBiofilm.

Datasets Microbes Drugs Associations
MDAD 173 1373 2470
aBiofilm 140 1720 2884

Methods

Based on newly-downloaded known microbe-drug association data from the datasets, let Nr and Nm denote the numbers of different drugs and microbes in the downloaded datasets, then it is obvious that we can build a microbe-drug adjacency matrix SRNr×Nm as follows: If and only if there is a known association between any given drug ri and microbe mj, we define that there is S(i,j)=1, otherwise we define that there is S(i,j)=0.

Construction of the drug similarity network

Firstly, for any two given drugs ri and rj, let Rs(i) and Rs(j) represent the i-th row and j-th row of S respectively, then we can calculate score of the Gaussian kernel similarity Grri,rj between ri and rj is as follows:

Grri,rj=exp-μRsi-Rsj2 1

Here, μ denotes the standard nuclear bandwidth, which can be calculated as follows:

μ=1/1Nri=1NrRsi2 2

Next, we can further obtain score of the drug Cosine similarity Crri,rj between ri and rj is as follows:

Crri,rj=Rsi·RsjRsi×Rsj 3

Finally, based on the chemical structural information existing between drugs, for any two given drugs ri and rj, we will estimate score of structural similarity Srri,rj between them by adopting the method of SIMCOMP2 proposed by Hattori et al.22.

Construction of the microbe similarity network

In a similar way, for any given microbes mi and mj, let Cs(i) and Cs(j) represent the i-th column and j-th column of S separately, then we can first calculate score of the Gaussian kernel similarity Gmmi,mj between mi and mj as follows:

Gmmi,mj=exp(-γCs(i)-Cs(j)2) 4
γ=1/1Nmi=1NmCsi2 5

And then, we can further calculate score of the Cosine similarity between mi and mj as follows:

Cmmi,mj=Cs(i)·Cs(j)Cs(i)×Cs(j) 6

Moreover, we will further obtain score of the function similarity Fmmi,mj between mi and mj based on the method proposed by Kamneva et al.23.

Feature extraction for drugs and microbes based on RWR

Due to the imbalance between positive and negative sample data, it may contain noise in the newly-obtained microbe and drug Gaussian kernel similarities. Inspired by the method of NTSHMDA24, in this section, we will further apply the algorithm of Restart Random Walk (RWR) to derive inter-node features for microbes and drugs respectively based on the Gaussian kernel similarity, since RWR has been effectively utilized in miRNA-disease association prediction25, lncRNA-disease association prediction26, and target gene identification27 respectively. Here, the RWR adopted in MDASAE is defined as follows28:

qil+1=λMqil+(1-λ)ei 7

where λ is the restart probability, which will be set to 0.1 in MDASAE, and M is the transition probability matrix. Besides, eiR(1×m) is the original probability vector of node i in the microbe or drug Gaussian kernel similarity network, which is defined as follows:

ei,j=1ifi=j0otherwise 8

Based on above methods, it is easy to see that we can obtain a drug probability profile matrix DR and a microbe probability profile matrix MR eventually.

Learning attribute features for microbes and drugs based on SAE

Stacked autoencoder (SAE) is composed of stacked layers with several Autoencoders (AEs) that mainly consist of three layers such as the input layer, the hidden layer, and the output layer. In order to learn the attribute features between nodes, SAE is first pre-trained in an unsupervised manner, and then a supervised method is employed to fine-tune the parameters in the SAE. To be specific, SAE could learn the most important attributes of the input data, and reconstruct the input data in the output layer through encoding and decoding. Generally, the stage of mapping the input data to the hidden layer through a nonlinear activation function is called encoding, and the mapping of the hidden layer to the output layer is called decoding. In this section, in order to extract the attribute features of drugs and microbes more efficiently, we will adopt Cr and Sr as the input of SAE respectively to learn the attribute features of drugs, and Cm and Fm as the input of SAE respectively to learn the attribute features of microbes.

For convenience, let X denote the input of SAE, then the output of the hidden layer in the encoding process of SAE can be defined as follows:

Y=σ(WX+b) 9

where σ, W and b represent the activation function, the weight matrix and the bias of the hidden layer in the encoding phase of SAE respectively.

And additionally, in the decoding process, the input X will be reconstructed according to the following formula:

X=σWY+b 10

where W and b denote the weight matrix and bias of the hidden layer in the decoding phase of SAE separately.

Considering that the input X may not be fully reconstructed in the decoding phase, we further add a multi-head attention mechanism in SAE to capture critical features and improve the efficiency and accuracy of the attribute feature extraction process.

Finally, we will introduce the Adam optimizer for training in SAE, which is more efficient than the traditional SGD optimizer. The Adam optimizer is calculated as follows:

LX,X=X-X2=X-σ(WσWX+b+b)2 11

Based on above methods, it is obvious that, by adopting SAE, we can obtain two different drug attribute feature matrices and two different microbe attribute feature matrices respectively. For simplicity, we define these two drug attribute feature matrices as SAEr1 and SAEr2, and these two microbe attribute feature matrices as SAEm1 and SAEm2, separately.

Predicting association scores of microbe-drug pairs

Firstly, we stack the drug probability profile matrix DR and the two different drug attribute feature matrices SAEr1 and SAEr2 horizontally. Meanwhile, in order to fuse more raw data information about drugs and to ensure a matched dimension of the integrated drug feature matrix, we also add an adjacency matrix S. It is easy to see that we can obtain an integrated drug feature matrix IFMr as follows:

IFMr=DR;S;SAEr1;S;SAEr2;S 12

And then, in a similar way, through combining the microbe probability profile matrix MR, the two different microbe attribute feature matrices SAEm1 and SAEm2 with the adjacency matrix S. we can obtain an integrated microbe feature matrix IFMm as follows:

IFMm=ST;MR;ST;SAEm1;ST;SAEm2 13

where IFMrRNr×((Nr+Nm)×3) and IFMmRNm×((Nm+Nr)×3).

Finally, in order to simulate the interaction of drugs and microbes, we can obtain the predicted scores of associations between them by adopting the inner product of IFMr and IFMm. In general, the multiplication of two vectors is a means of simulating the interaction of two different objects, which on the one hand weakens the information about their differences and on the other hand highlights the commonality of the interactions. The formula is as follows:

Score=Sigmoid(IFMr·IFMmT) 14

where IFMmT is the transposed matrix of IFMm and Sigmoid is an activation function.

Results

In this section, we will first analyze the effects of relevant parameters on the predictive performance of MDASAE for model optimization. And then, an ablation experiment will be employed to analyze the effect of the multi-head attention mechanism on the prediction performance MDASAE, followed by performance comparison between MDASAE and five state-of-the-art competitive prediction methods. Eventually, some specific drugs and microbes will be selected out as case studies to confirm the validity of MDASAE.

Hyperparameter sensitivity analysis

From above descriptions, it is easy to see that there are several key parameters in MDASAE such as the learning rate lr for model training, the number of attention mechanism heads h in SAE, as well as the neurons number n1 of the hidden layer and the number n2 of stacked AEs in SAE, and so on. In this section, we will employ the 5-fold cross-validation to evaluate the prediction performance MDASAE and adjust the values of parameters by observing the mean of AUCs achieved on MDAD. When implementing the 5-fold cross-validation, we will randomly divide those downloaded microbe-drug pairs into five equal subsets first, and then, each subset will be selected out in turn to serve as the test set, while the remaining four subsets serve as the training set.

During experiments, we first evaluated the effect of the learning rate lr on the prediction performance MDASAE while its value varies in the range of {0.0001, 0.0005, 0.001, 0.01, 0.1}. It can be seen from Fig. 2a that MDASAE achieved the highest AUC value while lr was set to 0.01. After that, we validated the effect of the number of attention mechanism heads h in SAE on the prediction performance MDASAE while the value of h changes in the range of {2,4,8,16}. From observing the Fig. 2b, it is obvious that the AUC values reached the maximum while h was set to 4. Finally, as for the neurons number n1 of the hidden layer and the number n2 of stacked AEs in SAE, we found that their values had little effect on the predictive performance of MDASAE through intensive experiments, therefore, we set n1 and n2 to their default values {128, 64, 32} and 3 respectively.

Figure 2.

Figure 2

Analysis of the effect of hyperparameters on prediction performance of MDASAE. (a) and (b) show respectively the AUC values achieved by MDASAE with different learning rates and head numbers in multi-head attention mechanism.

Analysis of the multi-head attention mechanism

In MDASAE, we incorporate a multi-head attention mechanism into SAE to help the model jointly focus on information from different representation subspaces at different positions29, which will be helpful for the model to capture fruitful feature information. In this section, ablation experiments will be performed based on MDAD and aBiofilm under the 5-fold cross-validation and the 10-fold cross-validation to evaluate the impact of the multi-head attention mechanism on the predictive performance of MDASAE. In the ablation experiment, we will compare the predictive performance between MDASAE (with the multi-head attention mechanism) and MDASAE W/O attention (without the multi-head attention mechanism). As shown in Fig. 3, it is easy to see that MDASAE can achieve higher AUC values when the attention mechanism is introduced.

Figure 3.

Figure 3

Effect of the multi-head attention mechanism on model performance of MDASAE. (a) and (b) show the AUC values achieved by MDASAE (with the multi-head attention mechanism) and MDASAE W/O attention (without the multi-head attention mechanism) under the 5-fold cross-validation and the 10-fold cross-validation on MDAD and aBiofilm, respectively.

Comparison with state-of-the-art predictive methods

In order to further validate the prediction performance of MDASAE, in this section, we will compare it with the following five representative competitive methods based on MDAD and aBiofilm separately:

  • LRLSHMDA30 adopted the Laplace regularized least squares classifier, a semi-supervised computational model, to predict potential microbe-disease associations.

  • HMDAKATZ12 proposed a KATZ measure-based calculation method to infer latent associations between microorganisms and drugs.

  • BIRWMP31 designed a relevance of multi-path based bi-random walk to detect possible microbe-disease associations.

  • NTSHMDA32 established a random walk algorithm to infer potential microbe-disease associations by integrating network topological similarity.

  • LAGCN33 combined embeddings from multiple graph convolutional layers with an attention mechanism to predict latent microbe-disease associations.

During experiments, we employed the AUC value as a performance metric while performing 5-fold CV and 10-fold CV for all competitive methods, and illustrated the comparison results in the following Tables 2, 3, and Fig. 4 respectively. Especially, to be fair, all these five competing methods are performed based on their original parameters in comparison experiments.

Table 2.

Comparison of AUCs achieved by MDASAE and 5 competitive methods under 5-fold CV and 10-fold CV based on MDAD.

Methods AUC (5-flod) AUC (10-flod)
LRLSHMDA 0.9259 ± 0.0031 0.9392 ± 0.0014
HMDAKATZ 0.8718 ± 0.0032 0.8928 ± 0.0024
BIRWMP 0.8140 ± 0.0049 0.8172 ± 0.0022
NTSHMDA 0.8495 ± 0.0028 0.8715 ± 0.0020
LAGCN 0.8544 ± 0.0042 0.8637 ± 0.0036
MDASAE (our model) 0.9665 ± 0.0016 0.9659 ± 0.0044

Significant values are in [bold].

Table 3.

Comparison of AUCs achieved by MDASAE and 5 competitive methods under 5-fold CV and 10-fold CV based on aBiofilm.

Methods AUC (5-flod) AUC (10-flod)
LRLSHMDA 0.9371 ± 0.0023 0.9520 ± 0.0007
HMDAKATZ 0.8988 ± 0.0034 0.9193 ± 0.0014
BIRWMP 0.8491 ± 0.0027 0.8486 ± 0.0026
NTSHMDA 0.8625 ± 0.0023 0.8845 ± 0.0015
LAGCN 0.8615 ± 0.0084 0.8722 ± 0.0074
MDASAE (our model) 0.9634 ± 0.0007 0.9636 ± 0.0003

Significant values are in [bold].

Figure 4.

Figure 4

ROC curves achieved by competitive methods based on the MDAD database. (a) and (b) show the comparison results under 5-fold CV and 10-fold CV respectively.

From observing Table 2, it is easy to see that MDASAE can achieve the highest AUC values of 0.9665 ± 0.0016 and 0.9659 ± 0.0044 under the 5-fold CV and 10-fold CV, respectively, followed by LRLSHMDA with AUC values of 0.9259 ± 0.0031 and 0.9392 ± 0.0014, whereas BIRWMP with the lowest AUC values.

To further evaluate the predictive performance of MDASAE, we conducted validation on the database of aBiofilm as well. As shown in Table 3, it is obvious that MDASAE can achieve an AUC value of 0.9634 ± 0.0007 and 0.9636 ± 0.0003 under 5-fold CV and 10-fold CV, respectively. Similarly, LRLSHMDA ranked second, and BIRWMP was the lowest.

From above descriptions, it is easy to know that MDASAE exhibits the best prediction performance and outperform all these state-of-the-art competing calculation models.

Case study

To further demonstrate the validity of MDASAE, we will perform case studies of two popular drugs (Pefloxacin and Ciprofloxacin) and a microbe (Mycobacterium tuberculosis) in this section. Among them, Pefloxacin is a fluorinated quinolone that has a broad spectrum of activity against a vast array of Gram-negative and Gram-positive bacteria34. In addition, it has been shown that pefloxacin penetrates into cells and is highly effective in the treatment of infections caused by intracellular pathogens35, and it is also able to inhibit the activity of a variety of bacteria. For example, El-Sukhon et al. verified the pefloxacin combination was synergistic against E. coli36, Juvin et al. demonstrated the in vivo bactericidal effect of pefloxacin in an experimental model of endocarditis in Serratia marcescens37, and Moin et al. proposed the ability to use pefloxacin as an alternative marker to detect the susceptibility of Salmonella enterica serotypes typhoid and paratyphoid to ciprofloxacin38. As shown in Table 4, among the top 20 predicted candidate microbes, there are 16 microbes having been confirmed to be associated with Pefloxacin by previously published literatures.

Table 4.

The top 20 predict Pefloxacin-associated microbes.

Microbe Evidence Microbe Evidence
Staphylococcus aureus PMID: 2940215 Vibrio harveyi Unconfirmed
Pseudomonas aeruginosa PMID: 1645509 Bacillus subtilis PMID: 12024980
Escherichia coli PMID: 14659660 Human immunodeficiency virus 1 PMID: 9495677
Candida albicans PMID: 11563831 Actinomyces oris Unconfirmed
Streptococcus mutans Unconfirmed Streptococcus sanguis PMID: 1666667
Staphylococcus epidermidis PMID: 2606159 Serratia marcescens PMID: 8031065
Staphylococcus epidermis PMID: 2640275 Clostridium perfringens PMID: 3162143
Haemophilus influenzae PMID: 2940213 Streptococcus pneumoniae PMID: 20384283
Salmonella enterica PMID: 31954032 Mycobacterium tuberculosis PMID: 1909062
Enterococcus faecalis PMID: 2258345 Candida glabrata Unconfirmed

The first column records top 10 microbes, while the third column records top 11–20 microbes.

In addition, Ciprofloxacin, which is one of the new generations of fluoroquinolone-containing drugs and is a potent and well-tolerated antibacterial drug39, has enormous potential for antibacterial activity against both Gram-positive and Gram-negative bacteria, as well as pefloxacin. For example, Rehman et al. mentioned that ciprofloxacin is frequently used for the treatment of various infections caused by the opportunistic pathogen Pseudomonas aeruginosa in their research on the resistance mechanism of ciprofloxacin40, Gould investigated the effect of ciprofloxacin to inhibit the activity of pneumococci41, and Gollapudi validated the effect of ciprofloxacin to inhibit TNF-α induced HIV expression in U1 cells42. As presented in Table 5, among the top 20 predicted candidate microbes, there are 19 microbes having been verified to be associated with Ciprofloxacin by available journals. Thus, it means that MDASAE is helpful for both the clinical application of drugs and prediction of potential drug-associated microbes.

Table 5.

The top 20 predict Ciprofloxacin-associated microbes.

Microbe Evidence Microbe Evidence
Escherichia coli PMID: 2325984 Salmonella enterica PMID: 32747937
Pseudomonas aeruginosa PMID: 30605076 Mycobacterium tuberculosis PMID: 30020039
Candida albicans PMID: 35404123 Vibrio harveyi PMID: 27247095
Staphylococcus aureus PMID: 35301951 Stenotrophomonas maltophilia PMID: 18510823
Streptococcus mutans PMID: 33402618 Human immunodeficiency virus 1 PMID: 9566552
Staphylococcus epidermidis PMID: 2327776 Proteus vulgaris PMID: 34638966
Staphylococcus epidermis PMID: 10632381 Serratia marcescens PMID: 2071875
Haemophilus influenzae PMID: 29655917 Actinomyces oris Unconfirmed
Bacillus subtilis PMID: 30758259 Clostridium perfringens PMID: 16701569
Enterococcus faecalis PMID: 23789048 Streptococcus pneumoniae PMID: 15155208

The first column records top 10 microbes, while the third column records top 11–20 microbes.

As far as microbes are concerned, Mycobacterium tuberculosis is a category of bacteria that is exclusively aerobic, and it is the primary reason for death due to a single source of infection as the causative agent of human tuberculosis43. And the pathogen is universally latent in the human body and can threaten human health any time. Researchers are also searching for various drugs to combat its resistance. For instance, Gaidukevich et al. showed that liposomes of the non-antibiotic levofloxacin containing phospholipid cardiolipin affected the growth of extensively drug resistant Mycobacterium tuberculosis44, and Wang et al.45 mentioned that ethambutol, as one of the first-line antituberculosis drugs, has a resistance rate of 17.2% against multi-drug resistant tuberculosis. In Table 6, all these top 20 Mycobacterium tuberculosis-associated candidate drugs predicted by MDASAE have been confirmed by published reports.

Table 6.

The top 20 predict mycobacterium tuberculosis-associated drugs.

Drug Evidence Drug Evidence
Ciprofloxacin PMID: 30020039 Capreomycin PMID: 21678479
Tobramycin PMID: 19723387 Viomycin PMID: 14799786
Pyrazinamide PMID: 34181476 Triclosan PMID: 19130456
Epigallocatechin Gallate PMID: 33463343 SQ109 PMID: 22258923
Limonene PMID: 29288759 Ceftazidime PMID: 32773662
Bedaquiline PMID: 33055230 Rifapentine PMID: 33856282
Isoniazid PMID: 33132303 Levofloxacin PMID: 30029913
Aminosalicylic Acid PMID: 26033719 Curcumin PMID: 23305394
Verapamil PMID: 30648892 Ethambutol PMID: 27806932
Colistin PMID: 26183185 Oxacillin PMID: 33109697

The first column records top 10 drugs, while the third column records top 11–20 drugs.

From above results of case studies, it is easy to see that MDASAE may be a promising tool for predicting potential associations between microbes and drugs in the future.

Discussion and conclusion

Researchers have shown that there is a mutually constraining and interdependent relationship between humans and microbes, and a substantial portion of microbes are closely related to human health. Normally, the probability of infectious diseases in the human body is minimal, and the microbes that reside in the human body are harmless, and they are even resistant to pathogens, thus, it also promotes research on the prediction of the association between microbes and drugs.

In this work, we proposed a novel prediction model MDASAE for inferring latent microbe-drug associations. In MDASAE, we first constructed different microbe similarity networks and drug similarity networks based on known linkage data information. And then, some of them were used as input to SAE to learn attribute features for microbes and drugs, and the remaining of them were combined with these newly-learned attribute features to predict scores of possible associations between microbes and drugs. Results of case study and intensive comparison experiments showed that MDASAE was superior to existing competitive state-of-the-art calculation methods, which indicated that MDASAE might be a promising tool for identifying latent microbe-drug associations, and had potential for new drug discovery and drug clinical treatment at the same time. Furthermore, we may be able to apply MDASAE to other association prediction issues between biological entities, such as microbe-disease association prediction and circRNA-disease association prediction, etc.

Although MDASAE has some advantages, there will still be some limitations as well. For instance, some of these microbes predicted to be related to specific drugs by MDASAE showed less evidence of association with each other. And additionally, SAE cannot stack too many AEs owing to the sparsity of the dataset, which tends to cause overfitting phenomena. In the future, we will consider data augmentation to solve this problem (“Supplementary information”).

Supplementary Information

Author contributions

L.F. and L.W. produced the main ideas, and did the modeling, computation and analysis and also wrote the manuscript. L.W. and X.Z. provided supervision and effective scientific advice and related ideas, research design guidance, and added value to the article through editing and contributing completions. All authors contributed to the article and approved the submitted version.

Funding

This work was partly sponsored by the National Natural Science Foundation of China (No.62272064), the Natural Science Foundation of Hunan Province (No.2022JJ50138) and the 14th Five-Year Plan" Key Disciplines and Application-oriented Special Disciplines of Hunan Province(Xiangjiaotong [2022] 351).

Data availability

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Lei Wang, Email: wanglei@xtu.edu.cn.

Xianyou Zhu, Email: zxy@hynu.edu.cn.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-023-34438-8.

References

  • 1.Dalal P, Sharma D. Microbe defines the efficacy of chemotherapeutic drug: A complete paradigm. FEMS Microbiol. Lett. 2021;368:fnab116. doi: 10.1093/femsle/fnab116. [DOI] [PubMed] [Google Scholar]
  • 2.Gill SR, et al. Metagenomic analysis of the human distal gut microbiome. Science. 2006;312:1355–1359. doi: 10.1126/science.1124234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ventura M, et al. Genome-scale analyses of health-promoting bacteria: probiogenomics. Nat. Rev. Microbiol. 2009;7:61–71. doi: 10.1038/nrmicro2047. [DOI] [PubMed] [Google Scholar]
  • 4.Xiong Y, et al. The role of gut microbiota in hypertension pathogenesis and the efficacy of antihypertensive drugs. Curr. Hypertens. Rep. 2021;23:40. doi: 10.1007/s11906-021-01157-2. [DOI] [PubMed] [Google Scholar]
  • 5.Dethlefsen L, McFall-Ngai M, Relman DA. An ecological and evolutionary perspective on human-microbe mutualism and disease. Nature. 2007;449:811–818. doi: 10.1038/nature06245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Petrosino JF. The microbiome in precision medicine: the way forward. Genome Med. 2018;10:12. doi: 10.1186/s13073-018-0525-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ramirez M, et al. Diverse drug-resistance mechanisms can emerge from drug-tolerant cancer persister cells. Nat. Commun. 2016;7:10690. doi: 10.1038/ncomms10690. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pammolli F, Magazzini L, Riccaboni M. The productivity crisis in pharmaceutical R&D. Nat. Rev. Drug Discov. 2011;10:428–438. doi: 10.1038/nrd3405. [DOI] [PubMed] [Google Scholar]
  • 9.Sun YZ, et al. MDAD: A special resource for microbe-drug associations. Front. Cell. Infect. Microbiol. 2018;8:424. doi: 10.3389/fcimb.2018.00424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Rajput A, Thakur A, Sharma S, Kumar M. aBiofilm: A resource of anti-biofilm agents and their potential implications in targeting antibiotic drug resistance. Nucleic Acids Res. 2018;46:D894–D900. doi: 10.1093/nar/gkx1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Andersen PI, et al. Discovery and development of safe-in-man broad-spectrum antiviral agents. Int. J. Infect. Dis. 2020;93:268–276. doi: 10.1016/j.ijid.2020.02.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhu, L., Duan, G., Yan, C. & Wang, J. Prediction of microbe-drug associations based on Katz measure. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Vol. 103 183–187. 10.1109/JPROC.2015.2404212 (2019).
  • 13.Long Y, et al. Ensembling graph attention networks for human microbe-drug association prediction. Bioinformatics. 2020;36:i779–i786. doi: 10.1093/bioinformatics/btaa891. [DOI] [PubMed] [Google Scholar]
  • 14.Long Y, Wu M, Kwoh CK, Luo J, Li X. Predicting human microbe-drug associations via graph convolutional network with conditional random field. Bioinformatics. 2020;36:4918–4927. doi: 10.1093/bioinformatics/btaa598. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Deng L, Huang Y, Liu X, Liu H. Graph2MDA: A multi-modal variational graph embedding model for predicting microbe-drug associations. Bioinformatics. 2022;38:1118–1125. doi: 10.1093/bioinformatics/btab792. [DOI] [PubMed] [Google Scholar]
  • 16.Tan Y, et al. GSAMDA: A computational model for predicting potential microbe-drug associations based on graph attention network and sparse autoencoder. BMC Bioinform. 2022;23:492. doi: 10.1186/s12859-022-05053-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ma Q, Tan Y, Wang L. GACNNMDA: A computational model for predicting potential human microbe-drug associations based on graph attention network and CNN-based classifier. BMC Bioinform. 2023;24:35. doi: 10.1186/s12859-023-05158-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen Y, Lei X. Metapath aggregated graph neural network and tripartite heterogeneous networks for microbe-disease prediction. Front. Microbiol. 2022;13:919380. doi: 10.3389/fmicb.2022.919380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Peng L, Yang C, Chen Y, Liu W. Predicting CircRNA-Disease associations via feature convolution learning with heterogeneous graph attention network. IEEE J. Biomed. Health Inform. 2023;100:200. doi: 10.1109/JBHI.2023.3260863. [DOI] [PubMed] [Google Scholar]
  • 20.Peng L, et al. DAESTB: Inferring associations of small molecule-miRNA via a scalable tree boosting model based on deep autoencoder. Brief. Bioinform. 2022;23:bbac478. doi: 10.1093/bib/bbac478. [DOI] [PubMed] [Google Scholar]
  • 21.Wang L, Tan Y, Yang X, Kuang L, Ping P. Review on predicting pairwise relationships between human microbes, drugs and diseases: From biological data to computational models. Brief. Bioinform. 2022;23:bbac080. doi: 10.1093/bib/bbac080. [DOI] [PubMed] [Google Scholar]
  • 22.Kamneva OK. Genome composition and phylogeny of microbes predict their co-occurrence in the environment. PLoS Comput. Biol. 2017;13:e1005366. doi: 10.1371/journal.pcbi.1005366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Hattori M, Tanaka N, Kanehisa M, Goto S. SIMCOMP/SUBCOMP: Chemical structure search servers for network analyses. Nucleic Acids Res. 2010;38:W652–W656. doi: 10.1093/nar/gkq367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Luo H, et al. Computational drug repositioning with random walk on a heterogeneous network. IEEE ACM Trans. Comput. Biol. Bioinform. 2019;16:1890–1900. doi: 10.1109/TCBB.2018.2832078. [DOI] [PubMed] [Google Scholar]
  • 25.Le DH, Verbeke L, Son LH, Chu DT, Pham VH. Random walks on mutual microRNA-target gene interaction network improve the prediction of disease-associated microRNAs. BMC Bioinform. 2017;18:479. doi: 10.1186/s12859-017-1924-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Chen X, You ZH, Yan GY, Gong DW. IRWRLDA: Improved random walk with restart for lncRNA-disease association prediction. Oncotarget. 2016;7:57919–57931. doi: 10.18632/oncotarget.11141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Zhu L, Su F, Xu Y, Zou Q. Network-based method for mining novel HPV infection related genes using random walk with restart algorithm. Biochim. Biophys. Acta Mol. Basis Dis. 2018;1864:2376–2383. doi: 10.1016/j.bbadis.2017.11.021. [DOI] [PubMed] [Google Scholar]
  • 28.Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 2008;82:949–958. doi: 10.1016/j.ajhg.2008.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Vaswani V, et al. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017 doi: 10.48550/arXiv.1706.03762. [DOI] [Google Scholar]
  • 30.Wang F, et al. LRLSHMDA: Laplacian regularized least squares for human microbe-disease association prediction. Sci. Rep. 2017;7:7601. doi: 10.1038/s41598-017-08127-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shen, X., Zhu, H., Jiang, X., Hu, X. & Yang, J. A novel approach based on bi-random walk to predict microbe-disease associations. In Intelligent Computing Methodologies: 14th International Conference, ICIC 2018, Wuhan, China, Vol. 10956 746–752. 10.1007/978-3-319-95957-3_78 (2018).
  • 32.Luo J, Long Y. NTSHMDA: Prediction of human microbe-disease association based on random walk by integrating network topological similarity. IEEE ACM Trans. Comput. Biol. Bioinform. 2020;17:1341–1351. doi: 10.1109/TCBB.2018.2883041. [DOI] [PubMed] [Google Scholar]
  • 33.Yu Z, Huang F, Zhao X, Xiao W, Zhang W. Predicting drug-disease associations through layer attention graph convolutional network. Brief. Bioinform. 2021;22:bbaa243. doi: 10.1093/bib/bbaa243. [DOI] [PubMed] [Google Scholar]
  • 34.Gonzalez JP, Henwood JM. Pefloxacin. A review of its antibacterial activity, pharmacokinetic properties and therapeutic use. Drugs. 1989;37:628–668. doi: 10.2165/00003495-198937050-00003. [DOI] [PubMed] [Google Scholar]
  • 35.Bressolle F, Gonçalves F, Gouby A, Galtier M. Pefloxacin clinical pharmacokinetics. Clin. Pharmacokinet. 1994;27:418–446. doi: 10.2165/00003088-199427060-00003. [DOI] [PubMed] [Google Scholar]
  • 36.El-Sukhon SN, Faiza Boukhatem Z. Activity of combinations of ceftazidime, imipenem and pefloxacin against Staphylococcus aureus, Escherichia coli and Pseudomonas aeruginosa. Int. J. Antimicrob. Agents. 2003;22:613–617. doi: 10.1016/s0924-8579(03)00127-4. [DOI] [PubMed] [Google Scholar]
  • 37.Juvin ME, et al. In vivo bactericidal activities of ciprofloxacin and pefloxacin in an experimental model of Serratia marcescens endocarditis. Antimicrob. Agents Chemother. 1994;38:883–885. doi: 10.1128/AAC.38.4.883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Moin S, Zeeshan M, Laiq S, Raheem A, Zafar A. Use of pefloxacin as a surrogate marker to detect ciprofloxacin susceptibility in Salmonella enterica serotypes Typhi and Paratyphi A. J. Pak. Med. Assoc. 2020;70:96–99. doi: 10.5455/JPMA.8635. [DOI] [PubMed] [Google Scholar]
  • 39.Campoli-Richards DM, et al. Ciprofloxacin. A review of its antibacterial activity, pharmacokinetic properties and therapeutic use. Drugs. 1988;35:373–447. doi: 10.2165/00003495-198835040-00003. [DOI] [PubMed] [Google Scholar]
  • 40.Rehman A, Patrick WM, Lamont IL. Mechanisms of ciprofloxacin resistance in Pseudomonas aeruginosa: New approaches to an old problem. J. Med. Microbiol. 2019;68:1–10. doi: 10.1099/jmm.0.000873. [DOI] [PubMed] [Google Scholar]
  • 41.Gould KA, Pan XS, Kerns RJ, Fisher LM. Ciprofloxacin dimers target gyrase in Streptococcus pneumoniae. Antimicrob. Agents Chemother. 2004;48:2108–2115. doi: 10.1128/AAC.48.6.2108-2115.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Gollapudi S, Kim CH, Roshanravan B, Gupta S. Ciprofloxacin inhibits activation of latent human immunodeficiency virus type 1 in chronically infected promonocytic U1 cells. AIDS Res. Hum. Retrovir. 1998;14:499–504. doi: 10.1089/aid.1998.14.499. [DOI] [PubMed] [Google Scholar]
  • 43.Koch A, Mizrahi V. Mycobacterium tuberculosis . Trends Microbiol. 2018;26:555–556. doi: 10.1016/j.tim.2018.02.012. [DOI] [PubMed] [Google Scholar]
  • 44.Gaidukevich SK, et al. Antibacterial effects of liposomes containing phospholipid cardiolipin and fluoroquinolone levofloxacin on Mycobacterium tuberculosis with extensive drug resistance. Bull. Exp. Biol. Med. 2016;160:675–678. doi: 10.1007/s10517-016-3247-z. [DOI] [PubMed] [Google Scholar]
  • 45.Wang T, Jiao WW, Shen AD. Progress on mechanism of ethambutol resistance in Mycobacterium tuberculosis. Yi Chuan = Hereditas. 2016;38:910–917. doi: 10.16288/j.yczz.16-111. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES