Identification of Drug-Disease Associations Using a Random Walk with Restart Method and Supervised Learning

Xiaoqing Liu; Wenjing Yi; Baohang Xi; Qi Dai

doi:10.1155/2022/7035634

. 2022 Oct 10;2022:7035634. doi: 10.1155/2022/7035634

Identification of Drug-Disease Associations Using a Random Walk with Restart Method and Supervised Learning

Xiaoqing Liu ¹, Wenjing Yi ², Baohang Xi ², Qi Dai ^2,^✉

PMCID: PMC9576438 PMID: 36262874

Abstract

Drug-disease correlations play an important role in revealing the mechanism of disease, finding new indications of available drugs, or drug repositioning. A variety of computational approaches were proposed to find drug-disease correlations and achieve good performances. However, these methods used a variety of network information, but integrated networks were rarely used. In addition, the role of known drug-disease association data has not been fully played. In this work, we designed a combination algorithm of random walk and supervised learning to find the drug-disease correlations. We used an integrated network to update the model and selected a gene set as the start of random walk based on the known drug-disease correlations data. The experimental results show that the proposed method can effectively find the correlation between drugs and diseases, and the prediction accuracy is 82.7%. We found that there are 8 pairs of drug-disease relationships that have not yet been reported, and 5 of them have pharmacodynamic effects on Parkinson's disease. We also found that a key linkage between Parkinson's disease and phenylhexol, a drug for the treatment of Parkinson's disease α-synuclein and tau protein, provides a useful exploration for the effectiveness of the treatment of Parkinson's disease.

1. Introduction

With the prevalence of complex diseases, the existing drugs are far from meeting the needs of human beings to fight against diseases. At the same time, due to the rising cost of drug research and development, long research and development cycle, large difference in research and development success rate, and high loss rate of new drugs, the research and development of innovative drugs has become a major challenge in the medical field.

At present, reusing compounds that have reduced risk to treat common or rare diseases has become a popular means of drug research and development. This strategy is called drug repositioning or drug reuse. This method not only reduces the overall development cost but also shortens a large amount of research and development time [1–3]. Through drug repositioning, pharmaceutical companies have achieved many successes, such as Pfizer's Viagra for erectile dysfunction [4] and Celgene's thalidomide for severe nodular leprosy erythema [5].

With the rapid expansion of large-scale genome, transcriptome and proteome data, computational drug repositioning study has emerged as one of the leading methods. Huang et al. developed a new drug repositioning pipeline to analyze four lung cancer microarray datasets, enrich biological processes, potential therapeutic drugs, and target genes for the treatment of non-small-cell lung cancer (NSCLC) [6]. They integrated two methods: machine learning algorithm and classification based on topological parameters. Zheng et al. designed a weighted ensemble similarity (WES) algorithm which provides a new perspective for drug repositioning and discovery [7]. Wang et al. integrated two drug transfer methods and proposed a new method for drug repositioning [8]. Cheng et al. [9] integrated the integration of chemical, gene, and disease networks, inferred the chemical hazard profile, studied the exposure data gap, and fully considered the gene and disease network in the chemical safety assessment [10]. A large number of genetic and molecular biology studies have shown that diseases reflect the interaction of multiple molecular components on a certain level [11–14]. Therefore, drug repositioning study should consider the interaction between different disease-related genes [15–18]. Luo et al. found the potential indications of a given drug based on some comprehensive similarity measures and Bi-Random walk (BiRW) [19]. Yu et al. inferred the correlation between drugs and diseases by studying the characteristics of known protein complexes [20]. PREDICT (PREdicting Drug IndiCaTions) considers that similar drugs are suitable for similar diseases; the prediction task is achieved by designing similarity measures between multiple drugs and diseases [21].

The above method was successfully applied to drug-disease association study and achieved good performance. However, these methods have used a variety of network information, but the integrated network is still less used. With the increase of the related data of known drug diseases, a supervised learning method should be designed to further improve the drug-disease association research by using the related data of known drug diseases. In this paper, we used an integrated network consisting of HPRD, BioGRID, STRING, and other databases. Unlike previous network-based studies, which used the random walk method with restart on the network, we updated the model using the known data of the relationship between drugs and diseases and selected a gene set as the starting point of random walk, thus realizing the supervised learning of random walk with restart method. We also evaluated the performance of the proposed methods in various diseases and analyzed their GO and KEGG function enrichment.

2. Datasets and Methods

2.1. Protein–Protein Interaction (PPI) Network

Human protein–protein interaction (PPI) network is selected, which has been compiled by Jörg et al. that contained experimentally documented human physical interactions from TRANSFAC, IntAct39, MINT40, BioGRID41, HPRD42, KEGG43, BIGG44, CORUM45, PhosphoSitePlus46, and a large scale signaling network47. We used the largest connected component of the interaction in our analysis, consisting of 141,150 interactions between 13,329 proteins. Entrez Gene IDs were used to map disease-associated genes to the corresponding proteins in the interaction. The interaction and disease-gene association data is provided as a supplementary data set in Menche et al. [22]

2.2. Disease and Disease-Gene Data

Medical Subject Headings (MeSH) is an authoritative thesaurus compiled by the National Medical Library of the United States [23]. The disease subject words in the vocabulary provided by MeSH have perfect vocabulary classification. Our disease data and drug data are derived from Menche et al. [22], which integrate some genetic disease-related genes from the human Mendelian inheritance in man (OMIM: Online Mendelian Inheritance in Man) and trait gene association data from GWAS central. Through the medical topic title Ontology (MeSH) [24], the disease names of different disease nomenclature are combined into one name.

We screened diseases containing at least 20 disease-related genes from 1489 diseases in MeSH. In this paper, we considered at least 20 disease-related genes in order to understand the role of related genes in the interaction network, rather than the occurrence of disease due to the mutation of a gene. Finally, 299 diseases and their 3173 genes were obtained. In the process of disease screening, we required at least one drug for each disease. By searching the DrugBank database, the drug information that can treat 79 diseases corresponding to FDA approval is obtained, and Metab2Mesh is used for text mining [25]. If the text mining results indicate that there is a strong correlation between disease and drug, we added the relationship between the drug and disease into the known data set.

2.3. Drug and Drug-Target Data

DrugBank is a comprehensive drug information database, which not only includes the information of drug structure, drug target, and drug action mechanism but also integrates the information of drug experiment and clinical research. DrugBank has strong retrievable ability, coupled with its convenient web visualization function, which provides researchers with powerful convenience in drug research and development, drug mechanism exploration, and so on. DrugBank 5.0 contains information about 10971 drugs and 4900 protein targets, including 2391 FDA approved small molecule drugs, 934 approved biotechnology drugs, 109 nutritional drugs, and more than 5090 experimental drugs. We collected the drug and drug-target information certified by the FDA from DrugBank, and then searched for the strong literature evidence of drug-early-warning-disease association through Metab2Mesh, and finally obtained 238 drugs that can treat corresponding diseases.

2.4. Random Walk with Restart Method

PPI network can be expressed as G = (V, E), where V denotes protein and E stands for protein–protein interaction. The n∗n adjacency matrix A is used to represent the PPI network, where n is the total number of the proteins. If there is interaction between protein i and protein j, A_{[i, j]} is 1, otherwise it is 0. We then normalized the adjacency matrix A:

\begin{matrix} A_{[i, j]}^{'} = \frac{A_{[i, j]}}{\sum_{k = 1}^{n} A_{[k, j]}} . \end{matrix}

(1)

Random walk is used to find potential gene association data of diseases or drugs. When the random walk converges, the probability of a disease or drug at each point of the PPI network can be obtained. The relationship between drugs and diseases can be calculated based on the correlation between the probability distribution of diseases and drugs.

Random walk starts with a set of seed genes. The initial vector of seed genes is defined as follows:

\begin{matrix} P_{0} = {[ψ_{1}, ψ_{2}, \dots, ψ_{n}]}^{T} . \end{matrix}

(2)

For a disease, we listed all the drugs that can treat it, incorporate all the genes of these drugs into the relevant genes of the disease, and took the combined gene set as the seed gene of the disease. Among them, the genes directly related to the disease are defined as

\begin{matrix} P_{{dis}_{dir}} = [ψ_{{dis}_{{dir}_{1}}}, ψ_{{dis}_{{dir}_{2}}}, \dots, ψ_{{dis}_{{dir}_{n}}}],^{T} \end{matrix}

(3)

where the disease-related genes ψ_{dis_{dir_i}} will be set to 1, otherwise it will be set to 0. Then P_{dis_dir} is normalized as

\begin{matrix} P_{{dis}_{{dir}_{k}}}^{‘} = \frac{P_{{dis}_{{dir}_{k}}}}{\sum_{k = 1}^{n} P_{{dis}_{{dir}_{k}}}} . \end{matrix}

(4)

Suppose there are m drugs that can treat the same disease, they are represented as P_{dis_drug₁}, P_{dis_drug₂}, ⋯, and P_{dis_{drug_m}}

\begin{matrix} P_{{dis}_{{drug}_{m}}} = {[ψ_{{dis}_{{drug}_{m_{1}}}}, ψ_{{dis}_{{drug}_{m_{2}}}}, \dots, ψ_{{dis}_{{drug}_{m_{n}}}}]}^{T} . \end{matrix}

(5)

Sum all drugs for a disease:

\begin{matrix} P_{di s_{drug}} = \sum_{k = 1}^{m} P_{di s_{{drug}_{k}}} . \end{matrix}

(6)

Then we normalized P_{dis_drug} as

\begin{matrix} P_{di s_{{drug}_{k}}}^{‘} = \frac{P_{di s_{{drug}_{k}}}}{\sum_{k = 1}^{n} P_{di s_{{drug}_{k}}}} . \end{matrix}

(7)

Finally we got its seed gene for a given disease,

\begin{matrix} P_{disease} = t P_{{dis}_{dir}}^{‘} + (1 - t) P_{{dis}_{drug}}^{‘}, \end{matrix}

(8)

where t is 0.5.

We also got the seed gene P_drug of a given drug following the same method. Start random walk and randomly access adjacent genes in each time scale (t⟶t + 1), the state probability P_t+1 at time t + 1is

\begin{matrix} P_{t + 1} = (1 - r) A^{'} P_{t} + r P_{0}, \end{matrix}

(9)

where P₀ is the initial vector, P_t is the probabilities at time t, and r is the restart probability. If the difference between P_t and P_t+1 is less than 1e − 6, it is considered that the process will reach a stable state. After reaching the stable state, the correlation between drugs and diseases, drugs and drugs, and diseases and diseases is calculated according to the probability of drugs and diseases accessing each node on the network.

2.5. Supervised Learning

Cross-validation is a frequently used model validation technology. It divides the known data into two subsets, adds the data of one subset to the model training, and verifies the model with the remaining subset to evaluate the performance of the model in unknown data. For example, when using k-fold cross validation, the known data set needs to be randomly divided into k parts. In each cross-validation, k-1 data is selected to be added to the model training, and the remaining data is used for validation. Repeat for k times and select one piece of data for verification each time until each piece of data is tested.

The goal of cross-validation is to test the prediction ability of the model in new data, and it can also reflect the problem of overfitting or selection bias. In this paper, the idea of this method is used for supervised learning of random walk. For a certain disease, all drugs that can treat the disease in the data set are listed, and the genes associated with these drugs are incorporated into the relevant genes of the disease, and the combined gene set is used as the start of random walk. Needles are treated in the same way as drugs. In this paper, 403 known drug-disease associations between 78 diseases and 238 drugs were randomly divided into 10 parts. Nine of the disease and drug association data were selected to update the model, and the updated model was used to process the other data, so as to achieve the effect of supervised learning.

2.6. Evaluation Method

Receiver operating characteristic (ROC) curve is a curve based on the true positive rate (TPR) and false positive ratio (FPR) under various threshold settings. Area under the curve of ROC, also known as AUC value, can well reflect the performance of the classifier. The value of AUC varies between 0 and 1. When the AUC value is equal to 0.5, it means that the classifier cannot work. The larger the AUC value, the better the performance of the classifier. When the AUC value is 1, the classifier can produce perfect results.

3. Results

3.1. Performance Evaluation of the Random Walk with Restart Method Based on Supervised Learning

In order to evaluate the effectiveness of the proposed method, we first took the known drug-disease association as an independent validation data set. According to the relevant genes of 78 diseases and the drug targets of 238 drugs, the correlation information between diseases and drugs was obtained through restart random walk on PPI network. According to the ranking of drug-disease information pair by correlation, the AUC value was calculated. Three PPI networks BioGrid, HPRD, and STRING were independently verified, and their AUC results were 0.64, 0.52, and 0.66.

In order to further explore the efficiency of methods in different diseases, MeSH was used to classify all diseases. There are also some diseases in the classification that belong to a variety of disease types, such as colorectal tumors, which belong to C04 tumor diseases and C06 digestive system diseases. For the above case, we only calculated the average AUC value according to one of them. The AUC value was calculated on the basis of PPI network and optimal parameters. The classification results of various diseases are shown in Table 1.

Table 1.

Average AUC value of various diseases based on the random walk with restart method and three PPI networks BioGrid, HPRD, and STRING.

Disease classification based on MeSH	Number of diseases	Average AUC
Viral diseases C02	2	0.70467
Tumor C04	16	0.764632
Musculoskeletal diseases C05	5	0.748488
Digestive system diseases C06	9	0.760698
Respiratory diseases C08	2	0.68288
Nervous system diseases C10	9	0.62232
Eye diseases C11	2	0.83772
Male genitourinary system C12	1	0.81407
Cardiovascular disease C14	11	0.66685
Blood and lymphatic system C15	4	0.876637
Skin and connective tissue diseases C17	5	0.675556
Nutritional and metabolic diseases C18	6	0.734407
Endocrine system diseases C19	2	0.87446
Immune system diseases C20	4	0.62869

Open in a new tab

From Table 1, it is easy to note that the performance of the random walk with the restart method is different among various diseases. It achieves good performance in the diseases of blood and lymphatic system C15, endocrine system diseases C19, eye diseases C11, and Male genitourinary system C12, with AUC values above 0.8. The highest is blood and lymphatic system C15, with an AUC value of 0.877. The AUC value of nervous system diseases is low, only 0.62.

In order to further verify the efficiency of the random walk with the restart method and supervised learning, we randomly divided all known drug-disease relationships into ten parts, nine pieces of data are used as the training set and the other is used to calculate AUC. For a certain disease, we listed all the drugs that can treat the disease in the known training set, and then integrated all the related genes of these drugs into the related genes of the disease, and took the combined gene set as the start of random walk [26]. For drugs, the same method is used; that is, the relevant genes of diseases that can be treated by a drug in the training set were combined into the target information of the drug. Ten AUC values were obtained for each experiment. In order to reduce random interference, the above experiment was repeated 10 times, and a total of 100 AUC values were obtained, as shown in Figure 1.

The AUC distribution of the random walk with restart method and supervised learning.

The results show that the average value of 100 AUC values is 0.827, indicating that the proposed method found the relationship between drugs and diseases. With the help of the training data of the known network relationship between drugs and diseases, the prediction sensitivity of drugs and diseases was further improved. Adding the target information of drugs that can treat a disease to the disease-gene information will indirectly add some potential disease-gene information, making the disease-gene information more abundant. Similarly, adding the genes of all diseases that can be treated by a drug to the target information of drugs can also enrich the information of drug-action targets and make the relationship between drugs and diseases more discovered, thus improving the prediction of drug sensitivity.

3.2. Analysis of the Relationship between Drugs and Diseases

In this work, disease-related genes were taken as the starting point of random walk on one side, and the target genes of drugs were taken as the starting point of random walk on the other side. Through the restart random walk on the whole PPI network, the relationship between each disease and each drug on the PPI network was obtained, and their correlation coefficient was further calculated. We got 18564 group correlations of 78 diseases and 238 drugs. According to their correlation coefficients, 61 pairs of disease drugs with a correlation degree of more than 0.8 are found, of which 53 diseases and drugs have been confirmed by research, and 8 pairs belong to unknown drug-disease relationship. The relevant information of 8 pairs of diseases and drugs is shown in Table 2.

Table 2.

The relevant information of eight diseases and eight drugs.

Disease	Drug	Pearson
Parkinsonian disorders	Apomorphine	0.876
Parkinsonian disorders	Cabergoline	0.876
Bone diseases metabolic	Calcitriol	0.841
Parkinsonian disorders	Bromocriptine	0.840
Leukemia lymphoid	Mitoxantrone	0.834
Hematologic diseases	Methylprednisolone	0.811
Parkinsonian disorders	Rotigotine	0.806
Autoimmune diseases	Prednisolone	0.806

Open in a new tab

Methylprednisolone (DB00959) can treat autoimmune diseases, but we found that methylprednisolone is strongly associated with hematological diseases. According to the definition of MeSH, blood diseases include blood tumors, bone marrow diseases, and other diseases. Methylprednisolone is a biological macromolecular drug, a steroid derivative, and also a glucocorticoid. It can affect the expression of some genes through the cell membrane, thus interfering with the inflammatory response, inhibiting humoral immune response, and has a strong anti-inflammatory effect. Bowen et al. found that high-dose methylprednisolone has a certain effect on patients with recurrent chronic lymphoblastic leukemia [27]. Yao et al. found that methylprednisolone inhibited Wnt signaling pathway by downregulating the expression of LEF-1 protein, and Wnt signaling pathway is highly related to recurrent chronic lymphoblastic leukemia [28].

Mitoxantrone (DB01204) is associated with non-Hodgkin's lymphoma (NHL) and multiple sclerosis (MS). We found that it is also strongly correlated with lymphoid leukemia [29, 30]. Mitoxantrone has significant benefits for tumor control and overall survival in patients with recurrent acute lymphoblastic leukemia.

Prednisolone (DB00860) is a typical steroid drug, which can treat a variety of diseases, including rheumatoid arthritis, asthma, allergies, psoriasis, and multiple sclerosis [31]. However, these diseases are all autoimmune diseases. Therefore, we also found that prednisolone has a strong connection with autoimmune diseases.

We also found that apomorphine (DB00714), cabergoline (DB00248), bromocriptine (DB01200), and rotigotine (DB05271) are related to Parkinson's disease. After querying DrugBank, we knew that these four drugs have therapeutic effects on Parkinson's disease, but they are not included in the known data set.

3.3. GO Function Enrichment Analysis

Eight reusable drugs were found in this work, five of which have pharmacodynamic effects on Parkinson's diseases. We further performed GO function enrichment analysis on disease-related genes of the disease before drug action. The results are shown in Figure 2(a). It is easy to note that genes are mainly enriched in functional modules such as chromosome breakage (GO: 0031052), upregulated cell migration (GO:0030335), and chain replacement (GO:0000732).

The distribution of the related genes of Parkinson's diseases. (a) GO function enrichment on the disease-related genes of Parkinson's disease before drug action; (b) GO function enrichment on disease-related genes of Parkinson's disease after drug action.

We then analyzed the related genes of Parkinson's diseases after drug action. The results are shown in Figure 2(b). The results show that the gene is enriched in the following functional modules, such as the regulation of exercise (GO:0040012), dopamine binding (GO:0035240), and serotonin binding (GO:0051378).

Before and after random walk, the GO enrichment module of Parkinson's disease has changed significantly. Before random walk, the main enrichment module of Parkinson's syndrome is related to gene expression and cell movement in cells, which may be related to the pathogenesis of Parkinson's disease. After random walk, the relevant genes of Parkinson's syndrome are mainly enriched in some neural transmission modules, which are closely related to the treatment of Parkinson's syndrome.

3.4. KEGG Pathway Analysis

We further analyzed the genes related to Parkinson's disease by KEGG pathway. The results are shown in Figure 3. Figure 3(a) shows that genes are mainly enriched in pancreatic secretion (hsa04972), PI3K Akt signaling pathway (hsa04151), and other pathways. After adding drug information and random walk, we conducted KEGG pathway analysis on relevant genes. The results show that the genes are mainly enriched in neural active ligand receptor interaction (hsa04080), calcium signaling pathway (hsa04020), serotonin receptor synapse (hsa04726), and dopamine receptor synapse (hsa04728).

KEGG pathway of the related genes of Parkinson's diseases. (a) KEGG pathway on the disease-related genes of Parkinson's disease before drug action; (b) KEGG pathway on disease-related genes of Parkinson's disease after drug action.

Before and after random walk, the KEGG pathway enrichment module of Parkinson's disease has changed significantly. The approximate change is similar to the result of GO enrichment analysis. Before random walk, the main enrichment pathways of Parkinson's syndrome are related to intracellular signaling pathways. After random walk, the relevant genes of Parkinson's syndrome are mainly enriched in some neural transmission pathways, which are closely related to the treatment of Parkinson's syndrome.

3.5. Key Gene Analysis

In order to further study key genes of Parkinson's disease, we studied the local relationship between Parkinson's disease and trihexyphenidyl, a drug that can treat Parkinson's disease and their related genes on the network (Figure 4). It can be seen from Figure 4 that the key genes of Parkinson's disease are α-synuclein (Gene ID: 6622) and tau protein (Gene ID:4137). α-Synuclein mainly exists at the synapse of the nerve cells and plays a key role in the transmission of neurotransmitters. Tau protein is a microtubule-associated protein that mainly exists in nerve cells. These two proteins are closely related to the pathogenesis of Parkinson's disease.

The gene network between Parkinson's disease and trihexyphenidyl.

4. Conclusion

With the prevalence of complex diseases, the existing drugs are far from meeting the needs of human beings to fight diseases. At the same time, due to the rising cost of drug research and development and the long research and development cycle, the research and development means of innovative drugs have become a major challenge in the medical field. In recent years, with the continuous enrichment of disease and drug databases, researchers have realized drug reuse through the correlation analysis of disease-related genes, drugs, and drug-target data. This is a new research and development idea in the field of pharmaceutical research and development, which reduces the research and development cost of innovative drugs and saves resources. Because most diseases are not single gene defects, they often involve the destruction of the coordination function between genes [32]. Therefore, we explored the relationship between drugs and diseases based on the biological function network. Using HPRD, BioGRID, STRING, and other databases; the protein–protein interaction (PPI) network was constructed. We designed a combination algorithm of random walk and supervised learning to predict the sensitivity of drugs. The accuracy of sensitivity prediction is 82.7%.

With the help of the proposed method, we found that 8 pairs of drug-disease relationships have not been reported, and 5 of them have pharmacodynamic effects on Parkinson's diseases. For Parkinson's disease, we found the changes of its functional modules by adding drug information and comparing before and after random walk, combined with the results of GO and KEGG function enrichment analysis. Using the network diagram of disease and drug-related genes after random walk, we found the key linkage between Parkinson's disease and phenylhexol, a drug for the treatment of Parkinson's disease α-synuclein and tau protein, which provide a useful exploration for the effectiveness of the treatment of Parkinson's disease.

Acknowledgments

The authors thank all the anonymous referees for their valuable suggestions and support. This work is supported by the National Natural Science Foundation of China (62172369); the Key Research and Development Plan of Zhejiang Province (2021C02039); and the Natural Science Foundation of Zhejiang Province (LY20F020016).

Data Availability

The data used to support the findings of this study are available from the Protein–Protein Interaction (PPI) Network: https://ppi-net.org

Conflicts of Interest

The authors declare no conflict of interest, financial or otherwise.

References

1.Joseph A. New drug development in the United States from 1963 to 1999. Clinical Pharmacology & Therapeutics . 2001;69(5):286–296. doi: 10.1067/mcp.2001.115132. [DOI] [PubMed] [Google Scholar]
2.Adams C. P., Brantner V. V. Estimating the cost of new drug development: is it really $802 million? Health Affairs . 2006;25(2):420–428. doi: 10.1377/hlthaff.25.2.420. [DOI] [PubMed] [Google Scholar]
3.Sleigh S. H., Barton C. L. Repurposing strategies for therapeutics. Pharmaceutical Medicine . 2010;24(3):151–159. doi: 10.1007/BF03256811. [DOI] [Google Scholar]
4.Novac N. Challenges and opportunities of drug repositioning. Trends in Pharmacological Sciences . 2013;34(5):267–272. doi: 10.1016/j.tips.2013.03.004. [DOI] [PubMed] [Google Scholar]
5.Walker S. L., Waters M. F. R., Lockwood D. N. J. The role of thalidomide in the management of erythema nodosum leprosum. Leprosy Review . 2007;78(3):197–215. doi: 10.47276/lr.78.3.197. [DOI] [PubMed] [Google Scholar]
6.Huang C. H., Chang P. M., Hsu C. W., Huang C. Y., Ng K. L. Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory. BMC Bioinformatics . 2016;32(17):2664–2671. doi: 10.1186/s12859-015-0845-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Zheng C., Guo Z., Huang C., et al. Large-scale direct targeting for drug repositioning and discovery. Scientific Reports . 2015;5, article 11970 doi: 10.1038/srep11970. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wang H., Gu Q., Wei J., Cao Z., Liu Q. Mining drug–disease relationships as a complement to medical genetics-based drug repositioning: where a recommendation system meets genome-wide association studies. Clinical Pharmacology and Therapeutics . 2015;97(5):451–454. doi: 10.1002/cpt.82. [DOI] [PubMed] [Google Scholar]
9.Cheng F., Li W., Zhou Y., et al. Prediction of human genes and diseases targeted by xenobiotics using predictive toxicogenomic-derived models (PTDMs) Molecular BioSystems . 2013;9(6):1316–1325. doi: 10.1039/c3mb25309k. [DOI] [PubMed] [Google Scholar]
10.Zhou J. P., Chen L., Guo Z. H. iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics . 2020;36(5):1391–1396. doi: 10.1093/bioinformatics/btz757. [DOI] [PubMed] [Google Scholar]
11.Schadt E. E. Molecular networks as sensors and drivers of common human diseases. Nature . 2018;461(7261):218–223. doi: 10.1038/nature08454. [DOI] [PubMed] [Google Scholar]
12.Califano A., Butte A. J., Friend S., Ideker T., Schadt E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature Genetics . 2012;44(8):841–847. doi: 10.1038/ng.2355. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zanzoni A., Soler-López M., Aloy P. A network medicine approach to human disease. 22nd IUBMB Congress/37th FEBS Congress . 2012;583(11):1759–1765. doi: 10.1016/j.febslet.2009.03.001. [DOI] [PubMed] [Google Scholar]
14.Ghiassian S. D. Network medicine: a network-based approach to human diseases. Dissertations & Theses-Gradworks . 2015;12(1):56–68. [Google Scholar]
15.Goh K., Cusick M. E., Valle D., Childs B., Vidal M., Barabási A. L. The human disease network. Proceedings of the National Academy of Sciences of the United States of America . 2007;104(21):8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Lage K., Møllgård K., Greenway S., et al. Dissecting spatio-temporal protein networks driving human heart development and related disorders. Molecular Systems Biology . 2010;6(1):381–381. doi: 10.1038/msb.2010.36. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Chuang H. Y., Lee E., Liu Y. T., Lee D., Ideker T. Network-based classification of breast cancer metastasis. Molecular Systems Biology . 2007;3:p. 140. doi: 10.1038/msb4100180. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Rolland T., Taşan M., Charloteaux B., et al. A proteome-scale map of the human interactome network. Cell Cambridge Ma . 2014;159(5):1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Luo H., Wang J., Li M., et al. Drug repositioning based on comprehensive similarity measures and bi-random walk algorithm. Bioinformatics . 2016;32(17):2664–2671. doi: 10.1093/bioinformatics/btw228. [DOI] [PubMed] [Google Scholar]
20.Yu L., Huang J., Ma Z., Zhang J., Zou Y., Gao L. Inferring drug-disease associations based on known protein complexes. BMC Medical Genomics . 2015;8(S2):p. 2. doi: 10.1186/1755-8794-8-S2-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gottlieb A., Stein G. Y., Ruppin E., Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology . 2011;7(1):p. 496. doi: 10.1038/msb.2011.26. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Menche J., Sharma A., Kitsak M., et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science . 2015;347(6224, article 1257601) doi: 10.1126/science.1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Peng S., You R., Wang H., Zhai C., Mamitsuka H., Zhu S. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics . 2016;32(12):i70–i79. doi: 10.1093/bioinformatics/btw294. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Jani S. D., Argraves G. L., Barth J. L., Argraves W. S. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms. BMC Bioinformatics . 2010;11(1):166–166. doi: 10.1186/1471-2105-11-166. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Sartor M. A., Ade A., Wright Z., et al. Metab2MeSH: annotating compounds with medical subject headings. Bioinformatics . 2012;28(10):1408–1410. doi: 10.1093/bioinformatics/bts156. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Pan X., Chen L., Liu, Niu Z., Huang T., Cai Y. D. Identifying protein subcellular locations with embeddings-based node2loc. IEEE/ACM Transactions on Computational Biology and Bioinformatics . 2021;19(1):1–675. doi: 10.1109/TCBB.2021.3080386. [DOI] [PubMed] [Google Scholar]
27.Bowen D. A., Call T. G., Jenkins G. D., et al. Methylprednisolone-rituximab is an effective salvage therapy for patients with relapsed chronic lymphocytic leukemia including those with unfavorable cytogenetic features. Leukemia & Lymphoma . 2007;48(12):2412–2417. doi: 10.1080/10428190701724801. [DOI] [PubMed] [Google Scholar]
28.Yao Q. M., Li P. P., Liang S. M., et al. Methylprednisolone suppresses the Wnt signaling pathway in chronic lymphocytic leukemia cell line MEC-1 regulated by LEF-1 expression. International Journal of Clinical and Experimental Pathology . 2015;8(7):7921–7928. [PMC free article] [PubMed] [Google Scholar]
29.Rosen P. J., Rankin C., Head D. R., et al. A phase II study of high dose ARA-C and mitoxantrone for treatment of relapsed or refractory adult acute lymphoblastic leukemia. Leukemia Research . 2000;24(3):183–187. doi: 10.1016/S0145-2126(99)00148-4. [DOI] [PubMed] [Google Scholar]
30.Parker C., Waters R., Leighton C., et al. Effect of mitoxantrone on outcome of children with first relapse of acute lymphoblastic leukaemia (ALL R3): an open-label randomised trial. The Lancet . 2010;376(9757):2009–2017. doi: 10.1016/S0140-6736(10)62002-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Yang Y., Chen L. Identification of drug-disease associations by using multiple drug and disease networks. Current Bioinformatics . 2022;17(1):48–59. doi: 10.2174/1574893616666210825115406. [DOI] [Google Scholar]
32.Li X., Lu L., Chen L. Lei Chen, identification of protein functions in mouse with a label space partition method. Mathematical Biosciences and Engineering . 2021;19(4):3820–3824. doi: 10.3934/mbe.2022176. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data used to support the findings of this study are available from the Protein–Protein Interaction (PPI) Network: https://ppi-net.org

[B1] 1.Joseph A. New drug development in the United States from 1963 to 1999. Clinical Pharmacology & Therapeutics . 2001;69(5):286–296. doi: 10.1067/mcp.2001.115132. [DOI] [PubMed] [Google Scholar]

[B2] 2.Adams C. P., Brantner V. V. Estimating the cost of new drug development: is it really $802 million? Health Affairs . 2006;25(2):420–428. doi: 10.1377/hlthaff.25.2.420. [DOI] [PubMed] [Google Scholar]

[B3] 3.Sleigh S. H., Barton C. L. Repurposing strategies for therapeutics. Pharmaceutical Medicine . 2010;24(3):151–159. doi: 10.1007/BF03256811. [DOI] [Google Scholar]

[B4] 4.Novac N. Challenges and opportunities of drug repositioning. Trends in Pharmacological Sciences . 2013;34(5):267–272. doi: 10.1016/j.tips.2013.03.004. [DOI] [PubMed] [Google Scholar]

[B5] 5.Walker S. L., Waters M. F. R., Lockwood D. N. J. The role of thalidomide in the management of erythema nodosum leprosum. Leprosy Review . 2007;78(3):197–215. doi: 10.47276/lr.78.3.197. [DOI] [PubMed] [Google Scholar]

[B6] 6.Huang C. H., Chang P. M., Hsu C. W., Huang C. Y., Ng K. L. Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory. BMC Bioinformatics . 2016;32(17):2664–2671. doi: 10.1186/s12859-015-0845-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Zheng C., Guo Z., Huang C., et al. Large-scale direct targeting for drug repositioning and discovery. Scientific Reports . 2015;5, article 11970 doi: 10.1038/srep11970. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Wang H., Gu Q., Wei J., Cao Z., Liu Q. Mining drug–disease relationships as a complement to medical genetics-based drug repositioning: where a recommendation system meets genome-wide association studies. Clinical Pharmacology and Therapeutics . 2015;97(5):451–454. doi: 10.1002/cpt.82. [DOI] [PubMed] [Google Scholar]

[B9] 9.Cheng F., Li W., Zhou Y., et al. Prediction of human genes and diseases targeted by xenobiotics using predictive toxicogenomic-derived models (PTDMs) Molecular BioSystems . 2013;9(6):1316–1325. doi: 10.1039/c3mb25309k. [DOI] [PubMed] [Google Scholar]

[B10] 10.Zhou J. P., Chen L., Guo Z. H. iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics . 2020;36(5):1391–1396. doi: 10.1093/bioinformatics/btz757. [DOI] [PubMed] [Google Scholar]

[B11] 11.Schadt E. E. Molecular networks as sensors and drivers of common human diseases. Nature . 2018;461(7261):218–223. doi: 10.1038/nature08454. [DOI] [PubMed] [Google Scholar]

[B12] 12.Califano A., Butte A. J., Friend S., Ideker T., Schadt E. Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nature Genetics . 2012;44(8):841–847. doi: 10.1038/ng.2355. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Zanzoni A., Soler-López M., Aloy P. A network medicine approach to human disease. 22nd IUBMB Congress/37th FEBS Congress . 2012;583(11):1759–1765. doi: 10.1016/j.febslet.2009.03.001. [DOI] [PubMed] [Google Scholar]

[B14] 14.Ghiassian S. D. Network medicine: a network-based approach to human diseases. Dissertations & Theses-Gradworks . 2015;12(1):56–68. [Google Scholar]

[B15] 15.Goh K., Cusick M. E., Valle D., Childs B., Vidal M., Barabási A. L. The human disease network. Proceedings of the National Academy of Sciences of the United States of America . 2007;104(21):8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Lage K., Møllgård K., Greenway S., et al. Dissecting spatio-temporal protein networks driving human heart development and related disorders. Molecular Systems Biology . 2010;6(1):381–381. doi: 10.1038/msb.2010.36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Chuang H. Y., Lee E., Liu Y. T., Lee D., Ideker T. Network-based classification of breast cancer metastasis. Molecular Systems Biology . 2007;3:p. 140. doi: 10.1038/msb4100180. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Rolland T., Taşan M., Charloteaux B., et al. A proteome-scale map of the human interactome network. Cell Cambridge Ma . 2014;159(5):1212–1226. doi: 10.1016/j.cell.2014.10.050. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Luo H., Wang J., Li M., et al. Drug repositioning based on comprehensive similarity measures and bi-random walk algorithm. Bioinformatics . 2016;32(17):2664–2671. doi: 10.1093/bioinformatics/btw228. [DOI] [PubMed] [Google Scholar]

[B20] 20.Yu L., Huang J., Ma Z., Zhang J., Zou Y., Gao L. Inferring drug-disease associations based on known protein complexes. BMC Medical Genomics . 2015;8(S2):p. 2. doi: 10.1186/1755-8794-8-S2-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Gottlieb A., Stein G. Y., Ruppin E., Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular Systems Biology . 2011;7(1):p. 496. doi: 10.1038/msb.2011.26. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Menche J., Sharma A., Kitsak M., et al. Disease networks. Uncovering disease-disease relationships through the incomplete interactome. Science . 2015;347(6224, article 1257601) doi: 10.1126/science.1257601. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Peng S., You R., Wang H., Zhai C., Mamitsuka H., Zhu S. DeepMeSH: deep semantic representation for improving large-scale MeSH indexing. Bioinformatics . 2016;32(12):i70–i79. doi: 10.1093/bioinformatics/btw294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Jani S. D., Argraves G. L., Barth J. L., Argraves W. S. GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms. BMC Bioinformatics . 2010;11(1):166–166. doi: 10.1186/1471-2105-11-166. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B25] 25.Sartor M. A., Ade A., Wright Z., et al. Metab2MeSH: annotating compounds with medical subject headings. Bioinformatics . 2012;28(10):1408–1410. doi: 10.1093/bioinformatics/bts156. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Pan X., Chen L., Liu, Niu Z., Huang T., Cai Y. D. Identifying protein subcellular locations with embeddings-based node2loc. IEEE/ACM Transactions on Computational Biology and Bioinformatics . 2021;19(1):1–675. doi: 10.1109/TCBB.2021.3080386. [DOI] [PubMed] [Google Scholar]

[B27] 27.Bowen D. A., Call T. G., Jenkins G. D., et al. Methylprednisolone-rituximab is an effective salvage therapy for patients with relapsed chronic lymphocytic leukemia including those with unfavorable cytogenetic features. Leukemia & Lymphoma . 2007;48(12):2412–2417. doi: 10.1080/10428190701724801. [DOI] [PubMed] [Google Scholar]

[B28] 28.Yao Q. M., Li P. P., Liang S. M., et al. Methylprednisolone suppresses the Wnt signaling pathway in chronic lymphocytic leukemia cell line MEC-1 regulated by LEF-1 expression. International Journal of Clinical and Experimental Pathology . 2015;8(7):7921–7928. [PMC free article] [PubMed] [Google Scholar]

[B29] 29.Rosen P. J., Rankin C., Head D. R., et al. A phase II study of high dose ARA-C and mitoxantrone for treatment of relapsed or refractory adult acute lymphoblastic leukemia. Leukemia Research . 2000;24(3):183–187. doi: 10.1016/S0145-2126(99)00148-4. [DOI] [PubMed] [Google Scholar]

[B30] 30.Parker C., Waters R., Leighton C., et al. Effect of mitoxantrone on outcome of children with first relapse of acute lymphoblastic leukaemia (ALL R3): an open-label randomised trial. The Lancet . 2010;376(9757):2009–2017. doi: 10.1016/S0140-6736(10)62002-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B31] 31.Yang Y., Chen L. Identification of drug-disease associations by using multiple drug and disease networks. Current Bioinformatics . 2022;17(1):48–59. doi: 10.2174/1574893616666210825115406. [DOI] [Google Scholar]

[B32] 32.Li X., Lu L., Chen L. Lei Chen, identification of protein functions in mouse with a label space partition method. Mathematical Biosciences and Engineering . 2021;19(4):3820–3824. doi: 10.3934/mbe.2022176. [DOI] [PubMed] [Google Scholar]

PERMALINK

Identification of Drug-Disease Associations Using a Random Walk with Restart Method and Supervised Learning

Xiaoqing Liu

Wenjing Yi

Baohang Xi

Qi Dai

Abstract

1. Introduction

2. Datasets and Methods