Skip to main content
PLOS Computational Biology logoLink to PLOS Computational Biology
. 2019 Dec 23;15(12):e1007541. doi: 10.1371/journal.pcbi.1007541

Overlap matrix completion for predicting drug-associated indications

Mengyun Yang 1,2, Huimin Luo 1, Yaohang Li 3, Fang-Xiang Wu 4, Jianxin Wang 1,*
Editor: Edwin Wang5
PMCID: PMC6946175  PMID: 31869322

Abstract

Identification of potential drug–associated indications is critical for either approved or novel drugs in drug repositioning. Current computational methods based on drug similarity and disease similarity have been developed to predict drug–disease associations. When more reliable drug- or disease-related information becomes available and is integrated, the prediction precision can be continuously improved. However, it is a challenging problem to effectively incorporate multiple types of prior information, representing different characteristics of drugs and diseases, to identify promising drug–disease associations. In this study, we propose an overlap matrix completion (OMC) for bilayer networks (OMC2) and tri-layer networks (OMC3) to predict potential drug-associated indications, respectively. OMC is able to efficiently exploit the underlying low-rank structures of the drug–disease association matrices. In OMC2, first of all, we construct one bilayer network from drug-side aspect and one from disease-side aspect, and then obtain their corresponding block adjacency matrices. We then propose the OMC2 algorithm to fill out the values of the missing entries in these two adjacency matrices, and predict the scores of unknown drug–disease pairs. Moreover, we further extend OMC2 to OMC3 to handle tri-layer networks. Computational experiments on various datasets indicate that our OMC methods can effectively predict the potential drug–disease associations. Compared with the other state-of-the-art approaches, our methods yield higher prediction accuracy in 10-fold cross-validation and de novo experiments. In addition, case studies also confirm the effectiveness of our methods in identifying promising indications for existing drugs in practical applications.

Author summary

This work introduces a computational approach, namely overlap matrix completion (OMC), to predict potential associations between drugs and diseases. The novelty of OMC lies in constructing an efficient framework of incorporating multiple types of prior information in bilayer and tri-layer networks. OMC for bilayer networks (OMC2) can approximate the low-rank structures of the drug–disease association matrices from both drug-side and disease-side. In addition, we further improve the prediction accuracy by extending OMC to handle tri-layer networks and develop its corresponding algorithm (OMC3). To evaluate the performance of OMC2 and OMC3, we conduct 10-fold cross-validation and de novo experiments on three datasets. Our computational results demonstrate that both OMC2 and OMC3 generally outperform five state-of-the-art methods in terms of ROC curve, PR curve, and top-ranked predictions.

Introduction

The development of new drugs is extremely time-consuming and expensive [1]. It is reported that the average time of developing a new drug is more than 13.5 years and the cost exceeds $1.8 billion dollars [2], while only a relatively small number of novel drugs are approved by US Food and Drug Administration (FDA) each year. Identifying new uses of existing drugs, known as drug repositioning, has been popularly used for the pharmaceutical industry and research community. Since the existing drugs have already owned safety, efficacy, and toleration data after numerous experiments and clinical trials, identifying new and reliable indications for commercialized drugs can sharply reduce time and costs. In addition, some successfully repositioned drugs, such as raloxifene, sildenafil, and thalidomide, have produced great revenues for their patent companies. Hence, drug repositioning is an important strategy of drug discovery in pharmaceutical industry.

The computational methods for drug repositioning have received much attention recently, as the traditional manual experimental investigation is complicated and inefficient. In recent years, many types of computational approaches have been proposed, including semantic inference, network-based analysis, and machine learning. The network-based methods are one of the popularly-used approaches to identify potential drug–disease associations. Based on the guilt-by-association principle, Wang et al. constructed a heterogenous graph between drug and target and proposed the HGBI (Heterogeneous Graph Based Inference) algorithm to predict potential drug–target interactions [3]. The HGBI algorithm is also used for prediction of drug–disease associations [4]. Based on the propagation flow algorithm, Martinez et al. proposed a network-based prioritization method named DrugNet for drug repositioning [5]. The DrugNet algorithm can perform both disease–drug and drug–disease prioritization by integrating drug, disease, and target information. In [6], the MBiRW method addressed the drug-repositioning problem by applying a bi-random walk algorithm on heterogeneous network with comprehensive similarity measures for drugs and diseases, obtained by utilizing logistic function [7] and ClusterONE [8].

Machine learning methods have attracted a lot of attention in recent years. Based on the common assumption that similar drugs tend to connect with similar diseases, Gottlieb et al. calculated five drug–drug similarity measures and two disease–disease similarity measures for drug-associated indication prediction, and presented a method (PREDICT) to identify potential drug indications for approved drugs [9]. Integrating chemical structure, drug–target interaction, and side-effect data, Wang et al. presented an approach called PreDR for drug–disease association prediction [10]. PreDR treated the prediction problem as a binary classification problem by defining a kernel function and applying an SVM-based learning algorithm. In [11], a matrix factorization model was developed to predict new indications for known drugs by incorporating the interaction network of genes. Luo et al. proposed a drug repositioning recommendation system (DRRS) [12]. Specifically, a heterogeneous network was constructed by integrating drug similarities, disease similarities, and drug–disease associations and the adjacency matrix of the large-scale heterogeneous network was considered as a low-rank matrix. The singular value thresholding algorithm (SVT) [13] was implemented to complete the missing entries of a drug–disease association matrix. Yang et al. further proposed a bounded nuclear norm regularization (BNNR) model [14], not only tolerating the noisy similarities of drugs and diseases by employing regularization, but also ensuring that all predicted values are within the interval of [0, 1]. However, the computational cost of both DRRS and BNNR increases sharply when target (protein/gene) information is incorporated into the heterogeneous drug–disease network.

In this study, we propose an overlap matrix completion for bilayer networks (OMC2) and tri-layer networks (OMC3) to predict potential indications for approved and new drugs. We design two different networks from drug-side aspect and disease-side aspect, instead of constructing a large-scale heterogeneous drug–disease network. This can significantly reduce the computational complexity for matrix completion. Meanwhile, a BNNR model [14] developed in our previous work is implemented to fill out the missing entries in the block adjacency matrix of these networks. We evaluate the performance of OMC2 and OMC3 in three different datasets and compare them with five latest approaches for drug repositioning. Our computational results show that our methods yield better accuracy in predicting potential drug–disease associations.

Materials and methods

In this section, we introduce OMC for bilayer networks (OMC2) and tri-layer networks (OMC3) to identify potential indications for both known and novel drugs. First of all, a concise description of experimental datasets is presented. In bilayer heterogeneous networks, we integrate drug–drug, disease–disease, and drug–disease information. In tri-layer heterogeneous networks, besides the above three kinds of data, drug–protein and disease–protein associations are considered. Then, we present the OMC2 algorithm for drug–disease bilayer networks to predict novel drug–disease associations. Finally, we extend OMC2 to an OMC3 algorithm in handling the tri-layer networks, where the target-related information is also incorporated.

Datasets

To construct bilayer and tri-layer networks, we collected drug, disease, and target protein information from published literatures and related authoritative databases. The approaches to collect association information and to compute similarity are described below.

Drug–disease associations. Confirmed drug–disease associations were obtained from the supplementary material of [9], which was admittedly treated as the gold standard dataset. There were 1, 933 associations between 593 drugs registered from DrugBank [15] and 313 diseases listed in the Online Mendelian Inheritance in Man (OMIM) database [16].

Drug–drug similarity. Drug–drug similarities were calculated based on chemical structures. The Canonical Simplified Molecular Input Line-Entry System (SMILES) [17] of these 593 drugs were downloaded from DrugBank. Then, the Chemical Development Kit (CDK) [18] was utilized to compute hashed fingerprints for each drug with default parameters. Finally, the similarity between two drugs was measured by the Tanimoto score [19] in the range of [0, 1].

Disease–disease similarity. Disease–disease similarities were computed by MimMiner [20], which identifies similarity of appearance of MESH (medical subject headings vocabulary) terms between two diseases in medical descriptions from the OMIM database. In the MimMiner program, the disease–disease similarity was normalized to the interval of [0, 1].

Drug–protein interactions. The interactions between drugs and proteins were collected from DrugBank. We collected 3, 184 drug–target (protein) interactions between 576 relevant drugs of the gold standard dataset and 975 proteins.

Disease–protein associations. We collected disease–protein associations in two steps. In the first step, we downloaded the interactions between diseases included in the gold standard dataset and genes from CTD [21], and the total of 475 disease–gene interactions were collected. Secondly, these genes were mapped into 849 proteins in UniprotKB database [22]. There were 1, 066 associations between 166 diseases and 849 proteins at last.

OMC algorithm for bilayer networks

Two drug–disease bilayer networks and corresponding adjacency matrices

We construct two heterogeneous drug–disease bilayer networks. One is composed of a drug–drug network and a drug–disease network and the other is of a disease–disease network and a drug–disease network. Fig 1 shows the workflow for constructing two bilayer networks and their corresponding block adjacency matrices.

Fig 1. The workflow of constructing the DrNet-Dis network and the DisNet-Dr network.

Fig 1

(a) Drug–drug network and its similarity matrix. (b) Drug–disease associations and KNN preprocessing. (c) Disease–disease network and its similarity matrix. (d) DrNet-Dis network and its block adjacency matrix. (e) DisNet-Dr network and its block adjacency matrix.

For the drug–drug network with m drug nodes, let ARRIRm×m be its adjacency matrix, where element (ARR)ij represents the similarity between drugs ri and rj. Similarly, ADDIRn×n is the adjacency matrix of the disease–disease network with n disease nodes, where (ADD)ij denotes the similarity between diseases di and dj. For the drug–disease network, let ADRIRn×m be its adjacency matrix (drug–disease association matrix), where (ADR)ij is set to 1 if there exists an experimentally validated association between di and rj, otherwise 0.

DrNet-Dis network. The DrNet-Dis network, illustrated in Fig 1(a), 1(b) and 1(d), is constructed by integrating the drug–drug network and the drug–disease network. For the sake of generality in applications, we take some novel disease nodes into account, which are not associated with any known drug node. For instance, d4 is a new disease node in Fig 1(b), and the corresponding row of ADR is a zero vector, which causes difficulty in matrix completion and affects the performance of prediction. To address this cold-start problem, we conduct a K-Nearest Neighbor (KNN) preprocessing step for these new diseases. Specifically, for each novel disease dp, K nearest neighbor diseases of dp are picked based on their disease similarities in descending order. We update the corresponding row vector of disease dp in the drug–disease association matrix by filling out a part of weighted association information. The detail of the KNN preprocessing algorithm is described by Algorithm 1. After the KNN preprocessing step, an updated drug–disease association matrix ADR1 is obtained and the block adjacency matrix M1IR(m+n)×m of the DrNet-Dis network is presented as follows,

M1=[ARRADR1].

DisNet-Dr network. The DisNet-Dr network, demonstrated by Fig 1(b), 1(c) and 1(e), is constructed by integrating the disease–disease network and the drug–disease network. For some novel drugs (e.g., drug r2 in Fig 1(b)), the corresponding columns of ADR are zero vectors. Similarly, the KNN preprocessing step is also implemented for these new drugs by Algorithm 1, and a new corresponding association matrix ADR2 is developed. Finally, the block adjacency matrix M2IRn×(m+n) of the DisNet-Dr network is denoted as follows,

M2=[ADR2ADD].

Actually, the above KNN preprocessing step is not required if there is no novel disease or drug node. M1 and M2 are the to-be-complete matrices.

Algorithm 1: KNN Preprocessing Algorithm

Input: The drug similarity matrix ARRIRm×m, the disease similarity matrix ADDIRn×n, the disease–drug association matrix ADRIRn×m may contain some zero rows or columns, and the neighborhood size K.

Output: Updated ADR1 and ADR2.

1. Initialize ADR1 = ADR and ADR2 = ADR;

2. Find index numbers of all zero rows of the matrix ADR1, which are denoted as {i1, i2, …, is} ⊂ {1, 2, …, m}. D0={di1,di2,,dis} represents the corresponding disease set. /* Entries of D0 actually are novel diseases, where di1 represents i1-th disease in all diseases.*/

for each disease dpD0 do

3. U=KNN(ADD,K,dp); /* KNN is a function for finding the K nearest neighbors of disease node dp based on similarity matrix ADD in descending order.*/

4. Sd=duUADD(dp,du);

5. ADR1(p,:)=duUADD(dp,du)Sd*ADR(du,:);

/*ADR1(p, :) notes the p-th row of matrix ADR1 and the denominator is the normalization term.*/

end for

6. Find index numbers of all zero columns of the matrix ADR2, which are denoted as {j1, j2, …, jt} ⊂ {1, 2, …, n}. R0={rj1,rj2,,rjt} represents the corresponding drug set. /*Entries of R0 actually are novel drugs, where rj1 represents the j1-th drug in all drugs.*/

for each drug rqR0 do

7. V=KNN(ARR,K,rq); /* KNN is a function for finding the K nearest neighbors of drug node rq based on similarity matrix ARR in descending order.*/

8. Sr=rvVADD(rq,rv);

9. ADR2(:,q)=rvVARR(rq,rv)Sr*ADR(:,rv);

/*ADR2(:, q) notes the q-th column of matrix ADR2 and the denominator is the normalization term.*/

end for

10. return ADR1 and ADR2.

BNNR model

Matrix completion, whose goal is to recover the missing elements of matrix from only a few observations, has been widely used in many applications. Under the low-rank assumption, matrix completion is generally formulated as the following nuclear norm minimization problem

minXX*s.t.PΩ(X)=PΩ(M). (1)

where ‖X* denotes the nuclear norm of X, which is defined as the sum of all singular values of X. M is the incomplete matrix, Ω is a set including index pairs (i, j) of all known elements in M, and PΩ is the projection operator projecting matrix X onto Ω, which is defined as

(PΩ(X))ij={Xij,(i,j)Ω0.(i,j)Ω

In the drug–disease association matrix, the entry value 1 denotes an experimentally validated indication while 0 indicates the association has not been validated yet. As a result, the predicted drug–disease association values are expected to fall in the interval of [0, 1], indicating the likelihood of being a true association. Therefore, a predicted value beyond the [0, 1] range is meaningless in the context of the application. To enforce the predicted values within the interval of [0, 1], a bounded constraint is added into the matrix completion model. In addition, due to the large amount of “noise” when calculating drug similarity and disease similarity, we relax the constraint satisfaction condition by incorporating a regularization term. As a result, we have proposed the bounded nuclear norm regularization (BNNR) described in [14] as follows,

minXX*+α2PΩ(X)PΩ(M)F2s.t.0X1. (2)

where α > 0 is a harmonic parameter to balance the nuclear norm and the error term and 0 ≤ X ≤ 1 represents 0 ≤ Xij ≤ 1 for all i, j. A simple and effective algorithm is designed to solve model (2) by using the alternating direction method of multipliers (ADMM). By introducing a new splitting matrix W, (2) can be formulated as the following equivalent form,

minXX*+α2PΩ(W)PΩ(M)F2s.t.X=W,0W1. (3)

The augmented Lagrangian function of model (3) is

L(W,X,Y,α,β)=X*+α2PΩ(W)PΩ(M)F2+Tr(YT(XW))+β2XWF2, (4)

where Y is the Lagrange multiplier and β > 0 is the penalty parameter. By applying ADMM, we can obtain the following iterative scheme:

Wk+1=argmin0W1L(W,Xk,Yk,α,β), (5)
Xk+1=argminXL(Wk+1,X,Yk,α,β), (6)
Yk+1=Yk+β(Xk+1Wk+1). (7)

We use the inverse operator [23] to solve Eq (5) and acquire a closed-form solution W* as follows,

W*=(Iαα+βPΩ)(1βYk+αβPΩ(M)+Xk),

where I denotes the identity operator. Moreover, to limit the element values of Wk+1 in the interval of [0, 1], we utilize the following projection operator

Wk+1=Q[0,1](W*), (8)

where Q[0,1] is defined as

(Q[0,1](W*))ij={1,Wij*>1Wij*,0Wij*10.Wij*<0

By rearranging the terms of (6), we have

Xk+1=argminXX*+β2X(Wk+11βYk)F2=D1β(Wk+11βYk), (9)

where Dτ(X) is the singular value shrinkage (SVT) operator [13] [24]. Specifically, SVT operator is defined as

Dτ(X)=i=1σiτ(σiτ)uiviT,

where σi is the ith singular value of X larger than threshold τ, while ui and vi are the left and right singular vectors corresponding to σi, respectively.

Algorithm 2 presents an iterative BNNR scheme for solving the model (2). After performing BNNR algorithm, we can obtain a completed matrix M*, where all the unknown entries of matrix M have been filled out.

Algorithm 2: BNNR Algorithm

Input: The to-be-complete M, parameters α, and β.

Output: Completed matrix M*.

1. initialize X1 = PΩ(M), W1 = X1, Y1 = X1;

2. k ← 1;

repeat

3. Wk+1Q[0,1](W*);

4. Xk+1D1β(Wk+11βYk);

5. Yk+1Yk + β(Xk+1Wk+1);

6. kk + 1;

until convergence

7. M* = Wk;

8. return M*.

OMC2 algorithm

We propose the OMC algorithm for bilayer networks (OMC2) to predict the potential drug–disease associations, whose goal is to obtain the low-rank matrices of drug–disease relationships from drug-side information and disease-side information. Firstly, we combine the updated disease–drug association matrix with the drug similarity matrix and create a block adjacency matrix M1, as illustrated in Fig 1(d). Meanwhile, from the disease-side, we combine the updated disease–drug association matrix with the disease similarity matrix and generate a block adjacency matrix M2, as illustrated in Fig 1(e). Secondly, the BNNR algorithm is implemented to fill out the unknown entries of M1 and M2. Finally, we calculate the average of two predicted drug–disease association matrices to obtain the final predicted matrix ADR*. Each element (ADR*)ij represents the predicted score between disease di and drug rj. The higher the score, the more likely that the association exists. To identify the promising candidate indicates for a specific drug, we rank all candidates according to their scores in descending order. The detail of the OMC2 algorithm is described in Algorithm 3.

Algorithm 3: OMC2 Algorithm

Input: The drug similarity matrix ARRIRm×m, the disease similarity matrix ADDIRn×n, the disease–drug association matrix ADRIRn×m, parameters K, α, and β.

Outout: Predicted association matrix ADR*.

1. ADR1KNNpreprocessing(ADR,ADD,K);

2. M1=[ARRADR1];

3. ADR2KNNpreprocessing(ADR,ARR,K);

4. M2=[ADR2ADD];

5. [ARR*ADR1*]BNNR(M1,α,β);

6. [ADR2*ADD*]BNNR(M2,α,β);

7. ADR*=ADR1*+ADR2*2;

8. return ADR*.

OMC algorithm for tri-layer networks

OMC can be easily extended from bilayer networks (OMC2) to tri-layer networks (OMC3) algorithm, where the disease–protein and drug–protein association information are incorporated to further improve prediction accuracy. Firstly, we collect drug–protein (target) interactions and disease–protein associations from different databases. This step has been discussed in the previous section. Secondly, based on the two bilayer networks, i.e., the DrNet-Dis network and the DisNet-Dr network, we design two corresponding tri-layer networks. We integrate protein nodes and drug–protein associations into the DrNet-Dis network and construct a drug–protein–disease network called DrNet-Pro-Dis, as showed in Fig 2(e). The block adjacency matrix of this tri-layer network is defined as

M1=[ARRAPRADR1].

Similarly, we integrate protein nodes and disease–protein associations into the DisNet-Dr network and create another tri-layer network called DisNet-Pro-Dr, as illustrated in Fig 2(f). The block adjacency matrix of DisNet-Pro-Dr network is defined as

M2=[ADR2ADPADD].

Thirdly, the BNNR algorithm is carried out to fill out the missing entries of M1 and M2 to obtain two predicted drug–disease association matrices. Finally, we calculate the average of these two matrices as the final output. The detail of OMC3 the algorithm is described in Algorithm 4.

Fig 2. The workflow of constructing the DrNet-Pro-Dis network and the DisNet-Pro-Dr network.

Fig 2

(a) DrNet-Dis network and its similarity matrix. (b) Drug–protein interactions and corresponding adjacency matrix. (c) Disease–protein associations and corresponding adjacency matrix. (d) DisNet-Dr network and its block adjacency matrix. (e) DrNet-Pro-Dis network and its block adjacency matrix. (f) DisNet-Pro-Dr network and its block adjacency matrix.

Algorithm 4: OMC3 Algorithm

Input: Drug similarity matrix ARRIRm×m, disease similarity matrix ADDIRn×n, protein–drug association matrix APRIRs×m, disease–protein association matrix ADPIRn×t, disease–drug association matrix ADRIRn×m, parameters K, α, and β.

Output: Predicted association matrix ADR*.

1. ADR1KNNpreprocessing(ADR,ADD,K);

2. M1=[ARRAPRADR1];

3. ADR2KNNpreprocessing(ADR,ARR,K);

4. M2=[ADR2ADPADD];

5. [ARR*APR*ADR1*]BNNR(M1,α,β);

6. [ADR2*ADP*ADD*]BNNR(M2,α,β);

7. ADR*=ADR1*+ADR2*2;

8. return ADR*.

Results

In this section, we systematically evaluate the performance of our proposed methods (OMC2 and OMC3) for predicting drug-associated indications. First of all, several evaluation metrics are introduced and parameter settings are discussed. In order to compare our methods with several state-of-the-art approaches, we perform 10-fold cross-validation and de novo tests in the gold standard dataset. Case studies are conducted to confirm the reliability of OMC3 in practical applications. Then, the performance of OMC and comparison on bilayer and tri-layer networks are discussed. Finally, we perform the same experiments on two other datasets to further illustrate the effectiveness and robustness of OMC2 and OMC3.

Evaluation metrics

To evaluate the performance of our approaches, a 10-fold cross-validation experiment is conducted to identify candidate diseases for specific drugs. In the gold standard dataset, all approved drug–disease associations are randomly divided into ten parts with approximately equal sizes. Each part is treated as the testing set in turn, and the training set is comprised of the remaining nine parts. To obtain convincing results, the 10-fold cross-validation is repeated 10 times and the final result is showed by the average value of the 10 folds. After the performing prediction, all candidate diseases associating with the test drug are ranked by their predicted scores in descending order. For a given rank threshold, the candidate disease is considered as a True Positive (TP) if its rank is above the threshold; otherwise, it is treated as a False Negative (FN). On the other hand, if the rank of a candidate disease had no association with the test drug is greater than the threshold, it is considered as a False Positive (FP), otherwise, it is treated as a True Negative (TN). Based on varying ranking thresholds, we can calculate True Positive Rate (TPR) and False Positive Rate (FPR) by

TPR=#ofTPs#ofTPs+#ofFNs,FPR=#ofFPs#ofFPs+#ofTNs,

and draw a Receiver Operating Characteristic (ROC) curve. Meanwhile, the area under the ROC curve (AUC) is utilized to evaluate the overall performance of a method. Precision and recall (equivalent to TPR) could be obtained to plot the precision-recall (PR) curve [25]. Due to the fact that the top-ranked result is a more important measurement in real-life drug-repositioning applications, the number of the retrieved correct associations is reported under different top ranking values.

Parameter settings

In OMC2 and OMC3 algorithms, there are three hyper parameters to be determined, including α, β, and K. In this subsection, using the OMC2 algorithm as an example, we explain the procedure of determining these parameters. The similar parameter determination procedure can be extended to the OMC3 algorithm.

For α and β, we perform a 10-fold cross-validation to find the most appropriate values by the grid search, which are chosen from {0.1, 1, 10, 100}. When the neighborhood size K is fixed to 1, S1 Table shows the AUC values of OMC2 under different values of α and β on the gold standard dataset. Our results show that the best performance is achieved by α = 1 and β = 10.

For K, we firstly assign 1 and 10 to α and β, respectively and then use cross validation to pick an appropriate K value from {1, 5, 10, 15, 20, 25, 30}. S1 Fig shows the AUC values of OMC2 under this setting. When K is 10, the best AUC value is achieved. Since the values of K have little effect on AUC values, we can treat K = 10 as a prior knowledge in other datasets for simplicity. Actually, We fixed the neighborhood size K to 10, the optimal values of α and β are also equal to 1 and 10, respectively. The results are shown in S2 Table and it could further illustrate the stability of the parameter values.

Based on the above analysis, we finally choose α = 1, β = 10, and K = 10 for the gold standard dataset as the default parameters.

Comparison with other methods

In order to obtain convincing and fair comparison results, OMC2 and OMC3 are compared with the five state-of-the-art approaches: BNNR [14], DRRS [12], MBiRW [6], DrugNet [5], and HGBI [3]. The parameters in the compared approaches are set to either the default values in their papers or the best value by the grid search, if the default values are not provided. We rank the predicted indications and plot the ROC curves and PR curves to analyze the 10-fold cross-validation results.

As shown in Fig 3, OMC2 and OMC3 outperform the other methods in ROC curves, PR curves, and top-ranked results. More specifically, OMC2 and OMC3 obtain AUC values of 0.939 and 0.945, while BNNR, DRRS, MBiRW, DrugNet, and HGBI yield AUC values of 0.932, 0.930, 0.917, 0.868, and 0.829, respectively. In real-life drug-repositioning applications, researchers particularly care about precision, because the precise prediction can significantly reduce experimental cost and time. The PR curves show OMC2 and OMC3 achieve the second best and the best precisions of 0.449 and 0.461, while BNNR, DRRS, MBiRW, DrugNet, and HGBI have the precisions of 0.440, 0.375, 0.304, 0.192, and 0.130, respectively. It is important to note that OMC3 can successfully prioritize 46.1% true drug–disease associations at top rank. A true drug–disease association is treated as the retrieved correct association when its predicted rank is higher than the specified ranking threshold. The numbers of correct associations predicted by all methods under different top ranking values are shown in Fig 3(c). The numbers of retrieved associations of both OMC2 and OMC3 exceed those of the other competing approaches. Specifically, among 1, 933 true drug–disease associations, 1, 493(77.2%) and 1, 529(79.1%) associations are correctly predicted at top 10 by OMC2 and OMC3, while in comparison, 1, 475(76.3%), 1, 413(73.1%), 1, 232(63.7%), 900(46.6%), and 752(38.9%) associations are identified by BNNR, DRRS, MBiRW, DrugNet, and HGBI, respectively.

Fig 3. The performance of all methods for predicting drug–disease associations in the 10-fold cross-validation.

Fig 3

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank thresholds.

Prediction of potential indications for new drugs

To evaluate the performance of OMC2 and OMC3 for identifying indications of novel drugs, we conduct a de novo test, where these drugs with only one known drug–disease association are chosen. For each of these drugs, the unique disease association is removed in turn as the test sample, and other known drug–disease associations are used as the training samples. There are totally 171 drugs with only one known associated disease in gold standard dataset.

As shown in Fig 4, OMC2 and OMC3 achieve the AUC values of 0.851 and 0.871, while BNNR, DRRS, MBiRW, DrugNet, and HGBI have inferior results with the AUC values of 0.830, 0.824, 0.818, 0.782, and 0.746, respectively. OMC3 has demonstrated its advantages measured by PR curves. For top-ranked results, OMC3 outperforms all methods at all ranking thresholds. Meanwhile, OMC2 surpasses the compared approaches at top 5, 10, 30, 50 and 100, except for being inferior to DRRS at top 1. Specifically, 74 and 88 drugs are identified correctly at top 5 by OMC2 and OMC3, respectively. In comparison, 73, 62, 71, 52, and 36 drugs are predicted by BNNR, DRRS, MBiRW, DrugNet, and HGBI, respectively. Summarizing the above results, one can find that our OMC methods are effective to address the cold-start problem to identify potential indications for novel drugs. In particular, OMC3 yields further improvement over OMC2, indicating the effectiveness of incorporating target association information in the tri-layer network.

Fig 4. Performance of all methods in predicting potential diseases for new drugs.

Fig 4

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) Number of correctly retrieved drug–disease associations for various rank thresholds.

Case studies

We apply OMC3 to predict new uses for already approved drugs in real applications. To predict novel indications for existing drugs in the gold standard dataset, we consider all known associations between drugs and diseases as the training samples and the unknown drug–disease pairs as the candidate samples. By carrying out the OMC3 algorithm, the predicted scores of all candidate pairs are obtained and sorted for each specific drug.

In order to verify the predicted diseases, we choose three representative drugs: Doxorubicin, Flecainide, and Levodopafour. We confirm the potential diseases associated with the given drug by retrieving authoritative public databases, such as CTD [21], DrugBank, and KEGG [26]. The newly predicted indications and their supporting evidences are listed in Table 1. One can find that more than three novel indications are validated on top-5 for each representative drug. As shown in this case study, OMC3 can be used as an effective method for identifying new indications for specific drugs in practical applications. In order to provide more helpful references for medical researchers, the top-30 candidate indications of each drug are listed in S3 Table.

Table 1. The top-5 candidate diseases for Doxorubicin, Flecainide, and Levodopa.

Drugs
(DrugBank IDs)
Top-5 candidate diseases
(OMIM IDs)
Evidences
Doxorubicin
(DB00997)
Dohle bodies (223350)
Reticulum cell sarcoma (267730) CTD
Small cell cancer of the lung (182280) CTD
Leukemia (109543) CTD/DB/KEGG
Testicular germ cell tumor (273300) CTD
Flecainide
(DB01195)
Atrial fibrillation (608583) CTD
Cardiac arrhythmia (115000) CTD/DB
Diastolic hypertension (608622) CTD
Hyperplastic myelinopathy (147530)
Nephropathy-hypertension (161900)
Levodopa
(DB01235)
Parkinson disease (168600) CTD/DB/KEGG
Dementia (125320) CTD/DB
Schizophrenia (181500) CTD
Optic atrophy, hearing loss, and peripheral neuropathy (165199) CTD
Hyperplastic myelinopathy (147530)

Effectiveness of OMC on performance

In order to evaluate the effectiveness of OMC, we compare OMC2 with algorithms using only drug- or disease-side information in 10-fold cross-validation. The first algorithm, called OMC-drug, is to obtain ADR1 by BNNR in DrNet-Dis network, while the other one, called OMC-disease, is to recover ADR2 by BNNR in DisNet-Dr network. As shown in S2 Fig, both OMC-drug and OMC-disease are inferior to OMC2 in each fold in terms of AUC. In conclusion, consolidating drug- and disease-side associations in OMC2 is a better way to predict drug–disease associations than just using one-side information.

Comparison on tri-layer networks

In this subsection, we illustrate the performance and the computational efficiency of different approaches in tri-layer networks. BNNR, DRRS, DrugNet, and HGBI algorithms are taken into account for extending from bilayer networks into tri-layer networks, in comparison with OMC3. Since the protein association information is incorporated, the resulted affinity matrix of the tri-layer network is significantly enlarged. This also poses computational challenges in the factorization algorithms in matrix completion, which often grow cubically. The running time of each approach is obtained on a Linux server with CPU 2.30 GHz and 128 GB memory.

As described in our previous works, BNNR and DRRS constructed the same bilayer networks between drugs and diseases. In order to construct a tri-layer heterogeneous network, we integrate protein-related information into the network, including protein–protein similarities, drug–protein interactions, and disease–protein associations. Accordingly, we get the corresponding square, symmetric adjacency matrix defined as follows,

M=[ARRADRTAPRTADRADDAPDTAPRAPDAPP],

where APP represents the protein–protein similarity matrix, which is calculated based on the amino acid sequence alignment by Rcpi [27]. The programs for completing the matrix M by BNNR and DRRS are called BNNR3 and DRRS3, respectively. For DrugNet, it is also applied to tri-layer networks by integrating target-related information [5], which is denoted as DrugNet3 here. DrugNet3 can predict drug–disease relationships by propagating information in the drug–target–disease network. Based on the guilt-by-association principle, the authors of HGBI had extended bilayer networks into tri-layer networks by integrating drug, target, and disease information [4], which was called TL-HGBI (denotes HGBI3 here).

The 10-fold cross-validation is uniformly conducted in the same gold standard dataset for OMC3, BNNR3, DRRS3, DrugNet3, and HGBI3. As shown in Fig 5(a) and 5(b), OMC3 outperforms the other approaches measured by the AUC values of the ROC curves and the precision. Specifically, OMC3 obtains the best AUC value of 0.945, while BNNR3, DRRS3, DrugNet3, and HGBI3 have the AUC values of 0.932, 0.932, 0.835, and 0.855, respectively. The PR curves show that OMC3 obtains the best precision with 0.460, while BNNR3, DRRS3, DrugNet3, and HGBI3 have the precision values of 0.431, 0.329, 0.093, and 0.227, respectively.

Fig 5. Method comparison in bilayer networks and tri-layer networks.

Fig 5

(a) ROC curves of prediction results. (b) PR curves of prediction results. (c) The average running time of each fold in the 10-fold cross-validation.

Surprisingly, a method extended from bilayer networks into tri-layer networks does not necessary improve the prediction performance. In fact, only OMC and HGBI obtain performance improvement in tri-layer networks over bilayer ones. BNNR3, DRRS3, and DrugNet3 yield even worse performance when tri-layer networks are used compared to the corespondent algorithms on bilayer networks. This is due to the fact that protein–protein similarities calculated by the algorithm contain a large amount of “noise”, which causes BNNR3, DRRS3, and DrugNet3 to degrade their prediction performance. In contrast, OMC3 avoids the use of protein–protein similarities and the information OMC3 used is experimentally proven, such as drug–protein interactions and disease–protein associations, which in turn leads to performance improvement over OMC2 on bilayer networks.

As shown in Fig 5(c), the average running time of BNNR and DRRS increase sharply from bilayer networks (BNNR2 and DRRS2) to tri-layer networks (BNNR3 and DRRS3), due to the increase of the affinity matrix. Nevertheless, this does not have such a significant impact on OMC, DrugNet, and HGBI. For OMC, this is because OMC keeps the matrix completion computation at the bilayer network level. As a result, OMC is not only better in terms of the prediction performance, but is also computational efficiency.

Experiments on the other datasets

We apply OMC2 and OMC3 to two other datasets, including Cdataset [6] and DNdataset [5], to demonstrate their robustness. Cdataset contains 663 drugs collected in DrugBank, 409 diseases obtained in OMIM database, and 2, 352 known drug–disease associations. In addition, we have collected drug–protein interactions related to drugs of Cdataset from DrugBank and retrieved a total of 3, 251 associations between 637 drugs and 891 proteins. For disease–protein associations, we download disease–gene interactions related to diseases of Cdataset from CTD database, and map genes into proteins in the UniprotKB database. There are 1, 280 associations between 226 diseases and 1, 002 proteins. The drug similarity and disease similarity are calculated in the same way as described in the previous section. DNdataset includes 1, 490 drugs registered in DrugBank, 4, 516 diseases annotated by Disease Ontology (DO) terms, 18, 107 proteins extracted from BioGRID, 11, 658 disease–protein associations directly extracted from the disease and gene annotations (DGA), 4, 026 drug–protein interactions collected in DrugBank, and 1, 008 known drug–disease associations. We evaluate the performance of our methods on Cdataset and DNdataset by performing a 10-fold cross-validation and de novo experiments.

For Cdataset, as shown in S3(a)–S3(c) Fig, OMC2 and OMC3 demonstrate superior performance in terms of ROC curve, PR curve, and top-ranked results in the 10-fold cross-validation. Specifically, OMC2 and OMC3 obtain the AUC values of 0.953 and 0.957 in the ROC curves, while BNNR, DRRS, MBiRW, DrugNet, and HGBI have 0.948, 0.947, 0.933, 0.903, and 0.858, respectively. The PR curves indicate that OMC2 and OMC3 achieve the second best precision of 0.476 and the best precision of 0.489, while the precision values in BNNR, DRRS, MBiRW, DrugNet, and HGBI are 0.471, 0.403, 0.351, 0.239, and 0.168, respectively. In addition, OMC2 and OMC3 outperform the other methods in the top-ranked results with respect to different ranking thresholds. In the de novo test, there are 177 drugs with only one known associated disease in Cdataset. As shown in S4(a)–S4(c) Fig, OMC2 and OMC3 obtain the AUC values of 0.830 and 0.846, respectively, while BNNR, DRRS, MBiRW, DrugNet, and HGBI have the AUC values of 0.812, 0.819, 0.804, 0.785, and 0.732, respectively. Both OMC2 and OMC3 exceed the other methods in terms of AUC values as well. For top-ranked results, among 177 test drugs, 100 (56.5%) drugs are correctly identified at top 10 by OMC3, while only 87 (49.2%), 78 (44.1%), 80 (45.2%), 61 (34.5%), and 48 (27.1%) drugs are predicted by BNNR, DRRS, MBiRW, DrugNet, and HGBI, respectively.

For DNdataset, in the 10-fold cross-validation results shown in S5(a)–S5(c) Fig, OMC2 and OMC3 obtain the AUC values of 0.957 and 0.965, while BNNR, DRRS, MBiRW, DrugNet, and HGBI yield the AUC values of 0.955, 0.934, 0.956, 0.950, and 0.921, respectively. Similar to that of Cdataset, OMC2 obtains the second best precision of 0.360 and OMC3 obtains the best precision of 0.369 in PR curves. Moreover, OMC2 and OMC3 outperform the other methods on top-ranked results at different ranking thresholds. In the de novo test, OMC3 also outperforms the other methods. As shown in S6(a)–S6(c) Fig, OMC2 and OMC3 obtain the AUC values of 0.963 and 0.972, while BNNR, DRRS, MBiRW, DrugNet, and HGBI have the AUC values of 0.956, 0.946, 0.970, 0.969, and 0.928, respectively. For top-ranked results, among 347 test drugs, 228 (65.7%) and 231 (66.6%) drugs are correctly identified at top 1 by OMC2 and OMC3, while only 218 (62.8%), 213 (61.4%), 219 (63.1%), 156 (45.0%), and 150 (43.2%) drugs are predicted by BNNR, DRRS, MBiRW, DrugNet, and HGBI, respectively. In summary, the above results on Cdataset and DNdataset demonstrate the robustness and generalization of OMC.

Discussion

In this study, we have proposed a novel OMC method for predicting drug-associated indications, which can effectively integrate multiple types of drug and disease information. In addition, our method can be simply extended from bilayer networks to tri-layer networks by incorporating drug-target associations. Furthermore, OMC effectively avoids the use of noisy data in tri-layer networks. The performance of our methods (OMC2 and OMC3) are validated by the cross validation, de novo experiments, and case studies. The experimental results indicate that our methods are effective compared with the latest approaches, particularly for de novo drugs.

However, OMC has two potential limitations. First, the drug and disease similarity computations in this work may be not optimal. More reliable similarity measures, for example consensus integrating multiple similarities computations from different aspects could improve the performance of OMC. Second, OMC must perform matrix completion twice from both drug-side and disease-side before the final predicted score is obtained.

OMC can actually be used on other drug-related predictions, such as synergistic drug combination and small molecule–miRNA association prediction. The synergistic drug combination is based on the assumption that principal drugs which obtain the synergistic effect with similar adjuvant drugs are often similar and vice versa [28]. That means the drug combination matrix is also of low-rank. Therefore, OMC can be applied to predict potential synergistic drug combinations by integrating the drug similarity matrix and the drug–target interaction matrix. In addition, it may avoid classifying principal drugs and adjunct drugs before obtaining the final score of drug combinations. MiRNAs play an important role in the initiation and development of various human diseases. Several drug-like compound libraries targeting different miRNAs have been successfully screened in cell assays, further demonstrating the possibility of targeting miRNAs with small molecules. Hence, it is very meaningful and promising to develop computational models for drug repositioning based on drug related miRNA. Some original and novel methods have been proposed in recent years [29]. Especially, based on tri-layer heterogeneous networks, more prior information is used to obtain better prediction performance [30]. In the future, we plan to extend our OMC method to explore drug combinations and miRNA-small molecule associations for drug repositioning.

Supporting information

S1 Fig. The AUC values are indicated by the OMC2 algorithm when the neighborhood size K is chosen from {1, 5, 10, 15, 20, 25, 30} in the cross validation.

(TIF)

S2 Fig. Performance comparison of the OMC2, OMC-drug and OMC-disease in the 10-fold cross-validation in terms of AUC.

The result of each fold is presented.

(TIF)

S3 Fig

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank thresholds. The performance of all methods for predicting drug–disease associations in the 10-fold cross-validation on CDataset.

(TIF)

S4 Fig

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank threshold. The performance of all methods in predicting potential diseases for new drugs on CDataset.

(TIF)

S5 Fig

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank thresholds. The performance of all methods for predicting drug–disease associations in the 10-fold cross-validation on DNdataset.

(TIF)

S6 Fig

(a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank threshold. The performance of all methods in predicting potential diseases for new drugs on DNdataset.

(TIF)

S1 Table. The AUC values under different values of α and β in the 10-fold cross-validation for the gold standard dataset.

(DOC)

S2 Table. The AUC values based on K = 10 for α and β in the 10-fold cross-validation for the gold standard dataset.

(DOC)

S3 Table. The top-30 candidate indications of all drugs listed by OMC3.

(XLSX)

Data Availability

All relevant data can be obtained from the website https://github.com/BioinformaticsCSU/OMC.

Funding Statement

This research was supported by the National Natural Science Foundation of China (Grant No. 61972423, 61802113, and 61420106009), the Graduate Research Innovation Project of Hunan (Grant No. CX20190125), the General Project of Hunan Education Department (Grant No. 17C1434), Hunan Provincial Science and technology Program (No. 2018wk4001), and 111 Project (No. B18059). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1. Chong CR, Sullivan DJ. New uses for old drugs. Nature. 2007; 448(7154): 645–646. 10.1038/448645a [DOI] [PubMed] [Google Scholar]
  • 2. Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nature Reviews Drug Discovery. 2010; 9(3): 203–214. 10.1038/nrd3078 [DOI] [PubMed] [Google Scholar]
  • 3. Wang W, Yang S, Li J. Drug target predictions based on heterogeneous graph inference. Pac. Symp. Biocomput. 2013; 18: 53–64. [PMC free article] [PubMed] [Google Scholar]
  • 4. Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics. 2014; 30(20): 2923–2930. 10.1093/bioinformatics/btu403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Martinez V, Navarro C, Cano C, Fajardo W, Blanco A. DrugNet: network-based drug-disease prioritization by integrating heterogeneous data. Artificial Intelligence in Medicine. 2015; 63(1): 41–49. 10.1016/j.artmed.2014.11.003 [DOI] [PubMed] [Google Scholar]
  • 6. Luo H, Wang J, Li M, Luo J, Peng X, Wu FX, et al. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics. 2016; 32(17): 2664–2671. 10.1093/bioinformatics/btw228 [DOI] [PubMed] [Google Scholar]
  • 7. Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLOS Computational Biology. 2010; 6(1): e1000641 10.1371/journal.pcbi.1000641 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Nepusz T, Yu H, Paccanaro A. Detecting overlapping protein complexes in protein-protein interaction networks. Nature Methods. 2012; 9(5): 471–472. 10.1038/nmeth.1938 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Gottlieb A, Stein GY, Ruppin E, Sharan R. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular systems biology. 2011; 7(1): 496 10.1038/msb.2011.26 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Wang Y, Chen S, Deng N, Wang Y. Drug repositioning by kernel-based integration of molecular structure, molecular activity, and phenotype data. PLOS One. 2013; 8(11): e78518 10.1371/journal.pone.0078518 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Dai W, Liu X, Gao Y, Chen L, Song J, Chen D, et al. Matrix factorization-based prediction of novel drug indications by integrating genomic space. Computational and Mathematical Methods in Medicine. 2015; 2015: 275045 10.1155/2015/275045 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics. 2018; 34(11):1904–1912. 10.1093/bioinformatics/bty013 [DOI] [PubMed] [Google Scholar]
  • 13. Cai JF, Candès EJ, Shen Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization. 2008; 20(4): 1956–1982. 10.1137/080738970 [DOI] [Google Scholar]
  • 14. Yang M, Luo H, Li Y, Wang J. Drug repositioning based on bounded nuclear norm regularization. Bioinformatics. 2019; 35(14): i455–i463. 10.1093/bioinformatics/btz331 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. In Nucleic Acids Reseach. 2006; pages D668–D672. 10.1093/nar/gkj067 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Ada H, Alan FS, Joanna A, Carol B, David V, Victor AM. Online Mendelian Inheritance in Man (OMIM: a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 2002; 30(1): 52–55. 10.1093/nar/30.1.52 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Weininger D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences. 1988; 28(1): 31–36. 10.1021/ci00057a005 [DOI] [Google Scholar]
  • 18. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E. The Chemistry Development Kit (CDK): An open-source java library for chemo- and bioinformatics. Cheminform. 2003; 34(21): 493–500. 10.1021/ci025584y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Tanimoto TT. An Elementary Mathematical Theory of Classification and Prediction In Book An Elementary Mathematical Theory of Classification and Prediction. 1958; International Business Machines Corporation, New York, USA. [Google Scholar]
  • 20. Van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA. A text-mining analysis of the human phenome. European Journal of Human Genetics. 2006; 14(5): 535–542. 10.1038/sj.ejhg.5201585 [DOI] [PubMed] [Google Scholar]
  • 21. Davis AP, Murphy CG, Johnson R, Lay JM, Lennonhopkins K, Saracenirichards C, et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Research. 2013; 41(Database issue): D1104–D1114. 10.1093/nar/gks994 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Research. 2015; 43(Database issue): D204–D212. 10.1093/nar/gku989 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Yang J, Yuan X. Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization. Mathematics of Computation. 2012; 82(281): 301–329. 10.1090/S0025-5718-2012-02598-1 [DOI] [Google Scholar]
  • 24. Ma S, Goldfarb D, Chen L. Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming. 2011; 128(1-2): 321–353. 10.1007/s10107-009-0306-5 [DOI] [Google Scholar]
  • 25.Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. In ICML 2006: Proceedings of the International Conference on Machine Learning, New York, NY, USA. 2006; pages 233–240. 10.1145/1143844.1143874 [DOI]
  • 26. Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Mao T. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Research. 2014; 42(Database issue): D199–D205. 10.1093/nar/gkt1076 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Cao DS, Xiao N, Xu QS, Chen AF. Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics. 2015; 31(2): 279 10.1093/bioinformatics/btu624 [DOI] [PubMed] [Google Scholar]
  • 28. Chen X, Ren B, Chen M, Wang Q, Zhang L, Yan G. NLLSS: Predicting Synergistic Drug Combinations Based on Semi-supervised Learning. PLOS Computational Biology. 2016; 12(7): e1004975 10.1371/journal.pcbi.1004975 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Chen X, Guan NN, Sun YZ, Li JQ, Qu J. MicroRNA-small molecule association identification: from experimental results to computational models. Briefings in Bioinformatics. 2018; 16: 1–15. 10.1093/bib/bby098 [DOI] [PubMed] [Google Scholar]
  • 30. Qu J, Chen X, Sun Y, Zhou L, Jian Q, Ming Z. Inferring potential small molecule-miRNA association based on triple layer heterogeneous network. Journal of Cheminformatics. 2018; 10(1): 30 10.1186/s13321-018-0284-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007541.r001

Decision Letter 0

Edwin Wang

4 Sep 2019

Dear Dr Wang,

Thank you very much for submitting your manuscript 'Overlap matrix completion for predicting drug-associated indications' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Edwin Wang

Benchmarking Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: This paper formulates computational drug repositioning problem based on matrix completion. An overlap matrix completion method is proposed to predict the potential drug-associated indications by integrating reliable drug-related and disease-related information in bilayer and tri-layer networks. Experimental studies on three different datasets show that the proposed approach can outperform several baseline methods, especially in tri-layer networks.

1. When I take a look at case studies, only three drugs are used as examples. In order to give more references for medical researchers, the predicted top-30 indications of each drug need to be listed in the supplementary file.

2. Could MBiRW algorithm be used for comparison in tri-layer networks? Please clarify it.

3. The first letter of 'hyperplastic' should be capitalized in Table 2.

4. There are some writing and grammatical errors in the manuscript. The author should examine and revise the article carefully.

5. The quality of pictures in the manuscript need to be improved.

6.Drug combination can be regarded as a new way of drug repositioning. Could you give some discussions whether your method could be used for synergistic drug combination prediction in the future (PMID: 27415801 should be discussed)? In addition, could you consider the possibility of developing computational models for drug repositioning based on drug related miRNA, a new kind of drug target? Some important studies should be discussed (PMIDs: 30325405 and 29943160).

7. Please analyze the novelty and limitations of the method in the section of Conclusions.

Reviewer #2: This manuscript proposed a new method, namely the overlap matrix completion (OMP) method, to find potential drug-associated relationship. Numerical results show that the new algorithm can achieve accurate results with high efficiency. This is a well-written and well-organized work to address an important issue in computational biology.

Major issue:

The computation to find the optimal values of alpha, beta and K may not be optimal. First the samples are not evenly distributed, for example, alpha \\in [0.1. 1 10, 100]. It may be possible the optimal value is actually around 2 or around 5. In addition, the work first fixed K=1 and found the optimal value of alpha=1, beta=10. Then based on alpha=1, beta=10, found the optimal value was K=10. The question is, if K=10, what is the optimal values of alpha and beta? Is there any possibility to develop an iterated scheme to search for the optimal values?

Minor issue

There are many curves in each figure, and 11 figures in this work. It may be better to put some figures in supplementary information and show the remaining figures in larger size.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007541.r003

Decision Letter 1

Edwin Wang

12 Nov 2019

Dear Dr Wang,

We are pleased to inform you that your manuscript 'Overlap matrix completion for predicting drug-associated indications' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process.

One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org).

Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology.

Sincerely,

Edwin Wang

Benchmarking Editor

PLOS Computational Biology

Nir Ben-Tal

Deputy Editor

PLOS Computational Biology

<br \\>

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: satisfactory revisions have been implemented

Reviewer #2: Authors have addressed the issues in the first report very well. The quality of this manuscript has improved much. There is no further comment.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

<br \\>

If you choose “no”, your identity will remain anonymous but your review may still be made public.

<br \\>

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS Comput Biol. doi: 10.1371/journal.pcbi.1007541.r004

Acceptance letter

Edwin Wang

11 Dec 2019

PCOMPBIOL-D-19-01173R1

Overlap matrix completion for predicting drug-associated indications

Dear Dr Wang,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Sarah Hammond

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The AUC values are indicated by the OMC2 algorithm when the neighborhood size K is chosen from {1, 5, 10, 15, 20, 25, 30} in the cross validation.

    (TIF)

    S2 Fig. Performance comparison of the OMC2, OMC-drug and OMC-disease in the 10-fold cross-validation in terms of AUC.

    The result of each fold is presented.

    (TIF)

    S3 Fig

    (a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank thresholds. The performance of all methods for predicting drug–disease associations in the 10-fold cross-validation on CDataset.

    (TIF)

    S4 Fig

    (a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank threshold. The performance of all methods in predicting potential diseases for new drugs on CDataset.

    (TIF)

    S5 Fig

    (a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank thresholds. The performance of all methods for predicting drug–disease associations in the 10-fold cross-validation on DNdataset.

    (TIF)

    S6 Fig

    (a) ROC curves of prediction results. (b) PR curves of predicting candidate diseases for drugs. (c) The number of correctly retrieved drug–disease associations for various rank threshold. The performance of all methods in predicting potential diseases for new drugs on DNdataset.

    (TIF)

    S1 Table. The AUC values under different values of α and β in the 10-fold cross-validation for the gold standard dataset.

    (DOC)

    S2 Table. The AUC values based on K = 10 for α and β in the 10-fold cross-validation for the gold standard dataset.

    (DOC)

    S3 Table. The top-30 candidate indications of all drugs listed by OMC3.

    (XLSX)

    Attachment

    Submitted filename: Response to reviewers.doc

    Data Availability Statement

    All relevant data can be obtained from the website https://github.com/BioinformaticsCSU/OMC.


    Articles from PLoS Computational Biology are provided here courtesy of PLOS

    RESOURCES