Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction

Zhen Gao; Yu-Tian Wang; Qing-Wen Wu; Jian-Cheng Ni; Chun-Hou Zheng

doi:10.1186/s12859-020-3409-x

. 2020 Feb 18;21:61. doi: 10.1186/s12859-020-3409-x

Graph regularized L_2,1-nonnegative matrix factorization for miRNA-disease association prediction

Zhen Gao ¹, Yu-Tian Wang ¹, Qing-Wen Wu ¹, Jian-Cheng Ni ^1,^✉, Chun-Hou Zheng ^1,^✉

PMCID: PMC7029547 PMID: 32070280

Abstract

Background

The aberrant expression of microRNAs is closely connected to the occurrence and development of a great deal of human diseases. To study human diseases, numerous effective computational models that are valuable and meaningful have been presented by researchers.

Results

Here, we present a computational framework based on graph Laplacian regularized L_{2, 1}-nonnegative matrix factorization (GRL_{2, 1}-NMF) for inferring possible human disease-connected miRNAs. First, manually validated disease-connected microRNAs were integrated, and microRNA functional similarity information along with two kinds of disease semantic similarities were calculated. Next, we measured Gaussian interaction profile (GIP) kernel similarities for both diseases and microRNAs. Then, we adopted a preprocessing step, namely, weighted K nearest known neighbours (WKNKN), to decrease the sparsity of the miRNA-disease association matrix network. Finally, the GRL_2,1-NMF framework was used to predict links between microRNAs and diseases.

Conclusions

The new method (GRL_{2, 1}-NMF) achieved AUC values of 0.9280 and 0.9276 in global leave-one-out cross validation (global LOOCV) and five-fold cross validation (5-CV), respectively, showing that GRL_{2, 1}-NMF can powerfully discover potential disease-related miRNAs, even if there is no known associated disease.

Keywords: miRNA; Disease; miRNA-disease associations; NMF L_{2, 1}-norm

Background

MicroRNAs (miRNAs), which play crucial roles in the regulation of gene expression after transcription in organisms and vegetation, are 17–24 nt noncoding endogenous RNAs [1–3]. In 1993, Lee et al. [4] identified the first microRNA (miRNA) called lin-4 in Caenorhabditis elegans. Thereafter, a large number of miRNAs have been identified from a wide variety of species, such as plants, animals, and viruses [5, 6]. MiRNAs are associated with key biological processes, including development, differentiation, programmed cell death and cell proliferation [7, 8]. Past studies have indicated that abnormal miRNA expression participates in the development process of a variety of human diseases [9–11]. However, inferring microRNA-disease connections through manual experiments is tremendously costly, laborious, prone to failure and time consuming. Thus, the development of computation-based methods to infer disease-connected microRNAs is urgently needed, as they could solve the above problems and greatly facilitate human disease diagnosis and treatment [12–15].

For the past few years, in order to explore the pathogenic mechanism of human disease at the small molecule level and design specific molecular instruments for diagnosis treatment and prevention, considerable efforts have been made to develop computational algorithms for inferring disease-associated microRNAs according to the assumptions that microRNAs have similar functions that are highly likely to be connected with similar diseases, and vice versa. Numerous similarity measurement-based approaches according to heterogeneous biological information have been proposed to identify the interactions between microRNAs and diseases. Jiang et al. [16] inferred disease-related miRNAs by prioritizing the whole human miRNAome connected with disease that we investigated based on miRNA functional similarity information as well as the human phenome-microRNAome network. Li et al. [17] proposed a computation-based model to infer the possible disease–related miRNAs via calculations of FCS between the disease-gene and the target-gene, which had verification. There is an assumption that if two various diseases have phenotypic connections, they have similar molecular machinery and similar molecular mechanisms. Xu et al. [18] inferred human disease-connected microRNAs by fusing experimentally verified human disease genes as well as context-dependent miRNA-target interactions to prioritize disease-connected microRNAs. In line with weighted k nearest neighbours, HDMP was proposed by Xuan et al. [19] for identifying potential miRNA-disease associations. They presented a measurement method including the details of the disease term along with phenotypic similarities among diseases for the purpose of measuring the miRNA functional similarities. In addition, considering the miRNAs of the same miRNA family or cluster and their relationship to a group of diseases, they were given a higher weight. However, HDMP is not appropriate for diseases that have sparse connections with miRNAs. Chen et al. [20] developed miRPD in which experimentally verified or predicted interactions between miRNAs and proteins as well as text-extracted connections between protein and disease associations were explicitly utilized to calculate the probability that a microRNA-disease association exists. Chen et al. [21] developed WBSMDA according to the calculation of the within-scores and between-scores of every miRNA-disease group to identify potential disease-related miRNAs. Take a miRNA as an example, there is a miRNA set A whose elements all have known connections with the investigated disease d. The propose of within-scores is finding a miRNA in set A that has the highest similarity score with the investigated miRNA. There is a miRNA set B whose elements all have unknown connections with the investigated disease d. The proposed between-scores involves finding a miRNA in set B that has the highest similarity score with the investigated miRNA. Chen et al. [22] developed HGIMDA through an iteration approach in line with a graph that consists of many different types of bioinformatics information, such as the functional similarities of microRNAs, semantic similarities of diseases, kernel similarity of Gaussian interaction profiles and experimental verification of microRNA-disease connections. Yu et al. [23] proposed an assembled identification approach to infer potential microRNA-disease associations by modifying the existing maximizing information flow methods based on integrated microRNA functional similarity information, disease semantic information and phenotypic similarity information; these potential associations along with manually validated microRNA-disease interactions were placed into a phenome-microRNAome network. Chen et al. [24] presented a novel framework called RKNNMDA that utilizes ranking and k nearest neighbours. They integrated the functional similarity of microRNAs, semantic similarity of diseases, kernel similarity of Gaussian interaction profiles and experimental verification of microRNA-disease association and obtained miRNA’s (disease’s) k nearest neighbours via the KNN model. Next, they implemented the SVM ranking model to re-rank the above k nearest neighbours and thus obtained the eventual rankings of all possible microRNA-disease associations. In addition, RKNNMDA could also predict possible microRNA-disease connections for human diseases that don’t have manually validated associated miRNAs. Chen et al. [25] introduced Jaccard similarity among microRNAs and diseases in the BLHARMDA model to identify potential miRNA-disease interactions and then introduced an improved KNN framework into the bipartite local model method. Chen et al. [26] defined all paths between a given miRNA and disease as prediction scores, based on the assumption that if there are more paths between the miRNA and disease, the two are more likely to be related.

In addition, a host of studies in accordance with random walk with restart have been proposed for identifying potential microRNA-disease connections and finally obtained good predictive behaviour. A random walk with restart was presented by Chen et al. [27], who also integrated the manually verified microRNA-disease association information and functional similarity information of miRNAs. Considering the functional links among microRNA targets and human disease genes in a protein association network, Shi et al. [28] devised a computational model to infer likely microRNA-disease connections. This method utilized global network distance measurement, random walk analysis, and the construction of a microRNA-disease network to investigate microRNA-disease connections from a global perspective. Xuan et al. [29] designed a novel framework named MIDP, which predicted disease-connected miRNAs for diseases with known associated microRNAs in line with random walks. They analysed the attributes of the labelled and unlabelled nodes of the miRNA network and then established transition matrices, whose transition weights between the nodes were proportionate to the similarity between them. Furthermore, they presented an extension method called MIDPE, especially for diseases that don’t have manually verified connected microRNAs. Liu et al. [30] proposed a method to identify possible disease-connected microRNAs by utilizing a random walk with restart in accordance with a heterogeneous graph, which was established by combining disease semantic similarities and disease functional similarities, as well as the miRNA similarities that were obtained utilizing microRNA-target gene and microRNA-long noncoding RNA connections. Luo and Xiao [13] first established a heterogeneous network containing microRNA and disease information and then adopted a bi-random walk model to identify possible microRNA-disease connections. Finally, all microRNA candidates of an investigated disease were ranked.

Furthermore, machine learning-based algorithms, such as support vector machines, have been applied to bioinformatics and computational biology and have improved the prediction performance to some extent [31]. Xu et al. [32] presented MTDN to infer potential microRNA-disease associations. They identified positive disease-related miRNAs from negative samples through the SVM classifier in accordance with the characteristics of microRNA target-dysregulated network topology information. Chen et al. [33] identified miRNA-disease links based on regularized least squares (RLS) for identifying the miRNA-disease links. RLSMDA integrates known disease-microRNA connections, a disease semantic similarities dataset, and a miRNA functional similarities network and is thus suitable for predicting novel miRNAs for diseases without any manually validated connections with microRNAs. Li et al. [34] utilized a matrix completion model in line with manually validated microRNA-disease connections to infer candidates for diseases that did not have any experimentally proven connected microRNAs. In addition, MCMDA does not need negative associations. Chen et al. [35] proposed a random forest-based framework (RFMDA) for microRNA-disease connection prediction. RFMDA identifies possible disease-associated microRNAs by employing the random forest model to identify robust attributes from the miRNA-disease attribute collection. Chen et al. [36] predicted disease-associated miRNAs based on heterogeneous label propagation (HLPMDA), in which heterogeneous data were integrated into a heterogeneous network. Chen et al. [37] inferred disease-associated miRNAs with restricted Boltzmann machine (RBM); this model can acquire both disease-connected miRNAs as well as the corresponding forms of their links. However, this method is not suitable for diseases that do not have any known miRNA-disease associations, and selecting the right parameter values remains a significant issue for RBMMMDA. Chen et al. [38] first integrated a heterogeneous network, then put it into a stacked autoencoder for the purpose of detecting the deep representation of the heterogeneous information, finally utilizing an SVM classifier to prioritize all the candidates. Chen et al. [39] first constructed a feature vector according to the statistics, graph theory and matrix decomposition of the bioinformatics data and then put this vector into EGBMMDA to obtain a regression tree. Chen et al. [40] extracted three kinds of features, namely, statistical features from similarity measurements, graph theoretical features from similarity networks, and matrix factorization results from miRNA-disease associations. Then, disease-related miRNAs were discovered based on a decision tree classifier. Chen et al. [41] predicted disease-connected miRNAs by adopting sparse subspace learning with Laplacian regularization and L₁-norm. Interestingly, they extracted features and constructed objective functions from miRNA and disease perspectives, separately. Chen et al. [42] used a decision tree as a weak classifier and then integrated these weak classifiers into a strong classifier according to weights. It is worth noting that they implemented k-means to balance positive samples and negative samples.

Moreover, many researchers have made promising models with recommendation systems for microRNA-disease connection prediction purposes. Zou et al. [43] proposed two approaches, namely, KATZ and CATAPULT, for identifying miRNA-disease links. In line with the manually verified microRNA-disease link network, microRNA similarities network and disease similarities network, KATZ integrates the social network analysis approach with machine learning. Chen et al. [44] inferred disease-related miRNAs based on ensemble learning and link prediction (ELLPMDA). According to global similarity measures, ELLPMDA uses ensemble learning for integrating ranking results, which were obtained via three typical similarity-measurement approaches. Chen et al. [45] constructed a heterogeneous network and predicted disease-connected miRNAs in line with the rating-integrated bipartite network recommendation as well as experimentally verified miRNA–disease connections.

In addition, a fair number of studies based on matrix factorization have been presented for possible disease-connected microRNA prediction purposes. Zhao et al. [46] presented symmetric nonnegative matrix factorization (SNMFMDA) to infer disease-connected microRNAs with the NMF and Kronecker regularized least square (KronRLS) approaches. Zhong et al. [47] proposed a nonnegative matrix factorization (NMF)-based algorithm to predict disease-related microRNA candidates based on a bilayer network that was constructed with regard to the intricate links among microRNAs, among human diseases and between microRNAs and human diseases. Xiao et al. [48] introduced graph Laplacian regularized into NMF (GRNMF) based on heterogeneous data for inferring potential disease-connected microRNAs, particularly for many diseases without known associations. They introduced a pre-processing step, weighted k nearest neighbour (WKNKN) profiles, for both microRNAs and diseases, into GRNMF. Chen et al. [49] designed an effective algorithm, MDHGI, according to matrix decomposition as well as a heterogeneous graph inference method for inferring potential miRNA-disease connections.

However, these approaches based on matrix factorization ignored the sparsity of the miRNA-disease association matrix Y, so we utilized a pre-processing step named weighted K nearest known neighbours (WKNKN) [50] to convert the value of the miRNA-disease associations matrix Y into a decimal between 0 and 1. In addition, unlike the traditional nonnegative matrix factorization (NMF) methods, we added L_{2, 1}-norm as well as GIP (Gaussian interaction profile) kernels into the NMF model. The L_{2, 1}-norm was added to increase the disease matrix sparsity and eliminate unattached disease pairs [51–53]. Moreover, Tikhonov regularization was added to penalize the non-smoothness of W and H [48, 54, 55], and the graph regularization was primarily intended to assure local-based representation by leveraging the geometry of the data [56].

In this study, we present a computational algorithm based on graph regularized L_{2, 1}-nonnegative matrix factorization (GRL_{2, 1}-NMF) to infer the possible connections between microRNAs and diseases in heterogeneous omics data. First, we integrated manually validated microRNA-disease connection information, miRNA functional similarity information and two kinds of disease semantic similarity information, and then we calculated the GIP kernel similarities for the diseases and miRNAs. Then, we utilized WKNKN to decrease the sparsity of matrix Y. Furthermore, we added Tikhonov (L₂), graph Laplacian regularization terms and the L_{2, 1}-norm to the standard NMF model for predicting disease-associated miRNAs. Finally, five-fold cross validation and global leave-one-out cross validation were implemented to evaluate the effectiveness of our model, and we obtained AUCs of 0.9276 and 0.9280, respectively. Furthermore, we performed case studies on three high-risk human diseases (prostate neoplasms, lung neoplasms and breast neoplasms). As a result, 48, 45 and 45 out of the top 50 likely connected miRNAs of prostate neoplasms, lung neoplasms and breast neoplasms, respectively, were confirmed by HMDD [10] and dbDEMC [57]. Based on the experimental results, we can clearly see that GRL_{2, 1}-NMF is a valuable approach for inferring possible miRNA-disease connections.

Results

**Effect of parameters on the performance of*GRL***_{2, 1}-NMF

In this work, we measured two disease semantic similarities, miRNA functional similarity and GIP similarities for miRNAs and diseases. These two disease semantic similarities were integrated as Eq. (1), and the final disease similarity and miRNA similarity were measured as Eq. (2) and Eq. (3), respectively. We defined six parameters, namely, α₁, α₂, γ₁, γ₂, θ₁ and θ₂, to balance the items in Eq. (1), Eq. (2) and Eq. (3). The values of α₁ and α₂ ranged from 0.1, 0.2, 0.3, ... to 0.9. γ₁, γ₂, θ₁ and θ₂ ranged from 0,0.1,0.2, ... 0.9, to 1. We conducted a series of experiments on the above parameters to acquire the effects of these parameters. The experimental results are shown in Table 1 and Table 2.

SD 1 (d_{i}, d_{j}) = α_{1} S_{1}^{d} (d_{i}, d_{j}) + α_{2} S_{2}^{d} (d_{i}, d_{j})

SD (d_{i}, d_{j}) = \{\begin{matrix} γ_{1} SD 1 (d_{i}, d_{j}) + γ_{2} GD (d_{i}, d_{j}) & d_{i} and d_{j} have semantic similarity \\ GD (d_{i}, d_{j}) & otherwise \end{matrix}

SM (m_{i}, m_{j}) = \{\begin{matrix} θ_{1} S^{m} (m_{i}, m_{j}) + θ_{2} GM (m_{i}, m_{j}) & m_{i} and m_{j} have functional similarity \\ GM (m_{i}, m_{j}) & otherwise \end{matrix}

Table 1.

The effects of parameters α₁ and α₂ on the results of GRL_{2, 1}-NMF γ₁ = 1,γ₂ = 0, θ₁ = 1,and θ₂ = 0

	α₁	α₂	AUCs of 5-CV
SD12_1	0.1	0.9	0.9276
SD12_2	0.2	0.8	0.9276
SD12_3	0.3	0.7	0.9276
SD12_4	0.4	0.6	0.9276
SD12_5	0.5	0.5	0.9276
SD12_6	0.6	0.4	0.9276
SD12_7	0.7	0.3	0.9276
SD12_8	0.8	0.2	0.9276
SD12_9	0.9	0.1	0.9276

Open in a new tab

Table 2.

The effects of parameters θ₁,θ₂, γ₁, and γ₂ on the results of GRL_{2, 1}-NMF (a) α₁ = 0.5, α₂ = 0.5, γ₁ = 1, and γ₂ = 0 (b) α₁ = 0.5,α₂ = 0.5, θ₁ = 1,and θ₂ = 0

	θ₁	θ₂	AUCs of 5-CV		γ₁	γ₂	AUC of 5-CV
SMGM_1	0.1	0.9	0.9263	SDGD_1	0.1	0.9	0.9276
SMGM_2	0.2	0.8	0.9264	SDGD_2	0.2	0.8	0.9276
SMGM_3	0.3	0.7	0.9267	SDGD_3	0.3	0.7	0.9276
SMGM_4	0.4	0.6	0.9268	SDGD_4	0.4	0.6	0.9276
SMGM_5	0.5	0.5	0.9270	SDGD_5	0.5	0.5	0.9276
SMGM_6	0.6	0.4	0.9270	SDGD_6	0.6	0.4	0.9276
SMGM_7	0.7	0.3	0.9271	SDGD_7	0.7	0.3	0.9276
SMGM_8	0.8	0.2	0.9272	SDGD_8	0.8	0.2	0.9276
SMGM_9	0.9	0.1	0.9272	SDGD_9	0.9	0.1	0.9276
SMGM_10	1	0	0.9276	SDGD_10	1	0	0.9276

Open in a new tab

In Table 1, we can see that regardless of how α₁ and α₂ change, the AUC of 5-CV remains 0.9276. Thus, for convenience, we set α₁ = α₂ = 0.5. The experimental results of parameters θ₁ and θ₂ that balanced miRNA functional similarity (S^m) and GIP similarity for miRNAs (GM) are shown in Table 2 (a), and the results of parameters γ₁ and γ₂ that balanced disease semantic similarity (SD1) and GIP similarity for diseases (GD) are shown in Table 2 (b). Thus, we set θ₁ = 1, θ₂ = 0, γ₁ = 1, and γ₂ = 0.

Performance evaluation

To evaluate our model’s ability to predict disease-related miRNAs, we compared it with three state-of-art methods (ICFMDA [58], SACMDA [59] and IMCMDA [60]) by implementing two validation frameworks: global leave-one-out cross validation (global LOOCV) and five-fold cross validation (5-CV) according to the experimentally validated disease-related miRNAs in HMDD v2.0, which gathered plenty of the known miRNA-disease associations [10].

For the global LOOCV, every known miRNA-disease connection was selected in turn for testing, and others that had also been experimentally verified were considered as training sets for the purpose of model training. In addition, all miRNA-disease associations without evidence were regarded as candidate samples. Next, we calculated the prediction score of all associations by implementing GRL_{2, 1}-NMF and thus obtained the ranking of each test sample compared with that of the candidate samples. We hold our model as efficient if the ranking of each test sample was higher than a certain threshold. We obtained the corresponding true positive rate (TPR, sensitivity) and false positive rate (FPR, 1-specificity) by setting various thresholds. Sensitivity is the proportion of the testing samples whose ranking was higher than the threshold, while 1-specificity calculates the percentage of the testing samples whose ranking was lower than the threshold. Thus, the receiver operating characteristic (ROC) curve can be plotted in line with TPRs and FPRs obtained by different thresholds. Finally, to evaluate the performance and compare it with that of the other models, the areas under the ROC curve (AUCs) were computed. The AUC value is between 0 and 1, and a model whose AUC value is higher has a better performance. The results showed that GRL_{2, 1}-NMF, ICFMDA, SACMDA and IMCMDA achieved AUC values of 0.9280, 0.9072, 0.8777 and 0.8384, respectively (see Fig. 1). Clearly, GRL_{2, 1}-NMF obtained the best performance among the four explored methods.

Fig. 1 — AUC of global LOOCV compared with those of IMC, ICFMDA and SACMDA

For 5-CV, all known connections between microRNAs and diseases were randomly distributed into five parts, where one part was selected in turn for testing, and the other four parts were used in turn for training. Moreover, all unknown samples were treated as candidate samples. Like the global LOOCV, we finally calculated the ranking of the test sample relative to the candidate set. Considering the possible bias caused by random sample partitioning for performance evaluation, we repeatedly divided the known miRNA disease associations 100 times and obtained the corresponding ROC curves and AUCs in a similar manner to that for LOOCV. The results showed that GRL_{2, 1}-NMF had the best predictive performance with an average AUC of 0.9276, and ICFMDA, SACMDA and IMCMDA achieved AUC values of 0.9046, 0.8773 and 0.8330, respectively (see Fig. 2).

Fig. 2 — AUC of 5-fold cross validation compared with those of IMC, ICFMDA and SACMDA

Case studies

We constructed a simulation experiment to further demonstrate the effectiveness of GRL_{2, 1}-NMF for inferring likely disease-connected miRNAs. Here, all manually validated miRNA-disease connections were utilized for prediction, and other associations that did not have evidence were regarded as candidate connections for validation. For every disease, the candidate miRNAs were ranked based on the prediction scores. We used two miRNA-disease databases, namely, HMDD [10] and dbDEMC [57], to verify the inferred possible microRNAs for the investigated disease, including prostate neoplasms, breast neoplasms and lung neoplasms. Finally, the top 50 disease-related miRNAs predicted via GRL_{2, 1}-NMF are demonstrated in Table 3, Table 4 and Table 5. There are 48,45 and 45 of 50 inferred miRNAs confirmed to have associations with prostate neoplasms, breast neoplasms and lung neoplasms, respectively, by the dbDEMC database and HMDD v3.0 database.

Table 3.

The top 50 potential miRNAs associated with Prostate Neoplasms

miRNA	Evidence	miRNA	Evidence
hsa-mir-1	HMDD; dbDEMC	hsa-mir-32	HMDD; dbDEMC
hsa-mir-21	HMDD; dbDEMC	hsa-let-7i	dbDEMC
hsa-mir-22	HMDD; dbDEMC	hsa-mir-375	HMDD; dbDEMC
hsa-mir-155	HMDD; dbDEMC	hsa-let-7c	HMDD; dbDEMC
hsa-mir-9	HMDD	hsa-mir-200c	HMDD; dbDEMC
hsa-mir-221	HMDD; dbDEMC	hsa-mir-214	HMDD; dbDEMC
hsa-let-7a	dbDEMC	hsa-mir-182	HMDD; dbDEMC
hsa-mir-133a	HMDD; dbDEMC	hsa-mir-106b	HMDD; dbDEMC
hsa-mir-146a	HMDD	hsa-mir-23a	HMDD; dbDEMC
hsa-mir-222	HMDD; dbDEMC	hsa-mir-17	HMDD; dbDEMC
hsa-mir-34a	HMDD; dbDEMC	hsa-let-7e	dbDEMC
hsa-mir-29a	HMDD; dbDEMC	hsa-mir-181	unconfirmed
hsa-mir-142	unconfirmed	hsa-mir-200b	HMDD; dbDEMC
hsa-mir-223	HMDD; dbDEMC	hsa-mir-10b	dbDEMC
hsa-mir-126	HMDD; dbDEMC	hsa-mir-200a	HMDD; dbDEMC
hsa-mir-31	HMDD; dbDEMC	hsa-mir-34c	HMDD
hsa-mir-146b	HMDD; dbDEMC	hsa-mir-205	HMDD; dbDEMC
hsa-mir-29b	HMDD; dbDEMC	hsa-let-7d	HMDD; dbDEMC
hsa-mir-200	HMDD	hsa-mir-210	HMDD; dbDEMC
hsa-mir-143	HMDD; dbDEMC	hsa-mir-192	HMDD; dbDEMC
hsa-mir-16	HMDD; dbDEMC	hsa-mir-196a	HMDD; dbDEMC
hsa-mir-20a	HMDD; dbDEMC	hsa-mir-195	HMDD; dbDEMC
hsa-mir-30a	HMDD	hsa-let-7f	dbDEMC
hsa-let-7b	HMDD; dbDEMC	hsa-mir-181b	HMDD; dbDEMC
hsa-mir-199a	HMDD; dbDEMC	hsa-mir-34b	HMDD

Open in a new tab

Table 4.

The top 50 potential miRNAs associated with Lung Neoplasms

miRNA	Evidence	miRNA	Evidence
hsa-mir-1	HMDD	hsa-mir-139	HMDD; dbDEMC
hsa-mir-181	unconfirmed	hsa-mir-193b	dbDEMC
hsa-mir-200	HMDD	hsa-mir-204	dbDEMC
hsa-mir-26	HMDD	hsa-mir-708	dbDEMC
hsa-mir-195	dbDEMC	hsa-mir-378a	unconfirmed
hsa-mir-92	dbDEMC	hsa-mir-625	dbDEMC
hsa-mir-141	dbDEMC	hsa-mir-367	dbDEMC
hsa-mir-122	HMDD; dbDEMC	hsa-mir-149	HMDD; dbDEMC
hsa-mir-16	HMDD; dbDEMC	hsa-mir-148b	HMDD; dbDEMC
hsa-mir-99a	HMDD; dbDEMC	hsa-mir-328	HMDD; dbDEMC
hsa-mir-129	HMDD; dbDEMC	hsa-mir-302b	dbDEMC
hsa-mir-429	dbDEMC	hsa-mir-302a	dbDEMC
hsa-mir-130a	HMDD; dbDEMC	hsa-mir-373	HMDD; dbDEMC
hsa-mir-451	HMDD; dbDEMC	hsa-mir-92b	dbDEMC
hsa-mir-451a	HMDD; dbDEMC	hsa-mir-23b	dbDEMC
hsa-mir-15b	dbDEMC	hsa-mir-152	HMDD; dbDEMC
hsa-mir-151	unconfirmed	hsa-mir-196b	HMDD; dbDEMC
hsa-mir-15a	HMDD; dbDEMC	hsa-mir-302c	dbDEMC
hsa-mir-151a	unconfirmed	hsa-mir-452	dbDEMC
hsa-mir-296	unconfirmed	hsa-mir-215	HMDD; dbDEMC
hsa-mir-320a	dbDEMC	hsa-mir-302d	dbDEMC
hsa-mir-20b	dbDEMC	hsa-mir-28	dbDEMC
hsa-mir-342	HMDD; dbDEMC	hsa-mir-520a	dbDEMC
hsa-mir-194	HMDD; dbDEMC	hsa-mir-130b	HMDD; dbDEMC
hsa-mir-106b	dbDEMC	hsa-mir-372	HMDD; dbDEMC

Open in a new tab

Table 5.

The top 50 potential miRNAs associated with Breast Neoplasms

miRNA	Evidence	miRNA	Evidence
hsa-mir-1	HMDD; dbDEMC	hsa-mir-330	dbDEMC
hsa-mir-32	HMDD; dbDEMC	hsa-mir-192	HMDD; dbDEMC
hsa-mir-106a	HMDD; dbDEMC	hsa-mir-28	dbDEMC
hsa-mir-26	unconfirmed	hsa-mir-130b	HMDD; dbDEMC
hsa-mir-99a	HMDD; dbDEMC	hsa-mir-211	dbDEMC
hsa-mir-151	HMDD; dbDEMC	hsa-mir-181c	HMDD; dbDEMC
hsa-mir-451	HMDD; dbDEMC	hsa-mir-449a	HMDD; dbDEMC
hsa-mir-92	HMDD; dbDEMC	hsa-mir-449b	dbDEMC
hsa-mir-130a	HMDD; dbDEMC	hsa-mir-99b	dbDEMC
hsa-mir-15b	HMDD; dbDEMC	hsa-mir-208a	HMDD; dbDEMC
hsa-mir-150	HMDD; dbDEMC	hsa-mir-650	dbDEMC
hsa-mir-185	HMDD; dbDEMC	hsa-mir-491	HMDD
hsa-mir-142	HMDD	hsa-mir-532	unconfirmed
hsa-mir-378a	HMDD	hsa-mir-144	HMDD; dbDEMC
hsa-mir-186	dbDEMC	hsa-mir-181d	dbDEMC
hsa-mir-95	dbDEMC	hsa-mir-494	HMDD; dbDEMC
hsa-mir-92b	HMDD; dbDEMC	hsa-mir-362	unconfirmed
hsa-mir-196b	HMDD; dbDEMC	hsa-mir-517a	dbDEMC
hsa-mir-98	HMDD; dbDEMC	hsa-mir-371	dbDEMC
hsa-mir-372	dbDEMC	hsa-mir-371a	unconfirmed
hsa-mir-574	HMDD	hsa-mir-381	HMDD; dbDEMC
hsa-mir-542	unconfirmed	hsa-mir-216a	dbDEMC
hsa-mir-370	HMDD; dbDEMC	hsa-mir-433	dbDEMC
hsa-mir-212	HMDD; dbDEMC	hsa-mir-134	HMDD; dbDEMC
hsa-mir-30e	HMDD	hsa-mir-376a	HMDD; dbDEMC

Open in a new tab

Discussion

Our method, GRL_{2, 1}-NMF, is an efficient tool for predicting miRNA-disease associations according to the experimental results. The main contributions of this study are listed. First, we added GIP kernel similarities for miRNA and disease associations into the similarity measurement, which improved the dataset reliability. Second, considering the sparsity of observed miRNA-disease associations, we performed a pre-processing step (WKNKN) to solve this problem, thus enhancing the prediction performance of our model. Third, as a common model of recommendation systems, NMF also plays a crucial role in bioinformatics. However, standard NMF did not achieve satisfactory performance. Therefore, we added the Tikhonov (L₂), graph Laplacian regularization terms and the L_{2, 1}-norm into the standard NMF, which makes this model more reliable and robust. Finally, the AUCs of GRL_{2, 1}-NMF are higher than those of some excellent models.

Note that DNSGRMF [53], which also predicts miRNA-disease connections, is a graph regularized method similar to GRL_{2, 1}-NMF. Both methods decompose the original matrix Y into two matrices W and H, and then we can acquire a recovery matrix Y^∗ = W ∗ H. It is worth noting that GRL_{2, 1}-NMF is based on nonnegative factorization, while DNSGRMF is based on graph regularized matrix factorization. DNSGRMF has no constraints, while GRL_{2, 1}-NMF has two constraints of W ≥ 0 and H ≥ 0.

Nevertheless, our model still has room for improvement. First, miRNA information and disease information did not integrate perfectly, and we will improve this in future studies. Second, there may be more appropriate regularization terms that can improve the performance for miRNA-disease association prediction.

Conclusions

It is meaningful and significant to predict disease-related miRNAs in studying the intrinsic aetiological factors of human diseases. A new model named GRL_{2, 1}-NMF was developed in this work for potential miRNA-disease association prediction. First, we integrated experimentally validated connections between miRNAs and disease as well as miRNA functional similarities along with two kinds of disease semantic similarities, and then we calculated the GIP kernel similarities of microRNAs and diseases. Moreover, we used WKNKN to convert the value of matrix Y into a decimal between 0 and 1 and decrease the sparsity of matrix Y. Furthermore, the Tikhonov (L₂), graph Laplacian regularization terms and the L_{2, 1}-norm were added into the traditional NMF model for predicting miRNA-disease connections. In addition, the Tikhonov regularization was utilized to penalize the non-smoothness of W and H, and the graph Laplacian regularization was primarily intended to guarantee local-based representation by leveraging the geometric structure of the data. The L_{2, 1}-norm was added to increase the disease matrix sparsity and eliminate unattached disease pairs.

Our method performs well in global LOOCV, 5-CV and case studies in heterogeneous omics data. The experimental results indicate that GRL_{2, 1}-NMF can effectively and powerfully infer disease-related miRNAs, even if there are no known miRNA-disease associations. However, this method still has limitations that need further research. First, our similarity measurement for GRL_{2, 1}-NMF might not be perfect, and other miRNA information still needs to be taken into account. Moreover, there is still room for improvement in the predictive performance of our method.

Methods

Human miRNA-disease associations

We collected information on all experimentally validated human miRNA-disease associations stored in the HMDD v2.0 database [10]. An adjacency matrix Y ∈ R^n × m was established to represent the manually verified human miRNA-disease associations, and the rows and columns of matrix Y represent miRNA m_i interactions and diseases d_j interactions, respectively. Therefore, in this study, the number of rows and columns in Y was 495 and 383, respectively. If a miRNA m_i has a known connection with a disease d_j, Y_ij = 1, else Y_ij = 0.

MiRNA functional similarity

There is a hypothesis that if two miRNAs are similar functionally, they are more likely to have connections with diseases that have high similarity, and vice versa [61, 62]. Wang et al. [63] shared their investigation results, and researchers can download miRNA functional similarity information at http://www.cuilab.cn/files/images/cuilab/misim.zip. Here, we established a matrix S^m that was denoted as the microRNA functional similarities. The item S^m(m_i, m_j) denotes the functional similarities among microRNAs m_i and m_j.

Disease semantic similarity method 1

In this study, we take full advantage of the hierarchical directed acyclic graphs (DAGs) for disease similarity measurement based on the strategy of Wang et al. [63], and the disease DAG could be downloaded from the Medical Subject Headings (MeSH) database. DAG_d = (d, T_d, E_d) denotes the hierarchical DAG of disease d, where T_d denotes the disease collection, and E_d denotes links set in the DAG. According to the DAGs, the semantic values of disease D can be computed as Eq. (4).

DV 1 (D) = \sum_{d \in T (D)} D 1_{D} (d)

where D1_D(d) denotes the semantic contributions of disease d’ to disease d, and ∆ denotes the semantic contribution factor (∆ = 0.5) [63].

\{\begin{matrix} D 1_{D} (d) = 1 if d = D \\ D 1_{D} (d) = max \{∆ * D 1_{D} (d^{'})| d^{'} \in child of d\} if d \neq D \end{matrix}

Therefore, two diseases would likely have greater similarities if they share a larger part of their DAGs, and we can calculate semantic similarities between disease d_i and d_j as follows:

S_{1}^{d} (d_{i}, d_{j}) = \frac{\sum_{t \in T (d_{i}) \cap T (d_{j})} (D 1_{d_{i}} (t) + D 1_{d_{j}} (t))}{DV 1 (d_{i}) + DV 1 (d_{j})}

Disease semantic similarity method 2

In the strategy for calculating disease semantic similarities above, diseases that shared one layer of DAG_d shared a common contribution value. However, if some diseases merely exist in fewer DAGs, then these diseases are called more specific diseases and should have a higher semantic contribution to disease d. In view of the algorithm presented by [19, 45], we can calculate the semantic contributions of disease d to disease D and the semantic values of disease D as Eq. (7) and Eq. (8), respectively.

D 2_{D} (d) = - log (\frac{the number of DAGs including d}{the number of diseases})

DV 2 (D) = \sum_{d \in T (D)} D 2_{D} (d)

where d denotes any investigated disease. Finally, we could calculate the semantic similarities of diseases d_i and d_j as Eq. (9).

S_{2}^{d} (d_{i}, d_{j}) = \frac{\sum_{t \in T (d_{i}) \cap T (d_{j})} (D 2_{d_{i}} (t) + D 2_{d_{j}} (t))}{DV 2 (d_{i}) + DV 2 (d_{j})}

where the numerator of Equation (9) represents the common ancestor nodes of diseases d_i and d_j, and the denominator denotes the entire ancestor nodes of diseases d_i and d_j.

Gaussian interaction profile kernel similarity for diseases and MiRNAs

If two diseases are similar, they are likely to have associations with microRNAs that are functionally approximate, and vice versa [61–64]. Gaussian interaction profile (GIP) kernel similarities have been adopted to quantify disease similarities and miRNA similarities [60, 65, 66]. We also calculated GIP kernel similarities for diseases and miRNAs in this work. First, based on whether disease d_i(m_j) has a known connection with each miRNA (disease) of the adjacency matrix Y, the interaction profiles IP(d_i) and IP(m_j) were constructed for disease d_i and miRNA m_j, respectively. Then, the GIP kernel similarity between a disease pair and a miRNA pair is computed as Equation (10) and Equation (11), respectively.

GD (d_{i}, d_{j}) = exp (- β_{d} {‖IP (d_{i}) - IP (d_{j})‖}^{2})

GM (m_{i}, m_{j}) = exp (- β_{m} {‖IP (m_{i}) - IP (m_{j})‖}^{2})

Here, the kernel bandwidths β_m and β_d are described as Equation (12) and Equation (13), respectively, where $β_{m}^{'}$ and $β_{m}^{'}$ are both the original bandwidths.

β_{m} = β_{m}^{'} / \frac{1}{nm} \sum_{i = 1}^{nm} {‖IP (m_{i})‖}^{2}

β_{d} = β_{d}^{'} / \frac{1}{nd} \sum_{i = 1}^{nd} {‖IP (d_{i})‖}^{2}

In summary, the matrix GD and GM denote the GIP kernel similarity for diseases and miRNAs, respectively.

Integrated similarity for diseases and MiRNAs

According to the various similarity measurement methods mentioned above, we combined the GIP kernel similarities with two disease semantic similarities as well as the miRNA functional similarities to obtain integrated disease similarities and integrated miRNA similarities, respectively. The weight setting problem of the above similarities is described in detail in the Results section, and we chose the following measurement strategy according to the experimental results. Specifically, if two miRNAs m_i and m_j had functional similarities, then the final similarity was the functional similarity. If two miRNAs m_i and m_j did not have functional similarities, then the final similarity was the GIP kernel similarity. Hence, the miRNA similarities score matrix SM between miRNA m_i and miRNA m_j is established as follows. Similarly, the disease similarity matrix SD is computed as follows:

SM (m_{i}, m_{j}) = \{\begin{matrix} S^{m} (m_{i}, m_{j}) & m_{i} and m_{j} have functional similarity \\ GM (m_{i}, m_{j}) & otherwise \end{matrix}

SD (d_{i}, d_{j}) = \{\begin{matrix} \frac{S_{1}^{d} (d_{i}, d_{j}) + S_{2}^{d} (d_{i}, d_{j})}{2} & d_{i} and d_{j} have semantic similarity \\ GD (d_{i}, d_{j}) & otherwise \end{matrix}

Weighted K nearest known Neighbours (WKNKN) for MiRNAs and diseases

Let M = {m₁, m₂, …, m_n} and D = {d₁, d₂, …, d_m} represent the collection of n microRNAs and m diseases, respectively. We described the quantity of the investigated miRNAs and diseases as n and m, respectively, and then established an association matrix Y ∈ R^n × m to denote the known human microRNA-disease connections according to the HMDD v2.0 [10] database. If a miRNA m_i had been manually validated to be related to a disease d_j, then Y_ij is equal to 1; otherwise, it is equal to 0. Y(m_i) = {Y_i1, Y_i2, …, Y_im}, namely, the ith row vector of matrix Y, represents the interaction profile for miRNA m_i. Similarly, Y(d_j) = {Y_1j, Y_2j, …, Y_nj}, the jth column vector of matrix Y, represents the interaction profile for disease d_j. In this study, we investigated 495(n) miRNAs and 383(m) diseases, yet the adjacency matrix Y ∈ R^n × m has merely 5430 known entries; thus, Y is a sparse matrix. Here, we performed a pre-processing procedure named weighted K nearest known neighbours (WKNKN) [50] for miRNAs and diseases without any known associations to resolve the abovementioned sparse problem and thus improve the prediction accuracy. After executing WKNKN, the entry Y_ij was replaced with a continuous value ranging from 0 to 1, and the specific steps are as follows.

First, we acquired the interaction profile of each miRNA m_q according to the functional similarity between m_q and its K nearest known miRNAs as follows:

Y_{m} (m_{q}) = \frac{1}{Q_{m}} \sum_{i = 1}^{K} w_{i} Y (m_{i})

where m₁ to m_K are the miRNAs sorted in descending order based on their similarities to m_q; w_i is the weight factor, and w_i = α^i − 1 ∗ S^m(m_i, m_q); in other words, the higher the similarity between m_i and m_q is, the higher the weight. α ∈ [0, 1] is a decay term, and Q_m = ∑_{1 ≤ i ≤ K}S^m(m_i, m_q) is the normalization coefficient.

Second, we acquired the interaction profile of each miRNA d_p according to the semantic similarity between d_p and its K nearest known diseases as follows:

Y_{d} (d_{p}) = \frac{1}{Q_{d}} \sum_{j = 1}^{K} w_{j} Y (d_{j})

where d₁ to d_K are the diseases sorted in descending order based on their similarities to d_p; w_j is the weight factor, and w_j = α^j − 1 ∗ S^d(d_j, d_p); in other words, the higher the similarity between d_j and d_p is, the higher weight. Q_d = ∑_{1 ≤ j ≤ K}S^d(d_j, d_p) is the normalization term.

Finally, we took the average of the above two values instead of Y_ij = 0, indicating the overall likelihood of the interaction between m_i and d_j. Then, we integrated the above two matrices Y_m and Y_d acquired from different datasets, replaced Y_ij = 0 with the related likelihood scores, and then updated the original adjacency matrix Y as follows:

Y_{md} = a_{1} Y_{m} + a_{2} Y_{d} / \sum a_{i} (i = 1, 2)

Y = max (Y, Y_{md})

where a_i is the weight coefficient and a₁ = a₂ = 1.

Standard NMF

In recent years, as one of the common methods of recommendation systems, nonnegative matrix factorization (NMF) has been widely used as an effective prediction algorithm in the field of bioinformatics [67, 68]. Two non-negative matrices W and H, which are optimal approximations to the original matrix Y, can be found by NMF, where W and H satisfy Equation (20).

Y \approx W H^{T}, s . t . W \geq 0, H \geq 0

In this work, matrix Y ∈ R^n × m was used to represent the known miRNA-disease associations, and NMF can decompose this matrix into two matrices, namely, W ∈ R^n × k and H ∈ R^m × k. Here, we express the question of the miRNA-disease association identification problem as the objective function (Equation (21)).

{min}_{W, H} {‖Y - W H^{T}‖}_{F}^{2} s . t . W \geq 0, H \geq 0

where ‖∙‖_F represents the Frobenius norm of a matrix. Equation (21) can be optimized by taking advantage of the iterative update algorithm presented by [69].

However, standard NMF does not ensure the sparsity of decomposition; therefore, local-based representations are not always generated [70, 71]. Some researchers have developed sparse constraints on NMF [46–48].

GRL_{2, 1}-NMF

Here, a new nonnegative matrix factorization method was presented to identify underlying miRNA-disease connections. The flow chart of GRL_{2, 1}-NMF is shown in Fig. 3. We incorporated Tikhonov (L₂), graph Laplacian regularization terms and the L_{2, 1}-norm into the traditional NMF model for predicting miRNA-disease connections. The Tikhonov regularization is utilized to penalize the non-smoothness of W and H [48, 54, 55], and the graph Laplacian regularization is primarily intended to ensure local-based representation by leveraging the geometric structure of the data [56]. The L_{2, 1}-norm was added to increase the disease matrix sparsity and eliminate unattached disease pairs [30, 52, 53]. The optimization problem of GRL_{2, 1}-NMF can be formularized as follows:

\begin{matrix} {min}_{W, H} {‖Y - W H^{T}‖}_{F}^{2} + λ_{l} ({‖W‖}_{F}^{2} + {‖H‖}_{F}^{2}) + λ_{l} {‖H‖}_{2, 1} \\ + λ_{m} Tr (W^{T} L_{m} W) + λ_{d} Tr (H^{T} L_{D} H) \\ s . t . W \geq 0, H \geq 0 \end{matrix}

where ‖∙‖_F represents the Frobenius norm of a matrix; ‖·‖_{2, 1} represents the L_{2, 1}-norm; Tr(∙) denotes the trace of a matrix; and λ_l, λ_m and λ_d are regularization coefficients. Let S^m and S^d be miRNA and disease similarity networks; and let D_m and D_d be the diagonal matrices whose elements are row element or column element sums of S^m and S^d respectively. We define L_m = D_m − S^m and L_d = D_d − S^d as the graph Laplacian matrices for S^m and S^d [72], respectively; the first item denotes the similar matrix of the model for the purpose of searching for the matrices W and H. The next term is the Tikhonov regularization. The third item introduces the L_{2, 1}-norm into matrix H. The last two items refer to the graph regularization of microRNAs and diseases.

Optimization

Considering the two nonnegative constraints of the objective function, namely, W ≥ 0 and H ≥ 0, we utilized Lagrange multipliers to address the optimization problem in Equation (22). First, the Lagrange function L_f is as follows:

\begin{matrix} L_{f} = Tr (Y Y^{T}) - 2 Tr (YH W^{T}) + Tr (W H^{T} H W^{T}) \\ \begin{matrix} + λ_{l} Tr (W W^{T}) + λ_{l} Tr (H H^{T}) + λ_{l} {‖H‖}_{2, 1} \\ + λ_{m} Tr (W^{T} L_{m} W) + λ_{d} Tr (H^{T} L_{d} D) H \\ + Tr (\emptyset W^{T}) + Tr (φ H^{T}) \end{matrix} \end{matrix}

The partial derivatives of the above functions L_f for W and H are:

\frac{\partial L_{f}}{\partial W} = - 2 YH + 2 W H^{T} H + 2 λ_{l} W + 2 λ_{m} L_{m} W + \emptyset

\frac{\partial L_{f}}{\partial H} = - 2 Y^{T} W + 2 H W^{T} W + 2 λ_{l} H + 2 λ_{l} AH + 2 λ_{d} L_{d} H + φ

where A is a diagonal matrix, and the formula is as follows:

{[A]}_{i, j} = \frac{1}{{‖H^{s}‖}_{2}} = \frac{1}{\sqrt{\sum_{j = 1}^{m} {|H_{s, j}|}^{2}}}

Therefore, we obtained the updating rules expressed as Equations (27) and (28):

w_{ik} \leftarrow w_{ik} \frac{{(YH + λ_{m} S^{m} W)}_{ik}}{{(W H^{T} H + λ_{l} W + λ_{m} D_{m} W)}_{ik}}

h_{ik} \leftarrow h_{ik} \frac{{(Y^{T} W + λ_{d} S^{d} H)}_{ik}}{{(H W^{T} W + λ_{l} H + λ_{l} AH + λ_{d} D_{d} H)}_{ik}}

According to Equation (27) and Equation (28), the nonnegative matrices W and H are updated until convergence. Eventually, we obtained a matrix of Y^∗ = WH^T, which is based on interactions among microRNAs and disease. We ranked predicted disease-connected miRNAs according to the elements in matrix Y^∗. In theory, the higher-ranking miRNAs in each column of Y^∗ tend to be connected with the matching disease.

Supplementary information

12859_2020_3409_MOESM1_ESM.txt^{(604.5KB, txt)}

Additional file 1. disease semantic similarity1.txt. This is disease semantic similarity 1, which integrated 383 disease semantic similarities.

12859_2020_3409_MOESM2_ESM.txt^{(812.7KB, txt)}

Additional file 2. disease semantic similarity2.txt. This is disease semantic similarity 2, which integrated 383 disease semantic similarities.

12859_2020_3409_MOESM3_ESM.txt^{(718.2KB, txt)}

Additional file 3. miRNA functional similarity.txt. This is the miRNA functional similarity, which integrated 495 miRNA functional semantic similarities.

12859_2020_3409_MOESM4_ESM.txt^{(43.1KB, txt)}

Additional file 4. knownassociation.txt. This is a known miRNA-disease association matrix that was downloaded from HMDD v2.0. It includes 5430 known miRNA-disease associations between 495 miRNAs and 383 diseases.

12859_2020_3409_MOESM5_ESM.xlsx^{(17.9KB, xlsx)}

Additional file 5. diseases_list.xlsx. This file lists 383 disease names.

12859_2020_3409_MOESM6_ESM.xlsx^{(17.5KB, xlsx)}

Additional file 6. miRNAs_list.xlsx. This file lists 495 miRNA names.

Acknowledgements

Not applicable.

Abbreviations

5-CV: Five-fold cross validation
AUC: The area under the ROC curve
DAG: Directed acyclic graph
dbDEMC: Database of differentially expressed miRNAs in human cancers
FPR: False positive rate
GIP: GAUSSIAN interaction profiles
HMDD: Human microRNA disease database
LOOCV: Leave-one-out cross validation
miRNA: MicroRNA
NMF: Nonnegative matrix factorization
ROC: Receiver operating characteristic
TPR: True positive rate
WKNKN: Weighted K nearest known neighbours

Authors’ contributions

ZG and YTW collected the data. ZG, YTW, QWW, JCN and CHZ conceived and designed the experiments. ZG implemented the experiments. ZG and CHZ analysed the results. ZG and CHZ wrote the paper. All authors read and approved the final manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Nos. U19A2064, 61873001, 61872220, 61672037, 61861146002 and 61732012). The funding bodies did not play any role in the design of the study or collection, analysis and interpretation of data or in writing the manuscript.

Availability of data and materials

The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional files).

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Zhen Gao, Email: 17853711618@163.com.

Yu-Tian Wang, Email: 253667119@qq.com.

Qing-Wen Wu, Email: wqwcyf@126.com.

Jian-Cheng Ni, Email: nijch@163.com.

Chun-Hou Zheng, Email: zhengch99@126.com.

Supplementary information

Supplementary information accompanies this paper at 10.1186/s12859-020-3409-x.

References

1.Victor A. microRNAs: tiny regulators with great potential. Cell. 2001;107:823–826. doi: 10.1016/S0092-8674(01)00616-X. [DOI] [PubMed] [Google Scholar]
2.Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. [DOI] [PubMed] [Google Scholar]
3.Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]
4.Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. [DOI] [PubMed] [Google Scholar]
5.Jopling CL, Yi M, Lancaster AM, Lemon SM, Sarnow P. Modulation of hepatitis C virus RNA abundance by a liver-specific MicroRNA. Science. 2005;309:1577–1581. doi: 10.1126/science.1113329. [DOI] [PubMed] [Google Scholar]
6.Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. doi: 10.1093/nar/gkq1027. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Bartel DP. MicroRNAs. Target recognition and regulatory functions. Cell. 2009;136:215–233. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Harfe BD. MicroRNAs in vertebrate development. Curr Opin Genet Dev. 2005;15:410–415. doi: 10.1016/j.gde.2005.06.012. [DOI] [PubMed] [Google Scholar]
9.Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44:D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070–D1074. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Meola N, Gennarino VA, Banfi S. microRNAs and genetic diseases. Pathogenetics. 2009;2:7. doi: 10.1186/1755-8417-2-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6:857–866. doi: 10.1038/nrc1997. [DOI] [PubMed] [Google Scholar]
13.Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017;66:194–203. doi: 10.1016/j.jbi.2017.01.008. [DOI] [PubMed] [Google Scholar]
14.Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief Bioinform. 2016;17:193–203. doi: 10.1093/bib/bbv033. [DOI] [PubMed] [Google Scholar]
15.Chen X, Xie D, Zhao Q, You ZH. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20:515–539. doi: 10.1093/bib/bbx130. [DOI] [PubMed] [Google Scholar]
16.Jiang Q, Hao Y, Wang G. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010;4:S2. doi: 10.1186/1752-0509-4-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Li X, Wang Q, Zheng Y, Lv S, Ning S, Sun J, Huang T, Zheng Q, Ren H, Xu J, et al. Prioritizing human cancer microRNAs based on genes' functional consistency between microRNA and cancer. Nucleic Acids Res. 2011;39:e153. doi: 10.1093/nar/gkr770. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Xu C, Ping Y, Li X, Zhao H, Wang L, Fan H, Xiao Y, Li X. Prioritizing candidate disease miRNAs by integrating phenotype associations of multiple diseases with matched miRNA and mRNA expression profiles. Mol BioSyst. 2014;10:2800–2809. doi: 10.1039/C4MB00353E. [DOI] [PubMed] [Google Scholar]
19.Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, Liu Y, Dai Q, Li J, Teng Z, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS One. 2013;8:e70204. doi: 10.1371/journal.pone.0070204. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Mork S, Pletscher-Frankild S, Palleja Caro A, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics. 2014;30:392–397. doi: 10.1093/bioinformatics/btt677. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, Zhang Y, Dai Q. WBSMDA: within and between score for MiRNA-disease association prediction. Sci Rep. 2016;6:21106. doi: 10.1038/srep21106. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Chen X, Yan CC, Zhang X, You ZH, Huang YA, GY Y. HGIMDA: heterogeneous graph inference for miRNA-disease association prediction. Oncotarget. 2016;7(40):65257–65269. doi: 10.18632/oncotarget.11251. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Yu H, Chen X, Lu L. Large-scale prediction of microRNA-disease associations by combinatorial prioritization algorithm. Sci Rep. 2017;7:43792. doi: 10.1038/srep43792. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Chen X, Wu QF, Yan GY. RKNNMDA: ranking-based KNN for MiRNA-disease association prediction. RNA Biol. 2017;14:952–962. doi: 10.1080/15476286.2017.1312226. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Chen X, Cheng JY, Yin J. Predicting microRNA-disease associations using bipartite local models and hubness-aware regression. RNA Biol. 2018;15:1192–1205. doi: 10.1080/15476286.2018.1517010. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.You Z-H, Huang Z-A, Zhu Z, Yan G-Y, Li Z-W, Wen Z, Chen X. PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput Biol. 2017;13:e1005455. doi: 10.1371/journal.pcbi.1005455. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA-disease associations. Mol BioSyst. 2012;8:2792–2798. doi: 10.1039/c2mb25180a. [DOI] [PubMed] [Google Scholar]
28.Shi H, Xu J, Zhang G. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101. doi: 10.1186/1752-0509-7-101. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Xuan P, Han K, Guo Y, Li J, Li X, Zhong Y, Zhang Z, Ding J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics. 2015;31:1805–1815. doi: 10.1093/bioinformatics/btv039. [DOI] [PubMed] [Google Scholar]
30.Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2017;14:905–915. doi: 10.1109/TCBB.2016.2550432. [DOI] [PubMed] [Google Scholar]
31.Wong L, You ZH, Ming Z, Li J, Chen X, Huang YA. Detection of interactions between proteins through rotation Forest and local phase quantization descriptors. Int J Mol Sci. 2016;17:21. doi: 10.3390/ijms17010021. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Xu J, Li CX, Lv JY, Li YS, Xiao Y, Shao TT, Huo X, Li X, Zou Y, Han QL, et al. Prioritizing candidate disease miRNAs by topological features in the miRNA target-dysregulated network: case study of prostate cancer. Mol Cancer Ther. 2011;10:1857–1866. doi: 10.1158/1535-7163.MCT-11-0055. [DOI] [PubMed] [Google Scholar]
33.Chen X, Yan GY. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep. 2014;4:5501. doi: 10.1038/srep05501. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Li JQ, Rong ZH, Chen X. MCMDA: matrix completion for MiRNA-disease association prediction. Oncotarget. 2017;8:21187–21199. doi: 10.18632/oncotarget.15061. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Chen X, Wang CC, Yin J, You ZH. Novel human miRNA-disease association inference based on random Forest. Mol Ther Nucleic Acids. 2018;13:568–579. doi: 10.1016/j.omtn.2018.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Chen X, Zhang DH, You ZH. A heterogeneous label propagation approach to explore the potential associations between miRNA and disease. J Transl Med. 2018;16:348. doi: 10.1186/s12967-018-1722-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Chen X, Yan CC, Zhang X, Li Z, Deng L, Zhang Y, Dai Q. RBMMMDA: predicting multiple types of disease-microRNA associations. Sci Rep. 2015;5:13877. doi: 10.1038/srep13877. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Chen X, Gong Y, Zhang DH, You ZH, Li ZW. DRMDA: deep representations-based miRNA-disease association prediction. J Cell Mol Med. 2018;22:472–485. doi: 10.1111/jcmm.13336. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Chen X, Huang L, Xie D, Zhao Q. EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction. Cell Death Dis. 2018;9:3. doi: 10.1038/s41419-017-0003-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Chen X, Zhu C-C, Yin J. Ensemble of decision tree reveals potential miRNA-disease associations. PLOS Comput Biol. 2019;15:e1007209. doi: 10.1371/journal.pcbi.1007209. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Chen X, Huang L. LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput Biol. 2017;13:e1005912. doi: 10.1371/journal.pcbi.1005912. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Zhao Y, Chen X, Yin J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics. 2019;35:4730–4738. doi: 10.1093/bioinformatics/btz297. [DOI] [PubMed] [Google Scholar]
43.Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, Ju Y. Prediction of MicroRNA-disease associations based on social network analysis methods. Biomed Res Int. 2015;2015:810514. doi: 10.1155/2015/810514. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Chen X, Zhou Z, Zhao Y. ELLPMDA: ensemble learning and link prediction for miRNA-disease association prediction. RNA Biol. 2018;15:807–818. doi: 10.1080/15476286.2018.1517010. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Chen X, Xie D, Wang L, Zhao Q, You ZH, Liu H. BNPMDA: bipartite network projection for MiRNA-disease association prediction. Bioinformatics. 2018;34:3178–3186. doi: 10.1093/bioinformatics/bty333. [DOI] [PubMed] [Google Scholar]
46.Zhao Y, Chen X, Yin J. A novel computational method for the identification of potential miRNA-disease association based on symmetric non-negative matrix factorization and Kronecker regularized Least Square. Front Genet. 2018;9:324. doi: 10.3389/fgene.2018.00324. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Zhong Y, Xuan P, Wang X, Zhang T, Li J. A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network. Bioinformatics. 2018;34:267–277. doi: 10.1093/bioinformatics/btx546. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Xiao Q, Luo J, Liang C, Cai J, Ding P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics. 2018;34:239–248. doi: 10.1093/bioinformatics/btx545. [DOI] [PubMed] [Google Scholar]
49.Chen X, Yin J, Qu J, Huang L. MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLoS Comput Biol. 2018;14:e1006418. doi: 10.1371/journal.pcbi.1006418. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Ezzat A, Zhao PL. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions On Computational Biology And Bioinformatics. 2017;14:646–656. doi: 10.1109/TCBB.2016.2530062. [DOI] [PubMed] [Google Scholar]
51.Liu JX, Wang D, Gao YL, Zheng CH, Shang JL, Liu F, Xu Y. A joint-L2,1-norm-constraint-based semi-supervised feature extraction for RNA-Seq data analysis. Neurocomputing. 2017;228:263–269. doi: 10.1016/j.neucom.2016.09.083. [DOI] [Google Scholar]
52.Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY. The computational prediction of drug-disease interactions using the dual-network L2,1-CMF method. BMC Bioinformatics. 2019;20:5. doi: 10.1186/s12859-018-2575-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Gao MM, Cui Z, Gao YL, Liu JX, Zheng CH. Dual-network sparse graph regularized matrix factorization for predicting miRNA-disease associations. Mol Omics. 2019;15:130–137. doi: 10.1039/C8MO00244D. [DOI] [PubMed] [Google Scholar]
54.Pauca VP, Shahnaz F, Berry MW. Proceedings of the 2004 SIAM International Conference on Data Mining Society for Industrial and Applied Mathematics. 2004. Text mining using non-negative matrix factorization; pp. 452–456. [Google Scholar]
55.Guan N, Tao D, Luo Z, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process. 2011;20:2030–2048. doi: 10.1109/TIP.2011.2105496. [DOI] [PubMed] [Google Scholar]
56.Cai D, He X, Han J. Graph regularized non-negative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell. 2011;33:1548–1560. doi: 10.1109/TPAMI.2010.231. [DOI] [PubMed] [Google Scholar]
57.Yang Z, Wu L, Wang A, Tang W, Zhao Y, Zhao H, AEJNar T. dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers. Nucleic Acids Res. 2016;45:D812–D818. doi: 10.1093/nar/gkw1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Jiang Y, Liu B, Yu L, Yan C, Bian H. Predict MiRNA-disease association with collaborative filtering. Neuroinformatics. 2018;16:363–372. doi: 10.1007/s12021-018-9386-9. [DOI] [PubMed] [Google Scholar]
59.Shao B, Liu B, Yan C. SACMDA: MiRNA-disease association prediction with short acyclic connections in heterogeneous graph. Neuroinformatics. 2018;16:373–382. doi: 10.1007/s12021-018-9373-1. [DOI] [PubMed] [Google Scholar]
60.Chen X, Wang L, Qu J, Guan NN, Li JQ. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics. 2018;34:4256–4265. doi: 10.1093/bioinformatics/bty503. [DOI] [PubMed] [Google Scholar]
61.Goh KI, Cusick ME, Valle D. The human disease network. Proc Natl Acad Sci. 2007;104:8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Lu M, Zhang Q, Deng M. An analysis of human microRNA and disease associations. PLoS One. 2008;3:e3420. doi: 10.1371/journal.pone.0003420. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Wang D, Wang JY, Lu M. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26:1644–1650. doi: 10.1093/bioinformatics/btq241. [DOI] [PubMed] [Google Scholar]
64.Bandyopadhyay S, Mitra R, Maulik U. Development of the human cancer microRNA network. Silence. 2010;1:6. doi: 10.1186/1758-907X-1-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Chen X, Huang YA, You ZH, Yan GY, Wang XS. A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics. 2018;34:1440. doi: 10.1093/bioinformatics/btx773. [DOI] [PubMed] [Google Scholar]
66.Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y. Drug-target interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17:696–712. doi: 10.1093/bib/bbv066. [DOI] [PubMed] [Google Scholar]
67.Zheng CH, Huang DS, Zhang L, Kong XZ. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans Inf Technol Biomed. 2009;13:599–607. doi: 10.1109/TITB.2009.2018115. [DOI] [PubMed] [Google Scholar]
68.Huang DS, Zheng CH. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics. 2006;22:1855–1862. doi: 10.1093/bioinformatics/btl190. [DOI] [PubMed] [Google Scholar]
69.Lee DD, Seung HS. Learning the parts of objects by nonnegtive matrix factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. [DOI] [PubMed] [Google Scholar]
70.Li X, Cui G, Dong Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans Cybern. 2017;47:3840–3853. doi: 10.1109/TCYB.2016.2585355. [DOI] [PubMed] [Google Scholar]
71.Wang JY, Almasri I, Gao X. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) IEEE. 2012. Adaptive Graph Regularized Nonnegative Matrix Factorization via Feature Selection; pp. 963–966. [Google Scholar]
72.Liu X, Zhai D, Zhao D, Zhai G, Gao W. Progressive image denoising through hybrid graph Laplacian regularization: a unified framework. IEEE Trans Image Process. 2014;23:1491–1503. doi: 10.1109/TIP.2014.2303638. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12859_2020_3409_MOESM1_ESM.txt^{(604.5KB, txt)}

Additional file 1. disease semantic similarity1.txt. This is disease semantic similarity 1, which integrated 383 disease semantic similarities.

12859_2020_3409_MOESM2_ESM.txt^{(812.7KB, txt)}

Additional file 2. disease semantic similarity2.txt. This is disease semantic similarity 2, which integrated 383 disease semantic similarities.

12859_2020_3409_MOESM3_ESM.txt^{(718.2KB, txt)}

Additional file 3. miRNA functional similarity.txt. This is the miRNA functional similarity, which integrated 495 miRNA functional semantic similarities.

12859_2020_3409_MOESM4_ESM.txt^{(43.1KB, txt)}

12859_2020_3409_MOESM5_ESM.xlsx^{(17.9KB, xlsx)}

Additional file 5. diseases_list.xlsx. This file lists 383 disease names.

12859_2020_3409_MOESM6_ESM.xlsx^{(17.5KB, xlsx)}

Additional file 6. miRNAs_list.xlsx. This file lists 495 miRNA names.

Data Availability Statement

The dataset(s) supporting the conclusions of this article is (are) included within the article (and its additional files).

[CR1] 1.Victor A. microRNAs: tiny regulators with great potential. Cell. 2001;107:823–826. doi: 10.1016/S0092-8674(01)00616-X. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Ambros V. The functions of animal microRNAs. Nature. 2004;431:350–355. doi: 10.1038/nature02871. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/S0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-Y. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Jopling CL, Yi M, Lancaster AM, Lemon SM, Sarnow P. Modulation of hepatitis C virus RNA abundance by a liver-specific MicroRNA. Science. 2005;309:1577–1581. doi: 10.1126/science.1113329. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39:D152–D157. doi: 10.1093/nar/gkq1027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Bartel DP. MicroRNAs. Target recognition and regulatory functions. Cell. 2009;136:215–233. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Harfe BD. MicroRNAs in vertebrate development. Curr Opin Genet Dev. 2005;15:410–415. doi: 10.1016/j.gde.2005.06.012. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Chou CH, Chang NW, Shrestha S, Hsu SD, Lin YL, et al. miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database. Nucleic Acids Res. 2016;44:D239–D247. doi: 10.1093/nar/gkv1258. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070–D1074. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Meola N, Gennarino VA, Banfi S. microRNAs and genetic diseases. Pathogenetics. 2009;2:7. doi: 10.1186/1755-8417-2-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6:857–866. doi: 10.1038/nrc1997. [DOI] [PubMed] [Google Scholar]

[CR13] 13.Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform. 2017;66:194–203. doi: 10.1016/j.jbi.2017.01.008. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Zeng X, Zhang X, Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief Bioinform. 2016;17:193–203. doi: 10.1093/bib/bbv033. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Chen X, Xie D, Zhao Q, You ZH. MicroRNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20:515–539. doi: 10.1093/bib/bbx130. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Jiang Q, Hao Y, Wang G. Prioritization of disease microRNAs through a human phenome-microRNAome network. BMC Syst Biol. 2010;4:S2. doi: 10.1186/1752-0509-4-S1-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Li X, Wang Q, Zheng Y, Lv S, Ning S, Sun J, Huang T, Zheng Q, Ren H, Xu J, et al. Prioritizing human cancer microRNAs based on genes' functional consistency between microRNA and cancer. Nucleic Acids Res. 2011;39:e153. doi: 10.1093/nar/gkr770. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Xu C, Ping Y, Li X, Zhao H, Wang L, Fan H, Xiao Y, Li X. Prioritizing candidate disease miRNAs by integrating phenotype associations of multiple diseases with matched miRNA and mRNA expression profiles. Mol BioSyst. 2014;10:2800–2809. doi: 10.1039/C4MB00353E. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Xuan P, Han K, Guo M, Guo Y, Li J, Ding J, Liu Y, Dai Q, Li J, Teng Z, et al. Prediction of microRNAs associated with human diseases based on weighted k most similar neighbors. PLoS One. 2013;8:e70204. doi: 10.1371/journal.pone.0070204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Mork S, Pletscher-Frankild S, Palleja Caro A, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics. 2014;30:392–397. doi: 10.1093/bioinformatics/btt677. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Chen X, Yan CC, Zhang X, You ZH, Deng L, Liu Y, Zhang Y, Dai Q. WBSMDA: within and between score for MiRNA-disease association prediction. Sci Rep. 2016;6:21106. doi: 10.1038/srep21106. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Chen X, Yan CC, Zhang X, You ZH, Huang YA, GY Y. HGIMDA: heterogeneous graph inference for miRNA-disease association prediction. Oncotarget. 2016;7(40):65257–65269. doi: 10.18632/oncotarget.11251. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Yu H, Chen X, Lu L. Large-scale prediction of microRNA-disease associations by combinatorial prioritization algorithm. Sci Rep. 2017;7:43792. doi: 10.1038/srep43792. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Chen X, Wu QF, Yan GY. RKNNMDA: ranking-based KNN for MiRNA-disease association prediction. RNA Biol. 2017;14:952–962. doi: 10.1080/15476286.2017.1312226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Chen X, Cheng JY, Yin J. Predicting microRNA-disease associations using bipartite local models and hubness-aware regression. RNA Biol. 2018;15:1192–1205. doi: 10.1080/15476286.2018.1517010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.You Z-H, Huang Z-A, Zhu Z, Yan G-Y, Li Z-W, Wen Z, Chen X. PBMDA: a novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput Biol. 2017;13:e1005455. doi: 10.1371/journal.pcbi.1005455. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Chen X, Liu MX, Yan GY. RWRMDA: predicting novel human microRNA-disease associations. Mol BioSyst. 2012;8:2792–2798. doi: 10.1039/c2mb25180a. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Shi H, Xu J, Zhang G. Walking the interactome to identify human miRNA-disease associations through the functional link between miRNA targets and disease genes. BMC Syst Biol. 2013;7:101. doi: 10.1186/1752-0509-7-101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Xuan P, Han K, Guo Y, Li J, Li X, Zhong Y, Zhang Z, Ding J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics. 2015;31:1805–1815. doi: 10.1093/bioinformatics/btv039. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Liu Y, Zeng X, He Z, Zou Q. Inferring microRNA-disease associations by random walk on a heterogeneous network with multiple data sources. IEEE/ACM Trans Comput Biol Bioinform. 2017;14:905–915. doi: 10.1109/TCBB.2016.2550432. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Wong L, You ZH, Ming Z, Li J, Chen X, Huang YA. Detection of interactions between proteins through rotation Forest and local phase quantization descriptors. Int J Mol Sci. 2016;17:21. doi: 10.3390/ijms17010021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Xu J, Li CX, Lv JY, Li YS, Xiao Y, Shao TT, Huo X, Li X, Zou Y, Han QL, et al. Prioritizing candidate disease miRNAs by topological features in the miRNA target-dysregulated network: case study of prostate cancer. Mol Cancer Ther. 2011;10:1857–1866. doi: 10.1158/1535-7163.MCT-11-0055. [DOI] [PubMed] [Google Scholar]

[CR33] 33.Chen X, Yan GY. Semi-supervised learning for potential human microRNA-disease associations inference. Sci Rep. 2014;4:5501. doi: 10.1038/srep05501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Li JQ, Rong ZH, Chen X. MCMDA: matrix completion for MiRNA-disease association prediction. Oncotarget. 2017;8:21187–21199. doi: 10.18632/oncotarget.15061. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Chen X, Wang CC, Yin J, You ZH. Novel human miRNA-disease association inference based on random Forest. Mol Ther Nucleic Acids. 2018;13:568–579. doi: 10.1016/j.omtn.2018.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Chen X, Zhang DH, You ZH. A heterogeneous label propagation approach to explore the potential associations between miRNA and disease. J Transl Med. 2018;16:348. doi: 10.1186/s12967-018-1722-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Chen X, Yan CC, Zhang X, Li Z, Deng L, Zhang Y, Dai Q. RBMMMDA: predicting multiple types of disease-microRNA associations. Sci Rep. 2015;5:13877. doi: 10.1038/srep13877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Chen X, Gong Y, Zhang DH, You ZH, Li ZW. DRMDA: deep representations-based miRNA-disease association prediction. J Cell Mol Med. 2018;22:472–485. doi: 10.1111/jcmm.13336. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Chen X, Huang L, Xie D, Zhao Q. EGBMMDA: extreme gradient boosting machine for MiRNA-disease association prediction. Cell Death Dis. 2018;9:3. doi: 10.1038/s41419-017-0003-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Chen X, Zhu C-C, Yin J. Ensemble of decision tree reveals potential miRNA-disease associations. PLOS Comput Biol. 2019;15:e1007209. doi: 10.1371/journal.pcbi.1007209. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Chen X, Huang L. LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput Biol. 2017;13:e1005912. doi: 10.1371/journal.pcbi.1005912. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Zhao Y, Chen X, Yin J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics. 2019;35:4730–4738. doi: 10.1093/bioinformatics/btz297. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Zou Q, Li J, Hong Q, Lin Z, Wu Y, Shi H, Ju Y. Prediction of MicroRNA-disease associations based on social network analysis methods. Biomed Res Int. 2015;2015:810514. doi: 10.1155/2015/810514. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Chen X, Zhou Z, Zhao Y. ELLPMDA: ensemble learning and link prediction for miRNA-disease association prediction. RNA Biol. 2018;15:807–818. doi: 10.1080/15476286.2018.1517010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Chen X, Xie D, Wang L, Zhao Q, You ZH, Liu H. BNPMDA: bipartite network projection for MiRNA-disease association prediction. Bioinformatics. 2018;34:3178–3186. doi: 10.1093/bioinformatics/bty333. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Zhao Y, Chen X, Yin J. A novel computational method for the identification of potential miRNA-disease association based on symmetric non-negative matrix factorization and Kronecker regularized Least Square. Front Genet. 2018;9:324. doi: 10.3389/fgene.2018.00324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Zhong Y, Xuan P, Wang X, Zhang T, Li J. A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network. Bioinformatics. 2018;34:267–277. doi: 10.1093/bioinformatics/btx546. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Xiao Q, Luo J, Liang C, Cai J, Ding P. A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics. 2018;34:239–248. doi: 10.1093/bioinformatics/btx545. [DOI] [PubMed] [Google Scholar]

[CR49] 49.Chen X, Yin J, Qu J, Huang L. MDHGI: matrix decomposition and heterogeneous graph inference for miRNA-disease association prediction. PLoS Comput Biol. 2018;14:e1006418. doi: 10.1371/journal.pcbi.1006418. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Ezzat A, Zhao PL. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Transactions On Computational Biology And Bioinformatics. 2017;14:646–656. doi: 10.1109/TCBB.2016.2530062. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Liu JX, Wang D, Gao YL, Zheng CH, Shang JL, Liu F, Xu Y. A joint-L2,1-norm-constraint-based semi-supervised feature extraction for RNA-Seq data analysis. Neurocomputing. 2017;228:263–269. doi: 10.1016/j.neucom.2016.09.083. [DOI] [Google Scholar]

[CR52] 52.Cui Z, Gao YL, Liu JX, Wang J, Shang J, Dai LY. The computational prediction of drug-disease interactions using the dual-network L2,1-CMF method. BMC Bioinformatics. 2019;20:5. doi: 10.1186/s12859-018-2575-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Gao MM, Cui Z, Gao YL, Liu JX, Zheng CH. Dual-network sparse graph regularized matrix factorization for predicting miRNA-disease associations. Mol Omics. 2019;15:130–137. doi: 10.1039/C8MO00244D. [DOI] [PubMed] [Google Scholar]

[CR54] 54.Pauca VP, Shahnaz F, Berry MW. Proceedings of the 2004 SIAM International Conference on Data Mining Society for Industrial and Applied Mathematics. 2004. Text mining using non-negative matrix factorization; pp. 452–456. [Google Scholar]

[CR55] 55.Guan N, Tao D, Luo Z, Yuan B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. IEEE Trans Image Process. 2011;20:2030–2048. doi: 10.1109/TIP.2011.2105496. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Cai D, He X, Han J. Graph regularized non-negative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell. 2011;33:1548–1560. doi: 10.1109/TPAMI.2010.231. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Yang Z, Wu L, Wang A, Tang W, Zhao Y, Zhao H, AEJNar T. dbDEMC 2.0: updated database of differentially expressed miRNAs in human cancers. Nucleic Acids Res. 2016;45:D812–D818. doi: 10.1093/nar/gkw1079. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR58] 58.Jiang Y, Liu B, Yu L, Yan C, Bian H. Predict MiRNA-disease association with collaborative filtering. Neuroinformatics. 2018;16:363–372. doi: 10.1007/s12021-018-9386-9. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Shao B, Liu B, Yan C. SACMDA: MiRNA-disease association prediction with short acyclic connections in heterogeneous graph. Neuroinformatics. 2018;16:373–382. doi: 10.1007/s12021-018-9373-1. [DOI] [PubMed] [Google Scholar]

[CR60] 60.Chen X, Wang L, Qu J, Guan NN, Li JQ. Predicting miRNA-disease association based on inductive matrix completion. Bioinformatics. 2018;34:4256–4265. doi: 10.1093/bioinformatics/bty503. [DOI] [PubMed] [Google Scholar]

[CR61] 61.Goh KI, Cusick ME, Valle D. The human disease network. Proc Natl Acad Sci. 2007;104:8685–8690. doi: 10.1073/pnas.0701361104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Lu M, Zhang Q, Deng M. An analysis of human microRNA and disease associations. PLoS One. 2008;3:e3420. doi: 10.1371/journal.pone.0003420. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Wang D, Wang JY, Lu M. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26:1644–1650. doi: 10.1093/bioinformatics/btq241. [DOI] [PubMed] [Google Scholar]

[CR64] 64.Bandyopadhyay S, Mitra R, Maulik U. Development of the human cancer microRNA network. Silence. 2010;1:6. doi: 10.1186/1758-907X-1-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Chen X, Huang YA, You ZH, Yan GY, Wang XS. A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics. 2018;34:1440. doi: 10.1093/bioinformatics/btx773. [DOI] [PubMed] [Google Scholar]

[CR66] 66.Chen X, Yan CC, Zhang X, Zhang X, Dai F, Yin J, Zhang Y. Drug-target interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17:696–712. doi: 10.1093/bib/bbv066. [DOI] [PubMed] [Google Scholar]

[CR67] 67.Zheng CH, Huang DS, Zhang L, Kong XZ. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans Inf Technol Biomed. 2009;13:599–607. doi: 10.1109/TITB.2009.2018115. [DOI] [PubMed] [Google Scholar]

[CR68] 68.Huang DS, Zheng CH. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics. 2006;22:1855–1862. doi: 10.1093/bioinformatics/btl190. [DOI] [PubMed] [Google Scholar]

[CR69] 69.Lee DD, Seung HS. Learning the parts of objects by nonnegtive matrix factorization. Nature. 1999;401:788–791. doi: 10.1038/44565. [DOI] [PubMed] [Google Scholar]

[CR70] 70.Li X, Cui G, Dong Y. Graph regularized non-negative low-rank matrix factorization for image clustering. IEEE Trans Cybern. 2017;47:3840–3853. doi: 10.1109/TCYB.2016.2585355. [DOI] [PubMed] [Google Scholar]

[CR71] 71.Wang JY, Almasri I, Gao X. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) IEEE. 2012. Adaptive Graph Regularized Nonnegative Matrix Factorization via Feature Selection; pp. 963–966. [Google Scholar]

[CR72] 72.Liu X, Zhai D, Zhao D, Zhai G, Gao W. Progressive image denoising through hybrid graph Laplacian regularization: a unified framework. IEEE Trans Image Process. 2014;23:1491–1503. doi: 10.1109/TIP.2014.2303638. [DOI] [PubMed] [Google Scholar]

PERMALINK

Graph regularized L2,1-nonnegative matrix factorization for miRNA-disease association prediction

Zhen Gao

Yu-Tian Wang

Qing-Wen Wu

Jian-Cheng Ni

Chun-Hou Zheng

Abstract

Background

Results

Conclusions

Background

Results

Effect of parameters on the performance ofGRL2, 1-NMF

Table 1.

Table 2.

Performance evaluation

Fig. 1.

Fig. 2.

Case studies

Table 3.

Table 4.

Table 5.

Discussion

Conclusions

Methods

Human miRNA-disease associations

MiRNA functional similarity

Disease semantic similarity method 1

Disease semantic similarity method 2

Gaussian interaction profile kernel similarity for diseases and MiRNAs

Integrated similarity for diseases and MiRNAs

Weighted K nearest known Neighbours (WKNKN) for MiRNAs and diseases

Standard NMF

GRL2, 1-NMF

Fig. 3.

Optimization

Supplementary information

Acknowledgements

Abbreviations

Authors’ contributions

Funding

Availability of data and materials

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

Contributor Information

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Graph regularized L_2,1-nonnegative matrix factorization for miRNA-disease association prediction

**Effect of parameters on the performance of*GRL***_{2, 1}-NMF

GRL_{2, 1}-NMF