Skip to main content
BMC Bioinformatics logoLink to BMC Bioinformatics
. 2019 Feb 19;20:87. doi: 10.1186/s12859-019-2675-y

Prediction of lncRNA-disease associations by integrating diverse heterogeneous information sources with RWR algorithm and positive pointwise mutual information

Xiao-Nan Fan 1,2, Shao-Wu Zhang 1,, Song-Yao Zhang 1, Kunju Zhu 2,3, Songjian Lu 2,
PMCID: PMC6381749  PMID: 30782113

Abstract

Background

Long non-coding RNAs play an important role in human complex diseases. Identification of lncRNA-disease associations will gain insight into disease-related lncRNAs and benefit disease diagnoses and treatment. However, using experiments to explore the lncRNA-disease associations is expensive and time consuming.

Results

In this study, we developed a novel method to identify potential lncRNA-disease associations by Integrating Diverse Heterogeneous Information sources with positive pointwise Mutual Information and Random Walk with restart algorithm (namely IDHI-MIRW). IDHI-MIRW first constructs multiple lncRNA similarity networks and disease similarity networks from diverse lncRNA-related and disease-related datasets, then implements the random walk with restart algorithm on these similarity networks for extracting the topological similarities which are fused with positive pointwise mutual information to build a large-scale lncRNA-disease heterogeneous network. Finally, IDHI-MIRW implemented random walk with restart algorithm on the lncRNA-disease heterogeneous network to infer potential lncRNA-disease associations.

Conclusions

Compared with other state-of-the-art methods, IDHI-MIRW achieves the best prediction performance. In case studies of breast cancer, stomach cancer, and colorectal cancer, 36/45 (80%) novel lncRNA-disease associations predicted by IDHI-MIRW are supported by recent literatures. Furthermore, we found lncRNA LINC01816 is associated with the survival of colorectal cancer patients. IDHI-MIRW is freely available at https://github.com/NWPU-903PR/IDHI-MIRW.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-2675-y) contains supplementary material, which is available to authorized users.

Keywords: Long noncoding RNA, Disease, lncRNA-disease association, Heterogeneous network, Random walk with restart algorithm

Background

Long non-coding RNAs (lncRNAs) are the biggest part of non-coding RNAs with at least 200 nucleotides and no observed potential to encode proteins [1, 2]. To date, 15,778 lncRNA genes and 27,908 lncRNA transcripts have been annotated in human genome by the GENCODE v27. Increasing evidences have revealed that lncRNAs have key roles in gene regulations, affecting cellular proliferation, survival, migration and genomic stability [37]. Therefore, there is no surprise that mutation and dysregulation of lncRNAs could contribute to the development of various human complex diseases [810], such as HOTAIR in breast cancer [11] and MALAT1 in early-stage non-small cell lung cancer [12]. On the other hand, lncRNAs can drive many important cancer phenotypes through their interactions with other cellular macromolecules including DNA, protein, and RNA [4]. For example, lncRNA PCGEM1 and PRNCR1 are associated with androgen receptor in prostate cancer cells [6]. And lncRNA PTCSC3 could be a tumor suppressor in thyroid cancer cells by interacting with miR-574-5p [13].

In recent years, the number of experimentally verified lncRNA-disease associations is gradually increasing. Several databases for lncRNA functions and disease associations have been published, such as LncRNAdb [14], LncRNADisease [15], Lnc2Cancer [16] and NONCODE [17]. However, known lncRNA-disease associations still involve a small part of lncRNAs and diseases. Computational methods have been developed to predict the potential lncRNA-disease associations that can be used as candidates for biological experiment verifications, which would greatly reduce the experiment cost and save time for finding new lncRNA-disease associations. Existing computational methods can mainly be categorized into machine learning-based methods [1829] and network-based methods [3041]. The machine learning-based methods, such as LRLSLDA [18], LDAP [26], and MFLDA [27], have been developed to predict the potential lncRNA-disease associations. LRLSLDA [18] combined optimal classifiers in lncRNA space and disease space into a single classifier to predict lncRNA-disease associations based on lncRNA expression profiles and known lncRNA-disease associations. But how to combine the classifiers reasonably needs to further study. LDAP [26] employed two lncRNA similarity measures and five disease similarity measures to calculate lncRNA similarities and disease similarities, respectively, then used the bagging SVM to predict lncRNA-disease associations. However, this method suffered from fusing multiple similarities effectively. Fu et al. [27] developed a lncRNA-disease associations prediction model (MFLDA) with matrix factorization by integrating seven relational data sources between six object types (e.g. lncRNAs, miRNAs, genes, Gene Ontology, Disease Ontology, and drugs). Yet, MFLDA can only predict the potential lncRNA-disease associations which share both lncRNAs and diseases with known associations in training set.

The network-based methods, such as RWRlncD [30], RWRHLD [32], KATZLDA [33] and GrwLDA [40], use lncRNA-disease association, disease similarity, lncRNA similarity, and other molecular similarity to construct the lncRNA similarity networks, or lncRNA-disease heterogeneous network, then implement global network models (such as random walk and various propagation algorithms) to predict potential lncRNA-disease associations [10]. RWRlncD [30] constructed a lncRNA similarity network based on known lncRNA-disease associations, i.e., each lncRNA in their network has at least one known lncRNA-disease association, for predicting potential lncRNA-disease associations. So, the major limitation of RWRlncD is that it cannot predict lncRNA-disease associations for lncRNAs and diseases without any known lncRNA-disease associations. RWRHLD [32] calculated lncRNA similarities and disease similarities based on crosstalk between lncRNAs and miRNAs and directed acyclic graph in the disease ontology, respectively. One weakness of RWRHLD is that lncRNAs interacting with similar miRNAs do not always mean related with similar diseases, and only a small fraction of lncRNA-miRNA interactions is used [25]. KATZLDA [33] integrated lncRNA expression similarity, lncRNA functional similarity, Gaussian interaction profile kernel similarity for diseases and lncRNAs, disease semantic similarity, and known lncRNA-disease associations to build a lncRNA-disease heterogeneous network, then used KATZ algorithm to calculate potential association probability of each lncRNA-disease pair. GrwLDA [40] introduced a global network random walk method to predict potential lncRNA-diseases association by integrating disease semantic similarity, lncRNA functional similarity and known lncRNA-disease associations. Overall, the results of existing network-based methods show that integrating diverse lncRNA-related and disease-related information can boost the prediction accuracy of the lncRNA-disease association. However, most existing methods are limited to a small number of lncRNAs and diseases. For example, the network built in RWRHLD involves 697 lncRNAs and 126 diseases, while the network built in GrwLDA just involves 78 lncRNAs and 113 diseases. In addition, most existing methods calculate the lncRNA/disease similarities only on those that have at least one known lncRNA-disease association.

To address the aforementioned issues (or limitations) and further improve the prediction accuracy, we proposed a novel network-based method, namely IDHI-MIRW, to predict the potential lncRNA-disease associations by constructing a large-scale lncRNA-disease heterogeneous network with Random Walk with Restart (RWR) algorithm and the positive pointwise mutual information (PPMI). Instead of constraining lncRNA and disease on those with at least one known lncRNA-disease association, IDHI-MIRW calculates the lncRNA similarities for all the lncRNAs involved in lncRNA expression profiles, lncRNA-miRNA interactions, and lncRNA-protein interactions, and also calculates the diseases similarities for all the diseases involved in disease ontology, disease-miRNA associations, and disease-gene associations. Then, IDHI-MIRW uses the RWR algorithm on each similarity network to capture network topological structural features for measuring the lncRNA/disease topological similarity through the PPMI. By integrating the lncRNA/disease topological similarity, and introducing the known lncRNA-disease association information, a large-scale lncRNA-disease heterogeneous network is built. Finally, the random walk with restart on heterogeneous network (RWRH) algorithm [42] is applied on the lncRNA-disease heterogeneous network to predict the potential lncRNA-disease associations. The computational results show that IDHI-MIRW cannot only better predict the known lncRNA-disease associations, but also can effectively predict the potential lncRNA-disease associations, providing more candidates for experimental verification. Most of the new predicted lncRNA-disease associations are supported by recent literatures. By analyzing nine unvalidated lncRNAs, we found that six lncRNAs were differentially expressed in corresponding cancers. We also found that lncRNA LINC01816 is associated with the survival of colorectal cancer patients, which provides evidence that this lncRNA is disease-related.

Results

In this section, we first introduced the evaluation method and metrices for evaluating the performance of the IDHI-MIRW method. Then, we compared our IDHI-MIRW method with other existing state-of-the art methods on a small-scale lncRNA-disease heterogeneous network, explored the predictive power of IDHI-MIRW on a large-scale lncRNA-disease heterogeneous network, and discussed the effect of different parameters. In the end, we analyzed several predicted potential lncRNA-disease associations with our IDHI-MIRW.

Evaluation method and metrices

The leave-one-out cross validation (LOOCV) test method was used to evaluate the performance of the IDHI-MIRW method. In LOOCV test method, each known lncRNA-disease association in the dataset is singled out in turn as a test sample, and the remaining lncRNA-disease associations are used as training samples. That is, for a given disease di, each known lncRNA associated with di is left out in turn as a test sample, and corresponding association edge between test lncRNA and di is removed, and the remaining lncRNAs associated with di are considered as training samples.

The area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision-recall (PR) curve (AUPR) were used as evaluation metrices in our experiments. The ROC curve is the plot of the true-positive rate (TPR, or Recall) versus the false-positive rate (FPR) at different rank cutoffs. The PR curve is the plot of the ratio of true positives among all positive predictions for each given recall rate.

Comparison with other methods

We compared our IDHI-MIRW method with other six state-of-the-art methods of LRLSLDA [18], LNCSIM [19], RWRlncD [30], IRWRLDA [34], KATZLDA [33] and GrwLDA [40] on the small-scale lncRNA-disease heterogeneous network (HNetS) which contains 362 lncRNAs, 370 diseases, and 2169 known lncRNA-disease associations. Most existing methods often built this small-scale lncRNA-disease heterogeneous network in which each lncRNA (or disease) has at least an associated disease (or lncRNA) to predict the potential lncRNA-disease associations. LRLSLDA [18] and LNCSIM [19] adopt the semi-supervised learning frameworks with Laplacian regularized least squares. RWRlncD [30], IRWRLDA [34], KATZLDA [33] and GrwLDA [40] are the network-based methods. All methods were executed on a win10 system pc with i7–6700 CPU and 16.0G memory. Figure 1 shows the AUC and AUPR values of IDHI-MIRW and other six methods. IDHI-MIRW achieved a better performance than other six methods in terms of AUC and AUPR. The AUC of IDHI-MIRW is 0.866, which is 0.337, 0.108, 0.350, 0.245, 0.197 and 0.061 higher than that of LRLSLDA, LNCSIM, RWRlncD, IRWRLDA, KATZLDA and GrwLDA, respectively. The AUCPR of IDHI-MIRW is 0.318, which is 0.143, 0.213, 0.296, 0.172, 0.194 and 0.166 higher than that of LRLSLDA, LNCSIM, RWRlncD, IRWRLDA, KATZLDA and GrwLDA, respectively. The recall values of seven methods at different rank cutoffs are listed in Table 1, from which we can see that the recall value of IDHI-MIRW is higher than that of other six existing methods at 10, 20, 50, and 100 ran cutoff. These results show that our IDHI-MIRW can effectively predict the lncRNA-disease associations.

Fig. 1.

Fig. 1

Results of IDHI-MIRW, LRLSLDA, LNCSIM, RWRlncD, IRWRLDA, KATZLDA and GrwLDA on a small-scale lncRNA-disease heterogeneous network in LOOCV test. a AUC values. b AUPR values

Table 1.

Recalls of seven methods at different cutoffs on a small-scale lncRNA-disease heterogeneous network in LOOCV test

Top10 Top20 Top50 Top100
LRLSLDA 0.320 0.406 0.447 0.462
LNCSIM 0.217 0.402 0.595 0.704
RWRlncD 0.005 0.012 0.038 0.161
IRWRLDA 0.273 0.344 0.432 0.563
KATZLDA 0.251 0.382 0.554 0.661
GrwLDA 0.276 0.437 0.652 0.721
IDHI-MIRW 0.461 0.623 0.766 0.845

To further evaluate the performance of IDHI-MIRW for predicting the associated lncRNAs for new diseases without any known lncRNA association information, we removed all the known lncRNA associations for the query disease in the small-scale lncRNA-disease heterogeneous network. Due to RWRlncD implemented the RWR algorithm on an lncRNA similarity network, we just compared our IDHI-MIRW method with other five methods of LRLSLDA, LNCSIM, IRWRLDA, KATZLDA and GrwLDA for predicting the associated lncRNAs of the query diseases. The comparison results are shown in Fig. 2, which shows that our IDHI-MIRW method can better predict the associated lncRNAs for the new disease than other existing prediction methods.

Fig. 2.

Fig. 2

Prediction results for diseases without any known disease association information. a AUC values. b AUPR values

Effectiveness of introducing multiple information sources

In order to illustrate the effectiveness of introducing multiple information sources, we collected 7637 lncRNAs and 6453 diseases from EMBL-EBI (E-MTAB-5214), starBase v2.0 [43], NPInter v3.0 [44], RAID v2.0 [45], Diseases ontology [46], HMDD v2.0 [47], and DisGeNet [48] to construct a large-scale lncRNA-disease heterogeneous network (HNetL) by introducing 2169 known lncRNA-disease associations, then implemented our IDHI-MIRW method on HNetL. Additional files 1 and 2 provided the data processing procedure for lncRNAs and diseases. The results of IDHI-MIRW on HNetS and HNetL heterogeneous networks in LOOCV test are listed in Table 2, from which we can see that introducing more lncRNAs and diseases can effectively improve the predictive performance of IDHI-MIRW and can predict the potential lncRNAs/diseases for new disease/lncRNA without any known disease/lncRNA association information. All these results show that IDHI-MIRW can obtain a more reliable performance for predicting lncRNA-disease associations.

Table 2.

Results of IDHI-MIRW on the small-scale lncRNA-disease heterogeneous network and large-scale lncRNA-disease heterogeneous network in LOOCV test

Network AUC AUPR Recall
Top10 Top20 Top50 Top100
HNetS 0.866 0.318 0.461 0.623 0.766 0.845
HNetL 0.952 0.350 0.449 0.614 0.790 0.851

Effectiveness of using the topological similarity network to construct the lncRNA-disease heterogeneous network

In order to evaluate the effectiveness of using the topological similarity network to construct the lncRNA-disease heterogeneous network for improving the predictive performance, we designed another method of IDHI-AVG by adopting the strategy of averaging three lncRNA similarity matrices of LncNet1, LncNet2 and LncNet3 to form the lncRNA integration network (i.e., LncINet), averaging of three disease similarity matrices of DisNet1, DisNet2, and DisNet3 to form the disease integration network (i.e., DisINet). IDHI-AVG combines these two integration similarity networks of LncINet and DisINet with known lncRNA-disease bipartite network to construct the lncRNA-disease heterogeneous network on which RWRH algorithm is implemented to predict the potential lncRNA-disease associations. The compared results of IDHI-AVG and IDHI-MIRW on the small-scale lncRNA-disease heterogeneous network (HNetS) and large-scale ncRNA-disease heterogeneous network (HNetL) in LOOCV test are shown in Table 3. We can see the AUC and AUPR values of IDHI-MIRW are higher than that of IHDI-AVG. These results demonstrate that the strategy of using RWR and PPMI to form lncRNA/disease topological similarity networks and further constructing the lncRNA-disease heterogeneous network is effective. It can improve the performance of predicting lncRNA-disease associations.

Table 3.

Compared results of IDHI-MIRW and IDHI-AVG on the small-scale lncRNA-disease heterogeneous network and large-scale lncRNA-disease heterogeneous network in LOOCV test

HNetS HNetL
IDHI-AVG IDHI-MIRW IDHI-AVG IDHI-MIRW
AUC 0.829 0.866 0.942 0.952
AUPR 0.238 0.318 0.317 0.350

The effect of parameters

There are four main parameters in our method, which are the restart probability α in RWR, and the restart probability β, jumping probability γ, parameter η in RWRH. η is used to weight the importance of lncRNA topological similarity subnetwork and disease topological similarity subnetwork. To evaluate the effect of parameters, we implemented our IDHI-MIRW on HNetL heterogeneous network in LOOCV test with different αβ, γ, and η values (varying from 0.1 to 0.9 with scale 0.1). Additional file 3 shows the AUC and AUPR values of IDHI-MIRW with different parameters. We can see that the performance of IDHI-MIRW is robust to the value of these four parameters. Additional file 4 presents the AUC and AUPR values of IDHI-MIRW on HNetS heterogeneous network in LOOCV test. In this work, we selected α = 0.9, γ = 0.9, η = 0.2, and β = 0.6.

Case studies and the potential lncRNA-disease associations analysis

We used breast cancer, stomach cancer, and colorectal cancer as the cases to predict their potential associated lncRNAs with our IDHI-MIRW. For a given disease, all known lncRNAs associated with this given disease were considered as the seed nodes, and other remaining lncRNAs (i.e., without known association with the given disease) were considered as the candidates associated with the given disease. By implementing our IDHI-MIRW algorithm on the large-scale lncRNA-disease heterogeneous network, and according to the lncRNA-disease associations ranking scores from large to small, we extract top 15 potential association lncRNAs for each cancer. These top potential association lncRNAs are listed in Additional files 5, 6, and 7.

For breast cancer which is one of most common cancers and the second leading cause of cancer death [49], 13 out of 15 potential association lncRNAs are supported by recent literatures. For example, Diego Chacon-Cortes et al. [50] investigated six SNPs (i.e. rs1888138, rs7336610, rs9589207, rs17735387, rs4248505, rs1428) in the lncRNA MIR17HG, and identified significant association between rs4248505 at the allele level and rs4248505/ rs7336610 at the haplotype level susceptibility to breast cancer, which means that lncRNA MIR17HG plays the main role in the pathophysiology of breast cancer. Fu et al. [51] found lncRNA SNHG1, SNORD28 and sno-miR-28 are all significantly upregulated in breast tumors. LncRNA can be used as the biomarkers and therapeutic targets in combatting breast cancer [52].

For stomach cancer (or gastric cancer) which is the third leading cause of cancer mortality in the world [53, 54], 11 out of 15 potential association lncRNAs can be supported by recent literatures. For example, Hu et al. [55] discovered that lncRNA CRNDE increases gastric cancer cell viability and promotes proliferation by targeting miR-145. Pan et al. [56] found that lncRNA DANCR is activated by SALL4 and promotes the proliferation and invasion of gastric cancer cells. Specially, lncRNA LINC01816 (also known as LOC100133985) associated with stomach cancer has been confirmed by Tian et al. [57]. LncRNA LINC01816 is down-regulated and might be protective factor in gastric cancer.

For colorectal cancer which is the third most commonly diagnosed cancer in males and the second in females [58], 12 out of 15 potential association lncRNAs can be supported by recent literatures. For example, Zhao et al. [59] found that lncRNA SNHG1 promotes cell proliferation by affecting P53 in colorectal cancer. Zhang et al. [60] found that lncRNA CYTOR (also known as LINC00152) down-regulated by miR-376c-3p restricts viability and promotes apoptosis of colorectal cancer cells.

To further discover the evidences for the predicted lncRNAs associated with cancers, we analyzed the RNAseq and clinical data from TCGA for breast cancer, stomach cancer and colorectal cancer. For colorectal cancer, the RNASeq data including 19,676 protein coding genes, 15,513 lncRNA genes in 41 normal samples and 474 tumor samples were downloaded from TCGA. Using DESeq2 [61] algorithm, we found 1230 significantly upregulated lncRNAs and 568 downregulated lncRNAs by setting log2FC > 1 (or < − 1), FDR < 0.001. Among three unvalidated lncRNA, lncRNA SNHG7 (14th) is significantly upregulated in tumor samples (Fig. 3a). Meanwhile, we downloaded the clinical data of 448 tumor samples, and Kaplan-Meier survival analysis shows that lncRNA LINC01816 (10th) can divided the 448 colorectal cancer patients into high and low-risk groups with different survival times (Fig. 3b). The results of RNAseq and clinical data analysis for breast cancer and stomach cancer are shown in.

Fig. 3.

Fig. 3

Results of RNASeq and clinical data analysis for colorectal cancer. a boxplot of lncRNA SNHG7 expression in normal and tumor samples. b survival curve for lncRNA LINC01816

Additional files 8 and 9. 5/6 unvalidated lncRNAs are significantly differentially expressed in corresponding cancers.

In summary, 36 (13 for breast cancer, 11 for stomach cancer, 12 for colorectal cancer) out of 45 potential association lncRNAs have been supported by recent literatures. By analyzing the nine unvalidated potential association lncRNAs, we found that six lncRNAs are differentially expressed in corresponding cancers, and lncRNA LINC01816 is associated with the survival of patients with colorectal cancer. Results of these three case studies show that IDHI-MIRW can effectively predict the new association lncRNAs for a disease.

Discussion

LncRNAs play important roles in the development of human complex diseases. More and more attentions have been paid to discover the lncRNA functions related with human complex disease. Most previous computational methods only focus on the small-scale lncRNA-disease heterogeneous network (i.e., involving small numbers of lncRNAs and diseases) to predict the lncRNA-disease associations. To address this issue, IDHI-MIRW was developed to predict the potential lncRNA-disease associations based on a large-scale lncRNA-disease heterogeneous network (containing 7637 lncRNAs and 6453 diseases). Instead of calculating similarities of lncRNAs and diseases only involving in known lncRNA-disease associations, IDHI-MIRW used three lncRNA-related information (i.e., lncRNA expression profiles, lncRNA-miRNA interactions, and lncRNA-protein interactions) to form three lncRNA similarity networks, and three disease-related information (i.e., disease semantic similarity, disease-miRNA associations, and disease-gene associations) to form three disease similarity networks. Furthermore, instead of directly fusing those similarity networks, IDHI-MIRW applied the RWR algorithm on each lncRNA/disease similarity network to capture the topological similarity, and the PPMI to generate lncRNA/disease topological similarity network. The large-scale lncRNA-disease heterogeneous network was constructed by combing the lncRNA topological similarity network, disease topological similarity network, and the known lncRNA-disease bipartite graph. Then, the RWRH algorithm was used to prioritize candidate lncRNAs for each query disease. Our experiment results show that IDHI-MIRW achieves a better performance than other existing methods. We evaluated the effectiveness of introducing multiple information sources and capturing topological similarities, Tables 2 and 3 show that those strategies are effective for improving the performance of predicting lncRNA-disease associations. In addition, more novel lncRNA-disease associations predicted by IDHI-MIRW are supported by recent literatures, which means that IDHI-MIRW can effectively predict the novel association lncRNAs for a query disease. All the predicted lncRNA-disease associations are provided in Additional file 10.

Although IDHI-MIRW can effectively predict potential lncRNA-disease associations, there are still several issues need to be further addressed in the future. First, IDHI-MIRW used three lncRNA-related and three disease-related information to generate similarity matrices, we still expect to integrate more information (e.g., lncRNA GO annotations and disease MeSH annotation) to better predict lncRNA-disease association. Second, the averaging strategy was used to integrate the lncRNA/disease topological similarity matrices, we expect to design better integration approaches in future work to measure the different contributions of multiple lncRNA/disease similarities.

Conclusions

In this study, we proposed a novel network-based method (namely IDHI-MIRW) for identifying potential lncRNA-disease associations. We built a large-scale lncRNA-disease heterogeneous network by integrating multiple lncRNA-related information (i.e. lncRNA expression profiles, lncRNA-miRNA interactions, and lncRNA-protein interactions), multiple disease-related information (i.e. disease semantic similarity, disease-miRNA associations, and disease-gene associations), and known lncRNA-disease association information using RWR and PPMI. Our experimental results show that IDHI-MIRW can achieve higher performance than other state-of-the-art methods, and we found lncRNA LINC01816 is associated with the survival of colorectal cancer patients. These results indicate that IDHI-MIRW will contribute to the identification of potential lncRNA-disease associations.

Methods

Datasets

We collected lncRNA expression profile, lncRNA-miRNA interaction, and lncRNA-protein interaction data for constructing the lncRNA similarity networks, and Diseases Ontology (DO) information, disease-miRNA association, and disease-protein association data for constructing the disease similarity networks. All lncRNAs are annotated by ensembl gene ID, and all diseases are annotated by Disease Ontology ID.

LncRNA expression profiles were downloaded from EMBL-EBI (E-MTAB-5214), which includes the expression profiles in 53 human tissue samples. LncRNA-miRNA interactions and lncRNA-protein interactions were collected from starBase v2.0 [43], NPInter v3.0 [44], and RAID v2.0 [45] databases. Diseases ontology terms were collected from the Disease ontology [46]. Diseases-miRNAs associations were collected from HMDD v2.0 [47]. Disease-gene associations were collected from DisGeNet [48]. Known lncRNA-disease associations were collected from lncRNAdisease [15], lnc2Cancer [16], and GeneRIF [62]. Details and statistics of these data are shown in Additional file 11.

An overview of the IDHI-MIRW algorithm

Our IDHI-MIRW algorithm consists of the following four steps. Step 1, build three lncRNA similarity networks (i.e., LncNet1, LncNet2, LncNet3) based on lncRNA expression profiles, lncRNA-miRNA interactions, and lncRNA-protein interactions, and also build three disease similarity networks (i.e., DisNet1, DisNet2, DisNet3) based on disease ontology, disease-miRNA associations, and disease-gene associations. Step 2, form the lncRNA topological similarity network (LncTSNet) and disease topological similarity network (DisTSNet) by fusing lncRNA and disease multiple topological similarities obtained through implementing RWR on lncRNA similarity network (LncNet1, LncNet2, LncNet3) and disease similarity network (DisNet1, DisNet2, DisNet3), respectively. Step 3, construct a large-scale lncRNA-disease heterogeneous network by integrating lncRNA topological similarity network (LncTSNet), disease topological similarity network (DisTSNet), and known lncRNA-disease associations. Step 4, implement RWRH on the lncRNA-disease heterogeneous network for predicting the potential lncRNA-disease associations. The flowchart of IDHI-MIRW is shown in Fig. 4.

Fig. 4.

Fig. 4

Flowchart of the IDHI-MIRW. a building three lncRNA similarity networks and three disease similarity networks by calculating the Pearson correlation coefficient and Gaussian interaction profile kernel similarity. b forming the lncRNA/disease topological similarity networks with RWR and positive pointwise mutual information. c constructing the large-scale lncRNA-disease heterogeneous network by integrating lncRNA/disease topological similarities and known lncRNA-disease associations. d predicting the potential lncRNA-disease associations by implementing RWRH

Building lncRNA/disease similarity networks

By calculating the Pearson correlation coefficient of any lncRNA pair with expression profiles and fixing the P-value threshold (< 0.01), we built the LncNet1 lncRNA similarity weighted network. Based on Gaussian interaction profile kernel similarity [18, 63] of lncRNA-miRNA and lncRNA-protein interactions, we computed the Gaussian interaction profile kernel similarity between any pair of lncRNA li and lncRNA lj, then built the LncNet2 and LncNet3 lncRNA similarity weighted networks, respectively. Gaussian interaction profile kernel similarity between lncRNA li and lncRNA lj is calculated.

KDlilj=ExpκlIPliIPlj 1
κl=1/1Nli=1NlIPli2 2

where, the interaction profile IP(li) is the binary vector of lncRNA-miRNA (or lncRNA-protein) interactions encoding the presence or absence of interactions between lncRNA li and miRNA (or protein) in the lncRNA-miRNA (or lncRNA-protein) interaction dataset, κl controls the kernel bandwidth, and Nl is the total number of lncRNAs.

Based on the structure of a directed acyclic graph (DAG) in Disease Ontology, we used the function “doSim” form R package “DOSE” [64] to obtain the similarity between any disease pair, then built the DisNet1 disease similarity weighted network. Based on Gaussian interaction profile kernel similarity of disease-miRNA and disease-gene associations, we computed the Gaussian interaction profile kernel similarity between any pair of disease di and dj, then built the DisNet2 and DisNet3 disease similarity weighted networks, respectively.

KDdidj=expκdIPdiIPdj 3
κd=1/1Ndi=1NdIPdi2 4

where, the interaction profile IP(di) is the binary vector of disease-miRNA (or disease-gene) associations encoding the presence or absence of associations between di and miRNA (or gene) in the disease-miRNA (or disease-gene) association dataset. κd controls the kernel bandwidth, and Nd is the total number of diseases.

Generating lncRNA/disease topological similarity networks

Instead of directly fusing six similarity networks (i.e., LncNet1, LncNet2, LncNet3, DisNet1, DisNet2, and DisNet3), we captured the network topological structural features by implementing the RWR algorithm on each similarity network. The RWR algorithm is a network diffusion algorithm, which has been extensively applied to analyze the complex biological network [6569]. By considering both local and global topological connectivity patterns within network, the RWR algorithm can fully exploit the direct or indirect relation between nodes [65]. The RWR algorithm can be formulated as:

St+1=1αStW+αS0 5
Wij=BijjBij 6

where, St is the distribution matrix in which the (i, j)-th element denotes the distribution probability of node j being visited from node i after t iterations in the random walk process and S0 is the initial distribution matrix in which S0(i, i) = 1, S0(i, j) = 0, ∀j ≠ iα is restart probability controlling the relative influence of local and global topological information. B is the weighted adjacency matrix of lncRNA (or disease).

When the L1 norm of ΔS = St + 1 − Stis less than a small positive ε (we set ε = 10−10), we can obtain a stationary distribution matrix S, which was referred as the diffusion state of each node [70]. The element S(i, j) in diffusion state matrix S represents the probability of RWR starting node i and ending up at node j in equilibrium. When the diffusion states of two nodes are close, which suggests that they may have similar positions with respect to other nodes in the network and they probably share similar functions.

Motivated by Gligorijevic et.al. [69], we then calculated the topological similarity of each node pair by using PPMI, which is defined as:

MIij=max0log2SijijSijiSijjSij 7

The matrix MI is a non-symmetric matrix, thus we use the average of MI(i, j) and MI(j, i) to represent the topological similarity of node i and node j. After obtaining three lncRNA topological similarity matrices XL1, XL2, XL3 of LncNet1, LncNet2, LncNet3, and three disease topological similarity matrices XD1, XD2, XD3 of DisNet1, DisNet2, DisNet3, we can form the integration lncRNA topological similarity matrix XL by averaging three lncRNA topological similarity matrices, and the disease topological similarity matrix XD by averaging three disease topological similarity matrices, that is, XL=XL1+XL2+XL3/3, XD=XD1+XD2+XD3/3. Thus, we generated the lncRNA topological similarity network LncTSNet, and disease topological similarity network DisTSNet.

Constructing the lncRNA-disease heterogeneous network

By integrating the LncTSNet and DisTSNet networks with known lncRNA-disease bipartite network, we can construct the lncRNA-disease heterogeneous network whose adjacency matrix can be defined as:

A=ALALDADLAD 8

where, AL and AD represent the weighted adjacency matrices of LncTSNet and DisTSNet, respectively; ALD is the adjacency matrix of the lncRNA-disease bipartite graph; ADL represents the transpose of ALD. If there is association between lncRNA i and disease j in known lncRNA-disease associations, ALD(i, j) = 1, otherwise, ALD(i, j) = 0.

Implementing RWRH algorithm for predicting lncRNA-disease associations

To predict the association between lncRNA and disease, we adopted the RWRH (random walk with restart on heterogeneous network) algorithm [42] to prioritize candidate lncRNAs associated with a given disease. The RWRH algorithm is well-known heterogeneous network-based algorithm to infer the gene-phenotype relationship. It can effectively capture the complementarity of two kinds of node within heterogeneous network, which is widely used to predict the association problem [42, 71, 72]. The RWRH algorithm on the lncRNA-disease heterogeneous network can be formulated as:

pt+1=1βptM+βp0 9

where, pt is a probability vector in which the i-th element holds the probability of finding the random walker at node i at step t; β ∈ (0, 1) is restart probability; p0 is the initial probability vector for lncRNA-disease heterogeneous network which is defined as p0=ηu01ηv0. u0 and v0 represent the initial probability of LncTSNet and DisTSNet, respectively. The initial probability u0 of LncTSNet network is set such that all the seed nodes are assigned to the equal probabilities with the sum of probabilities equal to 1. Similarity, the initial probability v0 of DisTSNet network is given. The parameter η ∈ (0, 1) is used to weight the importance of each subnetwork.

M=MLMLDMDLMD is the transition matrix of the lncRNA-disease heterogenous network, where ML and MD are the intra-subnetwork transition matrices, MLD and MDL are the inter-subnetwork transition matrices. Let γ be the jumping probability, that is, the probability of random walker jumping from lncRNA network to disease network or vice versa. Thus, the transition probability ML(i, j) from lncRNA li to lncRNA lj and the transition probability MD (i, j) from disease di to disease dj are defined as

MLij=ALijjALijifjALDji=01γALijjALijotherwise 10
MDij=ADijjADijifjALDij=01γADijjADijotherwise 11

The transition probability from lncRNA li to disease dj and the transition probability from disease di to lncRNA lj are described as:

MLDij=γALDijjALDijifjALDij00otherwise 12
MDLij=γADLijjADLijifjADLij00otherwise 13

After some steps, the steady state probability vector p = p can be obtained by performing the iteration until the difference between pt and pt + 1 (measured by the L1 norm) fall below 10−10. p gives the ranking score of every lncRNA for a query disease. The lncRNAs with maximum in p are considered as the most probable associated lncRNAs of the query disease.

Additional files

Additional file 1: (1.4MB, tif)

LncRNA data processing procedure. (TIF 1447 kb)

Additional file 2: (1.3MB, tif)

Disease data processing procedure. (TIF 1340 kb)

Additional file 3: (3.4MB, tif)

AUPR values of IDHI-MIRW on the large-scale lncRNA-disease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3520 kb)

Additional file 4: (3.6MB, tif)

AUC and AUPR values of IDHI-MIRW on the small-scale lncRNA-disease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3705 kb)

Additional file 5: (9.6KB, xlsx)

The top 15 predicted associated lncRNAs for breast cancer. (XLSX 9 kb)

Additional file 6: (9.6KB, xlsx)

The top 15 predicted associated lncRNAs for stomach cancer. (XLSX 9 kb)

Additional file 7: (9.5KB, xlsx)

The top 15 predicted associated lncRNAs for colorectal cancer. (XLSX 9 kb)

Additional file 8: (9.6MB, tif)

The results of RNASeq data analysis for breast cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA AL157395.1 expression values. (C) boxplot of lncRNA AL157395.1 expression in normal and tumor samples. (D) heatmap of lncRNA AP001528.1 expression values. (E) boxplot of lncRNA AP001528.1 expression in normal and tumor samples. (TIF 9850 kb)

Additional file 9 (9MB, tif)

The results of RNASeq data analysis for stomach cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA KCNQ1OT1 expression values. (C) boxplot of lncRNA KCNQ1OT1 expression in normal and tumor samples. (D) heatmap of lncRNA DLEU2 expression values. (E) boxplot of lncRNA DLEU2 expression in normal and tumor samples. (F) heatmap of lncRNA LINC00299 expression values. (G) boxplot of lncRNA LINC00299 expression in normal and tumor samples. (TIF 9211 kb)

Additional file 10: (180KB, txt)

The predicted lncRNA-disease associations. (TXT 180 kb)

Additional file 11: (34.4KB, docx)

Details and statistics of collected data. (DOCX 34 kb)

Acknowledgements

Not applicable.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 61873202, No. 61473232 and No. 91430111; and the National Library of Medicine grants of United States under Grant No. R00LM011673. The funding bodies did not play any roles in the design of the study, in the collection, analysis, or interpretation of data, or in writing the manuscript.

Availability of data and materials

IDHI-MIRW is available at https://github.com/NWPU-903PR/IDHI-MIRW, and the datasets used and/or analyzed during the current study are available from the corresponding references.

Abbreviations

AUC

The area under the receiver operating characteristic curve

AUPR

The area under the precision-recall curve

DAG

Directed acyclic graph

DO

Disease ontology

FPR

False-positive rate

lncRNAs

Long noncoding RNAs

LOOCV

Leave-one-out cross validation; ROC: receiver operating characteristic

PPMI

Positive pointwise mutual information

PR

Precision-recall

RWR

Random walk with restart

RWRH

Random walk with restart on heterogeneous network

TPR

True-positive rate

Authors’ contributions

XNF collected the dataset, performed the experiments, and wrote the initial manuscript. SWZ and SL conceived and designed the experiments. XNF, SYZ and KZ analyzed the results. XNF and SYZ developed the codes. SWZ revised the manuscript. All authors participated in the definition of the process, the discussion of relevant aspects, and approved the final manuscript.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Xiao-Nan Fan, Email: fanxiaonan0223@gmail.com.

Shao-Wu Zhang, Email: zhangsw@nwpu.edu.cn.

Song-Yao Zhang, Email: zhangsongyaonwpu@sina.com.

Kunju Zhu, Email: kunju@pitt.edu.

Songjian Lu, Email: songjian@pitt.edu.

References

  • 1.Quinn JJ, Chang HY. Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet. 2016;17(1):47–62. doi: 10.1038/nrg.2015.10. [DOI] [PubMed] [Google Scholar]
  • 2.Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Annu Rev Biochem. 2012;81:145–166. doi: 10.1146/annurev-biochem-051410-092902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Huarte M. The emerging role of lncRNAs in cancer. Nat Med. 2015;21(11):1253–1261. doi: 10.1038/nm.3981. [DOI] [PubMed] [Google Scholar]
  • 4.Schmitt AM, Chang HY. Long noncoding RNAs in Cancer pathways. Cancer Cell. 2016;29(4):452–463. doi: 10.1016/j.ccell.2016.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Quinodoz S, Guttman M. Long noncoding RNAs: an emerging link between gene regulation and nuclear organization. Trends Cell Biol. 2014;24(11):651–663. doi: 10.1016/j.tcb.2014.08.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Yang L, Lin C, Jin C, Yang JC, Tanasa B, Li W, Merkurjev D, Ohgi KA, Meng D, Zhang J, et al. lncRNA-dependent mechanisms of androgen-receptor-regulated gene activation programs. Nature. 2013;500(7464):598–602. doi: 10.1038/nature12451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Lee S, Kopp F, Chang TC, Sataluri A, Chen B, Sivakumar S, Yu H, Xie Y, Mendell JT. Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell. 2016;164(1–2):69–80. doi: 10.1016/j.cell.2015.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Yan X, Hu Z, Feng Y, Hu X, Yuan J, Zhao SD, Zhang Y, Yang L, Shan W, He Q, et al. Comprehensive genomic characterization of long non-coding RNAs across human cancers. Cancer Cell. 2015;28(4):529–540. doi: 10.1016/j.ccell.2015.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21(6):354–361. doi: 10.1016/j.tcb.2011.04.001. [DOI] [PubMed] [Google Scholar]
  • 10.Chen X, Yan CC, Zhang X, You ZH. Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief Bioinform. 2017;18(4):558–576. doi: 10.1093/bib/bbw060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, Tsai MC, Hung T, Argani P, Rinn JL, et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–1076. doi: 10.1038/nature08975. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ji P, Diederichs S, Wang W, Boing S, Metzger R, Schneider PM, Tidow N, Brandt B, Buerger H, Bulk E, et al. MALAT-1, a novel noncoding RNA, and thymosin beta4 predict metastasis and survival in early-stage non-small cell lung cancer. Oncogene. 2003;22(39):8031–8041. doi: 10.1038/sj.onc.1206928. [DOI] [PubMed] [Google Scholar]
  • 13.Fan M, Li X, Jiang W, Huang Y, Li J, Wang Z. A long non-coding RNA, PTCSC3, as a tumor suppressor and a target of miRNAs in thyroid cancer cells. Exp Ther Med. 2013;5(4):1143–1146. doi: 10.3892/etm.2013.933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Quek XC, Thomson DW, Maag JL, Bartonicek N, Signal B, Clark MB, Gloss BS, Dinger ME. lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs. Nucleic Acids Res. 2015;43(Database issue):D168–D173. doi: 10.1093/nar/gku988. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen G, Wang Z, Wang D, Qiu C, Liu M, Chen X, Zhang Q, Yan G, Cui Q. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 2013;41(Database issue):D983–D986. doi: 10.1093/nar/gks1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ning S, Zhang J, Wang P, Zhi H, Wang J, Liu Y, Gao Y, Guo M, Yue M, Wang L, et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 2016;44(D1):D980–D985. doi: 10.1093/nar/gkv1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Zhao Y, Li H, Fang S, Kang Y, Wu W, Hao Y, Li Z, Bu D, Sun N, Zhang MQ, et al. NONCODE 2016: an informative and valuable data source of long non-coding RNAs. Nucleic Acids Res. 2016;44(D1):D203–D208. doi: 10.1093/nar/gkv1252. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chen X, Yan GY. Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinformatics. 2013;29(20):2617–2624. doi: 10.1093/bioinformatics/btt426. [DOI] [PubMed] [Google Scholar]
  • 19.Chen X, Yan CC, Luo C, Ji W, Zhang Y, Dai Q. Constructing lncRNA functional similarity network based on lncRNA-disease associations and disease semantic similarity. Sci Rep. 2015;5:11338. doi: 10.1038/srep11338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Huang YA, Chen X, You ZH, Huang DS, Chan KC. ILNCSIM: improved lncRNA functional similarity calculation model. Oncotarget. 2016;7(18):25902–25914. doi: 10.18632/oncotarget.8296. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chen X, Huang YA, Wang XS, You ZH, Chan KC. FMLNCSIM: fuzzy measure-based lncRNA functional similarity calculation model. Oncotarget. 2016;7(29):45948–45958. doi: 10.18632/oncotarget.10008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu MX, Chen X, Chen G, Cui QH, Yan GY. A computational framework to infer human disease-associated long noncoding RNAs. PLoS One. 2014;9(1):e84408. doi: 10.1371/journal.pone.0084408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Chen X. Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA. Sci Rep. 2015;5:13186. doi: 10.1038/srep13186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Zhao T, Xu J, Liu L, Bai J, Xu C, Xiao Y, Li X, Zhang L. Identification of cancer-related lncRNAs through integrating genome, regulome and transcriptome features. Mol BioSyst. 2015;11(1):126–136. doi: 10.1039/c4mb00478g. [DOI] [PubMed] [Google Scholar]
  • 25.Wang J, Ma R, Ma W, Chen J, Yang J, Xi Y, Cui Q. LncDisease: a sequence based bioinformatics tool for predicting lncRNA-disease associations. Nucleic Acids Res. 2016;44(9):e90. doi: 10.1093/nar/gkw093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lan W, Li M, Zhao K, Liu J, Wu FX, Pan Y, Wang J. LDAP: a web server for lncRNA-disease association prediction. Bioinformatics. 2017;33(3):458–460. doi: 10.1093/bioinformatics/btw639. [DOI] [PubMed] [Google Scholar]
  • 27.Fu G, Wang J, Domeniconi C, Yu G. Matrix factorization-based data fusion for the prediction of lncRNA-disease associations. Bioinformatics. 2018;34(9):1529–1537. doi: 10.1093/bioinformatics/btx794. [DOI] [PubMed] [Google Scholar]
  • 28.Cheng L, Hu Y, Sun J, Zhou M, Jiang Q. DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function. Bioinformatics. 2018;34(11):1953–1956. doi: 10.1093/bioinformatics/bty002. [DOI] [PubMed] [Google Scholar]
  • 29.Yu G, Wang Y, Wang J, Fu G, Guo M, Domeniconi C: Weighted matrix factorization based data fusion for predicting lncRNA-disease associations. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2018: 572–577.
  • 30.Sun J, Shi H, Wang Z, Zhang C, Liu L, Wang L, He W, Hao D, Liu S, Zhou M. Inferring novel lncRNA-disease associations based on a random walk model of a lncRNA functional similarity network. Mol BioSyst. 2014;10(8):2074–2081. doi: 10.1039/c3mb70608g. [DOI] [PubMed] [Google Scholar]
  • 31.Yang X, Gao L, Guo X, Shi X, Wu H, Song F, Wang B. A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases. PLoS One. 2014;9(1):e87797. doi: 10.1371/journal.pone.0087797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhou M, Wang X, Li J, Hao D, Wang Z, Shi H, Han L, Zhou H, Sun J. Prioritizing candidate disease-related long non-coding RNAs by walking on the heterogeneous lncRNA and disease network. Mol BioSyst. 2015;11(3):760–769. doi: 10.1039/c4mb00511b. [DOI] [PubMed] [Google Scholar]
  • 33.Chen X. KATZLDA: KATZ measure for the lncRNA-disease association prediction. Sci Rep. 2015;5:16840. doi: 10.1038/srep16840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen X, You ZH, Yan GY, Gong DW. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction. Oncotarget. 2016;7(36):57919–57931. doi: 10.18632/oncotarget.11141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Cheng L, Shi H, Wang Z, Hu Y, Yang H, Zhou C, Sun J, Zhou M. IntNetLncSim: an integrative network analysis method to infer human lncRNA functional similarity. Oncotarget. 2016;7(30):47864–47874. doi: 10.18632/oncotarget.10012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yu G, Fu G, Lu C, Ren Y, Wang J. BRWLDA: bi-random walks for predicting lncRNA-disease associations. Oncotarget. 2017;8(36):60429–60446. doi: 10.18632/oncotarget.19588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Wang P, Guo Q, Gao Y, Zhi H, Zhang Y, Liu Y, Zhang J, Yue M, Guo M, Ning S, et al. Improved method for prioritization of disease associated lncRNAs based on ceRNA theory and functional genomics data. Oncotarget. 2017;8(3):4642–4655. doi: 10.18632/oncotarget.13964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Yao Q, Wu L, Li J, Yang LG, Sun Y, Li Z, He S, Feng F, Li H, Li Y. Global prioritizing disease candidate lncRNAs via a multi-level composite network. Sci Rep. 2017;7:39516. doi: 10.1038/srep39516. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ding L, Wang M, Sun D, Li A. TPGLDA: novel prediction of associations between lncRNAs and diseases via lncRNA-disease-gene tripartite graph. Sci Rep. 2018;8(1):1065. doi: 10.1038/s41598-018-19357-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gu C, Liao B, Li X, Cai L, Li Z, Li K, Yang J. Global network random walk for predicting potential human lncRNA-disease associations. Sci Rep. 2017;7(1):12442. doi: 10.1038/s41598-017-12763-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zhang J, Zhang Z, Chen Z, Deng L. Integrating multiple heterogeneous networks for novel LncRNA-disease association inference. IEEE/ACM Trans Comput Biol Bioinform. 2017. [DOI] [PubMed]
  • 42.Li Y, Patra JC. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219–1224. doi: 10.1093/bioinformatics/btq108. [DOI] [PubMed] [Google Scholar]
  • 43.Li JH, Liu S, Zhou H, Qu LH, Yang JH. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42(Database issue):D92–D97. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Hao Y, Wu W, Li H, Yuan J, Luo J, Zhao Y, Chen R: NPInter v3.0: an upgraded database of noncoding RNA-associated interactions. Database (Oxford) 2016, 2016. [DOI] [PMC free article] [PubMed]
  • 45.Yi Y, Zhao Y, Li C, Zhang L, Huang H, Li Y, Liu L, Hou P, Cui T, Tan P, et al. RAID v2.0: an updated resource of RNA-associated interactions across organisms. Nucleic Acids Res. 2017;45(D1):D115–D118. doi: 10.1093/nar/gkw1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Schriml LM, Arze C, Nadendla S, Chang YW, Mazaitis M, Felix V, Feng G, Kibbe WA. Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res. 2012;40(Database issue):D940–D946. doi: 10.1093/nar/gkr972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Li Y, Qiu C, Tu J, Geng B, Yang J, Jiang T, Cui Q. HMDD v2.0: a database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42(Database issue):D1070–D1074. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Pinero J, Bravo A, Queralt-Rosinach N, Gutierrez-Sacristan A, Deu-Pons J, Centeno E, Garcia-Garcia J, Sanz F, Furlong LI. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45(D1):D833–D839. doi: 10.1093/nar/gkw943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cancer Genome Atlas N: Comprehensive molecular portraits of human breast tumours. Nature 2012, 490(7418):61–70. [DOI] [PMC free article] [PubMed]
  • 50.Chacon-Cortes D, Smith RA, Lea RA, Youl PH, Griffiths LR. Association of microRNA 17-92 cluster host gene (MIR17HG) polymorphisms with breast cancer. Tumour Biol. 2015;36(7):5369–5376. doi: 10.1007/s13277-015-3200-1. [DOI] [PubMed] [Google Scholar]
  • 51.Yu F, Bracken CP, Pillman KA, Lawrence DM, Goodall GJ, Callen DF, Neilsen PM. p53 represses the oncogenic Sno-MiR-28 derived from a SnoRNA. PLoS One. 2015;10(6):e0129190. doi: 10.1371/journal.pone.0129190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lin A, Li C, Xing Z, Hu Q, Liang K, Han L, Wang C, Hawke DH, Wang S, Zhang Y, et al. The LINK-A lncRNA activates normoxic HIF1alpha signalling in triple-negative breast cancer. Nat Cell Biol. 2016;18(2):213–224. doi: 10.1038/ncb3295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016. CA Cancer J Clin. 2016;66(1):7–30. doi: 10.3322/caac.21332. [DOI] [PubMed] [Google Scholar]
  • 54.Ge S, Xia X, Ding C, Zhen B, Zhou Q, Feng J, Yuan J, Chen R, Li Y, Ge Z, et al. A proteomic landscape of diffuse-type gastric cancer. Nat Commun. 2018;9(1):1012. doi: 10.1038/s41467-018-03121-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Hu CE, Du PZ, Zhang HD, Huang GJ. Long noncoding RNA CRNDE promotes proliferation of gastric Cancer cells by targeting miR-145. Cell Physiol Biochem. 2017;42(1):13–21. doi: 10.1159/000477107. [DOI] [PubMed] [Google Scholar]
  • 56.Pan L, Liang W, Gu J, Zang X, Huang Z, Shi H, Chen J, Fu M, Zhang P, Xiao X, et al. Long noncoding RNA DANCR is activated by SALL4 and promotes the proliferation and invasion of gastric cancer cells. Oncotarget. 2018;9(2):1915–1930. doi: 10.18632/oncotarget.23019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Tian X, Zhu X, Yan T, Yu C, Shen C, Hong J, Chen H, Fang JY. Differentially expressed lncRNAs in gastric Cancer patients: a potential biomarker for gastric Cancer prognosis. J Cancer. 2017;8(13):2575–2586. doi: 10.7150/jca.19980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A. Global cancer statistics, 2012. CA Cancer J Clin. 2015;65(2):87–108. doi: 10.3322/caac.21262. [DOI] [PubMed] [Google Scholar]
  • 59.Zhao Y, Qin ZS, Feng Y, Tang XJ, Zhang T, Yang L. Long non-coding RNA (lncRNA) small nucleolar RNA host gene 1 (SNHG1) promote cell proliferation in colorectal cancer by affecting P53. Eur Rev Med Pharmacol Sci. 2018;22(4):976–984. doi: 10.26355/eurrev_201802_14379. [DOI] [PubMed] [Google Scholar]
  • 60.Zhang YH, Fu J, Zhang ZJ, Ge CC, Yi Y. LncRNA-LINC00152 down-regulated by miR-376c-3p restricts viability and promotes apoptosis of colorectal cancer cells. Am J Transl Res. 2016;8(12):5286–5297. [PMC free article] [PubMed] [Google Scholar]
  • 61.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Lu Z, Cohen KB, Hunter L. GeneRIF quality assurance as summary revision. Pac Symp Biocomput. 2007:269–80. [DOI] [PMC free article] [PubMed]
  • 63.van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics. 2011;27(21):3036–3043. doi: 10.1093/bioinformatics/btr500. [DOI] [PubMed] [Google Scholar]
  • 64.Yu G, Wang LG, Yan GR, He QY. DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics. 2015;31(4):608–609. doi: 10.1093/bioinformatics/btu684. [DOI] [PubMed] [Google Scholar]
  • 65.Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun. 2017;8(1):573. doi: 10.1038/s41467-017-00680-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Cao M, Pietras CM, Feng X, Doroschak KJ, Schaffner T, Park J, Zhang H, Cowen LJ, Hescott BJ. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. Bioinformatics. 2014;30(12):i219–i227. doi: 10.1093/bioinformatics/btu263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Navlakha S, Kingsford C. The power of protein interaction networks for associating genes with diseases. Bioinformatics. 2010;26(8):1057–1063. doi: 10.1093/bioinformatics/btq076. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Liao CS, Lu K, Baym M, Singh R, Berger B. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics. 2009;25(12):i253–i258. doi: 10.1093/bioinformatics/btp203. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Gligorijevic V, Barot M, Bonneau R. DeepNF: deep network fusion for protein function prediction. Bioinformatics. 2018;34(22):3873–3881. doi: 10.1093/bioinformatics/bty440. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Cho H, Berger B, Peng J. Diffusion component analysis: unraveling functional topology in biological networks. Res Comput Mol Biol. 2015;9029:62–64. doi: 10.1007/978-3-319-16706-0_9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Chen X, Liu MX, Yan GY. Drug-target interaction prediction by random walk on the heterogeneous network. Mol BioSyst. 2012;8(7):1970–1978. doi: 10.1039/c2mb00002d. [DOI] [PubMed] [Google Scholar]
  • 72.Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics. 2018. [DOI] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1: (1.4MB, tif)

LncRNA data processing procedure. (TIF 1447 kb)

Additional file 2: (1.3MB, tif)

Disease data processing procedure. (TIF 1340 kb)

Additional file 3: (3.4MB, tif)

AUPR values of IDHI-MIRW on the large-scale lncRNA-disease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3520 kb)

Additional file 4: (3.6MB, tif)

AUC and AUPR values of IDHI-MIRW on the small-scale lncRNA-disease heterogeneous with different parameters in LOOCV test. (A) AUC values with different α. (B) AUC values with different γ. (C) AUC values with different η. (D) AUC values with different β. (E) AUPR values with different α. (F) AUPR values with different γ. (G) AUPR values with different η. (H) AUPR values with different β. (TIF 3705 kb)

Additional file 5: (9.6KB, xlsx)

The top 15 predicted associated lncRNAs for breast cancer. (XLSX 9 kb)

Additional file 6: (9.6KB, xlsx)

The top 15 predicted associated lncRNAs for stomach cancer. (XLSX 9 kb)

Additional file 7: (9.5KB, xlsx)

The top 15 predicted associated lncRNAs for colorectal cancer. (XLSX 9 kb)

Additional file 8: (9.6MB, tif)

The results of RNASeq data analysis for breast cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA AL157395.1 expression values. (C) boxplot of lncRNA AL157395.1 expression in normal and tumor samples. (D) heatmap of lncRNA AP001528.1 expression values. (E) boxplot of lncRNA AP001528.1 expression in normal and tumor samples. (TIF 9850 kb)

Additional file 9 (9MB, tif)

The results of RNASeq data analysis for stomach cancer. (A) heatmap of top 200 most significantly dysregulated lncRNA expression values. (B) heatmap of lncRNA KCNQ1OT1 expression values. (C) boxplot of lncRNA KCNQ1OT1 expression in normal and tumor samples. (D) heatmap of lncRNA DLEU2 expression values. (E) boxplot of lncRNA DLEU2 expression in normal and tumor samples. (F) heatmap of lncRNA LINC00299 expression values. (G) boxplot of lncRNA LINC00299 expression in normal and tumor samples. (TIF 9211 kb)

Additional file 10: (180KB, txt)

The predicted lncRNA-disease associations. (TXT 180 kb)

Additional file 11: (34.4KB, docx)

Details and statistics of collected data. (DOCX 34 kb)

Data Availability Statement

IDHI-MIRW is available at https://github.com/NWPU-903PR/IDHI-MIRW, and the datasets used and/or analyzed during the current study are available from the corresponding references.


Articles from BMC Bioinformatics are provided here courtesy of BMC

RESOURCES