Abstract
An increasing number of studies have indicated that long-non-coding RNAs (lncRNAs) play crucial roles in biological processes, complex disease diagnoses, prognoses, and treatments. However, experimentally validated associations between lncRNAs and diseases are still very limited. Recently, computational models have been developed to discover potential associations between lncRNAs and diseases by integrating multiple heterogeneous biological data; this has become a hot topic in biological research. In this article, we constructed a global tripartite network by integrating a variety of biological information including miRNA–disease, miRNA–lncRNA, and lncRNA–disease associations and interactions. Then, we constructed a global quadruple network by appending gene–lncRNA interaction, gene–disease association, and gene–miRNA interaction networks to the global tripartite network. Subsequently, based on these two global networks, a novel approach was proposed based on the naïve Bayesian classifier to predict potential lncRNA–disease associations (NBCLDA). Comparing with the state-of-the-art methods, our new method does not entirely rely on known lncRNA–disease associations, and can achieve a reliable performance with effective area under ROC curve (AUCs)in leave-one-out cross validation. Moreover, in order to further estimate the performance of NBCLDA, case studies of colorectal cancer, prostate cancer, and glioma were implemented in this paper, and the simulation results demonstrated that NBCLDA can be an excellent tool for biomedical research in the future.
Keywords: lncRNA–disease associations, tripartite network, quadruple network, prediction model, Naïve Bayesian Classifier
1. Introduction
Long non-coding RNAs (lncRNAs), those with over 200 nucleotides in length [1,2,3], are considered a new class of non-protein-coding transcripts. Much research evidence has shown that lncRNAs participate in almost the entire cell life cycle through various mechanisms and play significant roles in multiple biological processes including transcription, translation, epigenetic regulation, splicing, differentiation, immune response, cell cycle control, and so on [4,5,6,7,8]. In particular, the mutations and dysregulations of lncRNAs have been proven to be closely related to various human complex diseases [9,10,11], including AIDS [12], diabetes [13], Alzheimer’s Disease (AD) [14], and many types of cancers such as breast [15], prostate [16], hepatocellular [17], and bladder cancer [18]. For instance, the expression of the lncRNA called HOTAIR was shown to be higher in primary breast tumors and metastases, and the HOTAIR expression level was proven to be a powerful predictor of eventual metastasis and death [19,20]. Additionally, the lncRNA MALAT1 was demonstrated as a prognostic indicator as well as a therapeutic target and acts as a potential therapeutic method for preventing lung cancer metastasis, which is targeted by antisense oligonucleotides (ASO) [21]. Moreover, recent studies have shown that the human gene is frequently overexpressed in the myometrium and stroma during pathological endometrial proliferative events [22].
Obviously, predicting potential associations between lncRNAs and diseases would contribute to systematically understanding the pathogenesis of complex diseases at the molecular level and facilitate the identification of biomarkers for disease diagnosis, treatment, and prediction of response to therapy. However, relatively few experiments have supported lncRNA–disease associations until now. Hence, developing effective computational methods to uncover the potential associations between lncRNAs and diseases has become a hot topic in recent years. In general, existing models for predicting potential associations between lncRNAs and diseases can be divided into three categories. Among them, the first kind of methods are based on known disease-related lncRNAs. For example, Sun et al. proposed a model named RWRlncD [23], which carried out a random walk with the restart method on an lncRNA functional similarity network. This method uncovered potential associations between lncRNAs and diseases by integrating the disease similarity network, the lncRNAs functional network, and known lncRNA–disease associations. Ping et al. developed a method based on a newly constructed bipartite network, which relies on the known associations between lncRNAs and diseases [24]. Yang et al. constructed a coding-non-coding gene–disease bipartite network based on known associations between diseases and disease-causing genes (including lncRNAs). Then, they developed an iterative algorithm to uncover the possible links in the newly constructed bipartite network [25]. Ding et al. proposed a new model named TPGLDA to predict potential lncRNA–disease associations by integrating gene–disease associations with lncRNA–disease associations [26].
Different from the first kind of methods based on known lncRNA–disease associations, the second category of prediction models does not rely on known disease-related lncRNAs. For example, Chen et al. proposed a new method called HGLDA by integrating micro-RNA (miRNA)–disease associations and lncRNA–miRNA interactions. A hypergeometric distribution test is then applied to identify potential lncRNA–disease associations [27]. Liu et al. developed a computational framework by integrating human lncRNA expression profiles, gene expression profiles, and human disease-associated gene data to predict potential human lncRNA–disease associations [28]. Li et al. put forward a prediction method on account of the information of genome location to globally discover potential human lncRNAs related to vascular disease [29]. Gu et al. proposed a random walk-based model to identify potential associations between lncRNAs and diseases, which can be applied for predicting a disease without known associated lncRNAs and for inferring an lncRNA without known associated diseases [30].
In recent years, an increasing number of studies have been developed for understanding the cellular process, molecular interactions, and the pathogenesis of complex diseases at the molecular level by integrating different types of data and molecular interaction networks [31]. Such research includes the prediction of gene–disease associations [32], and the prediction of potential disease-associated miRNAs [33,34]. An increasing number of researchers have also adopted various data frameworks to increase the reliability of association prediction between diseases and lncRNAs. Hence, a third kind of prediction models has been proposed, in which multiple data sources are integrated to identify disease-related lncRNAs. For example, Lu et al. proposed a new prediction of lncRNA–disease associations via inductive matrix completion (named SIMCLDA), by integrating known lncRNA–disease interactions, disease–gene, gene–gene ontology associations [35]. Zhang et al. developed a novel model named LncRDNetFlow, which utilized a flow propagation algorithm to integrate a variety of information including the similarity of lncRNAs, the protein–protein interactions, and the similarity of diseases to infer lncRNA–disease associations [36]. Fu et al. proposed a model called MFLDA to predict potential lncRNA–disease associations by considering the quality and relevance of different heterogeneous data sources, which can select and integrate the data sources by assigning different weights to them [37]. Chen developed a path-based approach named KATZLDA for discovering potential lncRNA–disease associations by integrating information including known lncRNA–disease associations, lncRNA expression profiles, lncRNA functional similarity, disease semantic similarity, and Gaussian interaction profile kernel similarity [38]. All of these above data fusion-based methods can achieve effective results.
In this paper, to effectively predict potential lncRNA–disease associations, we first constructed a global tripartite network by integrating three kinds of heterogeneous networks including an lncRNA–disease association network, an miRNA–disease association network, and an miRNA–lncRNA interaction network. Then, considering that more heterogeneous networks can boost the prediction performance, we constructed a quadruple global network by appending a gene–lncRNA interaction network, a gene–disease association network, and a gene–miRNA interaction network to the tripartite network. Thereafter, based on these two newly constructed global networks, a novel probabilistic model named Naïve Bayesian Classifier used to predict potential LncRNA–Disease Associations (NBCLDA), based on the naïve Bayesian classifier, is proposed to uncover potential lncRNA–disease associations. Moreover, in order to evaluate the prediction performance of the NBCLDA, the leave-one-out cross-validation (LOOCV) framework was implemented, and the experimental results demonstrated the effective performance of the NBCLDA and illustrated that it can achieve better predictive performance than state-of-the-art methods in the terms of LOOCV.
2. Data Collection and Preprocessing
Considering that more heterogeneous data sources can boost the performance of prediction models, in this paper, to construct our novel prediction model NBCLDA—with the ultimate goal being to infer potential associations between lncRNAs and diseases-seven heterogeneous data sets were combined. These include the sets of miRNA–disease, miRNA–lncRNA, lncRNA–disease, gene–disease, and gene–lncRNA associations, as well as the sets of gene–miRNA interactions, and of diseases with disease tree numbers. The sets were collected from various databases.
2.1. Construction of miRNA–Disease and miRNA–lncRNA Association Sets
In this article, the miRNA–disease and miRNA–lncRNA association sets were downloaded from the HMDD [39] and the starBase v2.0 [40] databases in January 2015. Once these two data sets were collected, we removed any duplicate associations with conflicting evidence. Then, we further unified the names of miRNAs, and, thereafter, manually selected the common miRNAs in both sets. Finally, we retained only the associations related with those selected miRNAs in these two data sets. As a result, we obtained a data set consisting of 4704 miRNA–disease interactions between 246 miRNAs and 373 diseases, and a data set consisting of 9086 miRNA–lncRNA interactions between 246 miRNAs and 1089 lncRNAs (see Supplementary Materials Tables S1 and S2).
2.2. Construction of the lncRNA–Disease Association Set
In this paper, the set of lncRNA–disease associations was collected from the MNDR v2.0 database [41] in 2017. In a similar way, once the data set was collected, we removed the duplicate associations with conflicting evidence. Then, we selected the lncRNA–disease associations with diseases belonging to and lncRNAs belonging to simultaneously. As a result, we obtained a data set consisting of 407 lncRNA–disease associations between 77 lncRNAs and 95 diseases (see Supplementary Materials Table S3). The data set is utilized as the test sample in our following simulation experiments.
2.3. Construction of the Gene–Disease and Gene–lncRNA Association Sets
In this article, the set of gene–disease associations was gathered from the DisGeNET v5.0 database [42] in May 2017, and the set of gene–lncRNA associations was downloaded from the LncACTdb v1.0 database [43]. Again, we removed the duplicate associations with conflicting evidence. Then, we further unified the names of genes, and thereafter manually selected the common genes in both sets. Finally, we retained only the associations related with those selected genes in these two data sets. Additionally, we transformed some disease names included in the newly constructed set of gene–disease associations into their aliases in the , in order to keep the uniformity of disease names. For example, the disease names “pulmonary Emphysema” and “Bladder Neoplasm” in the newly collected set of gene–disease associations was converted into “pulmonary Embolism” and “Bladder Neoplasms” in the , respectively. Hence, we obtained a data set consisting of 3702 gene–disease associations between 171 genes and 227 diseases, and a data set consisting of 411 gene–lncRNA interactions between 171 genes and 66 lncRNAs (see Supplementary Materials Tables S4 and S5).
2.4. Construction of the Gene–miRNA Association Set
In this paper, the set of gene–miRNA interactions was obtained from the miRecords [44] database that was last updated in April 2013. Once the data set was collected, we removed the duplicate associations with conflicting evidence. Then, we selected the gene–miRNA interactions with genes belonging to or and miRNAs belonging to or , simultaneously. Finally, as a result, we obtained a data set consisting of 565 gene–miRNA associations between 109 genes and 174 miRNAs (see Supplementary Materials Table S6).
2.5. Construction of the Set of Diseases with Disease Tree Numbers
In this article, the set of diseases with Disease tree numbers was gathered from the MeSH database [45] . In the MeSH database, the disease terms, described as DAGs, were classified and signified as disease tree numbers. We browsed the MeSH database and collected the disease tree numbers of diseases in . As a result, we obtained a data set consisting of 373 diseases with their disease tree numbers (see Supplementary Materials Table S7).
2.6. Analysis of Multi Relational Data Sources
In our model, four object types such as lncRNA, diseases, miRNA, and genes are considered. Based on these four object types, we collect six relational data sources from different databases. Figure 1 is constructed to illustrate the relationship between these different data sources more directly. In Figure 1, denotes the different associations between these four object types, where represents one object, represents another object and denotes the dataset that the two objects belong to. For example, denotes the associations between miRNAs and diseases, m represents miRNAs, d represents diseases, and ‘1’ indicates all these miRNAs and diseases belong to the dataset . In addition, the numbers of the same objects in the different datasets and the relationships among them are shown in Figure 1. For instance, the number of diseases is 373 in , 95 (= 29 + 66) in and 227 (= 66 + 161) in , and it is obvious that both the 95 diseases in and the 227 diseases in are part of the 373 diseases in ; moreover, the intersect of disease in and includes 66 different diseases.
3. Method
As illustrated in Figure 2, our newly proposed model NBCLDA for predicting potential associations between lncRNAs and diseases can be mainly divided into the following steps:
Step 1: As illustrated in Figure 2a, on the basis of data sets , , and we can construct an miRNA–disease association network labeled MDN, an miRNA–lncRNA association network labeled MLN, and an lncRNA–disease association network labeled LDN.
Step 2: As illustrated in Figure 2b, by integrating the three association networks constructed in Step 1, we can easily obtain a global tripartite network of lncRNA–miRNA–disease relationships.
Step 3: As illustrated in Figure 2c, in order to utilize multiple data sources to improve the prediction performance, on the basis of data sets , , and obtained above, we can also construct a gene–disease association network labeled GDN, a gene–lncRNA association network labeled GLN, and a gene–miRNA association network labeled GMN.
Step 4: As illustrated in Figure 2d, by appending the three association networks constructed in Step 3 to constructed in Step 2, we can easily obtain a global quadruple network of lncRNA–miRNA–gene–disease relations.
Step 5: As illustrated in Figure 2e,f, after applying the naïve Bayesian classifier theory to and , we can obtain two kinds of prediction models: NBCLDA- and NBCLDA-.
Step 6: As illustrated in Figure 2g,h, in order to further improve the prediction performance of the NBCLDA, we implemented disease semantic similarity in NBCLDA- and NBCLDA-. Thus, we can obtain two new prediction models, NBCLDA--SD and NBCLDA--SD, to infer potential lncRNA–disease associations.
3.1. Construction of the MDN, MLN, LDN, and
Let L be the set of n lncRNAs in , be the set of lncRNAs in , D be the set of r diseases in , be the set of diseases in . Additionally, let be the set of t miRNAs in or . From Section 2.1 and Section 2.2, it is clear that and ; hence, we can let , , , and . Thus, we can represent the miRNA–disease association network, MDN, as , where denotes the set of known interactions between the miRNAs in M and the diseases in D. That is, the edge is associated with .
In the same way, we can further represent the miRNA–lncRNA interaction network, MLN, and the lncRNA–disease association network, LDN, as and , where denotes the set of known interactions between the miRNAs in M and the lncRNAs in L; represents the set of interactions between the lncRNAs in and the diseases in . Thus, the edge is associated with , and the edge is associated with . Finally, the global tripartite network, , is expressed as , where .
3.2. Construction of GDN, GLN, GMN, and
Let be the set of diseases in , be the set of lncRNAs in , G be the set of p genes in or , be the set of genes in , and be the set of miRNAs in . Additionally, from Section 2.3 and Section 2.4, it is clear that , , and ; hence, we can let , , , , and . We can thus represent the gene–disease association network, GDN, as , where denotes the set of known interactions between the genes in G and the diseases in . That is, the edge is associated with .
In the same way, we can further represent the gene–lncRNA interaction network, GLN, and gene–miRNA interaction network, GMN, as and , where and denote the set of known gene–lncRNA interactions and the set of known gene–miRNA interactions, respectively. In other words, the edge is associated with and the edge is associated with . Finally, it is evident that the global tripartite network can be expressed as , where .
3.3. Construction of NBCLDA
The naïve Bayesian classifier is a simple probabilistic classifier with a naïve independence assumption that any feature of a class is independent of the other features of the class. Abstractly, based on the Bayesian classifier probability model , where C is a dependent class variable and are the feature variables of class C, the posterior probability can be described as follows:
(1) |
Furthermore, according to the above assumption, since each feature is conditionally independent of every other feature , Equation (1) can be expressed as:
(2) |
Inspired by existing probabilistic models based on Bayesian theory to predict missing links in complex networks [46], we designed a prediction model NBCLDA to infer potential disease-related lncRNAs; we applied the naïve Bayesian theory to and , constructed in Section 3.1 and Section 3.2, respectively. In the context of Equation (1), in NBCLDA, the associations between lncRNAs and diseases in and are considered as the class of variables, while the common neighboring nodes of every lncRNA–disease pair in and are considered as the feature variables. In particular, when applying the naïve Bayesian theory to , for any given pair of lncRNA and disease nodes in , we will consider that their common neighboring miRNA nodes are all conditionally independent of each other, since all of the miRNAs are different, and, therefore, we assume that each of the miRNAs will not affect the others. To illustrate this assumption more intuitively, we provide an example in Figure 3a, in which the common neighboring nodes and between and will be assumed to be conditionally independent.
However, when applying the naïve Bayesian theory to , as there are two types of common neighboring nodes, miRNAs and genes, between a pair of lncRNA and disease nodes. In this case, it is unreasonable to consider that all of these common neighbors are conditionally independent of each other, since there may exist interactions between genes and miRNAs. Therefore, for any given pair of lncRNA and disease nodes in , let be the set that consists of all their common neighboring nodes. Then, for any miRNA node , if there is a gene node that is associated with , we will consider the miRNA and its related gene as a whole, and denote them as - and label this an miRNA–gene pair. By this means, it is obvious that there will be three kinds of features in —miRNAs, genes, and miRNA–gene pairs. Hence, we assume that these three kinds of elements in are conditionally independent of each other. To illustrate this assumption more intuitively, we present an example in Figure 3b, in which, , , , and are the common neighboring nodes between and , and we will assume that -, , and are conditionally independent of each other.
3.3.1. Method for Applying the Naïve Bayesian Theory into
For any given lncRNA node and disease node in , let and be the sets of neighboring nodes that are directly connected to and , respectively. From this, we construct , which denotes the set consisting of all common neighboring nodes between and in . Then, the prior probabilities for the existence of an relationship edge are calculated via:
(3) |
(4) |
where denotes the number of known associations between lncRNAs and diseases in LDN, and , where n denotes the number of lncRNAs in L and r denotes the number of diseases in D.
Based on the naïve Bayesian classifier, the posterior probabilities for an edge , representing whether the node is connected to in , are defined as follows:
(5) |
(6) |
From Equations (5) and (6), we can directly identify whether an lncRNA node is connected with a disease node or not in . However, since it is often too complicated to calculate the value of , we first define the probability of a potential association existing between and in as follows:
(7) |
where and are the conditional probabilities of a node belonging to ; they represent the possibilities of whether the node is a common neighboring node between and in or not, respectively. Moreover, according to Bayesian theory, these two conditional probabilities can be expressed as:
(8) |
(9) |
where and represent the conditional probability of whether the lncRNA node is connected to the disease node or not, respectively, and is one of the common neighboring nodes between and in . Thus, and are calculated via the following formulas:
(10) |
(11) |
where and denote the number of known and unknown associations between lncRNAs and diseases whose common neighbors include , respectively.
Hence, from Equations (8) and (9), Equation (7) can be modified as follows:
(12) |
Moreover, given any two nodes and in , the value of is a constant, which we denote as for convenience. Additionally, for each common neighboring node between and in , let denote the number of lncRNAs directly related to , and denote the number of diseases directly related to . Then, , and hence, Equation (7) can further be modified as follows:
(13) |
Considering that may equal zero, we will introduce the Laplace calibration to guarantee that the value of will not be zero:
(14) |
Furthermore, by introducing the logarithmic function for standardization, for any given lncRNA node and disease node in , we can finally define the probability of a potential association existing between them as:
(15) |
where is a constant utilized for normalization.
3.3.2. Method for Applying the Naïve Bayesian Theory to
In the same manner as described in Section 3.3.1, for any given lncRNA node and disease node in , we construct the set consisting of all common neighboring nodes, . Then, the posterior probabilities of and , representing whether the node is connected to in or not, respectively. Then, similarly as described in Section 3.3.1, we can define the probability of a potential association existing between and in as follows (the deep representation of scheme are described in Supplementary Material):
(16) |
where and denote the number of known and unknown associations between and in , respectively, conditional on and being common neighboring nodes between and in and - is an miRNA–gene pair. In addition, and denote the number of known and unknown associations between and in , respectively, conditional on being a common neighboring node between and . In addition, and represent the number of known and unknown associations between and in , respectively, conditional on being a common neighboring node between and . Finally, following the example of Equation (15), we can finally define the probability of a potential association existing between and in as follows:
(17) |
3.3.3. Method of Appending the Disease Semantic Similarity into
The disease semantic similarity has been widely utilized as a valuable data source for discovering potential disease-related lncRNAs in many previous studies [30,38]. In this paper, we append the disease semantic similarity into our newly constructed prediction model NBCLDA to further uncover the potential relationships between lncRNAs and diseases.
From the description given in Section 2.5, we know that each disease term in the MeSH database can be described as a directed acyclic graph (DAG), in which the nodes represent the disease MeSH descriptors and all MeSH descriptors in the DAG are linked from more general terms (parent nodes) to more specific terms (child nodes) by a direct edge. Hence, in this paper, we first obtain the disease tree numbers according to the disease terms collected from the MeSH database. Thereafter, adopting the method proposed by Wang et al. [47], while supposing that disease is represented as the graph , where is the set of all ancestor nodes of including node , is the set of corresponding links, and the contribution of a disease t in to the semantic of disease can be calculated as follows:
(18) |
where is the semantic contribution factor for edges linking disease with child disease t and the disease is the most specific disease and its own semantic score is defined as 1. Since nodes located farther from will be more general diseases that contribute less to , then, based on Equation (24), we can define the semantic value of the disease as follows:
(19) |
Therefore, based on the assumption that the diseases share the nodes of their DAGs, the semantic similarity between disease and can be defined as:
(20) |
Finally, based on the disease semantic similarity and the similarities between lncRNAs and diseases, we can reconstruct a new recommended measurement for inferring potential associations between lncRNAs and diseases as follows:
(21) |
where denotes either or and , which is computed via Equation (20) denotes the disease semantic similarity.
4. Results
4.1. Performance Evaluation
The performance of the NBCLDA, for inferring potential associations between lncRNAs and diseases, is evaluated by implementing LOOCV and is based on experimentally verified lncRNA–disease associations. At each round, a known lncRNA–disease association is used as a test sample, whereas all the remaining associations are taken as training cases for model learning. This step continues until each sample is treated as a verification sample. Moreover, the value of area under the receiver operating characteristic (ROC) curve (AUC) can be applied for measuring the overall performance of the method. The closer the AUC value is to 1, the better the performance is, and an AUC value of 0.5 refers to a random guess. We calculate a series of true positive rates (TPR or sensitivity) and false positive rates (FPR or 1−specificity) by setting different classification thresholds, and the ROC curve is plotted with the functional relationship between them. Specifically, TPR corresponds to the ratio of the successfully predicted lncRNA–disease associations to the total experimentally verified lncRNA–disease associations, and FPR refers to the percentage of candidate lncRNAs ranked below the threshold.
First, in order to estimate the influence of the addition of new types of nodes and the introduction of the disease semantic similarity on the predictions of potential associations between lncRNAs and diseases, we implemented the NBCLDA on the two constructed global networks and in the framework of LOOCV. The simulation results are shown in Figure 4 and Figure 5. From Figure 4, the NBCLDA achieved an AUC of 0.8240 on and an AUC of 0.8604 on when the disease semantic similarity was not utilized. On the other hand, from Figure 5, an AUC of 0.8519 on and an AUC of 0.8819 on were achieved when the disease semantic similarity was included. This demonstrates that the prediction performance of our method not only benefits from the addition of the new types of nodes for predicting potential associations between lncRNAs and diseases, but also is significantly improved by the introduction of disease semantic similarity.
In order to further assess the performance of the NBCLDA, we compared it with other state-of-the-art models including HGLDA [27], SIMCLDA [35], MFLDA [37], Yang et al. method [26], KATZLDA [38] and TPGLDA [26] in the framework of LOOCV. For comparing with the HGLDA, a data set consisting of 183 experimentally validated lncRNA–disease associations was previously constructed and taken as the test set to evaluate its performance. Hence, for convenience, we compared our model, the NBCLDA, with the HGLDA on that data set using the framework of LOOCV. The simulation results are illustrated in Table 1 and Figure 6, from which it is evident that our approach outperformed the HGLDA. For comparing with SIMCLDA, a data set consisting of 101 known lncRNA–disease associations between 30 lncRNAs and 79 diseases was collected from the data set containing of 293 experimentally validated lncRNA–disease associations which was used in method SIMCLDA. These selected lncRNAs and diseases all belong to in our paper. The simulation results are illustrated in Table 1, from which it is evident that our approach outperformed the SIMCLDA. While comparing with MFLDA, six relational data sources including lncRNA–miRNA associations, lncRNA–gene function associations, lncRNA–disease associations, miRNA–gene interactions, miRNA–disease associations and gene–disease associations, which were used in the method MFLDA, were collected to implement NBCLDA. The data set of experimentally validated lncRNA–disease associations was taken as the test set to evaluate its performance. The simulation results are illustrated in Table 1, from which it is evident that our approach outperformed the MFLDA.
Table 1.
Methods | AUCs | Methods | AUCs |
---|---|---|---|
NBCLDA--SD | 0.8982 | NBCLDA--SD | 0.9169 |
HGLDA | 0.7621 | Yang et al. method | 0.8568 |
NBCLDA--SD | 0.8897 | NBCLDA--SD | 0.8829 |
SIMCLDA | 0.8526 | KATZLDA | 0.8283 |
NBCLDA--SD | 0.8704 | NBCLDA--SD | 0.8897 |
MFLDA | 0.7945 | TPGLDA | 0.92 |
Furthermore, we compared the NBCLDA with Yang et al.’s method based on the data set consisting of 407 lncRNA–disease associations between 77 lncRNAs and 95 diseases. In order to make a comparison with Yang et al.’s method, according to their description, we first deleted the nodes with a degree equal to 1. As a result, we obtained a data set consisting of 319 lncRNA–disease associations between 37 lncRNAs and 52 diseases. Then, we took this data set as the test set to compare the two methods in the framework of the LOOCV. The simulation results are shown in Figure 7, from which it is seen that the NBCLDA achieved an AUC of 0.9169 while being implemented on , which is much better than the AUC of 0.8568 achieved by Yang et al.’s method. We also compared the NBCLDA with the KATZLDA, which is a path-based method designed to predict potential lncRNA–disease associations by integrating multiple pieces of information including known lncRNA–disease associations, lncRNA expression profiles, lncRNA functional similarity, disease semantic similarity, and the Gaussian interaction profile kernel similarity. Executing the simulation, we could not obtain information on the expression profiles of corresponding lncRNAs; thus, we compared the two methods without this information. The simulation results are shown in Figure 8, which indicate that the NBCLDA achieves higher AUCs (of 0.8519 and 0.8829) than the KATZLDA with a corresponding AUC of 0.8323. This also demonstrates the superiority of our newly constructed prediction model, the NBCLDA. Finally, comparing with TPGLDA, a data set consisting of 312 experimentally validated lncRNA–disease associations including 68 lncRNAs and 67 diseases and a data set consisting of 1941 gene–disease associations between 165 genes and 67 diseases were constructed, respectively. The data set of known lncRNA–disease associations was taken as the test set to evaluate its performance. The simulation results are illustrated in Table 1, from which it is obvious that TPGLDA can achieve a better performance with an AUC of 0.92, which is higher than that of ours with the AUC value of 0.8982. The main reason that TPGLDA can achieve a better performance is probably that the contribution of resource moved in both directions are taken into consideration by a consistence-based resource allocation algorithm. However, NBCLDA does not entirely rely on known lncRNA–disease associations and can integrate multiple data sources to predict potential associations.
In order to further evaluate the performance of NBCLDA, 20 percent of the known lncRNA–disease associations are randomly chosen as training set, while the remaining known and all the unknown associations are taken as testing set. We then compare with the six methods on the predicted top-k associations by using F1-score measure, which is a measure of a test’s accuracy [48]. Since the sparse known lncRNA–disease associations, we set different threshold k based on the different set of known associations when comparing with other methods and the comparison results are illustrated in Table 2. From Table 2, we could see that NBCLDA outperforms several other methods in terms of F1-score. However, TPGLDA could achieve higher values than that of our approach, this is likely due to that resource moved in both directions are taken into consideration by consistence-based resource allocation algorithm. However, comparing with TPGLDA, our new method does not entirely rely on known lncRNA–disease associations and can integrate multiple data sources to predict potential associations. These advantages may be an excellent addition for biomedical research in the future.
Table 2.
Methods | F1-Score | ||
---|---|---|---|
NBCLDA | 0.1536 (k = 20) | 0.1582 (k = 40) | null (k = 60) |
SIMCLDA | 0.0635 (k = 20) | 0.0482 (k = 40) | null (k = 60) |
NBCLDA | 0.1773 (k = 20) | 0.2415 (k = 40) | null (k = 60) |
MFLDA | 0.2012 (k = 20) | 0.1139 (k = 40) | null (k = 60) |
NBCLDA | 0.2575 (k = 20) | 0.2855 (k = 34) | null (k = 60) |
Yang et al.’s method | 0.2707 (k = 20) | 0.2769 (k = 34) | null (k = 60) |
NBCLDA | 0.1183 (k = 20) | 0.1088 (k = 40) | 0.1139 (k = 60) |
KATZLDA | 0.1274 (k = 20) | 0.0869 (k = 40) | 0.0779 (k = 60) |
NBCLDA | 0.1295 (k = 20) | 0.1510 (k = 40) | 0.1320 (k = 60) |
TPGLDA | 0.2070 (k = 20) | 0.1644 (k = 40) | 0.1301 (k = 60) |
4.2. Case Studies
To further estimate the performance of the NBCLDA, case studies of three types of lncRNA-related diseases—colorectal cancer, prostate cancer, and glioma—are analyzed in this section. During the simulation experiment, the known lncRNA–disease associations in the data set were considered as the training samples, while the experimentally validated lncRNA–disease associations beyond were used for testing. As for the simulation results, the top 20 disease-related lncRNAs, predicted by the NBCLDA, were verified via relevant literature, and the corresponding evidence is listed in Table 3. In addition, the predicted results of the top 20 disease-related lncRNAs were presented in the Supplementary Table S8.
Table 3.
Disease | lncRNA | Evidence (PMID) | Rank |
---|---|---|---|
Colorectal cancer | XIST | 17143621 | 1 |
Colorectal cancer | MALAT1 | 25446987,25031737,21503572,25025966,24244343,26887056 | 3 |
Colorectal cancer | KCNQ1OT1 | 16965397 | 6 |
Colorectal cancer | H19 | 11120891,19926638,22427002,26068968,26989025 | 8 |
Colorectal cancer | NEAT1 | 26314847 | 9 |
Colorectal cancer | SNHG16 | 24519959 | 12 |
Colorectal cancer | TUG1 | 26856330 | 18 |
Prostate cancer | MALAT1 | 23845456,23726266,26516927,22349460 | 3 |
Prostate cancer | KCNQ1OT1 | 23728290 | 6 |
Prostate cancer | H19 | 24063685,24988946 | 8 |
Prostate cancer | NEAT1 | 23728290,25415230 | 10 |
Prostate cancer | TUG1 | 26975529 | 19 |
Glioma | MALAT1 | 26649278,25613066,26619802,27134488,26938295 | 4 |
Glioma | H19 | 24466011,26983719 | 6 |
Glioma | TUG1 | 25645334,27363339 | 10 |
Glioma | NEAT1 | 26582084 | 12 |
Colorectal cancer (CRC) is one of the most common cancer types in western countries and its morbidity increases with age [49]. Accumulating studies have shown that lncRNAs play important roles in several steps of carcinogenesis and cancer metastasis and additionally interact with various cancers including CRC [50,51]. Therefore, we implemented the NBCLDA to discover possible CRC-associated lncRNAs. As illustrated in Table 3, seven of the top 20 lncRNAs have been validated to be related to colorectal cancer by recent biological literature, and five of them are ranked in the top 10 of the prioritized prediction results. The other two are lncRNAs SNHG16 (ranked 12th) and TUG1 (ranked 18th). For example, Chen et al. indicated that the lncRNA XIST can regulate the process of CRC development by competing for miR-200b-3p and thus it may be considered as a biomarker for prognosis [52]. Additionally, it has been demonstrated that the lncRNA MALAT1 may be considered as a potential prognostic and therapeutic target of colorectal cancer patients as it can fulfill a chemoresistant function in colorectal cancer [53]. Nakano et al. found that the epigenetic destruction and loss of imprinting of the lncRNA KCNQ1OT1 play a significant role in the occurrence of colorectal cancer [54]. Han et al. suggested that H19 can be considered as a candidate therapeutic biomarker and a new target for human CRC therapy when it is used as a growth regulator [55].
Prostate cancer is the second most common cause of cancer-related mortality in males worldwide [56]. Increasing studies show that lncRNA have become a promising target for the treatment of cancers including prostate cancer [57,58]. Hence, we carried out the NBCLDA to uncover possible prostate cancer-associated lncRNAs, and five of the top 20 predicted lncRNAs were verified and are listed in Table 3 according to the relevant literature. For example, Ren et al. evaluated the expression of MALAT1 in prostate cancer and showed that it may be considered as a perspective therapeutic target for refractory prostate cancer [59]. Zhu et al. found that the lncRNA H19 and its derived miRNA H19-miR-675 were significantly downregulated in advanced prostate cancer and they may be used for diagnostic and therapeutic treatment in advanced prostate cancer because H19-miR-675 could act as a suppressor of prostate cancer metastasis [60]. Additionally, Tian et al. showed that targeting the lncRNA NEAT1 axis could be used as a potential application in improving chemotherapy of prostate cancer [61].
Glioma is one of the most common malignant forms of brain tumors, and 6 out of 100,000 people may have gliomas [62]. Accumulating research has shown that lncRNAs play a significant role in the process of glioma development [63]. Therefore, we applied the NBCLDA to predict potential lncRNAs associated with glioma. Four of the top 20 glioma-related lncRNAs were validated by recent literature on biological experiments, and the results are illustrated in Table 3. For example, the lncRNA MALAT1 plays an important role in the progression and therapy of glioma and it may be considered an effective prognostic biomarker for the treatment of glioma [64]. Zhang et al. demonstrated that the lncRNA H19 was overexpressed in glioma tissue and cell lines, and also promotes cell proliferation of glioma [65]. Furthermore, Li et al. suggested that the lncRNA TUG1 can promote cell apoptosis of glioma cells and may act as a tumor suppressor in human glioma [66].
5. Discussion
Accumulating studies have indicated that lncRNAs play crucial roles in biological processes, complex disease diagnoses, prognoses, and treatments. Furthermore, computational models for predicting novel lncRNA–disease associations by integrating varieties of biological data are among the most noticeable topics. This is helpful to explore the understanding of disease mechanisms at the lncRNA level. In this paper, we construct a global tripartite network and a quadruple network by integrating various biological information and propose a novel approach, the NBCLDA, to predict potential lncRNA–disease associations by applying the naïve Bayesian classifier into the two constructed networks. Compared with current models, the NBCLDA does not entirely rely on known lncRNA–disease associations, and can achieve a reliable performance with effective AUCs in the LOOCV framework. This means that our method can not only predict the possible associations between lncRNAs and diseases included in the known associations set, but can also predict the potential associations whose elements are not in the known data set.
To evaluate the predictive performance of our method, the LOOCV is implemented based on the experimentally verified lncRNA–disease associations obtained from the MNDR database. Simulation experiment results of the NBCLDA show a strong performance and its predictive accuracy has been significantly improved by the addition of new types of nodes and the disease semantic similarity for predicting potential associations between lncRNAs and diseases. It also shows that the NBCLDA can achieve better performance than the other three state-of-the-art models with more effective AUCs in the framework of the LOOCV. Moreover, in order to further estimate the performance of the NBCLDA, case studies of colorectal cancer, prostate cancer, and glioma were implemented in this paper. These simulation results demonstrated that the NBCLDAs can be an excellent tool for future biomedical research.
Despite the reliable experimental results of the NBCLDA, there are also some biases in our method. For example, the known experimentally validated lncRNA–disease associations are still limited. Therefore, the prediction performance of the NBCLDA would be improved by a more comprehensive data set. Furthermore, the data sources in this paper need to be strictly preprocessed according to the proposed method, which restricts the richness of the data sources to a certain extent.
6. Conclusions
In this paper, we mainly summed up the following contributions: (1) we constructed a global tripartite network by integrating a variety of biological information including miRNA-disease, miRNA-lncRNA and lncRNA-diseases associations and interactions; (2) we constructed a global quadruple network by appending gene–lncRNA interaction, gene–disease association, and gene–miRNA interaction networks to the global tripartite network; (3) we developed a novel approach NBCLDA based on the naïve Bayesian classifier and applied it into the two global networks to predict potential lncRNA–disease associations; (4) we appended the disease semantic similarity into our newly constructed prediction model NBCLDA to further uncover the potential relationships between lncRNAs and diseases; (5) NBCLDA can not only predict the possible associations between lncRNAs and diseases included in the known associations set, but can also predict the potential associations whose elements are not in the known data set; (6) NBCLDA can integrate multiple heterogeneous biological data for discovering potential relationships between lncRNAs and diseases; (7) in the future work, more biological data can be collected and pre-processed to be utilized in the newly proposed method for predicting potential lncRNA-disease associations.
Acknowledgments
The authors thank the anonymous referees for suggestions that helped improve the paper substantially.
Supplementary Materials
The following are available at http://www.mdpi.com/2073-4425/9/7/345/s1, Supplementary Table S1: The known miRNA–disease associations of the data set consisting of 4704 miRNA–disease interactions which were collected from the HMDD database; Supplementary Table S2: The known miRNA–lncRNA associations of the data set consisting of 9086 miRNA–lncRNA interactions which were collected from the starBase v2.0 database; Supplementary Table S3: The known lncRNA–disease associations of the data set consisting of 407 lncRNA–disease associations which were downloaded from the MNDR v2.0 database; Supplementary Table S4: The known gene–disease associations of the data set consisting of 3702 gene–disease associations which were gathered from the DisGeNET v5.0 database; Supplementary Table S5: The known gene–lncRNA associations of the data set consisting of 411 gene–lncRNA interactions which were downloaded from the LncACTdb database; Supplementary Table S6: The known gene–miRNA associations of the data set consisting of 565 gene–miRNA association was obtained from the miRecords database; Supplementary Table S7: The Disease tree numbers of the data set consisting of 373 diseases with their disease tree numbers which were gathered from the MeSH database; Supplementary Table S8: The results of top 20 lncRNAs related to these three diseases. Supplementary Materials: Deep representation of the probabilistic scheme in our method.
Author Contributions
Conceptualization, J.Y. and L.W.; Methodology, J.Y., P.P. and L.W.; Validation, L.K., X.L. and Z.W.; Formal Analysis, J.Y. and L.W.; Investigation, L.K. and Z.W.; Resources, P.P. and Z.W.; Data Curation, J.Y. and P.P.; Writing—Original Draft Preparation, J.Y. and P.P.; Writing—Review and Editing, L.W. and X.L.; Supervision, L.W.; Project Administration, L.K. and X.L.; Funding Acquisition, L.W.
Funding
This research is partly sponsored by the Natural Science Foundation of Hunan Province (No. 2018JJ4058, No. 2017JJ5036), the National Natural Science Foundation of China (No. 61640210, No. 61672447), the CERNET Next Generation Internet Technology Innovation Project (No. NGII20160305), the project of “12th Five-Year” planning of Education Science in Hunan Province (No. XJK015BZY031) and the CERNET Next Generation Internet Technology Innovation Project (No. NGII20160305,No.NGII20170109).
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
References
- 1.Li Y., Zhang J., Pan J., Feng X., Duan P., Yin X., Xu Y., Wang X., Zou S. Insights into the roles of lncRNAs in skeletal and dental diseases. Cell Biosci. 2018;8:8. doi: 10.1186/s13578-018-0208-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Garitano-Trojaola A., Agirre X., Prósper F., Fortes P. Long non-coding RNAs in haematological malignancies. Int. J. Mol. Sci. 2013;14:15386. doi: 10.3390/ijms140815386. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Guttman M., Russell P., Ingolia N.T., Weissman J.S., Lander E.S.R. Ribosome Profiling Provides Evidence that Large Noncoding RNAs Do Not Encode Proteins. Cell. 2013;154:240–251. doi: 10.1016/j.cell.2013.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Guttman M., Rinn J.L. Modular regulatory principles of large non-coding RNAs. Nature. 2012;482:339–346. doi: 10.1038/nature10887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang K.C., Chang H.Y. Molecular mechanisms of long noncoding RNAs. Mol. Cell. 2011;43:904–914. doi: 10.1016/j.molcel.2011.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Wapinski O., Chang H.Y. Long noncoding RNAs and human disease. Trends Cell Biol. 2011;21:354–361. doi: 10.1016/j.tcb.2011.04.001. [DOI] [PubMed] [Google Scholar]
- 7.Derrien T., Johnson R., Bussotti G., Tanzer A., Djebali A., Tilgner H., Guernec G., Martin D., Merkel A., Knowles D. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 2012;22:1775–1789. doi: 10.1101/gr.132159.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Zhao W., Luo J., Jiao S. Comprehensive characterization of cancer subtype associated long non-coding RNAs and their clinical implications. Sci. Rep. 2014;4:6591. doi: 10.1038/srep06591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Cheetham S.W., Gruhl F., Mattick J.S., Dinger M.E. Long noncoding RNAs and the genetics of cancer. Br. J. Cancer. 2013;108:2419–2425. doi: 10.1038/bjc.2013.233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mercer T.R., Dinger M.E., Mattick J.S. Long non-coding RNAs: Insights into functions. Nat. Rev. Genet. 2009;10:155–159. doi: 10.1038/nrg2521. [DOI] [PubMed] [Google Scholar]
- 11.Taft R.J., Pang K.C., Mercer T.R., Dinger M., Mattick JS. Non-coding RNAs: Regulators of disease. J. Pathol. 2010;220:126–139. doi: 10.1002/path.2638. [DOI] [PubMed] [Google Scholar]
- 12.Zhang Q., Chen C.Y., Yedavalli V.S., Jeang K.T. NEAT1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression. MBio. 2013;4:e00596-12. doi: 10.1128/mBio.00596-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Pasmant E., Sabbagh A., Vidaud M., Biéche I. ANRIL, a long, noncoding RNA, is an unexpected major hotspot in GWAS. FASEB J. 2011;25:444–448. doi: 10.1096/fj.10-172452. [DOI] [PubMed] [Google Scholar]
- 14.Faghihi M.A., Modarresi F., Khalil A.M., Wood D.E., Sahagan B.G., Morgan T.E., Finch C.E., Laurent G.S., Kenny P.J., Wahlestedt C. Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feed-forward regulation of b-secretase. Nat. Med. 2008;14:723–730. doi: 10.1038/nm1784. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Malih S., Saidijam M., Malih N. A brief review on long noncoding RNAs: a new paradigm in breast cancer pathogenesis, diagnosis and therapy. Tumor Biology. 2016;37:1479–1485. doi: 10.1007/s13277-015-4572-y. [DOI] [PubMed] [Google Scholar]
- 16.Cui Z., Ren S., Lu J., Wang F., Xu W., Sun Y., Wei M., Chen J., Gao X., Xu C., Mao J.H., Sun Y. The prostate cancer-up-regulated long noncoding RNA PlncRNA-1 modulates apoptosis and proliferation through reciprocal regulation of androgen receptor. Urol. Oncol. 2013;31:1117–1123. doi: 10.1016/j.urolonc.2011.11.030. [DOI] [PubMed] [Google Scholar]
- 17.Wang J., Liu X., Wu H., Ni P., Gu Z., Qiao Y., Chen N., Sun F., Fan Q. CREB up-regulates long noncoding RNA, HULC expression through interaction with microRNA-372 in liver cancer. Nucleic Acids Res. 2010;38:5366–5383. doi: 10.1093/nar/gkq285. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ma Z., Xue S., Zeng B., Qiu D. lncRNA SNHG5 is associated with poor prognosis of bladder cancer and promotes bladder cancer cell proliferation through targeting p27. Oncol. Lett. 2018;15:1924–1930. doi: 10.3892/ol.2017.7527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Spizzo R., Almeida M.I., Colombatti A., Calin G.A. Long non-coding RNAs and cancer: A new frontier of translational research? Oncogene. 2012;31:4577–4587. doi: 10.1038/onc.2011.621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Gupta R.A., Shah N., Wang K.C., Kim J., Horlings H.M., Wong D.J., Tsai M.C., Hung T., Argani P., Rinn J.L. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464:1071–1076. doi: 10.1038/nature08975. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gutschner T., Heammerle M., Eissmann M., Hsu J., Kim Y., Hung G., Revenko A., Arun G., Stentrup M., Gross M. The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cancer Res. 2013;73:1180–1189. doi: 10.1158/0008-5472.CAN-12-2850. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Lottin S., Adriaenssens E., Berteaux N., Leprêtre A., Vilain M.O., Denhez E., Coll J., Dugimont T., Curgy J.J. The human H19 gene is frequently overexpressed in myometrium and stroma during pathological endometrial proliferative events. Eur. J. Cancer. 2005;41:168–177. doi: 10.1016/j.ejca.2004.09.025. [DOI] [PubMed] [Google Scholar]
- 23.Sun J., Shi H., Wang Z., Zhang C., Liu L., Wang L., He W., Hao D., Liu S., Zhou M. Inferring novel lncRNA–disease associations based on a random walk model of a lncRNA functional similarity network. Mol. Biosyst. 2014;10:2074–2081. doi: 10.1039/C3MB70608G. [DOI] [PubMed] [Google Scholar]
- 24.Ping P., Wang L., Kuang L., Ye S., Lqbal M.F.B. A Novel Method based on lncRNA–disease association Network for LncRNA–Disease Association Prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018 doi: 10.1109/TCBB.2018.2827373. [DOI] [PubMed] [Google Scholar]
- 25.Yang X., Gao L., Guo X., Shi X., Wu H., Song F., Wang B. A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases. PLoS ONE. 2014;9:e87797. doi: 10.1371/journal.pone.0087797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ding L., Wang M., Sun D., Li A. TPGLDA: Novel prediction of associations between lncRNAs and diseases via lncRNA–disease-gene tripartite graph. Sci. Rep. 2018;8:1065. doi: 10.1038/s41598-018-19357-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Chen X. Predicting lncRNA–disease associations and constructing lncRNA functional similarity network based on the information of miRNA. Sci. Rep. 2015;5:13186. doi: 10.1038/srep13186. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Liu M.X., Chen X., Chen G., Cui Q.H., Yan G.Y. A computational framework to infer human disease-associated long noncoding RNAs. PLoS ONE. 2014;9:e84408. doi: 10.1371/journal.pone.0084408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Li J.W., Cheng G., Wang Y.C., Gao C., Wang Y., Ma W., Tu J., Wang J., Chen Z., Kong W., Cui Q. A bioinformatics method for predicting long noncoding RNAs associated with vascular disease. Sci. China Life Sci. 2014;57:852–857. doi: 10.1007/s11427-014-4692-4. [DOI] [PubMed] [Google Scholar]
- 30.Gu C., Liao B., Li X., Cai L., Li Z., Li K., Yang J. Global network random walk for predicting potential human lncRNA–disease associations. Sci. Rep. 2017;7:12442. doi: 10.1038/s41598-017-12763-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gligorijević V., Pržulj N. Methods for biological data integration: perspectives and challenges. J. R. Soc. Interface. 2015;12:20150571. doi: 10.1098/rsif.2015.0571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zeng X., Ding N., Rodríguez-Patón A., Zou Q. Probability-based collaborative filtering model for predicting gene–disease associations. BMC Med. Genom. 2017;10(Suppl. 5):76. doi: 10.1186/s12920-017-0313-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Zhao H., Kuang L., Wang L., Ping P., Xuan Z., Pei T., Wu Z. Prediction of microRNA-disease associations based on distance correlation set. BMC Bioinform. 2018;19:141. doi: 10.1186/s12859-018-2146-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Zeng X., Zhang X., Zou Q. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Brief. Bioinform. 2016;17:193. doi: 10.1093/bib/bbv033. [DOI] [PubMed] [Google Scholar]
- 35.Lu C., Yang M., Luo F., Wu F.X., Li M., Pan Y., Li Y., Wang J. Prediction of lncRNA–disease associations based on inductive matrix completion. Bioinformatics. 2018 doi: 10.1093/bioinformatics/bty327. [DOI] [PubMed] [Google Scholar]
- 36.Zhang J., Zhang Z., Chen Z., Deng L. Integrating multiple heterogeneous networks for novel lncRNA-disease association inference. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017 doi: 10.1109/TCBB.2017.2701379. [DOI] [PubMed] [Google Scholar]
- 37.Fu G., Wang J., Domeniconi C., Yu G. Matrix factorization based data fusion for the prediction of lncRNA–disease associations. Bioinformatics. 2017 doi: 10.1093/bioinformatics/btx794. [DOI] [PubMed] [Google Scholar]
- 38.Chen X. KATZLDA: KATZ measure for the lncRNA–disease association prediction. Sci. Rep. 2015;5:16840. doi: 10.1038/srep16840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li Y., Qiu C., Tu J., Geng B., Yang J., Jiang T., Cui Q. HMDD v2.0: A database for experimentally supported human microRNA and disease associations. Nucleic Acids Res. 2014;42:D1070. doi: 10.1093/nar/gkt1023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Li J.H., Liu S., Zhou H., Qu L.H., Yang J.H. starBase v2.0: Decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res. 2014;42:D92. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Cui T., Lin Z., Yan H., Yi Y., Tan P., Zhao Y., Hu Y., Xu L., Li E., Wang D. MNDR v2.0: An updated resource of ncRNA–disease associations in mammals. Nucleic Acids Res. 2018;46:D371–D374. doi: 10.1093/nar/gkx1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Piñero J., Àlex B., Queraltrosinach N., Gutierrez-Sacristan A., Deu-Pons J., Centeno E., Garcia-Garcia J., Sanz F., Furlong L.I. DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 2017;45:D833–D839. doi: 10.1093/nar/gkw943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Wang P., Ning S., Zhang Y., Li R., Ye J., Zhao Z., Zhi H., Wang T., Guo Z., Li X. Identification of lncRNA-associated competing triplets reveals global patterns and prognostic markers for cancer. Nucleic Acids Res. 2015;43:3478. doi: 10.1093/nar/gkv233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Xiao F., Zuo Z., Cai G., Xiao F., Zuo Z., Cai G., Kang S., Gao X., Li T. miRecords: An integrated resource for microRNA-target interactions. Nucleic Acids Res. 2008;37:D105–D110. doi: 10.1093/nar/gkn851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.U.S. National Library of Medicine Medical Subject Headings 2018 [Internet] [(accessed on 6 July 2018)]; Available online: https://meshb.nlm.nih.gov/search.
- 46.Liu Z., Zhang Q., Lu L., Zhou T. Link prediction in complex networks: A local naïve Bayes model. Europhys. Lett. 2011;96:48007. doi: 10.1209/0295-5075/96/48007. [DOI] [Google Scholar]
- 47.Wang D., Wang J., Lu M., Song F., Cui Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics. 2010;26:1644–1650. doi: 10.1093/bioinformatics/btq241. [DOI] [PubMed] [Google Scholar]
- 48.Luo J., Ding P., Liang C., Cao B., Chen X. Collective prediction of disease-associated miRNAs based on transduction learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016;14:1468–1475. doi: 10.1109/TCBB.2016.2599866. [DOI] [PubMed] [Google Scholar]
- 49.Berger F.G. Interview: Screening and treatment for colorectal cancer. Colorectal Cancer. 2013;2:117–120. doi: 10.2217/crc.13.12. [DOI] [Google Scholar]
- 50.Prensner J.R., Chinnaiyan A.M. The emergence of lncRNAs in cancer biology. Cancer Dis. 2011;1:391–407. doi: 10.1158/2159-8290.CD-11-0209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Gutschner T., Diederichs S. The hallmarks of cancer: A long non-coding RNA point of view. RNA Biol. 2012;9:703–719. doi: 10.4161/rna.20481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Chen D.L., Chen L.Z., Lu Y.X., Zhang D.S., Zeng Z.L., Pan Z.Z., Huang P., Wang F.H., Li Y.H., Ju H.Q. Long noncoding RNA XIST expedites metastasis and modulates epithelial-mesenchymal transition in colorectal cancer. Econ. Theory Bus. Manag. 2017;8:e3011. doi: 10.1038/cddis.2017.421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Li P., Zhang X., Wang H., Wang L., Liu T., Du L., Yang Y., Wang C. MALAT1 is associated with poor response to oxaliplatin-based chemotherapy in colorectal cancer patients and promotes chemoresistance through EZH2. Mol. Cancer Ther. 2017;16:739–751. doi: 10.1158/1535-7163.MCT-16-0591. [DOI] [PubMed] [Google Scholar]
- 54.Nakano S., Murakami K., Meguro M., Soejima H., Higashimoto K., Urano T., Kugoh H., Mukai T., Ikeguchi M., Oshimura M. Expression profile of LIT1/KCNQ1OT1, and epigenetic status at the KvDMR1 in colorectal cancers. Cancer Sci. 2006;97:1147–1154. doi: 10.1111/j.1349-7006.2006.00305.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Han D., Gao X., Wang M., Qiao Y., Xu Y., Yang J., Dong N., He J., Sun Q., Lv G. Long noncoding RNA H19 indicates a poor prognosis of colorectal cancer and promotes tumor growth by recruiting and binding to eIF4A3. Oncotarget. 2016;7:22159–22173. doi: 10.18632/oncotarget.8063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Weiss M., Plass C., Gerhauser C. Role of lncRNAs in prostate cancer development and progression. Biol. Chem. 2014;395:1275–1290. doi: 10.1515/hsz-2014-0201. [DOI] [PubMed] [Google Scholar]
- 57.Yang G., Lu X., Yuan L. LncRNA: A link between RNA and cancer. Biochim. Biophys. Acta. 2014;1839:1097–1109. doi: 10.1016/j.bbagrm.2014.08.012. [DOI] [PubMed] [Google Scholar]
- 58.Chakravarty D., Sboner A., Nair S.S., Giannopoulou E., Li R., Hennig S., Mosquera JM., Pauwels J., Park K., Kossai M. The oestrogen receptor alpha-regulated lncRNA NEAT1 is a critical modulator of prostate cancer. Nat. Commun. 2014;5:5383. doi: 10.1038/ncomms6383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ren S., Liu Y., Xu W., Sun Y., Lu J., Wang F., Wei M., Shen J., Hou J., Gao X., Xu C. Long noncoding RNA MALAT-1 is a new potential therapeutic target for castration resistant prostate cancer. J. Urol. 2013;190:2278–2287. doi: 10.1016/j.juro.2013.07.001. [DOI] [PubMed] [Google Scholar]
- 60.Zhu M., Chen Q., Liu X., Sun Q., Zhao X., Deng R., Wang Y., Huang J., Xu M., Yan J., Yu J. lncRNA H19/miR-675 axis represses prostate cancer metastasis by targeting TGFBI. FEB J. 2014;281:3766–3775. doi: 10.1111/febs.12902. [DOI] [PubMed] [Google Scholar]
- 61.Tian X., Zhang G., Zhao H., Li Y., Zhu C. Long non-coding RNA NEAT1 contributes to docetaxel resistance of prostate cancer through inducing RET expression by sponging miR-34a. RSC Adv. 2017;7:42986–42996. doi: 10.1039/C7RA06107B. [DOI] [Google Scholar]
- 62.Boele F.W., Rooney A.G., Grant R., Klein M. Psychiatric symptoms in glioma patients: from diagnosis to management. Neuropsychiatr. Dis. Treat. 2015;11:1413–1420. doi: 10.2147/NDT.S65874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Zhou Q., Liu J., Quan J., Liu W., Tan H., Li W. lncRNAs as potential molecular biomarkers for the clinicopathology and prognosis of glioma: A systematic review and meta-analysis. Gene. 2018 doi: 10.1016/j.gene.2018.05.054. [DOI] [PubMed] [Google Scholar]
- 64.Ma K.X., Wang H.J., Li X.R., Li T., Su G., Yang P., Wu J.W. Long noncoding RNA MALAT1 associates with the malignant status and poor prognosis in glioma. Tumour Biol. J. Int. Soc. Oncodev. Biol. Med. 2015;36:3355–3359. doi: 10.1007/s13277-014-2969-7. [DOI] [PubMed] [Google Scholar]
- 65.Zhang T., Wang Y.R., Zeng F., Cao H.Y., Zhou H.D., Wang Y.J. LncRNA H19 is overexpressed in glioma tissue, is negatively associated with patient survival, and promotes tumor growth through its derivative miR-675. Eur. Rev. Med. Pharmacol. Sci. 2016;20:4891–4897. [PubMed] [Google Scholar]
- 66.Li J., Zhang M., An G., Ma Q. LncRNA TUG1 acts as a tumor suppressor in human glioma by promoting cell apoptosis. Exp. Biol. Med. 2016;241:644. doi: 10.1177/1535370215622708. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.