Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model

Ao Li; Mengqu Ge; Yao Zhang; Chen Peng; Minghui Wang

doi:10.1155/2015/671950

. 2015 Dec 29;2015:671950. doi: 10.1155/2015/671950

Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model

Ao Li ^1,^2,^*, Mengqu Ge ¹, Yao Zhang ¹, Chen Peng ¹, Minghui Wang ^1,²

PMCID: PMC4709602 PMID: 26839884

Abstract

Recent study shows that long noncoding RNAs (lncRNAs) are participating in diverse biological processes and complex diseases. However, at present the functions of lncRNAs are still rarely known. In this study, we propose a network-based computational method, which is called lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the potential lncRNA-protein interactions. First, we construct a heterogeneous network by integrating the lncRNA-lncRNA similarity network, lncRNA-protein interaction network, and protein-protein interaction (PPI) network. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. The leave-one-out cross validation test shows that our approach can achieve an AUC value of 96.0%. Some lncRNA-protein interactions predicted by our method have been confirmed in recent research or database, indicating the efficiency of LPIHN to predict novel lncRNA-protein interactions.

1. Introduction

Long noncoding RNAs (lncRNAs), a class of important non-protein coding transcripts with lengths more than 200 nucleotides [1], have gained wide attention recently, and a large number of lncRNAs have been discovered by analysis of chromatin-state maps [2] and full-length complementary DNA (cDNA) [3] based on RNA-seq data [4]. Recent researches show that lncRNAs play critical roles in complex cellular processes, such as epigenetic regulation of gene expression [5–9], chromatin modification [10], and cell differentiation. Moreover, studies show that a number of lncRNAs are implicated in a range of human diseases [11–13]. Hence, uncovering the functions of lncRNAs is of great importance in understanding the mechanisms of biological processes.

Generally, almost all of the lncRNAs function through interactions with corresponding RNA binding proteins [14–16]. In turn, RNA binding proteins can interact with different lncRNAs to regulate diverse cellular processes [17, 18]. Thus, identifying the potential lncRNA-protein interactions is critical to understand the functions of lncRNAs. Since experimental detection of unknown lncRNA-protein interactions is time consuming and costly, some computational approaches have been proposed for lncRNA-protein interaction prediction. In 2011, CatRAPID was developed by Bellucci et al. [5], in which lncRNA-protein pairs are encoded into feature vectors and scored by using matrix computation. In the same year, a method named RPIseq was introduced by Muppirala et al. [19] using random forest (RF) and support vector machines (SVM) classifiers to predict lncRNA-protein interaction and RPIseq only uses the sequence information of lncRNAs and proteins. In 2013, Lu et al. [20] introduced a method named lncPro, which predicts lncRNA-protein interactions by using scores yielded by amino acid and nucleotide sequences and Fisher's linear discriminant method.

In this paper, we introduce a network-based method, lncRNA-protein interaction prediction based on Heterogeneous Network Model (LPIHN), to predict the interactions between lncRNAs and proteins. First, we construct a heterogeneous network with the use of protein-protein interaction (PPI), lncRNAs expression similarity, and known lncRNA-protein interactions. Then, a random walk with restart is implemented on the heterogeneous network to infer novel lncRNA-protein interactions. We compare the performance with two network-based methods including PRIoritizatioN and Complex Elucidation (PRINCE) [21] and the random walk based method (RWR) [22]. In the leave-one-out cross validation (LOOCV) test we implement, LPIHN outperforms PRINCE and RWR by a significant margin. Moreover, we identify several lncRNA-protein interactions that are supported by evidence in recent literature or database, which shows the practical value of our method.

2. Materials and Methods

2.1. lncRNA-Protein Interactions

The development of bioinformatics and experimental technologies has made the global lncRNA-protein interaction network available. NPinter (http://www.bioinfo.org/NPInter/) is the up-to-data database that has collected experimentally validated interactions between noncoding RNAs (ncRNAs) and other biomoleculars [23]. The research done by Shang et al. [24] has extracted lncRNA-protein interactions from NPinter and made detailed and comprehensive analysis about the lncRNA-protein network.

In this paper, we download known ncRNA-protein interaction dataset from Npinter 2.0 database in November 2013 and then filter the ncRNAs and their interaction proteins, by restricting the organism and the type of ncRNAs to “Homo sapiens” and “NONCODE,” respectively. Then we further select the lncRNAs from these ncRNAs according to human lncRNA dataset from NONCODE 4.0 database [25] and map the lncRNA ID and protein ID into NONCODE 4.0 ID and string ID separately. I is defined as the adjacency matrix of lncRNA-protein interactions, in which I(i, j) is 1 if there is an interaction between protein i and lncRNA j, otherwise 0.

2.2. lncRNA Expression Similarity

The lncRNA expression profiles are obtained from NONCODE 4.0 database, including the expression profiles of 89,369 lncRNA in 24 human tissues or cell types. Then Pearson correlation coefficient (PCC) [26–31] between the expression profiles of each pair of lncRNAs is calculated as the lncRNA expression similarity. We define X = {x ₁, x ₂,…, x ₂₄} and Y = {y ₁, y ₂,…, y ₂₄} as two expression profiles of lncRNA i and j, respectively, which contain expression value of 24 human tissues or cell types. The expression similarity matrix of the lncRNAs SL can be calculated as

\begin{matrix} SL (i, j) = |\frac{cov (X, Y)}{σ_{X} σ_{Y}}|, \end{matrix}

(1)

where SL(i, j) in row i and column j represents the absolute value of PCC between lncRNA i and j, cov(X, Y) is the covariance of X and Y, and σ _X and σ _Y are the standard deviation of X and Y, respectively. Calculate PCC between the expression profiles of each pair of nodes which is widely used in bioinformatics research. Hence, the similarity calculated based on the expression data of lncRNA can obtain reliable performance.

2.3. Protein-Protein Interactions

We obtain PPI data from STRING 9.1 database [32], which contains weighted protein interactions derived from computational prediction methods, high-throughput experiments, and text mining. Then, we remove the redundant PPI data, resulting in 804 PPI data and corresponding interaction scores according to the known lncRNA-protein dataset, and all PPI pairs are treated as identically reliable. The symmetric matrix SP is defined as the interaction matrix, in which SP_ij is the interaction score of vertices i and j. Formally, define a diagonal matrix M, in which M(j, j) is the sum of row j of SP; the normalization of SP is defined by the following function:

\begin{matrix} {SP}_{i j}^{'} = \frac{{SP}_{i j}}{\sqrt{M (i, i) M (j, j)}}, \end{matrix}

(2)

where SP′ is a normalized form of SP.

2.4. The Heterogeneous Network

G ₁(L, E ₁, SL) is defined as the lncRNA-lncRNA similarity network, in which L = {l ₁, l ₂,…, l _n} represents the set of n lncRNAs, E ₁ = {e ₁, e ₂,…, e _k} represents sets of edges between vertices, l _i and l _j are connected if the similarity SL_ij calculated by PCC between l _i and l _j is more than 0. The PPI network G ₂(P, E ₂, SP′) can be constructed analogously, and vertices set P = {p ₁, p ₂,…, p _m} represents the set of m proteins. E ₂ represents sets of edges between proteins; p _i and p _j will be connected if the normalized interaction score SP_ij′ between vertices p _i and p _j is more than 0. In the lncRNA-protein network, p _i and l _j are connected if I(i, j) is 1. lncRNA-protein heterogeneous network is constructed by connecting the aforementioned lncRNA-lncRNA similarity network and PPI network together with lncRNA-protein interaction network (Figure 1(b)). Then, a random walk with restart will be implemented on the network.

A simple example of the procedure of predicting lncRNA-protein interactions with LPIHN. (a) The lncRNA-lncRNA similarity matrix is calculated by using the expression profiles of lncRNAs to calculate the PCC of each pair of lncRNAs. The profile of known lncRNA-protein interactions is obtained where the value of p _i and l _j is 1 if there exists interaction between lncRNA l _j and protein p _i, otherwise 0. The PPI profile is obtained based on the normalized score of PPI. (b) The upper purple network is the lncRNA-lncRNA similarity network, the lower red network is the PPI network, and both of them are constructed based on the corresponding profile in (a). The heterogeneous network is constructed by connecting the lncRNA-lncRNA similarity network and PPI network together with the known lncRNA-protein interaction network. Purple triangles indicate lncRNAs, red circles proteins, purple edges lncRNA-lncRNA similarities, red edges protein-protein interactions, and black dotted edges known lncRNA-protein interactions. (c) Our method assigns a score to each of the candidate proteins of a query lncRNA, with the random walk with restart implemented on the heterogeneous network. The candidate proteins are ranked based on the score.

2.5. LPIHN Method

LPIHN is proposed to score proteins for each lncRNA by implementing random walk with restart on the heterogeneous network, based on the assumption that similar lncRNAs tend to exhibit similar interaction patterns with proteins. The procedure of random walk with restart is that an iterative walker starts at a source node with an initial probability and transits to a randomly selected direct neighbor; in the process of random walking, the walker can restart at source node with some probability in every time step. Hence, when implementing the random walk on the heterogeneous network, the initial probability, transition matrix, and restart probability should be determined based on the information supplied by the heterogeneous network. In the procedure of predicting the potential proteins for lncRNA l _i, let Y ₀ represent the initial probability of walker starting at each node, where l _i and the proteins that are known to interact with l _i are assigned positive values and the remaining nodes are assigned zero. This assignment suggests that the random walker starts at l _i or the proteins interact with l _i. Let Y _t represent the relevance of l _i to all other nodes, in which the jth element indicating the probability of the random walker is found at node j at step t. Y _t+1 can be decided by the following iterative equation:

\begin{matrix} Y_{t + 1} = (1 - δ) W^{T} Y_{t} + δ Y_{0}, \end{matrix}

(3)

where δ ∈ (0,1) represents the restart probability of random walk. W is the transition matrix and Y ₀ is the initial probability of the random walk. All of them are detailed later.

Given a query lncRNA l _i, l _i is the seed node in the lncRNA network, the probability of vertex l _i is 1, and other vertices in the lncRNA network are assigned 0, forming the initial probability of lncRNA network v ₀. If protein p _j interacts with lncRNA l _i, then p _j is the seed node in the protein network. The initial probability vector of protein network u ₀ is formed by assigning equal probabilities to the protein seed nodes, under the condition that the sum is equal to 1. For the heterogeneous network, the initial probability is

\begin{matrix} Y_{0} = [\begin{matrix} (1 - β) u_{0} \\ β v_{0} \end{matrix}] . \end{matrix}

(4)

We use the parameter β ∈ (0,1) to weight the importance of lncRNA network and protein network. If β = 0.5, lncRNA-lncRNA similarity network and PPI network are equally weighted. If β < 0.5, the random walk tends to return to the protein network.

In order to implement random walk on the heterogeneous network, the transition matrix W must be defined. We define $W = [\begin{matrix} W_{P} & W_{P L} \\ W_{L P} & W_{L} \end{matrix}]$ as the transition matrix, where W _P and W _L are the subnetwork transition matrix showing the probability of the random walker transiting from one protein (lncRNA) to another protein (lncRNA) in the process of random walk. W _PL indicates the probability of the random walker transiting from protein network to lncRNA network and W _LP indicates the movement from lncRNA network to protein network. In the process of transition, we define γ as the probability of random walker transiting from protein network to lncRNA network and vice versa. W is defined as follows.

The probability of the random walker transiting from protein p _i to p _j is defined as

\begin{matrix} W_{P} (i, j) p (p_{j} ∣ p_{i}) \\ = \{\begin{matrix} \frac{{SP}^{'} (i, j)}{\sum_{j} {SP}^{'} (i, j)} & if \sum_{k} I (i, k) = 0 \\ \frac{(1 - γ) {SP}^{'} (i, j)}{\sum_{j} {SP}^{'} (i, j)} & otherwise . \end{matrix} \end{matrix}

(5)

∑_k I(i, k) = 0 means that p _i only connects to proteins, and the walker can only transit randomly to the direct neighbor protein in the PPI network next step. Otherwise, the walker can transit to the lncRNA-lncRNA network from p _i with probability γ; under that condition, the probability of p _i transiting to p _j should multiply 1 − γ.

Analogously, the probability from lncRNA l _i to l _j can be defined as

\begin{matrix} W_{L} (i, j) p (l_{j} ∣ l_{i}) \\ = \{\begin{matrix} \frac{SL (i, j)}{\sum_{j} SL (i, j)} & if \sum_{k} I (k, i) = 0 \\ \frac{(1 - γ) SL (i, j)}{\sum_{j} SL (i, j)} & otherwise . \end{matrix} \end{matrix}

(6)

The probability from protein p _i to lncRNA l _j is defined as

\begin{matrix} W_{PL} (i, j) p (l_{j} ∣ p_{i}) \\ = \{\begin{matrix} \frac{γ I (i, j)}{\sum_{j} I (i, j)} & if \sum_{k} I (i, k) \neq 0 \\ 0 & otherwise . \end{matrix} \end{matrix}

(7)

∑_k I(i, k) ≠ 0 means that p _i connects to at least one lncRNA, and the walker can transit to lncRNA-lncRNA network from p _i with probability γ; under that condition, we can further calculate the probability of p _i transiting to l _j. Otherwise, the probability of p _i transiting to l _j is 0.

The probability from lncRNA l _i to protein p _j can be defined in a similar manner as

\begin{matrix} W_{LP} (i, j) p (p_{j} ∣ l_{i}) \\ = \{\begin{matrix} \frac{γ I (j, i)}{\sum_{j} I (i, j)} & if \sum_{k} I (k, i) \neq 0 \\ 0 & otherwise . \end{matrix} \end{matrix}

(8)

As the initial probability Y ₀ and the transition matrix W are defined, the random walk with restart can be implemented on the heterogeneous network. After several iterations, the change between Y _t and Y _t+1 is less than 10⁻¹⁰, indicating that a stable probability $Y_{\infty} = [\begin{matrix} (1 - β) u_{\infty} \\ β v_{\infty} \end{matrix}]$ is obtained.

2.6. Leave-One-Out Cross Validation Test

We implement a LOOCV procedure to test the performance of LPIHN. With each cross validation trial, each known lncRNA-protein interaction is used as test data and the rest are taken as training dataset. Then the method is evaluated by successfully reconstructing the hidden interaction.

ROC curves are used to evaluate the performance of the method; for a rank threshold s, sensitivity (Sn) and specificity (Sp) are defined as follows:

\begin{matrix} Sn = \frac{TP}{TP + FN}, \\ Sp = \frac{TN}{TN + FP} . \end{matrix}

(9)

TN and TP represent the number of negative sites and positive sites that are correctly predicted. FN and FP represent the number of positive sites and negative sites that are wrongly predicted. We plot Sn versus 1 − Sp at different thresholds separating the prediction [33], which is the ROC curve. We calculate the AUC, which is the area under the ROC curve. Meanwhile, some common used measurements, namely, accuracy (Acc), precision (Pre), and Matthew's correlation coefficient (MCC), are calculated as follows:

\begin{matrix} Acc = \frac{TN + TP}{TN + TP + FN + FP}, \\ Pre = \frac{TP}{TP + FP}, \\ MCC \\ = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FN) \times (TP + FP) \times (TN + FN) \times (TN + FP)}} . \end{matrix}

(10)

We also use the precision versus recall and fold enrichment to measure the performance. For lncRNA l _i, the top k ranked proteins are considered to interact with l _i in our method. Precision means the fraction of true lncRNA-protein interactions that ranked among the top k in the procedure of cross validation. Recall means the fraction of hidden interaction is reconstructed that ranked within top k. In this paper, another measure for the evaluation of the method is fold enrichment. For a query lncRNA, the number of its candidate proteins is defined as N, the test protein is ranked n in the candidate protein set, and the fold enrichment can be calculated by the following formula: fold enrichment = N/2/n, and here we use the average fold enrichment of all test data for assessment.

3. Results

3.1. Comparison with Other Network-Based Methods

We compare the performance of LPIHN with other two network-based methods as follows: PRINCE [21] and RWR [22]. In RWR method, for one lncRNA, at least two proteins are required to perform LOOCV. Therefore, we only consider lncRNAs that are interacting with at least two proteins. After the preprocessing, we obtain 1,113 lncRNAs and 96 proteins. And 4,870 lncRNA-protein interactions are regarded as gold-standard dataset to be used in cross validation. Then, LOOCV is implemented to evaluate the performance of these methods. According to previous research [34], we set β = 0.5, γ = 0.5 here and fix δ to 0.3, as it has been reported that the restart probability δ has a very slight effect on the result [22, 35].

The ROC curves of LPIHN, PRINCE, and RWR are plotted in Figure 2, which clearly shows that the ROC curve of LPIHN is consistently above the other two methods. From Table 1, we can see that LPIHN achieves an AUC of 96.0%. The result is higher than PRINCE and RWR, which achieves AUC of 90.6% and 88.1%, respectively. This phenomenon indicates that the performance of LPIHN is better than PRINCE and RWR. To further evaluate that the prediction obtained by our method is not generated by chance, we perform the LOOCV test on random lncRNA-protein interaction network. The lncRNA-protein interaction network is randomized for 1000 times, which means we select seed proteins randomly for each lncRNA. The AUC value of randomization process is 53.0%, which is much lower than AUCs of other three methods. This indicates that our method can discover potential lncRNA-protein interactions. Besides AUC value, we also compare the Sn and Sp of these methods (Table 1). When the value of Sp is 99.0%, LPIHN achieves a Sn of 35.0%, which is 30.0% and 20.9% higher than other two methods, respectively. When the value of Sp decreases to 90.0%, the Sn value of LPIHN increases to 91.4%, which is 36.3% and 32.2% higher than PRINCE and RWR, respectively. Moreover, we download the update ncRNA-protein dataset (Npinter 3.0) and extract lncRNA-protein interactions according to the 1,113 lncRNAs from Npinter 2.0 dataset. The number of known lncRNA-protein interactions is increased from 4870 to 10232. The ROC curve and AUC value LPIHN, PRINCE, and RWR on the new dataset are displayed in Figure S1 (see Supplementary Material available online at http://dx.doi.org/10.1155/2015/671950), which indicates the same good performance of our method.

ROC curves of lncRNA-protein interaction predictions by different methods. The red, blue, and green curves are the ROC curves of LPIHN, PRINCE, and RWR, respectively. The red dotted line represents the ROC curve of LPIHN over randomized lncRNA-protein network. The largest area under the curve (AUC) indicates the best performance of potential lncRNA-protein interaction prediction.

Table 1.

Comparison of three different methods in terms of AUC, Sn and Sp.

	LPIHN	PRINCE	RWR
AUC	96.0%	90.6%	88.1%
Sn	35.0%	5.0%	14.1%
Sp	99.0%
Sn	73.1%	26.7%	35.3%
Sp	95.0%
Sn	91.4%	55.1%	59.2%
Sp	90.0%

Open in a new tab

In addition, the numbers of retrieved lncRNA-protein interactions in different percentiles are shown in Figure 3, in which the top-ranked reconstructions are especially important because of the lower number of false positives. The result shows that among the top 2% true lncRNA-protein interactions, 802 interactions are predicted successfully based on LPIHN. However, only 229 and 192 interactions are among the top 2% predictions based on PRINCE and RWR, respectively. Besides, LPIHN also achieves a higher number in all the other percentiles than PRINCE and RWR in this comparison.

The number of correctly retrieved lncRNA-protein interactions out of total 4,870 true interactions for different percentiles. The red, blue, and green bars represent LPIHN, PRINCE, and RWR, respectively.

The curves of precision and recall of LPIHN, PRINCE, and RWR with the varying threshold 1 ≤ k ≤ 96 are shown in Figure 4(a), which shows that LPIHN can achieve the highest precision of 72.1%, while PRINCE and RWR methods achieve lower results with 20.6% and 17.3%, respectively. Meanwhile, compared with PRINCE and RWR, the LPIHN method achieves a higher precision at every recall value. Moreover, the comparison of these methods in terms of average fold enrichment is shown in Figure 4(b). For all of the 96 proteins, LPIHN achieves an average enrichment score of 18.9, which is 10.6% and 10.1% compared to PRINCE and RWR, respectively.

Comparison of different methods in terms of precision versus recall and fold enrichment. (a) Precision versus recall of three different methods when considering the top k proteins for different values of threshold k. The red, blue, and green lines represent LPIHN, PRINCE, and RWR, respectively. Comparison between LPIHN, PRINCE, and RWR in terms of precision versus recall shows the performance advantage of our method. (b) Enrichment analysis for the heterogeneous network is shown; the comparison of LPIHN, PRINCE, and RWR in terms of fold enrichment shows that our method is outperforming the other two methods.

To further evaluate the performance of LPIHN, we implement case studies for two lncRNAs including NONHSAT010657 (HNRNPU-AS1) and NONHSAT022127 (MALAT1), which are related to 12 and 24 lncRNA-protein interactions, respectively. The comparison between LPIHN, PRINCE, and RWR in terms of Sn, Acc, Pre, and MCC is shown in Figure 5, which indicates that LPIHN achieves better performance than PRINCE and RWR. In particular, when Sp is 99.0%, for lncRNA HNRNPU-AS1, the Sn, Acc, Pre, and MCC values of LPIHN are increased by 16.7%, 2.1%, 25.0%, and 22.9% when compared with PRINCE, and 7.3%, 1.1%, 8.3%, and 10% when compared with RWR, respectively. For lncRNA MALAT1, the Sn, Acc, Pre, and MCC values of LPIHN are increased by 20.9%, 5.3%, 1.7%, and 25% when compared with PRINCE and 45.9%, 11.5%, 1.7%, and 35.4% when compared with RWR, respectively. Moreover, we reconstruct the interaction network of lncRNA HNRNPU-AS1 by using the prediction data of these three methods (Figure 6). Among the 12 true lncRNA-protein interactions of lncRNA HNRNPU-AS1, LPIHN successfully reconstructs 9 interactions, while PRINCE and RWR retrieve lower interactions of 7 and 6, respectively.

Comparison of three different methods on lncRNA HNRNPU-AS1 and MALAT1 in terms of Sn, Acc, Pre, and MCC. The x-axis represents sensitivity, accuracy, precision, and Matthew correlation coefficient, respectively. The left part is at Sp of 99.0% and the right part is at Sp of 95.0%.

The network of lncRNA HNRNPU-AS1 and the network reconstructed by using LPIHN, PRINCE, and RWR. (a) Known lncRNA-protein interaction network of lncRNA HNRNPU-AS1. (b) Network reconstructed by using LPIHN: solid line indicates known interactions that are correctly predicted and red dotted line new interactions that are not included in known lncRNA-protein interactions. (c) Network reconstructed by using PRINCE. (d) Network reconstructed based on RWR.

To verify the effect of the number of interactions on the performance of the proposed method, we group the lncRNAs into four equal intervals according to the different number of interactions. Then, AUC values of different intervals are plotted in Figure S2. The result shows that the more the proteins that interact with a query lncRNA are, the better the performance the proposed method can achieve.

3.2. Comparison with Existing Methods

We also evaluate the performance of LPIHN on lncRNA HNRNPU-AS1 and MALAT1 with existing methods: lncpro and RPIseq. RPIseq yields two types of scores based on support vector machine (SVM) and random forest (RF), respectively. ROC curves and AUC values of these methods are shown in Figure 7. It is obvious that the ROC curve of LPIHN is consistently above the other methods on both HNRNPU-AS1 and MALAT1. For lncRNA HNRNPU-AS1, the AUC value of LPIHN is 34.8%, 59.9%, and 39.4% higher than lncpro, RPIseq-RF, and RPIseq-SVM, respectively. The AUC value of LPIHN is 30.6%, 41.9%, and 35.2% higher than lncpro, RPIseq-RF, and RPIseq-SVM on MALAT1, respectively. All the evaluations above show that LPIHN outperforms the other two network-based methods and existing methods, which indicates that LPIHN is a powerful method to predict the interactions between lncRNAs and proteins.

ROC curves and AUC values on lncRNA HNRNPU-AS1 and MALAT1. (a) The ROC curves and AUC values of LPIHN, lncpro, RPIseq-RF, and RPIseq-SVM on lncRNA HNRNPU-AS1. (b) The ROC curves and AUC values on lncRNA MALAT1 by LPIHN, lncpro, RPIseq-RF, and RPIseq-SVM.

3.3. Case Studies

The proposed method is able to predict novel lncRNA-protein interactions for the query lncRNA. For each lncRNA, the proteins ranked within top 10 (this is a user-defined threshold) are considered as the potential proteins interacting with the query lncRNA. To further evaluate the efficiency of LPIHN to predict novel lncRNA-protein interactions, we present case studies of five lncRNAs, including NONHSAT137627 (FTX), HNRNPU-AS1, MALAT1, NONHSAT004412 (RP4-665J23.1), and NONHSAT016118 (RP11-18I14.10). Figure 8 shows the predicted network for these lncRNAs, where the known lncRNA-protein interactions and top 5 ranked predictions are displayed. For lncRNA FTX, HNRNPU-AS1, and MALAT1, the top 10 predictive proteins are listed in Table 2 (Table S1 for lncRNA RP4-665J23.1 and RP11-18I14.10). In the prediction result, 9606.ENSP00000258729 (IGF2BP3), 9606.ENSP00000254108 (FUS), 9606.ENSP00000371634 (IGF2BP2), 9606.ENSP00000401371 (TIA1), and 9606.ENSP00000258962 (SFRS1) are predicted to interact with FTX. 9606.ENSP00000290341 (IGF2BP1) and SFRS1 are predicted to interact with HNRNPU-AS1. FUS are predicted to interact with RP4-665J23.1. 9606.ENSP00000349428 (PTB) and FUS are predicted to interact with RP11-18I14.10. The predictions above are all confirmed by starBase, a database for known protein-RNA and miRNA-target interactions [36]. In our prediction result, 9606.ENSP00000283179 (HNRNPU) is predicted to interact with MALAT1, which is confirmed by the research done by Xiao et al. [37]. Moreover, the top 3 ranked proteins of lncRNA FTX in our study are IGF2BP3, FUS, and IGF2BP2. For the above predictions confirmed by evidence in research or database, we compare their ranks by LPIHN, PRINCE, and RWR (Table S2), which shows that LPIHN achieves a higher rank of almost every candidate protein. This further indicates the efficiency of our method to predict novel proteins for lncRNAs.

Case study results on lncRNA-protein interaction predictions. Purple and red nodes indicate lncRNAs and proteins, respectively, blue edges known interactions, and red dotted edges newly predicted interactions with 5 highest scores.

Table 2.

The top 10 ranked proteins for lncRNA FTX, HNRNPU-AS1 and MALAT1.

Gene	String ID	Rank	Gene	String ID	Rank
FTX (NONCODE ID: NONHSAT137627)
IGF2BP3	9606.ENSP00000258729	1	TIA1	9606.ENSP00000401371	6
FUS	9606.ENSP00000254108	2	SFRS1	9606.ENSP00000258962	7
IGF2BP2	9606.ENSP00000371634	3	RBFOX2	9606.ENSP00000413035	8
TARDBP	9606.ENSP00000240185	4	EIF2C1	9606.ENSP00000362300	9
EIF2C2	9606.ENSP00000220592	5	QKI	9606.ENSP00000354951	10

HNRNPU-AS1 (NONCODE ID: NONHSAT010657)
IGF2BP1	9606.ENSP00000290341	1	MOV10	9606.ENSP00000350028	6
TARDBP	9606.ENSP00000240185	2	IGF2	9606.ENSP00000338297	7
SFRS1	9606.ENSP00000258962	3	HNRNPU	9606.ENSP00000283179	8
TNRC6B	9606.ENSP00000338371	4	STAU1	9606.ENSP00000360922	9
TNRC6A	9606.ENSP00000379144	5	SFPQ	9606.ENSP00000349748	10

MALAT1 (NONCODE ID: NONHSAT022127)
HNRNPU	9606.ENSP00000283179	1	CDK9	9606.ENSP00000362361	6
RBFOX2	9606.ENSP00000413035	2	CTCF	9606.ENSP00000264010	7
IGF2	9606.ENSP00000338297	3	NXF1	9606.ENSP00000294172	8
STAU1	9606.ENSP00000360922	4	LRRK2	9606.ENSP00000298910	9
MEPCE	9606.ENSP00000308546	5	SSB	9606.ENSP00000260956	10

Open in a new tab

4. Conclusion

With the development of the research of lncRNA, computational methods have been published for the predictions of lncRNA-protein interactions. In this paper, we introduce a network-based method LPIHN to predict the proteins interacting with lncRNAs. First, a heterogeneous network is constructed by connecting PPI and lncRNA-lncRNA similarity network using known lncRNA-protein interactions. Then, an iteratively random walk is implemented on the heterogeneous network, which can score proteins for each lncRNA. Finally, LOOCV is implemented to evaluate the performance of our method. The results show that LPIHN obtains an AUC of 96.0%, which is much higher than PRINCE and RWR. Moreover, when focusing on the top 2% (4870) predicted lncRNA-protein interactions, LPIHN successfully reconstructs 802 interactions, while PRINCE and RWR retrieve much lower interactions of 229 and 192, respectively. Meanwhile, the other measures also show that LPIHN algorithm outperforms PRINCE and RWR method, which propagate information only in protein network. We also demonstrate the efficiency of LPIHN to predict novel lncRNA-protein interactions; some top-ranked lncRNA-protein interactions predicted by our method are supported by existing literature or database. The good performance and the practical value show that our approach is a promising way to predict potential lncRNA-protein interactions.

While the results are promising, the LPIHN method shows some limitations. Firstly, we test our method only on one database (i.e., NPinter 2.0). From the known lncRNA-protein interaction dataset, we observe that each lncRNA interacts with about 4.37 proteins on average. Due to the relative sparsity of the known lncRNA-protein interactions, the network-based method may produce biased predictions. This situation can be improved by the increase of comprehensive lncRNA-protein interactions datasets. Secondly, skewed degree distribution of the network may affect the result of our prediction; adding some appropriate resistance in the process of random walk may improve the performance of our method. Thirdly, the proposed method can only predict similarity between lncRNAs that have expression profile, which indicates that the increase of lncRNA-protein interaction datasets may lead to the incomplete coverage of the lncRNA-lncRNA similarity network. This situation can be improved by adding information such as known lncRNA-protein interactions.

Supplementary Material

Figure S1 -The ROC curve and AUC value LPIHN, PRINCE and RWR on the new dataset.

Figure S2 - Comparsion of AUC by LPIHN on different intervals .

lncRNAs are grouped into four equal intervals according to the different number of interactions. Then, AUC values of different intervals are displayed.

Table S1 - The top 10 ranked proteins for lncRNA RP4-665J23.1 and RP11-18I14.10.

Table S2 - Top candidate proteins predicted by LPIHN with reference support and their ranks predicted by PRINCE and RWR.

671950.f1.docx^{(162.4KB, docx)}

Acknowledgment

This work is supported by the National Natural Science Foundation of China (no. 61571414 and no. 61471331).

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

1.Wang K. C., Chang H. Y. Molecular mechanisms of long noncoding RNAs. Molecular Cell. 2011;43(6):904–914. doi: 10.1016/j.molcel.2011.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Guttman M., Amit I., Garber M., et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223–227. doi: 10.1038/nature07672. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Okazaki Y., Furuno M., Kasukawa T., et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420(6915):563–573. doi: 10.1038/nature01266. [DOI] [PubMed] [Google Scholar]
4.Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]
5.Bellucci M., Agostini F., Masin M., Tartaglia G. G. Predicting protein associations with long noncoding RNAs. Nature Methods. 2011;8(6):444–445. doi: 10.1038/nmeth.1611. [DOI] [PubMed] [Google Scholar]
6.Ponting C. P., Oliver P. L., Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136(4):629–641. doi: 10.1016/j.cell.2009.02.006. [DOI] [PubMed] [Google Scholar]
7.Morlando M., Ballarino M., Fatica A., Bozzoni I. The role of long noncoding RNAs in the epigenetic control of gene expression. ChemMedChem. 2014;9(3):505–510. doi: 10.1002/cmdc.201300569. [DOI] [PubMed] [Google Scholar]
8.Pang Q., Ge J., Shao Y., et al. Increased expression of long intergenic non-coding RNA LINC00152 in gastric cancer and its clinical significance. Tumor Biology. 2014;35(6):5441–5447. doi: 10.1007/s13277-014-1709-3. [DOI] [PubMed] [Google Scholar]
9.Wapinski O., Chang H. Y. Long noncoding RNAs and human disease. Trends in Cell Biology. 2011;21(6):354–361. doi: 10.1016/j.tcb.2011.04.001. [DOI] [PubMed] [Google Scholar]
10.Mercer T. R., Dinger M. E., Mattick J. S. Long non-coding RNAs: insights into functions. Nature Reviews Genetics. 2009;10(3):155–159. doi: 10.1038/nrg2521. [DOI] [PubMed] [Google Scholar]
11.Taft R. J., Pang K. C., Mercer T. R., Dinger M., Mattick J. S. Non-coding RNAs: regulators of disease. The Journal of Pathology. 2010;220(2):126–139. doi: 10.1002/path.2638. [DOI] [PubMed] [Google Scholar]
12.Gupta R. A., Shah N., Wang K. C., et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–1076. doi: 10.1038/nature08975. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Calin G. A., Liu C.-G., Ferracin M., et al. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell. 2007;12(3):215–229. doi: 10.1016/j.ccr.2007.07.027. [DOI] [PubMed] [Google Scholar]
14.Zhu J., Fu H., Wu Y., Zheng X. Function of lncRNAs and approaches to lncRNA-protein interactions. Science China Life Sciences. 2013;56(10):876–885. doi: 10.1007/s11427-013-4553-6. [DOI] [PubMed] [Google Scholar]
15.Khalil A. M., Rinn J. L. RNA-protein interactions in human health and disease. Seminars in Cell & Developmental Biology. 2011;22(4):359–365. doi: 10.1016/j.semcdb.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.da Sacco L., Baldassarre A., Masotti A. Bioinformatics tools and novel challenges in long non-coding RNAs (lncRNAs) functional analysis. International Journal of Molecular Sciences. 2012;13(1):97–114. doi: 10.3390/ijms13010097. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kishore S., Luber S., Zavolan M. Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Briefings in Functional Genomics. 2010;9(5-6):391–404. doi: 10.1093/bfgp/elq028. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhang C., Frias M. A., Mele A., et al. Integrative modeling defines the nova splicing-regulatory network and its combinatorial controls. Science. 2010;329(5990):439–443. doi: 10.1126/science.1191150. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Muppirala U. K., Honavar V. G., Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics. 2011;12(1, article 489) doi: 10.1186/1471-2105-12-489. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Lu Q., Ren S., Lu M., et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics. 2013;14(1, article 651) doi: 10.1186/1471-2164-14-651. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Vanunu O., Magger O., Ruppin E., Shlomi T., Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology. 2010;6(1) doi: 10.1371/journal.pcbi.1000641.e1000641 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Köhler S., Bauer S., Horn D., Robinson P. N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics. 2008;82(4):949–958. doi: 10.1016/j.ajhg.2008.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Yuan J., Wu W., Xie C., Zhao G., Zhao Y., Chen R. NPInter v2.0: an updated database of ncRNA interactions. Nucleic Acids Research. 2014;42(1):D104–D108. doi: 10.1093/nar/gkt1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Shang D., Yang H., Xu Y., et al. A global view of network of lncRNAs and their binding proteins. Molecular BioSystems. 2015;11(2):656–663. doi: 10.1039/c4mb00409d. [DOI] [PubMed] [Google Scholar]
25.Xie C., Yuan J., Li H., et al. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Research. 2014;42(1):D98–D103. doi: 10.1093/nar/gkt1222. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Chen X., Ba Y., Ma L., et al. Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases. Cell Research. 2008;18(10):997–1006. doi: 10.1038/cr.2008.282. [DOI] [PubMed] [Google Scholar]
27.Wang F., Zheng Z., Guo J., Ding X. Correlation and quantitation of microRNA aberrant expression in tissues and sera from patients with breast tumor. Gynecologic Oncology. 2010;119(3):586–593. doi: 10.1016/j.ygyno.2010.07.021. [DOI] [PubMed] [Google Scholar]
28.Ganegoda G. U., Wang J., Wu F.-X., Li M. Prioritization of candidate genes based on disease similarity and protein's proximity in PPI networks. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM '13); December 2013; Shanghai, China. IEEE; pp. 103–108. [DOI] [Google Scholar]
29.Tang X., Wang J., Zhong J., Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014;11(2):407–418. doi: 10.1109/tcbb.2013.2295318. [DOI] [PubMed] [Google Scholar]
30.Li M., Zheng R., Zhang H., Wang J., Pan Y. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods. 2014;67(3):325–333. doi: 10.1016/j.ymeth.2014.02.016. [DOI] [PubMed] [Google Scholar]
31.Li M., Zhang H., Wang J.-X., Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Systems Biology. 2012;6, article 15 doi: 10.1186/1752-0509-6-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Franceschini A., Szklarczyk D., Frankild S., et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research. 2013;41(1):D808–D815. doi: 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Aerts S., Lambrechts D., Maity S., et al. Gene prioritization through genomic data fusion. Nature Biotechnology. 2006;24(5):537–544. doi: 10.1038/nbt1203. [DOI] [PubMed] [Google Scholar]
34.Li Y., Patra J. C. Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219–1224. doi: 10.1093/bioinformatics/btq108. [DOI] [PubMed] [Google Scholar]
35.Chen X., Liu M.-X., Yan G.-Y. Drug–target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems. 2012;8(7):1970–1978. doi: 10.1039/c2mb00002d. [DOI] [PubMed] [Google Scholar]
36.Li J.-H., Liu S., Zhou H., Qu L.-H., Yang J.-H. StarBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein—RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Research. 2014;42(1):D92–D97. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Xiao R., Tang P., Yang B., et al. Nuclear matrix factor hnRNP U/SAF-A exerts a global control of alternative splicing by regulating U2 snRNP maturation. Molecular Cell. 2012;45(5):656–668. doi: 10.1016/j.molcel.2012.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1 -The ROC curve and AUC value LPIHN, PRINCE and RWR on the new dataset.

Figure S2 - Comparsion of AUC by LPIHN on different intervals .

lncRNAs are grouped into four equal intervals according to the different number of interactions. Then, AUC values of different intervals are displayed.

Table S1 - The top 10 ranked proteins for lncRNA RP4-665J23.1 and RP11-18I14.10.

Table S2 - Top candidate proteins predicted by LPIHN with reference support and their ranks predicted by PRINCE and RWR.

671950.f1.docx^{(162.4KB, docx)}

[B1] 1.Wang K. C., Chang H. Y. Molecular mechanisms of long noncoding RNAs. Molecular Cell. 2011;43(6):904–914. doi: 10.1016/j.molcel.2011.08.018. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B2] 2.Guttman M., Amit I., Garber M., et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458(7235):223–227. doi: 10.1038/nature07672. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Okazaki Y., Furuno M., Kasukawa T., et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420(6915):563–573. doi: 10.1038/nature01266. [DOI] [PubMed] [Google Scholar]

[B4] 4.Mortazavi A., Williams B. A., McCue K., Schaeffer L., Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5(7):621–628. doi: 10.1038/nmeth.1226. [DOI] [PubMed] [Google Scholar]

[B5] 5.Bellucci M., Agostini F., Masin M., Tartaglia G. G. Predicting protein associations with long noncoding RNAs. Nature Methods. 2011;8(6):444–445. doi: 10.1038/nmeth.1611. [DOI] [PubMed] [Google Scholar]

[B6] 6.Ponting C. P., Oliver P. L., Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136(4):629–641. doi: 10.1016/j.cell.2009.02.006. [DOI] [PubMed] [Google Scholar]

[B7] 7.Morlando M., Ballarino M., Fatica A., Bozzoni I. The role of long noncoding RNAs in the epigenetic control of gene expression. ChemMedChem. 2014;9(3):505–510. doi: 10.1002/cmdc.201300569. [DOI] [PubMed] [Google Scholar]

[B8] 8.Pang Q., Ge J., Shao Y., et al. Increased expression of long intergenic non-coding RNA LINC00152 in gastric cancer and its clinical significance. Tumor Biology. 2014;35(6):5441–5447. doi: 10.1007/s13277-014-1709-3. [DOI] [PubMed] [Google Scholar]

[B9] 9.Wapinski O., Chang H. Y. Long noncoding RNAs and human disease. Trends in Cell Biology. 2011;21(6):354–361. doi: 10.1016/j.tcb.2011.04.001. [DOI] [PubMed] [Google Scholar]

[B10] 10.Mercer T. R., Dinger M. E., Mattick J. S. Long non-coding RNAs: insights into functions. Nature Reviews Genetics. 2009;10(3):155–159. doi: 10.1038/nrg2521. [DOI] [PubMed] [Google Scholar]

[B11] 11.Taft R. J., Pang K. C., Mercer T. R., Dinger M., Mattick J. S. Non-coding RNAs: regulators of disease. The Journal of Pathology. 2010;220(2):126–139. doi: 10.1002/path.2638. [DOI] [PubMed] [Google Scholar]

[B12] 12.Gupta R. A., Shah N., Wang K. C., et al. Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature. 2010;464(7291):1071–1076. doi: 10.1038/nature08975. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B13] 13.Calin G. A., Liu C.-G., Ferracin M., et al. Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell. 2007;12(3):215–229. doi: 10.1016/j.ccr.2007.07.027. [DOI] [PubMed] [Google Scholar]

[B14] 14.Zhu J., Fu H., Wu Y., Zheng X. Function of lncRNAs and approaches to lncRNA-protein interactions. Science China Life Sciences. 2013;56(10):876–885. doi: 10.1007/s11427-013-4553-6. [DOI] [PubMed] [Google Scholar]

[B15] 15.Khalil A. M., Rinn J. L. RNA-protein interactions in human health and disease. Seminars in Cell & Developmental Biology. 2011;22(4):359–365. doi: 10.1016/j.semcdb.2011.02.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.da Sacco L., Baldassarre A., Masotti A. Bioinformatics tools and novel challenges in long non-coding RNAs (lncRNAs) functional analysis. International Journal of Molecular Sciences. 2012;13(1):97–114. doi: 10.3390/ijms13010097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Kishore S., Luber S., Zavolan M. Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression. Briefings in Functional Genomics. 2010;9(5-6):391–404. doi: 10.1093/bfgp/elq028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Zhang C., Frias M. A., Mele A., et al. Integrative modeling defines the nova splicing-regulatory network and its combinatorial controls. Science. 2010;329(5990):439–443. doi: 10.1126/science.1191150. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Muppirala U. K., Honavar V. G., Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinformatics. 2011;12(1, article 489) doi: 10.1186/1471-2105-12-489. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Lu Q., Ren S., Lu M., et al. Computational prediction of associations between long non-coding RNAs and proteins. BMC Genomics. 2013;14(1, article 651) doi: 10.1186/1471-2164-14-651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Vanunu O., Magger O., Ruppin E., Shlomi T., Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Computational Biology. 2010;6(1) doi: 10.1371/journal.pcbi.1000641.e1000641 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Köhler S., Bauer S., Horn D., Robinson P. N. Walking the interactome for prioritization of candidate disease genes. The American Journal of Human Genetics. 2008;82(4):949–958. doi: 10.1016/j.ajhg.2008.02.013. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Yuan J., Wu W., Xie C., Zhao G., Zhao Y., Chen R. NPInter v2.0: an updated database of ncRNA interactions. Nucleic Acids Research. 2014;42(1):D104–D108. doi: 10.1093/nar/gkt1057. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B24] 24.Shang D., Yang H., Xu Y., et al. A global view of network of lncRNAs and their binding proteins. Molecular BioSystems. 2015;11(2):656–663. doi: 10.1039/c4mb00409d. [DOI] [PubMed] [Google Scholar]

[B25] 25.Xie C., Yuan J., Li H., et al. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Research. 2014;42(1):D98–D103. doi: 10.1093/nar/gkt1222. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B26] 26.Chen X., Ba Y., Ma L., et al. Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases. Cell Research. 2008;18(10):997–1006. doi: 10.1038/cr.2008.282. [DOI] [PubMed] [Google Scholar]

[B27] 27.Wang F., Zheng Z., Guo J., Ding X. Correlation and quantitation of microRNA aberrant expression in tissues and sera from patients with breast tumor. Gynecologic Oncology. 2010;119(3):586–593. doi: 10.1016/j.ygyno.2010.07.021. [DOI] [PubMed] [Google Scholar]

[B28] 28.Ganegoda G. U., Wang J., Wu F.-X., Li M. Prioritization of candidate genes based on disease similarity and protein's proximity in PPI networks. Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine (BIBM '13); December 2013; Shanghai, China. IEEE; pp. 103–108. [DOI] [Google Scholar]

[B29] 29.Tang X., Wang J., Zhong J., Pan Y. Predicting essential proteins based on weighted degree centrality. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014;11(2):407–418. doi: 10.1109/tcbb.2013.2295318. [DOI] [PubMed] [Google Scholar]

[B30] 30.Li M., Zheng R., Zhang H., Wang J., Pan Y. Effective identification of essential proteins based on priori knowledge, network topology and gene expressions. Methods. 2014;67(3):325–333. doi: 10.1016/j.ymeth.2014.02.016. [DOI] [PubMed] [Google Scholar]

[B31] 31.Li M., Zhang H., Wang J.-X., Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Systems Biology. 2012;6, article 15 doi: 10.1186/1752-0509-6-15. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B32] 32.Franceschini A., Szklarczyk D., Frankild S., et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Research. 2013;41(1):D808–D815. doi: 10.1093/nar/gks1094. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B33] 33.Aerts S., Lambrechts D., Maity S., et al. Gene prioritization through genomic data fusion. Nature Biotechnology. 2006;24(5):537–544. doi: 10.1038/nbt1203. [DOI] [PubMed] [Google Scholar]

[B34] 34.Li Y., Patra J. C. Genome-wide inferring gene–phenotype relationship by walking on the heterogeneous network. Bioinformatics. 2010;26(9):1219–1224. doi: 10.1093/bioinformatics/btq108. [DOI] [PubMed] [Google Scholar]

[B35] 35.Chen X., Liu M.-X., Yan G.-Y. Drug–target interaction prediction by random walk on the heterogeneous network. Molecular BioSystems. 2012;8(7):1970–1978. doi: 10.1039/c2mb00002d. [DOI] [PubMed] [Google Scholar]

[B36] 36.Li J.-H., Liu S., Zhou H., Qu L.-H., Yang J.-H. StarBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein—RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Research. 2014;42(1):D92–D97. doi: 10.1093/nar/gkt1248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B37] 37.Xiao R., Tang P., Yang B., et al. Nuclear matrix factor hnRNP U/SAF-A exerts a global control of alternative splicing by regulating U2 snRNP maturation. Molecular Cell. 2012;45(5):656–668. doi: 10.1016/j.molcel.2012.01.009. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Predicting Long Noncoding RNA and Protein Interactions Using Heterogeneous Network Model

Ao Li

Mengqu Ge

Yao Zhang

Chen Peng

Minghui Wang

Abstract

1. Introduction

2. Materials and Methods

2.1. lncRNA-Protein Interactions

2.2. lncRNA Expression Similarity

2.3. Protein-Protein Interactions

2.4. The Heterogeneous Network

Figure 1.

2.5. LPIHN Method

2.6. Leave-One-Out Cross Validation Test

3. Results

3.1. Comparison with Other Network-Based Methods

Figure 2.

Table 1.

Figure 3.

Figure 4.

Figure 5.

Figure 6.

3.2. Comparison with Existing Methods

Figure 7.

3.3. Case Studies

Figure 8.

Table 2.

4. Conclusion

Supplementary Material

Acknowledgment

Conflict of Interests

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases