Skip to main content
Bentham Open Access logoLink to Bentham Open Access
. 2013 Feb;8(1):37–45. doi: 10.2174/1574893611308010008

Integrative Approaches for microRNA Target Prediction: Combining Sequence Information and the Paired mRNA and miRNA Expression Profiles

Naifang Su 1,2, Minping Qian 1,3, Minghua Deng 1,2,3,*
PMCID: PMC3583062  PMID: 23467572

Abstract

Gene regulation is a key factor in gaining a full understanding of molecular biology. microRNA (miRNA), a novel class of non-coding RNA, has recently been found to be one crucial class of post-transactional regulators, and play important roles in cancer. One essential step to understand the regulatory effect of miRNAs is the reliable prediction of their target mRNAs. Typically, the predictions are solely based on the sequence information, which unavoidably have high false detection rates. Recently, some novel approaches are developed to predict miRNA targets by integrating the typical algorithm with the paired expression profiles of miRNA and mRNA. Here we review and discuss these integrative approaches and propose a new algorithm called HCTarget. Applying HCtarget to the expression data in multiple myeloma, we predict target genes for ten specific miRNAs. The experimental verification and a loss of function study validate our predictions. Therefore, the integrative approach is a reliable and effective way to predict miRNA targets, and could improve our comprehensive understanding of gene regulation.

Keywords: Expression profile, integrative analysis, miRNA, target prediction.

INTRODUCTION

Discovering gene regulation is one of the main goals in molecular biology. Specifically, uncovering the mechanisms underlying the expression of tumor related genes is a key factor in gaining a full understanding of cancer biology [1], which is also of great therapeutic significance.

While previously a great deal of study has focused on transcriptional factors (TFs), one crucial class of regulators at the transcriptional level, the post-transcriptional regulator microRNA (miRNA) has arrested much attention recently. miRNAs are a noval class of endogenous ~22nt noncoding RNAs. They down regulate gene expression through the following procedures. First, the primary miRNA are transcribed from “miRNA genes” or spliced from the intronic regions of their host genes. Then the primary miRNAs produce the miRNA precursors, and the final mature miRNAs. These miRNAs are combined with Argonaute (Ago) proteins to form RNA-induced silencing complexes (RISCs). RISCSs bind to the 3’- untranslated region of target mRNAs, which lead to their translational repression or degradation [2]. Hundreds of miRNAs have been annotated in human genome, and they are predicted to regulate up to one third of all protein-coding genes [3].

Experimental analysis has recognized that miRNAs control the key cellular processes such as growth, development and apoptosis [4]. It has been established that miRNAs make an important contribution to gene regulation in embryonic development and human disease, especially cancer [5-8]. Previous studies have verified that miRNAs can act as tumor suppressors or oncogenes and their dysregulation is widely involved in cancer initiation and progression [9], which enables their inhibition to be a novel therapeutic strategy for cancer [10].

An essential step and major challenge in understanding miRNA regulatory function is the identification of their target genes [11]. Since it is infeasible to carry out high thoughput experiments, only a small fraction of miRNA targets have experimental supports [12, 13]. Typically, the target prediction is achieved by computational approaches based on sequence analysis. A great deal of target prediction programs have been developed [14-18]. Among them, TargetScan [3, 19], PicTar [20] and miRanda [21] are the most common ones. Generally, they use the following principles to recognize miRNA targets: 1) seed match: the 6-8nt seed in miRNA 5’ part pair to the 3’ UTR region of their target mRNA; 2) thermodynamic stability: the free energy of the miRNA-mRNA hybrid is low; 3) conservation: miRNA target sites are conserved among several species. However, these sequence based approaches have high false-positive rate. It has been demonstrated that the false positive rate of TargetScan prediction is about 22-31% [22]. Since the seed match complementation could not discern the real targets effectively, great deals of fake targets are confounded.

With the development of high throughput technology, more and more miRNA and mRNA expression profiles have been achieved to investigate miRNA’s role in biological processes, especially cancer [9, 23]. Previous studies have revealed that miRNA greatly repress their target mRNAs, and mRNAs have significant expression changes after miRNA transfection or inhibition. It has also been verified that the expression of mRNAs targeted by highly expressed miRNAs are negative shifted compared with other mRNAs. Therefore, the significantly negative correlated miRNA-mRNA pairs have high potential to be the real target pairs [24].

Based on this idea, some novel strategies have been developed to predict miRNA targets by integrative analysis. They mainly use the paired miRNA and mRNA expression data, which profile miRNA and mRNA expression levels simultaneously from the same sample, to supplement the sequence prediction for the detection of actual miRNA targets.

In this article, we review and discuss the most recent integrative approaches for miRNA target predictions. We also develop a new method called HCtarget. We apply HCtarget to the expression data in multiple myeloma and evaluate the performance of our predictions.

REVIEW OF PREVIOUS APPROACHES

In the recent developed integrative approaches, there are roughly three ways to incorporate miRNA and mRNA expression profiles to the sequence prediction (Table 1): 1) directly consider the correlation between miRNA and mRNA expression; 2) formulate mRNA and miRNA expression with linear model with latent varialbes 3) use Bayesian network to model the miRNA-mRNA regulatory network.

Table 1.

Integrative Approach for miRNA Target Prediction

Name URL Reference
Correlation Based Approach
MMIA http://cancer.informatics.indiana.edu/mmia [25]
Peng et al. - [26]
mirConnX http://www.benoslab.pitt.edu/mirconnx [27]
MAGIA http://gencomp.bio.unipd.it/magia [28]
TargetMinner http://www.isical.ac.in/ bioinfo_miu/ [29]
ExprTarget http://www.scandb.org/apps/microrna/ [30]
miRGator http://genome.ewha.ac.kr/miRGator/ [31]
MirZ http://www.mirz.unibas.ch [32]
mimiRNA http://mimirna.centenary.org.au [33]
HOCTAR - [34]
Linear Mode Approach
GenmiR++ http://www.psi.toronto.edu/genmir/ [24, 35]
F. Stingo et al. - [364]
J. Li et al. - [37]
L. Lu et al. - [38]
Bayesian Network Approach
B. Liu et al. - [39]

CORRELATION BASED APPROACH

Since miRNA generally repress their target mRNAs, a straightforward way to validate miRNA targeting mRNAs is detecting whether their expressions are inversely correlated. Based on this idea, some recent approaches have been developed to integrate the correlations between miRNA and mRNA pairs for the target predictions.

MMIA [25] (miRNA and mRNA Integrated Analysis) is an integrated miRNA and mRNA analyzing web server. It incorporates the common miRNA target prediction algorithms TargetScan, PITA and PicTar, and restricts the predictions on the significantly up (down) expressed miRNAs and the corresponding down (up) expressed mRNAs. MMIA is a feasible and simple tool for integrating miRNAs and mRNA expression profiles. However, it only takes into account the significantly up and down expression features, and loses the information of their whole expression patterns and their correlations.

X. Peng et al. [26] develop this approach by considering the inverse expression relationships between miRNAs and mRNA. They calculate the Pearson correlations between every miRNA-mRNA pair, and select the significant inverse expression pairs to construct a binary miRNA-mRNA correlation network. Meanwhile, they build a miRNA-mRNA target network based on sequence analysis. Here they relax the prediction criteria to the seed match principle, without demanding phylogenetic conservation or thermodynamic stability, to provide a larger set of candidate targets. Finally, the correlation network and the target prediction network are intersected to provide an integrative miRNA-mRNA regulatory network. This approach proposes a new point of view for miRNA target prediction, which replaces some sequence criteria by the inverse expression relationships.

G. Huang et al. provide mirConnX [27], a web interface for inferring and displaying mRNA and miRNA regulatory network. It combines five prediction algorithms including PITA, miRANDA, TargetScan, RNAhybrid and Pictar to achieve an integrative target prediction score between each miRNA-mRNA pair. The experimental verified miRNA targets [12] are also incorporated. Meanwhile, mirConnX integrates the miRNA-mRNA expression profiles by calculating the correlations (Pearson, Spearman or Kendall) between miRNA-mRNA pairs. These correlations are converted to the probabilities of association. The target scores and the association probabilities are weighted summed to the final prediction scores, with a user defined weight. mirConnX has two innovations. First, besides Pearson correlation, it considers the non-parametric coefficients (Spearman or Kendal) and converts them to probabilities. When the sample size is small or there are outliers in the expression data, this correlation is more reliable. Second, the correlation network and the target network are weighted integrated instead of the simple intersecting.

MAGIA [28] (miRNA and genes integrated analysis) is a similar web tool for the integrative analysis. It extracts the target predictions from miRanda, PITA and TargetScan, and provide four approaches to integrate miRNA and mRNA expression profiles. 1)Similar to mirConnX, compute the Pearson or Spearman correlation coefficients between each predicted miRNA-mRNA pairs, and convert them to a false discovery rate. 2) Calculate the mutual information between a miRNA expression and a mRNA expression based on nearest neighbor distance. It could be regarded as a generalization of the Pearson correlation. 3) GenmiR++, which would be described in the following part. 4) Meta-analysis when miRNA and mRNA profiles are not paired. Users could select one or several approaches and take the intersection or union to display the combined regulatory network.

S. Bandyopadhyay et al. propose a new point of view to integrate the expression data [29]. Their approach TargetMiner is a support vector machine (SVM) classifier for miRNA target prediction. It incorporates expression profiles to construct a reliable training set. Previously, the training set are putatively extracted from experimentally verified miRNA targets (from Tarbase [12] and miRecords [13]), or sequence based predictions (from miRanda, TargetScanS, PicTar and DIANA-microT). However, the number of verified targets is pretty small, and the predictions have a significant number of false positive targets. TargetMiner propose a multi-stage filtering approach to identify the non-targets in these predictions. It first identifies tissue specific miRNAs and mRNAs by analyzing miRNA and mRNA expression profiles across several tissues, and then selects mRNA as non-argets if it is over-expressed in the same tissue with its corresponding miRNA. These candidate non-targets are further filtered by removing mRNAs with feasible miRNA-mRNA duplex stability or seed-site conservation. Combining the experimentally verified miRNA targets, TargetMinner achieve an integrative training data of miRNA targets and non-targets. A SVM classification model is built on this data, with 30 features extracted and selected from sequence site context information. The learned SVM classifier could efficiently predict miRNA targets. Generally, TargetMinner provide an integrative training data for learning a classifier. However, it only considering the expression pattern in the training procedure, without taking them as the classification features in the SVM model.

E. Gammazon et al. develop a new approach ExprTarget [30] by combining the sequence prediction approach and the expression features in the classification. Focus on a certain miRNA, ExprTarget constructs a logistic model as:

logitpi=logpi1pi=β0βpXip+βtXit+βmXim+βeXie

Here pi is the probability that mRNA i is a real target. Xip, Xit and Xim are the target prediction scores of mRNA i from Pictar, TargetScan and miRanda respectively. Xie is expression feature, defined as the p value of the general linear model between mRNA i and the miRNA. Note that if the estimated coefficient in the model is positive, Xie is set to 1. The coefficients β describe the contribution weights of different prediction algorithms. Extracting the experimental validate miRNA targets as training data, β could be estimated using logistic regression. By this means, ExprTarget provide the target probabilities for each mRNA. ExprTarget extends TargetMinner by incorporate the expression features in the classifier. This feature Xie could be regarded as a generalization of the Pearson correlation, so ExprTarget is also an extension of mirConnX, with the weights learned from experimental validate targets.

Beside the above approaches, there are some other web tools that combine miRNA-mRNA expression profiles with their target predictions. miRGator [31] integrates miRanda, PicTar and TargetScan target predictions, and displays the expression correlations between miRNA-mRNA pairs. The rank list of target mRNAs sorted by their correlations with the corresponding miRNA could also be provided. MirZ [32] incorporates smiRNAdb, a database containning miRNA sequencing profiles, and the ElMMo miRNA target prediction algorithm. It also integrates mRNA expression data and allow user to restrict the target prediction to specific mRNAs that expressed in a given cell type. mimiRNA [33] integrates expression data from human miRNAs and mRNAs across multiple tissues or cell types. It groups and separates miRNA or mRNA expression data into several tissues and cell types. The paired expression data could be visualized. mimiRNA also incorporates TargetScan, miRBase, RNA22 and PicTar. User could search the targets and the inverse expressed mRNAs for a given miRNA.

In addition, when miRNA expression data are not available, HOCTAR [34] (host gene oppositely correlated targets) could be employed. It considers that most human miRNAs are intragenic and are transcribed as part of their hosting transcription units, so the expression of miRNA host genes could be used as a proxy of the expression of the miRNA itself. Based on this idea, HOCTAR extracts a great deal of mRNA expression profiles and provides an average inverse correlated score between each mRNA and miRNA host gene pair. These scores are then integrated with the miRanda, TargetScan, and PicTar predictions.

LINEAR MODEL APPROACH

The previous approaches only consider the pairwise expression correlation between miRNA and mRNA. However, mRNA may be regulated by multiple miRNAs and its expression is affected synthetically by all the targeting miRNAs. Based on this idea, some novel methods have been developed to model miRNA’s combinatorial effect on their target mRNAs.

Among them, GenmiR++ (Generative model for miRNA regulation) is the most widely used approach [24, 35]. It characterizes mRNA expressions as a linear combination of the regulatory effects of their targeting miRNAs, and a variational Bayesian algorithm is used to learn the latent miRNA target indicators. It has been successfully applied on the paired miRNA and mRNA expression data among multiple tissues.

Let yit denote the expression level of mRNA i in tissue t and zjt denote the expression level of miRNA j in the same tissue, where i = 1, …, N, j = 1, …, M and t = 1, …, T. GenMiR++ take a linear model to formulate the mRNA expressions and the regulatory effects of their targeting miRNAs. A latent binary variable R is used to indicate the target relations, where rij = 1 if mRNA i is targeted by miRNA j, and 0 otherwise. The relationship between mRNA and miRNA expressions is formulated as:

yit=μtγtj=1Mλjrijzjt+εit

or

yi=μj=iMλjrijΓzj+εi,εiN0,Σ

here λj represents the regulatory effects of miRNA j, γt accounts for the expression scaling in tissue t, and µt is the background effect of tissue t.

The latent variable R indicates the target relations between miRNA and mRNA. Integrating the target predictions C from TargetScan, as cij = 1 if mRNA i is predicted to be targeted by miRNA j, and 0 otherwise, GenmiR++ assign R a Bernoulli distribution depend on C. That is rij ~ bernoulli (π) in the condition of rij = 1, and rij = 0 when cij = 0.

Assigning the prior as γt ~ N (1, s2) and λj ~ exp(α), GenmiR++ use a variational Bayesian algorithm to estimate the posterior distribution of rij. Since its form is complicated, instead of learning the real posterior, the variational Bayesian algorithm provide a factorized variational posterior for approximation [40]. By this means, the computation is simplified and the target probability could be achieved.

GenMiR++ has also been developed to GenmiR3 [41], with an alternative prior distribution and the parameter π is modified by integrating the sequence information such as the hybridization energy and context score.

GenMiR++ has been widely used to integrate the miRNA-mRNA expression data with the target predictions. However, it has several restrictions. First, originating from the experiments of different tissues, GenmiR++ characterizes miRNA’s relative effects among all tissues as a constant. This assumption may not hold when considering the experiments of different cancer patients. Since patients have much more varieties, their miRNA’s relative effects could not be regarded as a constant anymore. Second, GenMiR++ uses variational Bayesian algorithm to learn the parameters. The variational posterior may deviated from the real posterior. Its convergence rate is highly depends on the form of the likelihood and priors and may be extremely slow.

F. Stingo et al. [36] propose a similar linear approach. Different from GenmiR++, they don’t take into account the tissue effect, and consider that miRNA has distinct regulatory effect on differet mRNAs. Based on this idea, they propose a linear model to fomulate miRNA and mRNA expressions:

yi=j=1Mβijrijzj+εi,εiN0,σi2,i=1,....,N

here yi is the expression of mRNA i and zj is the expression of miRNA j. βij represents the effect of miRNA j on mRNA i, in GenmiR++ this term is uniformed to λj. Meanwhile, the target indicator rij is assigned with Bernoulli distribution, with a modified parameter:

πij=exp η+τ1cij1+τ2cij2+τ3cij3+τ4cij4+τ5cij51+exp η+τ1cij1+τ2cij2+τ3cij3+τ4cij4+τ5cij5

where Cij1 , Cij2 , Cij3 , Cij4 , and Cij5 , are the prediction scores of PicTar, miRanda, aggregate TargetScan, total TargetScan and PITA respectively.

With the prior βij ~ Gamma (1, i) and σi1Gamma δ+M2,d2 , the posterior distribution could be estimated using Metropolis-Hasting algorithm. Thus the posterior target scores prii=1|data are achieved to construct miRNA regulatory network.

However, since α are distinct for different miRNAs and mRNAs, the model has a great deal of parameters. Therefore, this approach is limited in high computational complexity.

J. Li et al. [37] also modify the model. They discretize mRNA expression to binary value yit = 1 or 0, which represent high or low expressions, then assume yit follow a logistic model: let qit = P(yit = 1),

logqit1qit=j=1Mϕjrijzjt+εt,i=1,....,N

Similar to GenmiR++, rij follow a bernoulli distribution depend on the TargetScan prediction cij with parameter π.

With the prior фj ~ exp(ф),ф ~ U(0,∞), εt ~ U(-50, 50) and ф ~ beta(1, 1), the posterior could be estimated using Gibbs sampling. They also apply the similar approach to study the relation between miRNA expression and protein abundance.

In this approach, the binary mRNA values lose the information of the whole expression profile.

The above approaches use Bayesian methodology for parameter estimation. On the other hand, Y. Lu et al. [38] incorporate a lasso regression model to predict miRNA targets. Moreover, they pay attention to the role of RISCs and assume that mRNA expression follow a linear model with its targeting RISCs. The RISC level could be obtained through the expression of its comprising miRNA and Ago proteins. There are four Ago proteins in human, and Ago2 is the essential one. Therefore, the model is:

yi=βi0+j=1McijβijzjAgo2+j=1McijϕijzjAgo134+εi

Here yi is the expression of mRNA i, zj is the expression of miRNA j, Ago2 is the expression of Ago2 mRNA and Ago134 is the combined expression of Ago1, Ago3 and Ago4 mRNA. cij indicates the target prediction relation from TargetScan and PicTar.

Then a multi-run lasso regression procedure is produced, and miRNAs are ranked by their estimated coefficients. With these ranked scores, the targeting miRNAs could be achieved.

However, this approach produces lasso regression for each mRNA separately. It will be time consuming when applying to a great deal of mRNAs.

BAYESIAN NETWORK APPROACH

Beside the linear model approach, some novel studies are developed to model the whole miRNA-mRNA regulatory network. Bayesian network, a probabilistic graphical model, has been widely used to discover the structure of gene networks [42]. It could also be applied to study the regulation between miRNA and mRNA [43]. Liu et al. [39] develop a new approach which use Bayesian network to learn the miRNA-mRNA regulatory network by integrating miRNA target prediction and expression profiles.

Denote miRNA and mRNA as nodes and their target relations as directed edges, the regulatory network could be modeled as a discrete Bayesian network G. The miRNA and mRNA expressions X are discretized to binary values 1 and 0, indicating high and low expressions. Let Nijk be the observed times that mRNA Xi is in state k (k = 1, …, ri, here ri = 2) with its parent miRNAs in state j (j = 1, …, qi), then X follow a multinomial distribution with parameter θijk = P (Xi = k|parent(Xi ) = j):

pX|θGi=1nj=1qik=1riθijkNijk

Assigning the Dirichlet prior to θ as

pθ|Gi=1nj=1qik=1riθijkNijk1

the Bayesian score of the network p(X|G) is given by [44]:

pX|G=i=1nj=1qiΓαijΓNij+αijΓk=1riΓNijk+αijkΓαijk

Here Nij=k=1riNijk ,αij=k=1riαijk and αijk = N / riqi, N is the sample size.

Network with the maximum score is selected as the learned Bayesian network, which is putatively achieved by exhaustive searching algorithm such as hill climbing. The searching space could be reduced by constraining the target relations within miRBase, PicTar and TargetScan predictions. By this means, Liu et al. analyze miRNA-mRNA expression profiles from multiple cell types and build Bayesian network for each cell type. These networks are then integrated to provide the significant miRNA-mRNA target relations.

Bayesian network is a reliable and accurate model for the regulatory network [42]. However, its learning algorithm has high computational complexity and is time consuming. Therefore, Bayesian network could not be applied to learn large-scale networks.

HCTARGET METHOD

Based on the above discussion, we propose a new algorithm called HCtarget (High Confident targets) to integrate expression and sequence information to detect miRNA targets. Our approach extends GenMiR++ and overcomes its restrictions in the following two ways. First, GenmiR++ characterizes miRNA’s relative effects among all tissues as a constant. We improved this constrain by re-defining the parameters of miRNA effects. Second, GenMiR++ uses variational Bayesian algorithm to approximate of the real posterior. Its convergence rate may be slow and the estimation is not stable. We use a classical Markov chain Monte Carlo (MCMC) algorithm to learn the posterior directly.

MODEL

Incorporating the notations in GenmiR++, we propose a linear model to formulate the relations between mRNA expressions and the regulatory effects of their targeting miRNAs as:

yit=β0t+j=1Mrijzjtβjt+εit,εitN0,σt2

here βjt represents the regulatory effects of miRNA j at sample t (in GenMiR++, this term is factored into the product of the tissue effect and the miRNA effect γt λj), and β0t is the background effect of sample t.

The goal of our model is to estimate the latent indicators R. Similarly, it follow a Bernouli distribution depend on the sequence prediction C. In the following discussion, we focus on the pair with cij = 1. The likelihood of R is:

pR|πijπcijrj1πcij1rij

here π can be regarded as the accuracy of the sequence based predictions. This assumption enables our model to cut down the false positive rate of the previous prediction.

Let Bt = (rijzjt), At = [ Bt], yt = (y1t, …., yNt)T, Z = (zjt), βt = (β0t, …, βMt)T and t = ( 1t, ...., Nt)T, we have the vector representation of our model: Yt=Atβt+t

MCMC ALGORITHM FOR STATISTICAL INFERENCE

Based on the above model, the likelihood of the observed data p(Y, Z, C, R|β, σ2, ф) is:

i,te1σt2yitj=1Mzjtrijβjtβ0t2i,jπcijrij1πcij1rij

To estimate the parameters θ = (β σ2, π) and latent variables R, we apply the Bayesian methodology and a MCMC algorithm [45]. With proper prior assumptions, the posterior of R and θ have simple forms and could be directly computed using a MCMC algorithm as the following iterations [46,47]: (i) sample the parameters θ conditional on the updated latent variable; (ii) sample the latent variable R conditional on the updated parameters.

UPDATE THE PARAMETERS

Given the non-informative prior pβt,σt2σt2 , the posterior distributions of βt and σt are

βt|σt2,Y|Nβˆt,AtTAt1σt2,σt2|Yυst2χυ2

where ν = N - M - 1 and

βˆt=AtTAt1AtTYt,YˆtT=AtTβˆt,s12=1υYtYˆtTYtYˆt

While for π, with the conjugate prior π ~ Beta (a0, b0), the posterior distribution is, ~ Beta(n1 + a0, n0 + b0), where n1=ijcijrij and n0=ijcij1rij

UPDATE THE LATENT VARIABLE

The marginal distribution of the latent variable

prijcij=1,Y,Z,θ
expt1T1σt2yitkzktrikβktβ0t2πcijrij1πcij1rij

Since

yitkzktrikβktβ0t2=yitkjzktrikβktβ0t2+qijtrij

here qijt denotes

zjt2βjt22yitzjtβjt+2kzktβktzjtβjtrik+2zjtβjtβ0t

The first term doesn’t contain rij, so

prijexpt1Tqijtσt2rijπcijrij1πcij1rij

that is, rij has Bernoulli marginal distribution p(rij|.) ~ bernoulli(pij) with updated probability

pij=π1πcijπ1πcij+expt=1Tqijtσt2

THE ALGORITHM OF HCTARGET

Based on the above discussion, we use a traditional MCMC approach to estimate the parameters and the latent variable iteratively:

  1. Initial βt, σt, R as βt = 1, σt = 1 and rij|cij =1 ~ bernoulli(0.5) .

  2. Update σt2 by sampling from υst2χυ2 , update βt by sampling from Nβˆt,AtTAt1σt2 and update π by sampling from beta(n1 + a0, n0 + b0).

  3. Given the updated parameters, sample the latent variable rij from Bernoulli (pij).

  4. Repeat the above two steps until convergence. Here the convergence is evaluated by Gelman and Rubin criteria [47]

We output pij, which represents the probability that miRNA j targets mRNA i given the data, for our final prediction. miRNA-mRNA pairs with pij larger than a certain threshold are the putative target pairs of our model. In the analysis of cancer expression data, we specify the threshold as 0.8, so that our selected miRNA targets covered nearly 50% of the sequence-based predictions, and they are comparable with GenMiR++ targets.

RESULTS

We applied HCtarget to study miRNA’s role in cancer. The computational predictions were extracted from TargetScanHuman (release 5.1). Several paired miRNA-mRNA expression datasets, such as breast cancer data (GSE19783), prostate cancer data(GSE7055) and multiple myeloma data(GSE17306) were downloaded from GEO database [48]. Since their results are similar, we took the multiple myeloma data as an example in our analysis. It profiled miRNA and mRNA expressions from 52 patients with multiple myeloma [9].

We selected multiple myeloma related miRNAs and mRNAs for our predictions. Ten miRNAs with the highest expression level were picked up, they are: hsa-let-7g, hsa-miR-142-3p, hsa-miR-148a, hsa-miR-16, hsa-miR-19b, hsa-miR-21, hsa-miR-26a, hsa-miR-29c, hsa-miR-370 and hsa-miR-494. Meanwhile 1000 mRNAs were selected, half with the highest expressions and half with the lowest expressions, since miRNA putatively repress gene expressions and may have secondary up-regulatory effects [49].

PERFORMANCE OF HCTARGET ON THE SIMULATION DATA

First, we generated a simulation data to compare the performance of GenMiR++ and HCTarget. The ten miRNA expression data Z were extracted from the real data from patients with multiple myeloma, where 1000 mRNA expressions Y were simulated from

yit=β0t+j=110rijzjtβjt+εit,i=1,...1000,t=1,....,52

here β0t, β0t and ε were generated from N(-0.3, 0.1), N(1,1) and N(0,1) respectively. The real target relations rij was obtained from Bernoulli(0.5) conditions on cij = 1, where cij represents the predictions in TargetScan.

Applying GenmiR++ and HCtarget on the simulation data, we computed their true positive rate and false positive rate with different cutoffs. Their ROC (Receiver operating characteristic) curves and AUC (the area under the ROC curve) values are shown in Fig. (1), which indicate that HCTarget has higher accuracy than GenMiR++.

Fig. (1).

Fig. (1)

The ROC curves of HCTarget and GenMiR++ for simulation data. Their AUC values are 0.95 and 0.91 respectively.

PREDICT miRNA TARGETS BASED ON CANCER EXPRESSION DATA

We then applied our HCtarget approach to the real miRNA-mRNA expression data to detect miRNA targets in cancer. TargetScan provides 1401 target pairs for our selected miRNAs and genes. HCtarget cuts down these predictions to 647, while 699 target pairs are obtained by GenMiR++.

To assess the robustness of HCtarget, we performed a series of permutation tests [24]. We permuted the gene labels 1000 times and generated 1000 random data sets. In these sets, the relationship between miRNAs and mRNAs are destroyed and their predicted target probabilities could be regarded as background. These permutations allow us to evaluate whether our model would be affected by introducing a great deal of fake targets into the candidates. Comparing the predictions of HCtarget for both permuted and original data, we found the probabilities leaned from the real data are significantly higher than the background. The p value of one side wilcoxon test is 0.1. In addition, the proportion of the probabilities bigger than 0.8 for the real data (46.2%) is higher than permuted data (44.1%). It illustrates that HCtarget could successfully discriminate the real target from the fake ones, which ensures its robustness in target prediction.

Furthermore, we extracted experimentally supported miRNA targets from Tarbase (v.5c) [12] to evaluate the accuracy of our approach. To compare Tarbase with our predictions, miRNAs were all mapped to miRNA families using the annotations in miRBase [50] For the multiple myeloma related miRNAs and mRNAs, three miRNAs and their 17 target genes have biological verifications. Nine of them are detected by HCtarget, while GenMiR++ only identifies two. The numbers of verified targets predicted by TargetScan, GenMiR++ and HCTarget as well as their precisions are listed in Table 2, which show that HCtarget could identify more accurate targets than GenMiR++. For example, mir-15 has nine supported targets, seven of them are detected by HCtarget, while GenMiR++ failed to identify any of them. It also indicates that HCtarget has higher precision (2.78%, 18 out of 647) than the original TargetScan (2.43%, 34 out of 1401).

Table 2.

Comparison with Tarbase

miRNA Family TargetScan GenMiR++ HCTarget
let-7 7 (3.57%) 2 (2.02%) 2 (2.15%)
mir-15 9 (4.02%) 0 (0) 7 (6.67%)
mir-29 1 (0.51%) 0 (0) 0 (0)
total 17 (2.76%) 3 (0.95%) 9 (3.01%)

VALIDATE HSA-MIR-16 TARGETS

Previous analysis suggests that hsa-miR-16 can act as a tumor suppressor in multiple myeloma [51]. We extracted a loss of function study profile of hsa-miR-16 from GEO database (GSE24522). It provided gene expression levels before and after hsa-miR-16 deletion [51]. We focused on genes with fold change larger than 1.5 as different expressed genes. For our 1000 genes, 132 genes were selected.

To validate our prediction, we compared our detected targets with these different expressed genes. TargetScan identifies 224 targets for hsa-miR-16, 34 of them have different expression levels when hsa-miR-16 is deleted (the p value of hyper-geometric test is 0.14). HCtarget, which cuts down the target genes to 105, provides 22 validated targets (p=0.006) (Fig. 2). This represents that HCtarget has more confirmed targets than TargetScan. In addition, GenmiR++ only detects 11 different expressed genes (p=0.72), which also validates the accuracy of HCtarget.

Fig. (2).

Fig. (2)

Venn diagram. It shows the overlap of different expressed genes with the predicted targets of targetScan and HCtarget.

GENE ONTOLOGY ENRICHMENT ANALYSIS

To have further investigation of our predicted targets, we analyzed their function annotations in Gene Ontology (GO) [52, 53]. For each miRNA target set detected by TargetScan and HCtarget respectively, we computed its GO enrichment p value using hyper geometric test. Considering multiple testing problems, these p values were corrected using FDR modification. For TargetScan, we found 107 (2.5%) functional target sets (with FDR<0.1). While there are 135 (3.1%) functional sets of GenmiR++ and HCtarget increases the number to 158 (3.7%). The comparison exhibits that the targets of HCtarget have significantly more consistent functional annotations.

Meanwhile, we selected the GO functions that significantly enriched (FDR<0.01) in hsa-miR-19b, which has been experimentally verified to be a key regulator in multiple myeloma [54]. They are: GO0034612 (response to tumor necrosis factor), GO0000723 (telomere maintenance), GO0006289 (nucleotide-excision repair), GO0006302 (double-strand break repair) and GO0045732 (positive regulation of protein catabolic process). The first annotation is significantly associated with multiple myeloma, the latter three ones are crucial functions in cell division, a key cellular process in cancer, while the last one is putative important in metabolism. These findings demonstrate that HCtarget could successfully identify the functional miRNA targets.

EXAMPLE

Based on the above findings, we further focused on a specific target pair to discover miRNA’s role in multiple myeloma. hsa-miR-19b was selected, and one of its targets detected by HCtarget is SULF1, which has been found to be a potent inhibitor of myeloma tumor growth [55]. We focused the patients with higher hsa-miR-19b expressions (with expression level larger than average), and discovered that the expression levels of SULF1 are significantly lower in these patients than in the other ones (the p values of the one side wilcoxon test is 0.1). Their cumulative distributions (Fig. 3) displays that the expression of SULF1 is negatively shifted when hsa-miR-19b is highly expressed. This example further confirms the significant down regulatory effects of hsa-miR-19b, and provides us a reliable target gene SULF1. We believe that this target pair plays a crucial role in multiple myeloma and could be served as effective candidates for the therapeutic treatment.

Fig. (3).

Fig. (3)

The down regulatory effect of hsa-miR-19b on SULF1. The cumulative distributions of the expression level of SULF1 in the sample with or without highly expressed hsa-miR-19b (red solid line and blue dashed line respectively).

CONCLUSION

In this paper, we review and discuss the integrative approaches that predict miRNA target genes by combining the sequence information and expression profiles.

We also propose a new algorithm, HCtarget. The simulation study and the robustness assessment confirm the accuracy of our approach. The investigations of the expression profiles in multiple myeloma also exhibit the well performance of HCtarget. Our model affords reliable targets of miRNA, which improve our understanding of miRNA’s roles in cancer. Such as the disease related target pair, hsa-miR-19b and SULF1, is beneficial for the further discovery and clinical treatment of multiple myeloma. Furthermore, selecting some other proper miRNA and mRNA expression profiles, HCtarget could be generalized to provide miRNA’s whole genome target predictions, which is helpful for the comprehensive discovering of miRNA’s regulatory effects.

Generally, the integrative approaches improve miRNA target predictions. They could be directly generalized to detect the target genes of TFs. In addition, previous studies demonstrated that TFs, or their cis-regulatory modules, have widely cooperation with miRNAs. Their combinatorial regulatory modules play important parts in gene regulation [56]. With accurate target predictions of miRNAs and TF, the integrative approaches could effectively construct gene regulatory network, which helps us to uncover the mechanisms underlying gene expression.

ACKNOWLEDGEMENTS

This work is supported by the National Natural Science Foundation of China (No.31171262, No.11021463), and the National Key Basic Research Project of China (No.2009CB918503).

CONFLICT OF INTEREST

The authors declare that they have no competing interests.

REFERENCES

  • 1. Volinia S, Calin GA, Liu CG, et al. A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci USA. 2006;103(7 ):2257–61. doi: 10.1073/pnas.0510565103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Bartel DP. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116(2 ):281–97. doi: 10.1016/s0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]
  • 3. Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120(1 ):15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
  • 4. Brosh R, Shalgi R, Liran A, et al. p53-repressed miRNAs are involved with E2F in a feed-forward loop promoting proliferation. Mol Syst Biol. 2008;4 doi: 10.1038/msb.2008.65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Marson A, Levine SS, Cole MF, et al. Connecting microRNA genes to the core transcriptional regulatory circuitry of embryonic stem cells. Cell. 2008;134(3 ):521–33. doi: 10.1016/j.cell.2008.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer. 2006;6:857–66. doi: 10.1038/nrc1997. [DOI] [PubMed] [Google Scholar]
  • 7. Care A, Catalucci D, Felicetti F, et al. MicroRNA-133 controls cardiac hypertrophy. Nat Med. 2007;13(5 ):613–8. doi: 10.1038/nm1582. [DOI] [PubMed] [Google Scholar]
  • 8. Wang X, Tang S, Le SY, et al. Aberrant expression of oncogenic and tumor-suppressive microRNAs in cervical cancer is required for cancer cell growth. PLoS One. 2008;3(7 ):e2557. doi: 10.1371/journal.pone.0002557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Zhou Y, Chen L, Barlogie B, et al. High-risk myeloma is associated with global elevation of miRNAs and overexpression of EIF2C2/AGO2. Proc Natl Acad Sci USA. 2010;107(17 ):7904–9. doi: 10.1073/pnas.0908441107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Obad S, dos Santos CO, Petri A, et al. Silencing of microRNA families by seed-targeting tiny LNAs. Nat Genet. 2011;43(4 ):371–8. doi: 10.1038/ng.786. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Li L, Xu J, Yang D, Tan X, Wang H. Computational approaches for microRNA studies: a review. Mamm Genome. 2010;21(1-2 ):1–12. doi: 10.1007/s00335-009-9241-2. [DOI] [PubMed] [Google Scholar]
  • 12. Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. Rna. 2006;12(2 ):192–7. doi: 10.1261/rna.2239606. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T. miRecords: an integrated resource for microRNA-target interactions. Nucleic Acids Res. 2009;37(Database issue ):D105–10. doi: 10.1093/nar/gkn851. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Barbato C, Arisi I, Frizzo ME, Brandi R, Da Sacco L, Masotti A. Computational challenges in miRNA target predictions: to be or not to be a true target? J Biomed Biotechnol. 2009;2009:803069. doi: 10.1155/2009/803069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136(2 ):215–33. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Min H, Yoon S. Got target?. Computational methods for microRNA target prediction and their extension. Exp Mol Med. 2010;42(4 ):233–44. doi: 10.3858/emm.2010.42.4.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Saito T, Saetrom P. MicroRNAs--targeting and target prediction. N Biotechnol. 2010;27(3 ):243–9. doi: 10.1016/j.nbt.2010.02.016. [DOI] [PubMed] [Google Scholar]
  • 18. Dai Y, Zhou X. Computational methods for the identification of microRNA targets. Open Access Bioinformatics. 2010;2:29–39. doi: 10.2147/OAB.S6902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Friedman RC, Farh KK, Burge CB, Bartel DP. Most mammalian mRNAs are conserved targets of microRNAs. Genome Res. 2009;19(1 ):92–105. doi: 10.1101/gr.082701.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Krek A, Grun D, Poy MN, et al. Combinatorial microRNA target predictions. Nat Genet. 2005;37(5 ):495–500. doi: 10.1038/ng1536. [DOI] [PubMed] [Google Scholar]
  • 21. Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in Drosophila. Genome Biol. 2003;5(1 ):R1. doi: 10.1186/gb-2003-5-1-r1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Bentwich I. Prediction and validation of microRNAs and their targets. FEBS Lett. 2005;579(26 ):5904–10. doi: 10.1016/j.febslet.2005.09.040. [DOI] [PubMed] [Google Scholar]
  • 23. Wang L, Oberg AL, Asmann YW, et al. Genome-wide transcriptional profiling reveals microRNA-correlated genes and biological processes in human lymphoblastoid cell lines. PLoS One. 2009;4(6 ):e5878. doi: 10.1371/journal.pone.0005878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Huang JC, Morris QD, Frey BJ. Bayesian inference of MicroRNA targets from sequence and expression data. J Comput Biol. 2007;14(5 ):550–63. doi: 10.1089/cmb.2007.R002. [DOI] [PubMed] [Google Scholar]
  • 25. Nam S, Li M, Choi K, Balch C, Kim S, Nephew KP. MicroRNA and mRNA integrated analysis (MMIA): a web tool for examining biological functions of microRNA expression. Nucleic Acids Res. 2009;37((Web Server issue) ):W356–62. doi: 10.1093/nar/gkp294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Peng X, Li Y, Walters KA, et al. Computational identification of hepatitis C virus associated microRNA-mRNA regulatory modules in human livers. BMC Genomics. 2009;10:373. doi: 10.1186/1471-2164-10-373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Huang GT, Athanassiou C, Benos PV. mirConnX: conditionspecific mRNA-microRNA network integrator. Nucleic Acids Res. 2011;39(Web Server issue ):W416–23. doi: 10.1093/nar/gkr276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Sales G, Coppe A, Bisognin A, Biasiolo M, Bortoluzzi S, Romualdi C. MAGIA, a web-based tool for miRNA and Genes Integrated Analysis. Nucleic Acids Res. 2010;38(Web Server issue ):W352–9. doi: 10.1093/nar/gkq423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Bandyopadhyay S, Mitra R. TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples. Bioinformatics. 2009;25(20 ):2625–31. doi: 10.1093/bioinformatics/btp503. [DOI] [PubMed] [Google Scholar]
  • 30. Gamazon ER, Im HK, Duan S, et al. Exprtarget: an integrative approach to predicting human microRNA targets. PLoS One. 2010;5(10 ):e13534. doi: 10.1371/journal.pone.0013534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Nam S, Kim B, Shin S, Lee S. miRGator: an integrated system for functional annotation of microRNAs. Nucleic Acids Res. 2008;36(Database issue ):D159–64. doi: 10.1093/nar/gkm829. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Hausser J, Berninger P, Rodak C, Jantscher Y, Wirth S, Zavolan M. MirZ: an integrated microRNA expression atlas and target prediction resource. Nucleic Acids Res. 2009;37(Web Server issue ):W266–72. doi: 10.1093/nar/gkp412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Ritchie W, Flamant S, Rasko JE. mimiRNA: a microRNA expression profiler and classification resource designed to identify functional correlations between microRNAs and their targets. Bioinformatics. 2010;26(2 ):223–7. doi: 10.1093/bioinformatics/btp649. [DOI] [PubMed] [Google Scholar]
  • 34. Gennarino VA, Sardiello M, Avellino R, et al. MicroRNA target prediction by expression analysis of host genes. Genome Res. 2009;19(3 ):481–90. doi: 10.1101/gr.084129.108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Huang JC, Babak T, Corson TW, et al. Using expression profiling data to identify human microRNA targets. Nat Methods. 2007;4(12 ):1045–9. doi: 10.1038/nmeth1130. [DOI] [PubMed] [Google Scholar]
  • 36. Stingo FC, Chen YA, Vannucci M, et al. A Bayesian graphical modeling approach to microRNA regulatory network inference. Ann Appl Stat. 2010;4:2024–48. doi: 10.1214/10-AOAS360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Li J, Min R, Bonner A, Zhang Z. A probabilistic framework to improve microrna target prediction by incorporating proteomics data. J Bioinform Comput Biol. 2009;7(6 ):955–72. doi: 10.1142/s021972000900445x. [DOI] [PubMed] [Google Scholar]
  • 38. Lu Y, Zhou Y, Qu W, Deng M, Zhang C. A Lasso regression model for the construction of microRNA-target regulatory networks. Bioinformatics. 2011;27(17 ):2406–13. doi: 10.1093/bioinformatics/btr410. [DOI] [PubMed] [Google Scholar]
  • 39. Liu B, Li J, Tsykin A, Liu L, Gaur AB, Goodall GJ. Exploring complex miRNA-mRNA interactions with Bayesian networks by splitting-averaging strategy. BMC Bioinformatics. 2009;10:408. doi: 10.1186/1471-2105-10-408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Attias H. A variational Bayesian framework for graphical models. In: Solla SALTKMKR, editor. Advances in Neural Information Processing Systems 12; 2000. Cambridge: M I T Press; 2000. pp. 209–15. [Google Scholar]
  • 41. Huang JC, Frey BJ, Morris Q. Comparing Sequence and Expression for Predicting microRNA Targets Using GenMIR3. Pacific Symposium on Biocomputin 2008. 2008:52–63. doi: 10.1142/9789812776136_0007. [DOI] [PubMed] [Google Scholar]
  • 42. Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7(3-4 ):601–20. doi: 10.1089/106652700750050961. [DOI] [PubMed] [Google Scholar]
  • 43. Lu L, Li J. A combinatorial approach to determine the contextdependent role in transcriptional and posttranscriptional regulation in Arabidopsis thaliana. BMC Syst Biol. 2009;3:43. doi: 10.1186/1752-0509-3-43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Neapolitan RE. Learning bayesian networks. Pearson Prentice Hall Upper Saddle River, NJ. 2004.
  • 45. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian data analysis. Chapman and Hall. 2004.
  • 46. Sun N, Carroll RJ, Zhao H. Bayesian error analysis model for reconstructing transcriptional regulatory networks. Proc Natl Acad Sci USA. 2006;103(21 ):7988–93. doi: 10.1073/pnas.0600164103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Tanner MA. Tools for statistical inference: Methods for the exploration of posterior distributions and likelihood functions. Springer Verlag. 1996.
  • 48.NCBI: http://www.ncbi.nlm.nih.gov .
  • 49. Tu K, Yu H, Hua YJ, et al. Combinatorial network of primary and secondary microRNA-driven regulatory mechanisms. Nucleic Acids Res. 2009;37(18 ):5969–80. doi: 10.1093/nar/gkp638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:D154–D8. doi: 10.1093/nar/gkm952. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Gatt ME, Zhao JJ, Ebert MS, et al. MicroRNAs 15a/16-1 function as tumor suppressor genes in multiple myeloma. Blood. 2011;117(26 ):7188. doi: 10.1182/blood-2011-04-348722. [DOI] [PubMed] [Google Scholar]
  • 52.GO: http: //www.geneontology.org/
  • 53. Deng M, Tu Z, Sun F, Chen T. Mapping Gene Ontology to proteins based on protein-protein interaction data. Bioinformatics. 2004;20(6 ):895–902. doi: 10.1093/bioinformatics/btg500. [DOI] [PubMed] [Google Scholar]
  • 54. Pichiorri F, Suh SS, Ladetto M, et al. MicroRNAs regulate critical genes associated with multiple myeloma pathogenesis. Proc Natl Acad Sci USA. 2008;105(35 ):12885–90. doi: 10.1073/pnas.0806202105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Dai Y, Yang Y, MacLeod V, et al. HSulf-1 and HSulf-2 are potent inhibitors of myeloma tumor growth in vivo. J Biol Chem. 2005;280(48 ):40066–73. doi: 10.1074/jbc.M508136200. [DOI] [PubMed] [Google Scholar]
  • 56. Su N, Wang Y, Qian M, Deng M. Combinatorial regulation of transcription factors and microRNAs. BMC Syst Biol. 2010;4:150. doi: 10.1186/1752-0509-4-150. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Current Bioinformatics are provided here courtesy of Bentham Science Publishers

RESOURCES