Increasing MicroRNA Target Prediction Confidence by the Relative R-squared Method

Hsiuying Wang; Wen-Hsiung Li

doi:10.1016/j.jtbi.2009.05.007

. Author manuscript; available in PMC: 2010 Aug 21.

Published in final edited form as: J Theor Biol. 2009 May 20;259(4):793–798. doi: 10.1016/j.jtbi.2009.05.007

Increasing MicroRNA Target Prediction Confidence by the Relative R-squared Method

Hsiuying Wang ¹, Wen-Hsiung Li ^2,^3,^§

PMCID: PMC2744435 NIHMSID: NIHMS119512 PMID: 19463832

Abstract

MicroRNAs (miRNAs) are short noncoding RNAs involved in post-transcriptional gene regulation via binding to mRNAs. Studies show that in a multicellular organism microRNAs (miRNAs) downregulate a large number of target mRNAs. However, predicting the target genes of a miRNA is challenging. Microarray expression profiling has been proposed as a complementary method to increase the confidence of miRNA target prediction, but it can become computationally costly or even intractable when many miRNAs and their effects across multiple tissues are to be considered. Here, we propose a statistical method, the relative R² method, to find high-confidence targets among the set of potential targets predicted by a computational method such as TargetScanS or by microarray analysis, when expression data of both miRNAs and mRNAs are available for multiple tissues. Applying this method to existing data, we obtain many high-confidence targets in mouse.

Keywords: microRNA, microarray, regression model, TargetScanS

1. Introduction

MicroRNAs (miRNAs), which are single-stranded RNAs of ~20–23 nucleotides, posttranscriptionally regulate gene expression. Computational and molecular cloning approaches have revealed hundreds of miRNAs in a variety of organisms (Ambros et al. 2003; Houbaviy et al. 2003; Lim et al. 2003; Kim et al. 2004). Many computational methods have been developed to predict miRNA targets; for example, TargetScanS predicts the targets of a miRNA by searching for the presence of conserved 8mer or 7mer sites that match the seed region of the miRNA. A small portion of these predicted targets have been experimentally validated, showing a relatively high accuracy for target prediction (Sethupathy et al. 2006).

Computational approaches currently use sequence complementarity and most of them also use evolutionary conservation to identify potential targets. It has been suggested that the false-discovery rate for computationally predicted targets is ~50% (Lewis et al. 2005; Farh et al. 2005). Besides the sequence complementarity approach, Grimson et al. (2007) pointed out that a crux for target recognition is ~7 nt sites that match the seed region of the miRNA. Since these seed matches are not always sufficient for repression, they uncovered five general features of site context that boost binding efficacy: AU-rich nucleotide composition near the site, proximity to sites for coexpressed miRNAs, proximity to residues pairing to miRNA nucleotides 13–16, positioning within the 3’UTR at least 15 nt from the stop codon, and positioning away from the center of long UTRs.

Profiling miRNA expression is very helpful for studying the biological functions of miRNAs, so it has been used to as a complementary method for discovering miRNA targets (Lim et al. 2005). However, this method can become computationally complicated when multiple miRNAs and their effects across multiple tissues are to be considered. To overcome this difficulty, we use statistical methods to build up a network of associations between the miRNAs and their target mRNAs.

In this study, the relative R² method is proposed to select high-confidence targets from predicted targets. The relative R² method can be explained statistically from the degree of fitness of a model in terms of a subset of independent variables.

A method for finding miRNA targets using Bayesian variation analysis was recently proposed by Huang et al. (2007). This method is complicated and requires extensive calculations. We apply our method to the same dataset used in Huang et al. (2007) and select 448 high-confidence targets such that the relative R² for each target reaches 0.995, which is considerably higher than those targets predicted by Huang et al. (2007), whose average relative R² is less than 0.9.

2. Methods

We introduce our method with the usual linear regression model, before considering the general model.

2.1. The regression model

Consider n miRNAs, z₁ , ……, z_n, and l tissues, t₁,……, t₁ . Assume that the expression levels of the n miRNAs in tissue t_j are z_1j ,……, z_nj . By prediction methods, such as TargetScanS and microarray analysis, potential targets for each of these n miRNAs can be predicted. Our method is to select high-confidence miRNA targets from the set of the predicted miRNA targets, using microarray expression data.

First, for an mRNA, we can find the miRNAs, say z₁ , ……, z_k , from the set of potential targets such that each of the miRNAs has this mRNA as its potential target. Our goal is to find from these k miRNAs the miRNAs that have a significant effect on the expression level of this mRNA. Assume that the expression levels for the mRNA in the l tissues are y₁ y₂ , ,.…, y_l . We fit the microarray expression data of the mRNA in terms of the microarray expression of the k miRNAs using the regression model

y_{j} = b_{0} z_{0 j} + b_{1} z_{1 j} + b_{2} z_{2 j} . \dots + b_{k} z_{kj} + ε_{j,} j = 1, \dots, l,

(1)

where ε_j is the error term.

In model (1), the best estimator of β = (b₀,b₁,b₂,…,b_k) ' is β̂ = (b̂₀ b̂₁,…, b̂_k) '= (Z^TZ)⁻¹Z^TY , where y = (y₁,y₂,.…,y_l),Z = (z_ij)ix (k+1) and (z₀₁,…,z_0l) (1,.…,1). Note that b₀ in (1) is the basal expression level of the mRNA and b_i is the weight that miRNA z_i affects the expression level of the mRNA.

Let f_i = (Zβ̂)_i be an estimator of y_i. Define $S S_{total} = \sum_{i} {(y_{i} - \bar{y})}^{2} and S S_{reg} = \sum_{i} {(f_{i} - \bar{y})}^{2}$ , where ȳ is the mean of y₁, y₂,.…,y_l . The R² for a linear regression model is defined as SS _reg/SS _total, which is a statistic that gives information about the goodness of fit of a model. In regression, the R² coefficient of determination is a statistical measure of how well the regression line approximates the real data points. A R² of 1.0 indicates that the regression line perfectly fits the data. R² is often interpreted as the proportion of response variation explained by the regressors in the model. Thus, R² = 1 indicates that the fitted model explains all variability in y, while R² = 0 indicates no linear relationship between the response variable and the regressors. However, we do not directly use R² in this study, but use the relative R² values as a criterion to choose high-confidence targets. The definition of relative R² is given later.

Our method of selecting miRNAs that significantly affect the level of the mRNA is first to rank the k miRNAs according to their p-values --- the smaller the p-value, the higher the rank. The p-value of miRNA z_i is defined as the probability

P (| W | \geq \frac{| {\hat{b}}_{i} |}{\sqrt{Var ({\hat{b}}_{i})}}),

which is the p-value used to test : H₀ :b_i =0, where W denotes the standard normal random variable. Note that $\hat{β} ~ N (β, {(Z^{T} Z)}^{- 1} σ^{2})$ and σ² can be estimated by sample variance ${\hat{σ}}^{2} = \sum_{i = 1}^{l} {(y_{i} - f_{i})}^{2} / (l - r)$ , where r denotes the rank of Z . Thus, Var(b̂_i) can be approximated by the ith diagonal element of (Z^TZ)⁻¹σ̂². Note that if the number of the tissues l is small, for obtaining more accurate probability approximation, we may use the T statistic to replace the standard normal random variable Z , where the T statistic follows the t distribution.

Rank the miRNA as the j th significant miRNA if its p-value is the j th smallest p-value. Calculate the R² for model (1), say g_k . Consider the miRNAs that have a p-value less than a critical value, say p₀ . Since a p-value is an indicator of the significance of the effect of the miRNA to the mRNA, it is reasonable to require that the p-values of the selected miRNAs are not large. For example, we can choose p₀ as 0.5 or less. We suggest choosing p₀ near 0.5 to reduce the chance that too few miRNAs are included in the analysis. Note that p₀ is not the main criterion in this approach; the main criterion is the relative R²method. Since the range of p-value is between 0 and 1, we may set the middle point 0.5 as a threshold.

Assume that there are m (m ≤ k) miRNAs, z₁,……, z_m , whose p-values are less than p₀ . We can use the m miRNAs to fit the microarray expression data of the mRNA. The model is

y_{i} = c_{0} z_{0 i} + c_{1} z_{1 i} + c_{2} z_{2 i} . \dots + c_{m} z_{mi} + ε_{i} i = 1, \dots, l,

(2)

Let c =(c₀,c₁,…c_m), Z_r = (Z_ij)lx(m+1) and

\hat{c} = ({\hat{c}}_{0}, {\hat{c}}_{1}, \dots ., {\hat{c}}_{m}) = {(Z_{r}^{T} Z_{r})}^{- 1} Z_{r}^{T} Y .

Denote the R² for the regression model (2) as g_m . If g_m / g_k ≥ S, then the m miRNAs are included in the set of the miRNAs each of which has a significant effect on the mRNA, where s can be chosen as 0.95 or larger. If g_m / g_k < s, then the m miRNAs are not included. Basically, the selection of the p₀ and s values can be based on the proportion of high confidence targets that we intend to obtain from the set of potential targets.

We define the value g_m / g_k as the relative R² . Instead of using the standard R² , we use the relative R ² values to evaluate the fitness of model (2). Since, from the potential target set, the k miRNAs are the only miRNAs that have significant effects on the mRNA, and the best R² that can be derived from the linear regression model using the k miRNAs, z₁ ,……, z_k , is g_k, it is reasonable to use the value of g_k as a base to evaluate the fitness of a regression model by using some variables in the set of {z₁ , ……, z_k} as dependent variables. Therefore, we can use the criterion of comparing g_m with g_k to select high-confidence miRNAs. It is possible that g_k is not high, such as the situation discussed in Section 4 later. Basically, if the correlations between an mRNA and each miRNA are not high across the l tissues, it is unlikely to find a model such that g_k is high because there is no strong dependent tendency between the expression of mRNA and the expression of the miRNAs.

However, even if g_k is not high, it is still possible that the mRNA is the true target for some miRNAs among these k miRNAs. Thus, we can use the relative R² method to select a set of more significant miRNAs, which simultaneously affect the expression of the mRNA.

Using the method, for each mRNA, we can assign a set of miRNAs such that these miRNAs significantly affect the expression of the mRNA in terms of the linear model. For a miRNA, we can collect the set of mRNAs such that each mRNA in the set is a significant potential target of this miRNA by the relative R² method. Then the mRNAs collected are the high-confidence targets of this miRNA.

Note that in order to eliminate the difference between the different tissues used in the analyses, we can first transform the expression data of mRNAs by normalizing the data in each tissue such that the scale of the expression data used in each tissue is the same. We use the normalized expression data when we apply the above approach to select high-confidence targets.

2.2. General model

The linear model used in Section 2.1 can be replaced by another kind of model, such as a nonlinear regression model. For a general model to fit the y_i by using {Z₁,.…, Z_k}, assume that f_i ' is the estimator of y_i derived under this model. Define $S {S^{'}}_{total} = \sum_{i} {(y_{i} - \bar{y})}^{2} and S {S^{'}}_{reg} = \sum_{i} {({f_{i}}^{'} - \bar{y})}^{2}$ for the model. For m Z_i ’s from the set {Z₁,.…, Z_k}, denoted as Z₁,.…, Z_m, we can also use the m miRNAs to derive the form of the model and the estimators for y_i . Then calculate the R² for the model based on the m miRNAs and compare it with the R² for the model based on the k miRNAs to derive the relative R² . Note that if the model is not a linear regression model, it may not be straightforward to derive the significant miRNAs for an mRNA by the p-value approach. It will depend on the form of the model to establish a test to select the significant miRNAs. However, to avoid the heavy calculation for deriving a test method for selecting significant miRNAs, for a set of miRNAs, we may directly calculate its relative R² and choose the set of miRNAs corresponding to the highest relative R² as the set of significant miRNAs. By a similar argument, the relative R² can be used as a criterion to select high-confidence targets of a miRNA.

Furthermore, it is feasible to apply the relative R² method to other criteria such as the adjusted R² , etc. Since the value of adjusted R² may be negative, a situation that requires more consideration, the application of the relative method under other criteria is currently under investigation.

3. Data Analyses

We now apply the new method to the data of Babak et al. (2004), which was used by Huang J.C. et al. (2007). The data set includes 1770 potential targets for 22 miRNAs across 17 tissues, which were predicted by Target-Scan in a dataset of 41699 mouse mRNAs in Babak et al. (2004) and Zhang et al. (2004). The 1770 potential targets are from 788 different mRNAs because some miRNAs have the same mRNA as their targets. The microarray expressions of the 41699 mRNAs across the 17 tissues can be represented by a 41699×17 matrix, and the microarray expressions of the 22 miRNAs across the 17 tissues can be represented by a 22×17 matrix. (The 22 miRNAs studied are let-7a, miR-1, miR-101, miR-107, miR-122a, miR-124a, miR-125b, miR-126, miR-133a, miR-16, miR-181a, miR-183, miR-194, miR-205, miR-22, miR-23b, miR-24, miR-26a, miR-29b, miR-34a, miR-92, and miR-93; and the 17 tissues studied are brain, femur, lung, heart, skeletal muscle, mammary gland, teeth, bladder, stomach, ES, spleen, embryo 12.5, embryo, placenta 9.5, embryo 9.5, small intestine, liver.)

To apply the relative R² method to an mRNA in the 788 mRNAs, we first normalize the expression data of the mRNAs using the 41699 expression data for each tissue. The normalization method is first to calculate the mean and standard deviation of the 41699 expression values for each tissue. Then, for each tissue, the normalized expression data is the original expression data minus the mean and then divided by the standard deviation. Since the data set included 41699 expression data points, we can use it as a reference to make the normalization such that the scale used in the expression data of mRNA for each tissue is the same.

We find the miRNAs such that the mRNA is one of the potential targets of these miRNAs from the 1770-target dataset, and then use a regression model to fit the normalized expression data of the mRNA. Huang et al. (2007) used the Bayesian variation method to derive high-confidence targets. This method is more complicated and computationally more costly than the relative R² method.

Using the present method, we can select high-confidence targets such that the relative R² for each target reaches 0.995, given p₀ =0.47. A total of 448 high-confidence targets are found and the average relative R² for these 448 targets is 0.999. Here the p₀ value is selected such that the number of about one-fourth targets in the 1770 targets can be selected by the method with the relative R² reaching 0.995.

The above dataset can also be used to conduct a random permutation test of our method. We test whether our method can select more high-confidence targets from the set of the 1770 potential targets predicted by TargetScanS than from a set that is constructed by randomly assigning each one of the 1770 targets to one of the 22 miRNAs. The random permutation was repeated ten times, and the average number of selected high-confidence targets over the ten times was 336 (s.d. ≈ 27), which is significantly lower than 448, the number of high-confidence targets found by our method (see above) with a p-value of 1.4×10⁻¹⁰ . The p-value is derived from viewing the two proportions of the selected targets by the random permutation method and the relative R²method as the proportions of two binomial distributions and from testing the equality of the two proportions by the normal approximation. In addition, we also compare the targets shared between the random permutation case and the selected high-confidence targets. The average number of the shared targets from several comparisons is 25. So most of the selected high-confidence targets are not the same as the targets selected by random permutation. This result upholds the relative R² method because (i) the method can select more targets than random permutation and most of the high-confidence targets are not the same as the targets selected by random permutation, and (ii) the method gives accordant results between the expression data analyses and TargetScanS analyses.

To make a more extensive comparison with the random permutation case, we conduct simulations for different cases by varying the values of p₀ and s. Figure 1 and Figure 2 show that the numbers of high-confidence targets selected from the 1770 potential targets are always greater than the numbers selected from the random permutation case. Figure 1 shows that the difference between the two numbers increases with p₀ , which reinforces the argument in Section 2 that the condition about the constraint of p₀ should not be too strict; otherwise, the advantage of the relative R² method is limited by this constraint.

Fig 1 — Relationship between the number of selected high-confidence targets and the p₀ threshold used. The solid and dotted lines denote the numbers of high-confidence targets selected by the relative R² method from the 1770 potential targets and from the dataset constructed by random permutation, respectively.

Fig 2 — Relationship between the number of selected high-confidence targets and the relative R² value threshold used. The solid and dotted lines denote the numbers of high-confidence targets selected by the relative R² method for the 1770 potential targets and the dataset constructed by random permutation, respectively, as s increases from 0.6 to 0.9, where s is the threshold of the relative R² value as defined in the text.

In addition, in Table 1 we present several sets of p₀ and _s values for each of which the number of the selected targets by the relative R² method with respect to these values is close to 450. It is seen that p₀ is an increasing function of s when the proportion of the selected targets is set to be a fixed value. To obtain a fixed proportion of targets, there are more than one set of p₀ and s values that can be chosen and the targets selected may be different with respect to different p₀ and s . One may be interested in which set is more appropriate here. As we mentioned above, it is not recommended to choose a small p₀ because it may confine the overall performance of the relative R₂ method. Besides, from the correlation analysis in Section 4, the confirmed targets and miRNA may not have a high correlation when we investigate their relations individually, but can reveal the relation when we consider their overall performance by the relative R² method. It corroborates the view that the selection of p₀ should be more relaxed, while the selection of s can be more strict. Therefore, we select p₀ as 0.47 and s as 0.995.

Table 1.

The 6 sets of p and s values for which the number of selected targets by the relative R² method is close to 450.

p	0.47	0.45	0.4	0.35	0.3	0.25
s	0.995	0.99	0.97	0.95	0.85	0.7

Open in a new tab

The ratio of the number of high-confidence targets found by our method to the total number of the potential targets (1770) for each of the 22 miRNAs is shown in Figure 3.

Fig 3 — The ratios of high-confidence targets for the 22 miRNAs selected by the relative R² method to the targets for the 22 miRNAs selected by TargetScanS.

We can also check our analysis with the literature. We recover several confirmed targets from the literature (Bartel 2004, Cimmino et al. 2005, Farh et al. 2005, Lim et al. 2005, and Lewis et al. 2005). These include the relationships between miR-92 and the mitogen-activated protein kinase kinase 4 (MAP2K4) gene, between miR-16 and the B-cell CLL/lymphoma 2 (BCL2) gene, between miR-124a and the solute carrier family 15 member 4 protein (SLC15A4) gene, and between miR-124a and the homeodomain interacting protein kinase 1 (HIPK1) gene. We also recover the relationship between miR-181a and the B-cell CLL/lymphoma 2 (BCL2) from Tarbase (Sethupathy et al. 2006).

4. Discussion

The relative R² method is proposed to analyze the data from the relative instead of from the absolute statistical point of view. If the correlation between the mRNA and miRNA is high, then we can directly adopt a standard statistical method to explore the high confidence targets. However, when the correlation between the mRNA and miRNA is not high, it is challenging to develop a statistical method to select correct targets. In such a case, if we use a standard statistical criterion, such as a high R² to select the high-confidence targets mRNAs, then no confirmed target mRNAs may be selected. In this case, it would be better to use a variable standard that is dependent on the mRNA and miRNAs under study, rather than using a single standard for all genes. The relative R² method can provide a variable standard to solve this problem.

We now discuss the relation between the confirmed targets and the miRNA by exploring their correlation coefficients and relative R² values. For analyzing the data and investigating the relationship of the miRNAs and their targets, before modeling the data, a good way is to investigate the relation of the miRNA with each individual potential target. Although this study is to find the effect of multiple miRNAs on a target, rather than the effect of a single miRNA, understanding the connection between the target mRNA and a miRNA is helpful for reinforcing the validity of the proposed method.

For example, let us consider the three mRNAs, HIPK1, MAP2K4 and BCL2, mentioned in Section 3 and explore the relationship between their correlation coefficient and the R² value.

First, consider the HIPK1 mRNA, which is a potential target for the four miRNAs, miR-124a, miR-181a, miR-26a and miR-92. Although we are interested in knowing how the four miRNAs affect the expression of the mRNA, we also can investigate the relationship between the expression of each miRNA and the expression of HIPK1. The correlation coefficients of the expression for the four miRNAs with the expression of HIPK1, across the 17 tissues, are shown in Figure 4.

By using the relative R² method, three of the four miRNAs are selected such that its relative R² reaches 0.995. The three selected miRNAs are miR-124a, miR-181a and miR-92 and their correlation coefficients with the HIPK1 mRNA are −0.566, 0.151 and −0.116, respectively (Figure 4(a)). Note that miR-26a, which is not selected, has the largest correlation coefficient 0.333. The correlations of two of the three selected miRNAs are negative, in agreement with the expectation that a miRNA usually downregulates its target mRNAs (Farh et al. 2005; Lim et al. 2005). The standard R² values for the linear model of fitting the expression data of HIPK1 using the expression level of the four miRNAs and the expression level of the three selected miRNAs are 0.622 and 0.621, respectively. In this case, we can see from Figure 4 that the correlation coefficients between HIPK1 and the four miRNAs are not high. Therefore, it is hard to construct a model to fit the expression data of HIPK1 in terms of the expression data of the four miRNAs. So, if we use the standard R² value, we may not be able to select any one of the three miRNAs, though one of the relations is confirmed in the literature. On the other hand, if we use the relative R² method by comparing the ratio of 0.621/0.622 to 1, the confirmed relationship can be selected.

For the confirmed MAP2K4 mRNA and its corresponding miRNA miR-92, their correlation coefficient is −0.030 and the R² is 0.198 (Fig. 4(b)). For the confirmed BCL2 mRNA and its corresponding miRNA miR-16 (Fig. 4(c)), their correlation coefficient is −0.027 and the R² is 0.104. Therefore, if we use the correlation coefficient or R² to select high confidence targets, these two confirmed mRNAs will not be selected. Because the two correlation coefficients are −0.030 and −0.027, we do not expect the two confirmed mRNAs to be selected by any statistical method from the absolute point of view. Instead, we need to use the relative criterion to select the targets because the coefficients of other miRNAs in the potential targets dataset are also not high. By including the other miRNAs in the potential targets dataset, we can construct the relative criterion to select the miRNAs such that the effects of the miRNAs on the expression level of the mRNA are found to be significant.

In addition, we present the overlap of the selected targets between Huang et al. 2007 and the relative R² method in Figure 5. The total number of overlaps for the 22 miRNAs is 142. The overlap numbers for the three miRNAs, miR-126, miR-183 and miR-122a, are zero. The total overlap number is not large, perhaps because the correlation between the miRNA and their targets mRNA is not high, which was validated from several confirmed relationship as we mentioned above. This may lead to the variation between different statistical approaches.

Fig. 5 — The number of overlaps between the predictions by Huang et al. (2007) and the relative R² method.

Besides, to estimate the accuracy of the relative R² method, we compare the number of relationship appeared in Tarbase, but not found in the relative R² method from the 1770 potential relationships. We found only two Tarbase interactions in the 1770 potential targets: the relationship between miR-181a and BCL2 mRNA and the relationship between miR-181a and HOXA11 mRNA. Only the relationship between miR-181a and BCL2 mRNA appeared in the 448 selected targets by relative R²method and in those selected by Huang et al. (2007). However, the other relationship between miR-181a and HOXA11 mRNA can be selected by the relative R²method if we relax the criteria by choosing 0.67 p₀ and s = 0.9999 , which leads to 715 selected targets. Note that at first we set up p₀ as 0.47 and s as 0.995, because we intended to obtain 25% of the targets (~ 450) among the 1770 potential targets as the high-confidence targets. But to coincide with Tarbase interactions, we can use the above new thresholds to select targets. The ratio of the number of the selected targets to the number of the potential targets is 715/1770 ≈ 0.4 . Thus, if we relax the thresholds to include 40% of the potential targets selected, then the two relationship found in Tarbase can be selected.

In summary, from the above discussions, combining results from the confirmed targets and theoretical statistical inference to develop methods for exploring the relationship between miRNA and mRNA can be more useful.

Acknowledgements

We thank Han Liang, Tsunglin Liu and Henry Lu for valuable suggestions. This study was supported by Academia Sinica, Taiwan, and by NIH grants GM30998 and GM081724.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

1.Ambros V, et al. A uniform system for microRNA annotation. RNA. 2003;9:277–279. doi: 10.1261/rna.2183803. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Babak T, Zhang W, Morris Q, Blencowe BJ, Hughes TR. Probing microRNAs with microarrays: tissue specificity and functional inference. RNA. 2004;10:1813–1819. doi: 10.1261/rna.7119904. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Bartel DP. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/s0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]
4.Cimmino A, et al. miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A. 2005;102:13944–13949. doi: 10.1073/pnas.0506654102. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Farh KKH, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. Science. 2005;310:1817–1821. doi: 10.1126/science.1121158. [DOI] [PubMed] [Google Scholar]
6.Grimson A, Farh KKH, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Molecular Cell. 2007;27:91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Houbaviy HB, Murray MF, Sharp PA. Embryonic stem cell-specific microRNAs. Dev Cell. 2003;5:351–358. doi: 10.1016/s1534-5807(03)00227-2. [DOI] [PubMed] [Google Scholar]
8.Huang JC, Morris QD, Frey BJ. Bayesian Inference of MicroRNA Targets from Sequence and Expression Data. Journal of Computational Biology. 2007;14:550–563. doi: 10.1089/cmb.2007.R002. [DOI] [PubMed] [Google Scholar]
9.Kim J, Krichevsky A, Grad Y, Hayes GD, Kosik KS, Church GM, Ruvkun G. Identification of many microRNAs that copurify with polyribosomes in mammalian neurons. Proc Natl Acad Sci U S A. 2004;101:360–365. doi: 10.1073/pnas.2333854100. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. Identification of tissue-specific microRNAs from mouse. Curr Biol. 2002;12:735–739. doi: 10.1016/s0960-9822(02)00809-6. [DOI] [PubMed] [Google Scholar]
11.Lau NC, Lim LP, Weinstein EG, Bartel DP. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science. 2001;294:858–862. doi: 10.1126/science.1065062. [DOI] [PubMed] [Google Scholar]
12.Lee RC, Ambros V. An extensive class of small RNAs in Caenorhabditis elegans. Science. 2001;294:862–864. doi: 10.1126/science.1065329. [DOI] [PubMed] [Google Scholar]
13.Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-y. [DOI] [PubMed] [Google Scholar]
14.Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]
15.Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. Vertebrate microRNA genes. Science. 2003;299:1540. doi: 10.1126/science.1080372. [DOI] [PubMed] [Google Scholar]
16.Lim LP, et al. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 2005;433:769–773. doi: 10.1038/nature03315. [DOI] [PubMed] [Google Scholar]
17.Saunders MA, Liang H, Li WH. Human polymorphism at microRNAs and microRNA target sites. Proc Natl Acad Sci U S A. 2007;104:3300–3305. doi: 10.1073/pnas.0611347104. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197. doi: 10.1261/rna.2239606. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Supplemental Data for Lewis et al. Cell. 120:15–20. http://web.wi.mit.edu/bartel/pub/Supplemental%20Material/Lewis%20et%20al%202005%20Supp/
20.Zhang W, et al. The functional landscape of mouse gene expression. J Biol. 2004;3:21–43. doi: 10.1186/jbiol16. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Ambros V, et al. A uniform system for microRNA annotation. RNA. 2003;9:277–279. doi: 10.1261/rna.2183803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Babak T, Zhang W, Morris Q, Blencowe BJ, Hughes TR. Probing microRNAs with microarrays: tissue specificity and functional inference. RNA. 2004;10:1813–1819. doi: 10.1261/rna.7119904. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Bartel DP. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. doi: 10.1016/s0092-8674(04)00045-5. [DOI] [PubMed] [Google Scholar]

[R4] 4.Cimmino A, et al. miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A. 2005;102:13944–13949. doi: 10.1073/pnas.0506654102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Farh KKH, Grimson A, Jan C, Lewis BP, Johnston WK, Lim LP, Burge CB, Bartel DP. Science. 2005;310:1817–1821. doi: 10.1126/science.1121158. [DOI] [PubMed] [Google Scholar]

[R6] 6.Grimson A, Farh KKH, Johnston WK, Garrett-Engele P, Lim LP, Bartel DP. MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Molecular Cell. 2007;27:91–105. doi: 10.1016/j.molcel.2007.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Houbaviy HB, Murray MF, Sharp PA. Embryonic stem cell-specific microRNAs. Dev Cell. 2003;5:351–358. doi: 10.1016/s1534-5807(03)00227-2. [DOI] [PubMed] [Google Scholar]

[R8] 8.Huang JC, Morris QD, Frey BJ. Bayesian Inference of MicroRNA Targets from Sequence and Expression Data. Journal of Computational Biology. 2007;14:550–563. doi: 10.1089/cmb.2007.R002. [DOI] [PubMed] [Google Scholar]

[R9] 9.Kim J, Krichevsky A, Grad Y, Hayes GD, Kosik KS, Church GM, Ruvkun G. Identification of many microRNAs that copurify with polyribosomes in mammalian neurons. Proc Natl Acad Sci U S A. 2004;101:360–365. doi: 10.1073/pnas.2333854100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Lagos-Quintana M, Rauhut R, Yalcin A, Meyer J, Lendeckel W, Tuschl T. Identification of tissue-specific microRNAs from mouse. Curr Biol. 2002;12:735–739. doi: 10.1016/s0960-9822(02)00809-6. [DOI] [PubMed] [Google Scholar]

[R11] 11.Lau NC, Lim LP, Weinstein EG, Bartel DP. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science. 2001;294:858–862. doi: 10.1126/science.1065062. [DOI] [PubMed] [Google Scholar]

[R12] 12.Lee RC, Ambros V. An extensive class of small RNAs in Caenorhabditis elegans. Science. 2001;294:862–864. doi: 10.1126/science.1065329. [DOI] [PubMed] [Google Scholar]

[R13] 13.Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell. 1993;75:843–854. doi: 10.1016/0092-8674(93)90529-y. [DOI] [PubMed] [Google Scholar]

[R14] 14.Lewis BP, Burge CB, Bartel DP. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell. 2005;120:15–20. doi: 10.1016/j.cell.2004.12.035. [DOI] [PubMed] [Google Scholar]

[R15] 15.Lim LP, Glasner ME, Yekta S, Burge CB, Bartel DP. Vertebrate microRNA genes. Science. 2003;299:1540. doi: 10.1126/science.1080372. [DOI] [PubMed] [Google Scholar]

[R16] 16.Lim LP, et al. Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 2005;433:769–773. doi: 10.1038/nature03315. [DOI] [PubMed] [Google Scholar]

[R17] 17.Saunders MA, Liang H, Li WH. Human polymorphism at microRNAs and microRNA target sites. Proc Natl Acad Sci U S A. 2007;104:3300–3305. doi: 10.1073/pnas.0611347104. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197. doi: 10.1261/rna.2239606. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Supplemental Data for Lewis et al. Cell. 120:15–20. http://web.wi.mit.edu/bartel/pub/Supplemental%20Material/Lewis%20et%20al%202005%20Supp/

[R20] 20.Zhang W, et al. The functional landscape of mouse gene expression. J Biol. 2004;3:21–43. doi: 10.1186/jbiol16. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Increasing MicroRNA Target Prediction Confidence by the Relative R-squared Method

Hsiuying Wang

Wen-Hsiung Li

Abstract

1. Introduction