Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2008 Sep 24;105(39):14885–14890. doi: 10.1073/pnas.0803169105

Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes

Michal Rabani *, Michael Kertesz *, Eran Segal *,
PMCID: PMC2567462  PMID: 18815376

Abstract

Messenger RNA molecules are tightly regulated, mostly through interactions with proteins and other RNAs, but the mechanisms that confer the specificity of such interactions are poorly understood. It is clear, however, that this specificity is determined by both the nucleotide sequence and secondary structure of the mRNA. Here, we develop RNApromo, an efficient computational tool for identifying structural elements within mRNAs that are involved in specifying posttranscriptional regulations. By analyzing experimental data on mRNA decay rates, we identify common structural elements in fast-decaying and slow-decaying mRNAs and link them with binding preferences of several RNA binding proteins. We also predict structural elements in sets of mRNAs with common subcellular localization in mouse neurons and fly embryos. Finally, by analyzing pre-microRNA stem–loops, we identify structural differences between pre-microRNAs of animals and plants, which provide insights into the mechanism of microRNA biogenesis. Together, our results reveal unexplored layers of posttranscriptional regulations in groups of RNAs and are therefore an important step toward a better understanding of the regulatory information conveyed within RNA molecules. Our new RNA motif discovery tool is available online.

Keywords: bioinformatics, motif prediction, posttranscriptional regulation, RNA secondary structure, SCFGs


RNA molecules undergo diverse posttranscriptional regulation of gene expression, including regulation of RNA transport and localization, mRNA translation, and RNA decay (13). In many cases, such posttranscriptional regulation occurs through elements on the mRNA molecule that interact with the hundreds of RNA binding proteins (RBPs) that exist in the cell (4). A well-known example is the iron-responsive element (IRE), a secondary structure RNA motif located on UTRs of members of the iron metabolism and transport pathway (5). The binding of the RBPs Irp1 and Irp2 to IRE elements affects the translation rate of the mRNA, and by that coordinates the response to changing levels of iron in the environment. Other examples include a 118-nucleotide stem–loop structure through which mRNAs are transported to the yeast bud tip by the RBP She2 (6) and the RBP Sbp2 that is involved in mediating UGA redefinition from a stop codon to selenocysteine by binding specific stem–loop structures, termed selenocystein insertion site (SECIS) elements, in the 3′ UTR of selenoproteins (7). In other cases, elements on the mRNA molecule interact with other RNAs that direct the regulatory effect. For example, the recognition and binding affinity of a microRNA to its mRNA target is determined by both the sequence and structure of the target mRNA (811).

The examples above suggest that the posttranscriptional regulation of mRNAs is determined not only by its linear nucleotide sequence but also by its secondary structure. Thus, a key goal is to understand the involvement of mRNA secondary structures in such regulation. One approach is to identify recurring patterns, termed motifs. However, linear sequence motifs, which are commonly found in DNA sequences, are not suitable in this case. Instead, we wish to identify motifs that combine primary and secondary structural elements and are therefore better suited to describe functional elements in RNA molecules.

Here we develop RNApromo (RNA prediction of motifs), a new computational method to identify short structural RNA motifs in sets of long unaligned RNAs. Using our method, we identify putative motifs in sets of mRNAs with substantial experimental evidence for a common posttranscriptional regulation, and we support these findings with cross-validation analysis. In some cases, sequence conservation of the putative motifs provides strong independent support for our findings. The identified motifs include motifs for mRNAs with similar decay rates, mRNAs that are bound by the same RBP, and mRNAs with a common cellular localization. Additionally, analysis of pre-microRNA structures identified differences between animals and plants in the sizes of stem and loop structures of pre-microRNAs, suggesting that animals and plants use different mechanisms for microRNA biogenesis.

Results and Discussion

A New Computational Scheme for RNA Motif Discovery.

Several tools exist for finding local structural motifs in a set of long input RNAs. Many existing works use free energy minimization considerations for predicting local motifs, by applying one of three major schemes: use of a sequence-based local alignment to build a motif consensus structure (12, 13); identification of common structures in the RNAs' predicted minimal free energy folds (1416); or simultaneous alignment of the RNAs and prediction of their secondary structure (1720). Recently, some graph theoretical techniques were also proposed for this task (21, 22). Yet some of the most successful approaches for motif discovery are based on advanced probabilistic models, which are highly suitable to capture the observed variation in the input set.

Stochastic context-free grammars (SCFGs) are a class of probabilistic models proposed for modeling common sequence and structure in a set of input RNAs (23, 24), which replaced the thermodynamic considerations in several existing RNA motif discovery tools (25, 26). However, currently available SCFG applications optimize the model's parameters by essentially considering all possible secondary structures of the input sequences. This approach is successful mainly when it is possible to exploit covariation between organisms to infer the secondary structure at the motif position. However, when one wishes to identify short motifs, using data from a single organism, covariation data cannot be used. Moreover, enumerating all possible secondary structures results in a rather high time complexity of the algorithm, making it unfeasible to scan large RNA sets or long RNA sequences.

Here, we devised a new SCFG-based method for finding local motifs in a set of unaligned RNAs, which restricts the search space to a predefined and limited number of structures for each input RNA. Yet which structures should be considered by the algorithm? Ideally, experimental information about RNA structure can be used, and only the few structures that are consistent with such structural data are considered. Even though such experimental structural information currently exists for only a small number of cases, such information may soon be available. Alternatively, using existing thermodynamic-based secondary structure prediction programs (14, 27) (with ∼50%–90% prediction accuracy [28]) for predicting a small set of thermodynamically stable folds and restricting the algorithm to those, allows both reduction of the search space and integration of thermodynamic considerations into our model, which are not fully embedded into standard covariance model applications. Certainly, other structure prediction tools, such as those based on probabilistic models (29), can also be used to derive a set of highly probable folds as an input to our method.

Our algorithm, called RNApromo, takes as input a set of RNA sequences assumed to share a motif, and their suggested secondary structures. The algorithm first identifies specific and relatively short candidate structures that appear in as many inputs as possible. These candidates are then used as seeds for a probabilistic inference algorithm that refines the predicted motif using statistical estimation (see supporting information (SI) for a full description).

To examine the performance of RNApromo, we first tested its ability to identify known RNA structural motifs in a large collection of validation sets from the Rfam database (30). We used a fivefold cross-validation scheme, in which we partition the input set into five parts, learn a model from each of the possible combinations of four sets, and use this model to assign likelihood scores to the RNAs that were held out while learning it. We use the standard receiver operating characteristic (ROC) curve and its associated area under the curve (AUC) measure to evaluate the significance of the input RNAs' likelihood scores compared with shuffled sequences. We filtered each set to include only sequences with <90% sequence similarity, because high sequence similarity may produce high AUC scores even when a functional motif is not present. The maximal sequence similarity in our validation sets ranges from 30% to 90% (see SI for details). Although high AUC scores indicate that the input RNAs share a biological signal, we still have to validate that this signal results from a common motif. To test that, we applied the same motif discovery scheme to a collection of random sets that include randomly selected sequences from all of the validation sets. Because we do not expect to identify motifs in these random sets, their AUC scores should be close to 0.5, indicating that in the absence of a common motif in the input set, no signal is detected.

The results (Fig. 1A) show that whereas the AUC scores of the random sets are indeed distributed around 0.5, with only 5% of the scores above 0.6, more than 60% of the AUC scores of the true sets are above 0.6. This indicates that in most cases, RNApromo indeed detects the biological signal when it is present, yet it rarely detects such a signal if there is no common motif in the input RNAs. Similar results were obtained using a different method to predict the structure of the input RNAs (29) (see SI).

Fig. 1.

Fig. 1.

Validation of motif discovery scheme. (A) Distribution of AUC scores for motifs in the true (dark blue) vs. permuted (light blue) sets (using ViennaRNA predicted folds). AUC scores of permuted sets are distributed around 0.5, whereas AUC for motif-containing sets are usually higher. (B) Prediction of three known human motifs: HFD (Top), IRE (Middle), and SECIS (Bottom). Motifs are represented using a structural logo of the motif's most probable structure. Specific positions are color coded according to their probability (green-to-red scale for sequence, and gray scale for structure). Shown is the predicted motif logo and the known consensus structure (Left), with several examples of correctly classified motifs (Right). The motif position is annotated in green, and the 5′ end of the motif is circled. The FTH1 UTR (in the IRE part), which was misfolded at the IRE loop, is shown in red. Sequence conservation profile (average across all of the motif instances) is also shown (Rightmost Column).

Repeating the same analysis with other available tools (26) shows that RNApromo is compatible with these tools in terms of results but requires shorter running times (see SI). This result is notable, given that the Rfam database includes motif instances from several organisms, and in such a setting, tools that exploit covariation information have an advantage over RNApromo.

Next, we tested our method's ability to identify and correctly describe known biological motifs when covariation data is minimal. We analyzed three well-known motifs from human: histone-fold domain (HFD), iron-responsive element (IRE), and seleno-cystein insertion site (SECIS), each sharing a different level of sequence similarity. Although three examples cannot provide global statistics, they can still allow a better evaluation of our method's abilities in such a setting.

Using RNApromo, we detected a motif in all three sets (Fig. 1B): HFD (AUC = 0.95, P < 5 × 10−16), IRE (AUC = 0.86, P < 7 × 10−4), and SECIS (AUC = 0.59, P < 0.05). The predicted consensus structures are highly similar to those described in the literature and usually match the known motif positions (94% of the HFD instances, 60% of the IRE instances, and 47% of the SECIS instances). In the misclassified cases, the motif position was not folded into the correct structure by the prediction algorithm. Note that the SECIS consensus structure includes two noncanonical G-A base pairs, which standard folding algorithms cannot predict. Nonetheless, RNApromo identifies a motif in this set and predicts its structure quite accurately. The predicted SECIS motif obviously does not include the noncanonical base pairs, yet the sequence specificity of these positions is detected. In two cases, the sequence conservation at the motif positions is also remarkably high: HFD (P < 5 × 10−16) and IRE (P < 2 × 10−4). In the third case, the conservation signal is relatively weak, demonstrating that structural motifs are not always associated with high sequence conservation.

Repeating the analysis of those three sets using CMfinder (26) results in the identification of only two of the three motifs (the SECIS motif is not identified). Moreover, the HFD consensus includes only a 5-bp (rather than 6-bp) stem (see SI). Because the comparison is done only on three sets, we cannot draw definite conclusions but only suggest that in the absence of covariation data, RNApromo may be better suited to predict RNA structural motifs.

Taken together, these examples demonstrate the ability of RNApromo to correctly identify and characterize the right (short) RNA motif from an input of (long) unaligned RNAs, including both its structure and sequence elements, without any prior knowledge of its length, location, or structure, and in the presence of noise in the folding input. Importantly, RNApromo also performs well with minimal covariation information and therefore can be used to predict motifs for a single organism.

Motifs Involved in Modulating mRNA Decay Rates.

Having validated that our computational scheme can detect known biological motifs, we used it to predict novel motifs. We applied RNApromo to 3′ and 5′ UTR sequences of several sets of genes for which substantial evidence suggests that they share common posttranscriptional regulation.

We first filtered these sets to include only sequences with <90% sequence similarity. Using a similar cross-validation scheme, we then assigned an AUC score to each set and evaluated its statistical significance relative to a background distribution. For sets with significant AUC scores, we also build a model of the motif. As independent support for our findings, we evaluate the statistical significance of the average sequence conservation at the predicted motif positions. Because the learning process itself is done on mRNAs from a single organism, no evolutionary information is used during the training process. Therefore, a significant level of conservation provides an independent biological signal that further supports the findings and suggests that the motif that was identified in one organism could be functional in additional organisms. However, motifs with low sequence conservation may still be functional, either because they are conserved at the structure level rather than the sequence level or because they are specific to the tested organism.

Recently, genome-wide decay rates of yeast mRNAs were measured (3), but the investigators were unable to detect any significant relationships between the measured mRNA half-lives and codon usage, ORF lengths, or primary sequence motifs. It is therefore possible that mRNAs with similar decay rates exhibit a common motif through which their costability is controlled. To test this hypothesis, we created sets of genes with similar decay rates and applied our motif discovery scheme to them.

Intriguingly, we identify a motif in 3′ UTRs of 75 mRNAs with a measured short half-life of ≤6 min (AUC = 0.61, P < 10−3). The predicted motif sequence is AU rich and folds into a stem–loop structure with a relatively short loop (Fig. 2A). Furthermore, this motif shows a high conservation profile (P < 5 × 10−3), as an independent support for its biological relevance. We also identify a motif in the 3′ UTRs of 240 mRNAs with a measured long half-life of ≥60 min (AUC = 0.59, P < 8 × 10−4). Once again, the motif sequence is AU rich, yet the structural context is different and includes a large unstructured U-rich loop followed by a short stem (Fig. 2B).

Fig. 2.

Fig. 2.

mRNA decay motifs. (A and B) Fast- and slow-decaying mRNAs. Shown is the predicted motif (Left) and the sequence conservation profile, along with top-scoring motif examples (Right). The motif position is annotated in green, and the 5′ end of the motif is circled. (C) Predicted motifs for RBPs mRNA targets. Shown are the motif consensus and sequence conservation profile. (D) Half-life distribution for the top 20% of targets of the Puf proteins (blue line) and Pub1 (red line). Puf targets have faster decay rates than Pub1 targets.

The role of AU-rich elements located on the 3′ UTR of mRNAs in modulating mRNA stability, both as stabilizing and destabilizing elements, has long been known (31). Our results suggest that the structural context within which these sequence elements are embedded determines their activity: a small loop will induce destabilization, whereas a long U-rich loop will stabilize the mRNA.

Many studies have shown that much of the posttranscriptional regulation of mRNAs in the cell occurs through their interactions with the hundreds of different cellular RBPs. It is therefore possible that the motifs we identified in the fast- and slow-decaying mRNAs are bound by specific proteins that modulate mRNA stability. In an attempt to identify such proteins, we selected several available sets of genome-wide measurements of mRNAs bound by the same RBP and tried to identify common motifs in them.

Proteins from the Puf family of RBPs have been reported to bind UGUR motifs located in the 3′ UTR of their targets and thereby repress gene expression by affecting mRNA stability and translation rate. A recent study in yeast (32) measured the set of RNAs bound by five RBPs from the Puf family, resulting in groups of 40–220 bound mRNAs per protein. By applying a DNA motif discovery tool to the 3′ UTR primary sequence of the mRNAs bound by each Puf protein, the investigators were able to identify distinct 10-nucleotide RNA sequence motifs containing the UGUR element in mRNAs interacting with Puf3, Puf4, and Puf5. Applying RNApromo to the targets of each of the five Puf proteins, we identify a significant motif in the 3′ UTRs of Puf3 (AUC = 0.57, P < 2 × 10−3), Puf4 (AUC = 0.58, P < 8 × 10−5), and Puf5 (AUC = 0.59, P < 4 × 10−5) targets. The three motifs are somewhat similar (Fig. 2C) and include an AU-rich sequence that is folded into a stem–loop structure with a relatively short loop. For the Puf5 motif, we find a significant level of sequence conservation (P < 10−4), as an independent support for the predicted motif. Moreover, the Puf5 consensus structure includes the UGU element, which is part of the previously proposed Puf5 sequence motif (in 63% of the predicted Puf5 target sites).

The yeast polyU binding protein, Pub1, an embryonic lethal abnormal visual (ELAV)-like RBP with mammalian homologues, is known to play key roles in cellular mRNA decay. Pub1 was shown to bind with high affinity different target mRNAs with AU-rich sequences in their 3′ UTR and stabilize them (33). In a recent study, a genome-wide measurement of Pub1 targets identified 368 target transcripts. Applying RNApromo to the 3′ UTRs of Pub1 targets, we predict a significant motif (AUC = 0.6, P < 2 × 10−5), which is also highly conserved (P < 5 × 10−10). The predicted motif is AU rich and folds into a stem–loop structure that includes a long and highly U-rich loop (Fig. 2C).

Looking at the motifs we predict for these RBPs, we notice their similarity to the mRNA decay motifs: in targets of Pub1, a protein that is known to stabilize its targets, we identify a very similar motif to the slow-decay motif, whereas in targets of the Puf proteins, which were suggested to increase mRNA degradation rates, we identify a motif with a similar structure to the fast-decay motif. Indeed, fast-decaying mRNAs (half-life <6 min) are enriched with Puf proteins targets (P < 8 × 10−16), whereas slow-decaying mRNAs (half-life >40 min) are enriched for Pub1 targets (P < 2 × 10−3). Moreover, looking at the decay rates of the top 20% of targets of each protein (scored by the predicted motifs), we see (Fig. 2C) that the half-lives of the Puf targets are lower (18 ± 13 min on average) than the Pub1 targets (28 ± 32 min on average).

Finally, Sam68 is a human RBP involved in cell growth regulation, which was shown to bind to AU-rich motifs within a stem–loop context (34). Analyzing its eight known targets, we predict a stem–loop motif (AUC = 0.73, P < 2 × 10−3). On the basis of the previous observations and the structure of this motif, we can hypothesize that Sam68 promotes the degradation of its mRNA targets. If true, this would indicate that this destabilization motif is conserved between human and yeast.

Overall, our results suggest a role for mRNA secondary structure in controlling mRNA stability. The detailed secondary structure of the well known AU-rich elements may play a significant role in determining their function as stabilizing or destabilizing elements. By binding different RBPs, AU-rich elements with different structure can change the mRNA stability and therefore affect its translation rates.

Motifs Involved in mRNA Localization.

RNA transport and local translation have now been documented in vertebrates, invertebrates, and unicellular organisms, enabling cells to control gene expression at small localized regions. Several studies suggest that cellular localization of mRNAs is specified by RNA motifs on the localized mRNA and bound by RBPs involved in trafficking (5). The RBPs involved in several localization processes were identified experimentally and shown to interact with elements composed of both sequence and structure (6). We thus applied our motif discovery scheme to several sets of mRNAs that share similar localization patterns.

Recently, a large study on mRNA localization during fly embryonic development was performed (35). This study provides a large set of colocalized mRNA transcripts, both to specific parts of the embryo and to distinct cellular compartments. Although some of the mechanisms for mRNA localization during development are known and involve morphogen gradients that induce local transcription of mRNA, these are not active at the subcellular level or for maternal mRNAs. Therefore, it is possible that some of the documented localization events are a result of signals on the mRNA itself. Applying RNApromo to the 94 sets of colocalized mRNAs, we detect significant motifs in nine sets of colocalized maternal and embryonic transcripts (Fig. 3A). Interestingly, whereas mRNA stability motifs are located on 3′ UTRs, most predicted localization motifs are located on the 5′ UTR of the transcripts.

Fig. 3.

Fig. 3.

Localization motifs. (A) Fly embryonic localization patterns. Consensus structure logos of the statistically significant RNA motifs. The location of each set in the localization annotation hierarchy is indicated. (B) RNA motif predicted for mRNAs that are localized to mouse dendrites.

Of the nine predicted motifs, five (56%) are identified in sets of maternal transcripts with a common cellular localization, although such sets constitute only 7% of the localization patterns (P < 5 × 10−5). One interesting example is the motif predicted in maternal transcripts localized to the spindle midzone, the central area of the spindle where microtubules from opposite poles overlap. This localization pattern is evident during developmental stages 4 to 5 (at the beginning of blastoderm cellularization), and it is possible that these maternal transcripts are involved in early cellularization processes in the embryo. Quite surprisingly, very similar motifs were independently identified in both 3′ and 5′ UTRs of these transcripts, suggesting that they both function in the same localization event.

Because only posttranscriptional mechanisms can induce embryonic localization of maternal mRNAs, we can also expect to identify common RNA motifs in maternal mRNAs with similar embryonic localization patterns. Indeed, we identified a motif in apically localized maternal transcripts, providing a possible mechanism for the localization of maternal transcripts to distinct embryonic locations.

Finally, we identified motifs in three sets of zygotic transcripts that localize to unique subsets of the blastodermal nuclei. One motif also shows a significant sequence conservation profile. Because it is less likely that RNA motifs are involved in localization of transcripts to specific parts of the embryo, especially after cellularization processes begin, we hypothesize that these motifs are involved in other posttranscriptional regulations, whereas the expression in specific groups of blastoderm cells can be determined by another mechanism.

We also predicted motifs for other (non-embryonic) mRNA localization events. mRNA localization is particularly important in neurons, where the plastic modulation of synaptic connections requires local changes of gene expression. Indeed, we predict a motif in 97 dendrite-localized mRNAs that were recently identified experimentally in mouse hippocampal neurons (AUC = 0.61, P < 2 × 10−5) (36), suggesting that it may be involved in the localization of these transcripts to dendrites (Fig. 3B).

Overall, we were able to predict motifs in several sets of mRNAs that are localized to similar cellular compartments, both in fly embryos and mouse neurons. These results demonstrate the potential use for mRNA structural motifs in producing local mRNA concentration in the cell, which may be beneficial for the local translation of proteins.

Target Recognition by the Pre-microRNA Processing Machinery.

MicroRNAs are a class of small (21–24 nucleotides) noncoding RNAs that play a significant role in regulating gene expression and mRNA stability (37). Mature microRNAs are produced from endogenous transcripts (pri-microRNAs) with a stem–loop structure. During microRNA biogenesis, the Drosha RNase recognizes these transcripts and cleaves them to produce the pre-microRNA stem–loops. The pre-microRNA is further processed into a mature microRNA through a second cleavage event by the Dicer family of RNases. It is known that Drosha recognizes its targets by their stem–loop structure, and several elements of this structure were demonstrated to be particularly important for this recognition (38), yet the exact features are still unclear. To produce a more accurate model of this structural motif, we applied RNApromo to pre-microRNAs of different organisms. We expect this collection to represent the preferences of the pre-microRNA recognition mechanism.

The identified motifs take, as expected, the shape of a stem loop with almost no bulges or internal loops, and with relatively weak sequence signals (Fig. 4A). This does not mean that the pre-microRNA structures do not contain bulges (in fact they usually do), but rather that there is no tendency across the different pre-microRNA structures for a bulge in a specific position. Yet the pre-microRNAs consensus structures of different organisms are not identical and can be divided into two groups on the basis of two key features: the stem length and the loop length. Intriguingly, this division matches two distinct groups of organisms: animals (metazoa) and plants (Fig. 4A). Plant pre-microRNAs have a small loop (5.6 bases on average) and a long stem (average 38.1 bp), whereas animal pre-microRNAs usually have a longer loop (average 10.9 bases) and shorter stem (average 32.8 bp). Another evident difference between animals and plants pre-microRNA is their overall length, which is much higher in plants (average 160 bases) than in animals (average 88 bases). Finally, consistent with the fact that viruses are known to use the host cell machinery to process their microRNAs (39), the stem size of animal viruses' pre-microRNAs is closer to that of animals (average 9.4 bases). No data for plant viruses were available for comparison.

Fig. 4.

Fig. 4.

Pre-microRNA motifs. (A) Bar graphs showing the loop and stem size of the motifs predicted for pre-microRNAs of different organisms. (B) Examples of pre-microRNA–predicted consensus structure for selected species. (C) Examples of genomic transcript stem–loops predicted consensus structure for selected species.

The Drosha enzyme is active inside the nucleus, where many other transcripts besides those of pri-microRNAs are produced. Assuming it recognizes and cleaves only pri-microRNA stems, we would expect pri-microRNA stems to have specific structural features, distinguishing them from stem–loop structures that appear in other genomic transcripts. To test this hypothesis, we applied RNApromo to stem–loops that appear in arbitrary genomic transcripts. Strikingly, we find that in plants, the length of the stem and loop of arbitrary stem–loop-containing transcripts is very similar to the lengths of the plant pri-microRNAs, whereas in animals, stem–loops of arbitrary transcripts and stem–loops of animal pri-microRNAs have markedly different lengths. Thus, our results suggest that in animals, the nuclear RNase Drosha recognizes and cleaves its pri-microRNA targets, perhaps by measuring the length of their loop, among other features. Longer loops, which are characteristic of pri-microRNAs, enable the enzyme to specifically recognize these transcripts. These results further suggest that in plants, Drosha recognition is not part of the microRNA biogenesis process, and thus a different mechanism may exist.

These intriguing results are supported by the observation that although in animals cleavage by Drosha is essential for pre-microRNA export to the cytoplasm and further processing by Dicer, most Dicer homologues in plants are already localized to the nucleus (37). Moreover, the existing literature provides no evidence of a plant Drosha homologue and no evidence for accumulation of pre-microRNAs after a Dicer knockout in plants (40). Thus, it is possible that plant pri-microRNAs, in a similar way to siRNAs precursors, are directly recognized and cleaved by Dicer. Indeed, plant pre-microRNAs have long stems that are similar to the long dsRNA precursors that generate siRNAs, and the mature microRNAs, like mature siRNAs, usually induce degradation of their targets rather than a translational arrest.

Conclusions

In summary, we have developed and applied a novel computational tool, RNApromo, that identifies short RNA motifs from the full-length RNA regions in which they are embedded. Unlike other available tools, RNApromo restricts the motif search to a predefined set of input structures, which allows it to perform well even on sets of mRNAs from a single organism, reduces the time complexity, and allows the use of thermodynamic-based methods for predicting RNA secondary structure.

Using RNApromo, we predict two structurally different and AU-rich motifs in 3′ UTRs of fast-decaying and slow-decaying mRNAs and identify proteins that may bind these motifs and modulate the stability of the bound mRNAs. Although the involvement of AU-rich elements in this process is known (31), our analysis reveals the importance of the structural context in which these elements are embedded. Next, we predict motifs in sets of colocalized transcripts in mouse neurons and fly embryos, which can be involved in establishing local cellular concentrations of mRNAs. Finally, we analyze pre-microRNA sequences and reveal that in animals Drosha may recognize its pri-microRNA targets by measuring, among other features, the length of their loop. Moreover, our analysis suggests that plant microRNAs are processed similarly to other siRNAs, in a mechanism that requires only the Dicer enzyme. Overall, the predicted motifs represent novel and experimentally testable findings and demonstrate the potential of RNApromo for uncovering posttranscriptional regulatory mechanisms. Our work thus represents a step toward the long-term goal of making the genome-wide computational prediction of RNA secondary structure elements as routine and robust as is currently practiced for linear DNA elements, thereby revealing some of the mechanisms by which RNA molecules are regulated within the cell.

Methods

See the SI for full details on our methods and on the datasets we used, as well as for additional results. An online version and the implementation of the RNApromo algorithm are available from the authors' website or upon request.

Supplementary Material

Supporting Information

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. S.R.E. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/cgi/content/full/0803169105/DCSupplemental.

References

  • 1.Arava Y, et al. Genome-wide analysis of mRNA translation profiles in Saccharomyces cerevisiae. Proc Natl Acad Sci USA. 2003;100:3889–3894. doi: 10.1073/pnas.0635171100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Shepard KA, et al. Widespread cytoplasmic mRNA transport in yeast: Identification of 22 bud-localized transcripts using DNA microarray analysis. Proc Natl Acad Sci USA. 2003;100:11429–11434. doi: 10.1073/pnas.2033246100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang Y, et al. Precision and functional specificity in mRNA decay. Proc Natl Acad Sci USA. 2002;99:5860–5865. doi: 10.1073/pnas.092538799. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Anantharaman V, Koonin EV, Aravind L. Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 2002;30:1427–1464. doi: 10.1093/nar/30.7.1427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hentze MW, Muckenthaler MU, Andrews NC. Balancing acts: Molecular control of mammalian iron metabolism. Cell. 2004;117:285–297. doi: 10.1016/s0092-8674(04)00343-5. [DOI] [PubMed] [Google Scholar]
  • 6.Olivier C, et al. Identification of a conserved RNA motif essential for She2p recognition and mRNA localization to the yeast bud. Mol Cell Biol. 2005;25:4752–4766. doi: 10.1128/MCB.25.11.4752-4766.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Krol A. Evolutionarily different RNA motifs and RNA-protein complexes to achieve selenoprotein synthesis. Biochimie. 2002;84:765–774. doi: 10.1016/s0300-9084(02)01405-0. [DOI] [PubMed] [Google Scholar]
  • 8.Kertesz M, et al. The role of site accessibility in microRNA target recognition. Nat Genet. 2007;39:1278–1284. doi: 10.1038/ng2135. [DOI] [PubMed] [Google Scholar]
  • 9.Robins H, Li Y, Padgett RW. Incorporating structure to predict microRNA targets. Proc Natl Acad Sci USA. 2005;102:4006–4009. doi: 10.1073/pnas.0500775102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Long D, et al. Potent effect of target structure on microRNA function. Nat Struct Mol Biol. 2007;14:287–294. doi: 10.1038/nsmb1226. [DOI] [PubMed] [Google Scholar]
  • 11.Zhao Y, Samal E, Srivastava D. Serum response factor regulates a muscle-specific microRNA that targets Hand2 during cardiogenesis. Nature. 2005;436:214–220. doi: 10.1038/nature03817. [DOI] [PubMed] [Google Scholar]
  • 12.Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002;319:1059–1066. doi: 10.1016/S0022-2836(02)00308-X. [DOI] [PubMed] [Google Scholar]
  • 13.Gautheret D, Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol. 2001;313:1003–1011. doi: 10.1006/jmbi.2001.5102. [DOI] [PubMed] [Google Scholar]
  • 14.Hofacker IL, et al. Fast folding and comparison of RNA secondary structures. Monatshefte fur Chemie. 1994;125:167–188. [Google Scholar]
  • 15.Hochsmann M, et al. Local similarity in RNA secondary structures. Proc IEEE Comput Soc Bioinform Conf. 2003;2:159–168. [PubMed] [Google Scholar]
  • 16.Pavesi G, et al. RNAProfile: An algorithm for finding conserved secondary structure motifs in unaligned RNA sequences. Nucleic Acids Res. 2004;32:3258–3269. doi: 10.1093/nar/gkh650. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sankoff D. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J Appl Math. 1985;45:810–825. [Google Scholar]
  • 18.Gorodkin J, Heyer LJ, Stormo GD. Finding the most significant common sequence and structure motifs in a set of RNA sequences. Nucleic Acids Res. 1997;25:3724–3732. doi: 10.1093/nar/25.18.3724. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Harmanci AO, Sharma G, Mathews DH. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics. 2007;8:130. doi: 10.1186/1471-2105-8-130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Will S, et al. Inferring noncoding RNA families and classes by means of genome-scale structure-based clustering. PLoS Comput Biol. 2007;3:e65. doi: 10.1371/journal.pcbi.0030065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ji Y, Xu X, Stormo GD. A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences. Bioinformatics. 2004;20:1591–1602. doi: 10.1093/bioinformatics/bth131. [DOI] [PubMed] [Google Scholar]
  • 22.Hamada M, et al. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics. 2006;22:2480–2487. doi: 10.1093/bioinformatics/btl431. [DOI] [PubMed] [Google Scholar]
  • 23.Eddy SR, Durbin R. RNA sequence analysis using covariance models. Nucleic Acids Res. 1994;22:2079–2088. doi: 10.1093/nar/22.11.2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Sakakibara Y, et al. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Res. 1994;22:5112–5120. doi: 10.1093/nar/22.23.5112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Holmes I. Accelerated probabilistic inference of RNA structure evolution. BMC Bioinformatics. 2005;6:73. doi: 10.1186/1471-2105-6-73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yao Z, Weinberg Z, Ruzzo WL. CMfinder—A covariance model based RNA motif finding algorithm. Bioinformatics. 2006;22:445–452. doi: 10.1093/bioinformatics/btk008. [DOI] [PubMed] [Google Scholar]
  • 27.Wuchty S, et al. Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers. 1999;49:145–165. doi: 10.1002/(SICI)1097-0282(199902)49:2<145::AID-BIP4>3.0.CO;2-G. [DOI] [PubMed] [Google Scholar]
  • 28.Wiese KC, Hendriks A. Comparison of P-RnaPredict and mfold—Algorithms for RNA secondary structure prediction. Bioinformatics. 2006;22:934–942. doi: 10.1093/bioinformatics/btl043. [DOI] [PubMed] [Google Scholar]
  • 29.Do CB, Woods DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22:e90–e98. doi: 10.1093/bioinformatics/btl246. [DOI] [PubMed] [Google Scholar]
  • 30.Griffiths-Jones S, et al. Rfam: Annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005;33:D121–D124. doi: 10.1093/nar/gki081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Barreau C, Paillard L, Osborne HB. AU-rich elements and associated factors: Are there unifying principles? Nucleic Acids Res. 2005;33:7138–7150. doi: 10.1093/nar/gki1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Gerber AP, Herschlag D, Brown PO. Extensive association of functionally and cytotopically related mRNAs with Puf family RNA-binding proteins in yeast. PLoS Biol. 2004;2:E79. doi: 10.1371/journal.pbio.0020079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Duttagupta R, et al. Global analysis of Pub1p targets reveals a coordinate control of gene expression through modulation of binding and stability. Mol Cell Biol. 2005;25:5499–5513. doi: 10.1128/MCB.25.13.5499-5513.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Itoh M, et al. Identification of cellular mRNA targets for RNA-binding protein Sam68. Nucleic Acids Res. 2002;30:5452–5464. doi: 10.1093/nar/gkf673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lecuyer E, et al. Global analysis of mRNA localization reveals a prominent role in organizing cellular architecture and function. Cell. 2007;131:174–187. doi: 10.1016/j.cell.2007.08.003. [DOI] [PubMed] [Google Scholar]
  • 36.Zhong J, Zhang T, Bloch LM. Dendritic mRNAs encode diversified functionalities in hippocampal pyramidal neurons. BMC Neurosci. 2006;7:17. doi: 10.1186/1471-2202-7-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.He L, Hannon GJ. MicroRNAs: Small RNAs with a big role in gene regulation. Nat Rev Genet. 2004;5:522–531. doi: 10.1038/nrg1379. [DOI] [PubMed] [Google Scholar]
  • 38.Han J, et al. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell. 2006;125:887–901. doi: 10.1016/j.cell.2006.03.043. [DOI] [PubMed] [Google Scholar]
  • 39.Cullen BR. Viruses and microRNAs. Nat Genet. 2006;38(Suppl):S25–S30. doi: 10.1038/ng1793. [DOI] [PubMed] [Google Scholar]
  • 40.Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol. 2006;57:19–53. doi: 10.1146/annurev.arplant.57.032905.105218. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
0803169105_ST1.xls (87KB, xls)
0803169105_ST2.xls (63.5KB, xls)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES