Abstract
Despite an increasing number of experimentally determined RNA structures, the gap between the number of structures and that of RNA families is still growing. To overcome this limitation, efficient and reliable RNA modeling methodologies must be developed. In order to reach this goal, here, we show how triloop sequence–structure relationships have been inferred through a systematic analysis of all triloops found in available high-resolution structures. The structural annotation of all triloops allowed us to define discrete states of the triloop's conformational space, and therefore an explicit sequence-to-structure relation. The sequence–structure relationships inferred from this explicit relation are presented in a convenient modeling table that provides a limited set of possible three-dimensional structures given any triloop sequence. The table is indexed by the two nucleotides that form the triloop's flanking base pair, since they are shown to provide the most information about the triloop three-dimensional structures. We also report the observations in the X-ray crystallographic structures of important conformational variations, which we believe might be the result of RNA dynamic.
Keywords: RNA, triloop, motif, structure, comparative analysis, three-dimensional modeling
INTRODUCTION
Single-stranded RNAs fold in three-dimensional (3D) space and adopt a diversity of conformations conferring various biological functions. A recent accumulation of high-resolution X-ray crystallographic structures (Berman et al. 2000) offers an opportunity to study further RNA architecture. In particular, the various ribosomal RNA (rRNA) structures (Ban et al. 2000; Wimberly et al. 2000; Harms et al. 2001; Nissen et al. 2001; Brodersen et al. 2002) show many repeated structural motifs in the context of their hosts. Generally acknowledged and studied motifs include the loop E (Varani et al. 1989; Wimberly et al. 1993; Szewczak and Moore 1995), the GNRA tetraloop (Jucker and Pardi 1995; Jucker et al. 1996), pseudoknots (Giedroc et al. 2000), the kink-turn (Klein et al. 2001), the A-minor motif (Nissen et al. 2001), the T-loop (Nagaswamy and Fox 2002), the UNR U-turn (Gutell et al. 2000), and the lone-pair triloop (Lee et al. 2003).
In the meantime, in order to study RNA structures systematically, our laboratory developed a series of computational tools that are compliant with the RNA ontology (Leontis et al. 2006). Given a set of 3D structures, the MC-Annotate computer program interprets and labels the RNA base-pairing and base-stacking interactions (Gendron et al. 2001; Lemieux and Major 2002), whereas the MC-Search computer program determines the locations of user-defined structural motifs (Hoffmann et al. 2003; Olivier et al. 2005). Given the large and increasing amount of high-resolution RNA structural data, it is now difficult to design sound and complete motif studies without such systematic tools. For instance, unforeseen examples of the so-called GNRA tetraloop motif do not adhere to the G-N-R-A consensus sequence (Huang et al. 2005), or were found in interior loops rather than tetraloops. However, these dissident examples fulfill the same role of stabilizing tertiary interactions between two adjacent stems (Lemieux and Major 2006).
The latter observation was facilitated by a systematic root-mean-square deviation (RMSD) classification of cyclic motifs in the 23S rRNA of Haloarcula marismortui (Ban et al. 2000). The cyclic motifs were shown to preserve the same base-pairing and base-stacking interactions (Lemieux and Major 2006). Consequently, classifying RNA motifs according to their base interactions or using RMSD is equivalent, and defines a discrete structure space that enables sequence–structure mapping, i.e., sequences can be linked with a number of distinct structures.
Here, taking advantage of the recent X-ray crystallographic structures and of our computational tools, we study the triloop motif in different RNAs, species, and contexts. First, we systematically search for all triloops. Second, we classify them according to the types and positions of their base interactions. Then, we build a discrete and explicit sequence-to-structure relation, which we use to link triloop sequences with base-pairing and base-stacking constraints. Such constraints are useful to narrow down the size of triloops' conformational space in the context of structure prediction and 3D modeling (Major et al. 1991; Massire and Westhof 1998; Jossinet and Westhof 2005; Major and Thibault 2007).
The triloop motif is particularly interesting due to its major role in a variety of organisms and pathways, for instance, in the promotion of some virus replication (Huang et al. 2001; Olsthoorn and Bol 2002), viral synthesis (Haasnoot et al. 2003), and iron response (McCallum and Pardi 2003). Furthermore, a previous triloop study revealed an important structural diversity (Lee et al. 2003), making this motif an ideal candidate for a base interaction characterization. This previous work identified triloops in which the only base pair was the closing one, whereas here all triloops are identified regardless of the presence of intraloop base pairs.
RESULTS
Triloop motif
We define the triloop motif as a sequence of five adjacent nucleotides, where the first and last form a base pair in the 3D structure (Fig. 1A). This definition corresponds to the classical view of a triloop: three free nucleotides closed by a base pair. Our choice to accept all possible base-pairing types for the flanking (indicated by the circled “P” in Fig. 1A) comes from recent observations showing near 50% of large rRNA base pairs are non-Watson–Crick (Lemieux and Major 2002), and also from the results of a previous study indicating that only a small fraction of lone-pair triloops are flanked by canonical Watson–Crick base pairs (Lee et al. 2003). The dotted lines in Figure 1A indicate that any intraloop interactions are allowed. This flexibility is necessary to account for potentially different structures for any triloop sequence.
FIGURE 1.
Triloop motif. (A) Interaction graph (above) and MC-Search input descriptor (below). The nucleotides are numbered from 1 to 5 and from 5′ to 3′. The lines indicate nucleotide interactions. The bold lines indicate the presence of phosphodiester linkages (backbone). The dotted lines indicate possible intraloop interactions. Any nucleotide type is accepted, represented by the letter “N” (IUPAC code) at each node. Any flanking base pairing type is tolerated, indicated by the circled letter “P” between nucleotides 1 and 5. (B) Three-dimensional structure of a typical triloop in the 23S rRNA of H. marismortui (PDB code 1JJ2). This triloop is flanked by a Watson–Crick cis base pair. The H-bonds in the base pair are shown by dotted lines. Nucleotide A2 stacks with both nucleotides involved in the flanking base pair, indicated by the arrows. Nucleotides A3 and U4 bulge out of the triloop structure. (C) Interaction graph of the triloop instance shown in B. The flanking GC Watson–Crick cis base pair is represented by the black dot. The arrows indicate base stacking as in B. Nucleotide G1 corresponds to the residue identifier 118 in chain “0” of the PDB file 1JJ2.
Triloop sequence–structure relationships
Applying the MC-Search program with our definition of the triloop and the high-resolution RNA structures available at the PDB (Berman et al. 2000) results in 922 triloops that were found in 86 different PDB files (for the complete list, see Table S-I in the online Supplemental Materials). Triloops were found in a wide variety of RNA families, including the 5S, 16S, and 23S rRNAs with and without antibiotics, transfer RNAs (tRNA), ribonuclease P RNAs, many viruses, riboswitches, group I introns, as well as several RNA aptamers bound to proteins. The indistinguishable triloops in sequence and structure (see Materials and Methods) were grouped together, yielding 104 different specimens (or groups). Figure 1B shows the 3D structure of a typical example of a 23S rRNA triloop specimen. Figure 1C shows the base interaction graph of the typical triloop specimen shown in Figure 1B. Fifty-five different triloop structures define 55 such distinct and discrete interaction graphs, in contrast to the continuous RNA 3D space. Sixty different triloop sequences fold in these 55 distinct structures. The set of triloop interaction graphs can also be seen as a partition of the triloop conformational space, which is convenient to define a straightforward sequence–structure relation, as shown in Figure 2.
FIGURE 2.
Sequence–structure relation. Only subsets of triloop sequences and structures are shown. The sequences are linked to the structures in which they are found. Three different symbols represent the base-pairing types accordingly to the Leontis and Westhof nomenclature (see Materials and Methods). The base-stacking types are shown using arrowtips, accordingly to the Major and Thibault nomenclature (see Materials and Methods).
Triloop dynamic
This explicit sequence–structure relation allows us to study the sequence–structure relation in both ways. First, we can take a fixed sequence and study the different structures it has fold into. Second, although not explored in this work, we can pick a given structure and compare the sequences that were threaded in it. In the context of our studies, the sequence–structure relation reveals the many-to-many relationships between some triloop sequences and structures, i.e., some sequences fold in different structures, and some structures accommodate multiple sequences.
Fixing the sequence
Interestingly, when we take a close look at given sequences, we observe small structural changes that occur in different crystals of the same RNAs and sites. Three types of such small structural variations were noticed: (1) stack movement along the backbone, (2) base stacking, and (3) base-pairing formation/disappearance. These three types have been observed in the 32 available X-ray crystal structures of the 23S rRNA of H. marismortui.
The first type was detected in a CUAAG triloop found at position 1186, where four different triloop structures are observed (see Fig. 3A). The base stacking between nucleotides 3 and 4 is present in all 32 X-ray structures, whereas another stack interaction moves along the phosphodiester linkages. It appears between nucleotides 4 and 5 in the PDB 1VQN (Fig. 3A, second from left), between nucleotides 1 and 2 in 1YJW and 1VQ8 (Fig. 3A, third and fourth from left). In the latter X-ray crystallographic structure, another base-stacking interaction is observed between nucleotides 4 and 5 (Fig. 3A, right), also present in 1VQN (Fig. 3A, second from left).
FIGURE 3.
Interaction dynamics. (A) Base-stacking interactions along the backbone. Four structures for the same 23S rRNA site (“0”1186) differ by the positions of base-stacking interactions. (B) Intraloop stacking interactions. Three structures for the same 23S rRNA site (“0”218) differ by intraloop base-stacking interactions. The leftmost structure has no intraloop interaction. Then, an intraloop base stacking appears between nucleotides 2 and 4. Finally, a second intraloop base-stacking interaction appears between nucleotides 1 and 3. (C) Intraloop base-pairing interactions. Two structures for the same 23S rRNA site (“0”138) differ by the appearance of an intraloop Hoogsteen cis base-pairing interaction between nucleotides 1 and 3.
The second type of triloop dynamic is intraloop base-stacking formation. For instance, three different structures are observed in a CGCGA triloop found at position 218 (Fig. 3B). No intraloop base stacking is present in 1YIT (Fig. 3B, left). However, base stacking between nucleotides 3 and 5 appears in 1FFK (Fig. 3A, middle), and then a second base-stacking interaction between nucleotides 1 and 3 is observed in 1VQ4 (Fig. 3B, right).
Finally, intraloop base-pairing formation is observed in a triloop at position 138 (Fig. 3C). The triloop in 1VQ8 and five other 23S rRNAs do not show any intraloop base pairing (Fig. 3A, left), whereas a Hoogsteen U1-G3 cis base pair that participates in the formation of a base triple, involving the flanking base pair, is observed in 1VQM (Fig. 3C, right) and another 23S rRNA. This site in the 24 other X-ray crystal structures of the 23S rRNA of H. marismortui shows pentaloops, not triloops.
This base-pairing formation inside a triloop site of the rRNA is not common, since only four triloop structures among the 55 have this feature. In addition to the example shown above, a UUAAG triloop is found at position 1966 of an X-ray crystallographic structure of the 23S rRNA of H. marismortui (data not shown). In this case, all 32 X-ray crystallographic structures maintain a cis Watson–Crick/Hoogsteen U2–A4 base pair. On the other hand, a CACAA triloop is found at position 934 of the X-ray crystallographic structure of the 16S rRNA of Thermus thermophilus. In 2J00, a cis Watson–Crick/Hoogsteen C1–A4 base pair is observed, whereas it disappears in 1FJG (data not shown). This latter case falls into our third type of triloop dynamic. Finally, the last case is a CCCGG triloop found at position 1028 in the 16S rRNA of T. thermophilus, where a cis Watson–Crick C2–G5 base pair participates in the formation of a base triple with the flanking (data not shown).
Ligand influence
The sequence–structure relation can also serve to measure the effect of ligand binding on triloop structures. We studied an interaction among two triloops and a GNRA tetraloop in the X-ray crystal structure of the 16S rRNA of T. thermophilus (see Fig. 4A). In 2J00, the rRNA is bound to paromomycin antibiotics. Noticeably, the second and third nucleotides of, respectively, the triloop capping helix 10 (H10) and helix 17 (H17) (nucleotides 202 and 461) bulge out of the triloop, toward each other's triloop. This junction is significantly different in 1FKA, where the 16S rRNA is not bound to antibiotics (Fig. 4B). In this case, both nucleotides fold back toward their respective loops. The triloop capping H17 in 1FKA is not a longer triloop, but a rather large-size loop, in which even the last Watson–Crick base pair in the stem is broken, as shown in the interaction graph of Figure 4C. Furthermore, the triloop capping H10 loses the stack between nucleotides 1 and 3. The GNRA tetraloop of helix 15 (H15) switches to a triloop.
FIGURE 4.
Antibiotics and triloop structure. Two views of a triloop found in the 16S rRNA of T. thermophilus. The triloop is represented in a red cartoon, where the third nucleotide of the triloop is shown using sticks. A nucleotide of an adjacent loop is shown in blue, also using sticks. (A) Stereoview of the triloop found in PDB 2J00 at position 459 with two neighbor loops. The third nucleotide of the triloop and a nearby one occupy the cavity created by the three RNA loops. (B) Stereoview of the triloop found in PDB 1FKA at position 454 (resolution of 3.3 Å). The third nucleotide of the triloop and the nearby one moved toward their respective loops and outside the cavity. (C) Structural graphs of the triloop observed in 2J00 (left) and in 1FKA (right). The triloop and base pair next to the flanking are disrupted by the conformational changes, possibly induced by the presence of antibiotics.
Interestingly, an antibiotic bound X-ray crystal structure of the 23S rRNA of T. thermophilus (2J01) exhibits two overlapping triloops at positions 475 and 476 (see Fig. 5). We were curious about the implication of the antibiotics in this peculiar triloop arrangement. The same overlapping triloops are found at the same site in the X-ray crystallographic structure of Escherichia coli (2AW4) (Fig. 5B, middle). However, this structure is not bound to antibiotics. We thus inspected other E. coli X-ray crystallographic structures and found one bound to kasugamyin antibiotics, 1VS6 (Fig. 5B, left). This one, surprisingly, shows only the triloop at position 476; the triloop at position 475 is lost.
FIGURE 5.
Ribosomal overlapping triloops. (A) Stereoview of the six nucleotides forming the overlapping triloops, as found in PDB file 2J01 (T. thermophilus with antibiotics). Nucleotides 4 and 5 cross one from each other to achieve the overlapping triloops. (B) Structural graphs of the overlapping triloops shown in A (right), the single triloop of PDB file 1VS6 (E. coli with antibiotics) (left), and the overlapping triloops in PDB file 2AW4 (E. coli without antibiotics) (middle).
Sequence distributions
Out of the 60 sequences found, 20 are not specific to a single structure, confirming that no trivial, or direct, triloop sequence–structure relationships exist. However, we noticed that the three nucleotides of the loop, nucleotides 2, 3, and 4, display a greater variation than the complete sequence, and most triplets are found in many different triloop structures (data not shown).
We further confirmed this by using an information theory approach (see Materials and Methods). Out of near 2 bits of mutual sequence–structure information available in our data, >1 bit (in fact, 1.13 bits/1.88 bits = >60% of the information) is provided by the flanking base-pair nucleotides (nucleotides 1 and 5). In other words, the flanking base pair is the most informative sequence element of the triloop 3D structure.
We therefore focused our interest on the flanking base pair, and indexed the structural information by it (see Table S-II in the online Supplemental Materials), yielding to a modeling table (see Table S-III in the online Supplemental Materials). The table returns, from the knowledge of the flanking base-pair sequence of a triloop, the possible types of the flanking base pair and, for each, the possible interactions among the nucleotides of the triloop. Figure 6 shows the example of all triloops flanked by a CG base pair (CNNNG). Here, 10 different alternatives are proposed. In general, small numbers of structures (less than 20) are proposed, instead of almost 20,000 theoretical types: three types of interaction for each of the nine possible intraloop nucleotide interactions = 39, or 19,683 for any given triloop when no a priori information is available.
FIGURE 6.
Section of the modeling table. The triloop specimens with a CG flanking base pair are shown using interaction graphs. For each graph, the observed triplet (nucleotides 2–4) sequences are given. Three groups are defined by the flanking base-pair types: W/H trans, W/W trans, and W/W cis.
DISCUSSION
Systematic search
The large number of triloops found in this study indicates the increasing importance of using a systematic tertiary structure approach to search for RNA motifs. As an example, here, MC-Search and MC-Annotate helped us find nine additional triloops in the 23S rRNA subunit of H. Marismortui, taking to 22 the total number. This represents an increase of about ∼40% in comparison to the previous study (Lee et al. 2003) (see Figure S-1 in the online Supplemental Materials).
Nucleotide dynamic
Fourteen triloop sites fold in more than one structure. A comparative analysis of these triloop sites revealed three types of nucleotide interaction interplay: (1) a movement of base-stacking and base-pairing interactions along the backbone path, (2) the appearance/disappearance of intraloop base-stacking and base-pairing interactions, and (3) changes in interaction types. Among these, more stability is observed in the base-pairing types than in any other nucleotide interplay. Interestingly, these observations correspond to regions in the X-ray crystallographic structures that, if the RNA was static rather than dynamic, should have folded similarly. They can indeed be the effect of a competition between different folds, of different folding pathways that depend on the environment and conditions, or of an oscillation among several possible conformations, possibly due to nucleotide dynamics, which would occur until a late folding step or when a ligand approaches.
The analysis of the counterpart of the 16S rRNA triloop at position A459 in T. thermophilus (PDB file 2J00 versus 1FKA) allowed us to observe a degenerated triloop conformation (see Fig. 4C). The triloop structure observed in 2J00 was resolved bound to the antibiotic paromomycin (Fig. 4A,C, left), whereas the degenerated one was resolved in a transcription-activated state, free of any antibiotics. In the structure bound to the antibiotic, the two triloops (in H10 and H17) expose a nucleotide inside the cavity created by the junction of the three loops. In the second structure, the two exposed nucleotides have moved toward their respective loop, leaving the cavity empty. The reason why the cavity is left empty in the activated state is unknown. Nevertheless, it shows that the two triloops were affected in a similar fashion by the experimental conditions.
The interactions found in the triloop of the X-ray crystallographic structure with the antibiotics can be obtained from the degenerated one by simple 3D modeling manipulations. First, flipping out C3 would allow for the stacking between G4 and A2. Second, bringing A5 inside the loop would allow for a pairing with G1. Finally, C0 and G6 can be slightly altered to retrieve the last Watson–Crick cis base pair of the adjacent stem. The degenerated structure observed in 1FKA can possibly be a preliminary state of the triloop, or the result of conformational changes induced by the activated state of the ribosome. Besides, the investigators of this X-ray crystallographic structure mentioned that long-range conformational changes could be transmitted along what they call a long structural pillar that includes H17 (Schluenzen et al. 2000).
Recently, RNA dynamics have also been reported in the literature in two interior loops. The X-ray crystallographic structures of the 23S rRNA of Deinococcus radiodurans and E. coli contain an interior loop in helix 40. Turner and coworkers have solved this interior loop by NMR and found a lower ground state than that of the X-ray crystals (Shankar et al. 2006). An important difference between the two states is the disappearance of a noncanonical A–A base pair in the X-ray crystal structures. Similarly, the cytoplasmic A site in the small rRNA of Homo sapiens has been solved by X-ray crystallography, showing two distinct structures (Kondo et al. 2006). The two structures are referred to as the OFF and ON states, which correspond, to a free A site and to a loaded one, respectively, where a tRNA brings a new amino acid to a nascent protein. The ON state shows two bulging out nucleotides that pair inside the helix in the OFF state. The two examples above show clearly the appearance/disappearance of intraloop base-stacking and base-pairing interactions, such as those observed in the triloops.
The comparative analysis also revealed the presence of two overlapping triloops in two different species, E. coli and T. thermophilus, which share four of their five nucleotides (Fig. 5). The effect created by the presence of antibiotics differs in each species, and could thus be species specific. The T. thermophilus structure with antibiotics (2J01) shows the presence of two triloops. The E. coli structure with antibiotics (1VS6) only shows one triloop. Finally, the E. coli structure without antibiotics (2AW4) restores the second triloop, a structure identical to that of the T. thermophilus with the antibiotics. Worth mentioning, this triloop site is also located in H17, but in E. coli.
These results show that ligand binding has an extremely important effect on RNA structure, which can be structure specific, and cannot be accounted for in any context-free RNA structure predictor.
Three-dimensional modeling
Our first approach to capture sequence–structure relationships in triloops focused on a structural classification based on the 5-nucleotide (nt) sequence. However, the 60 different sequences spanning 102 specimens were insufficient to derive valid statistics. We then focused our attention on the middle triplets (nucleotides 2–4), based on the argument that isosteric base-pair substitutions can introduce noise in the statistics, since the partners can be almost any nucleotide with little influence on the geometry of the triloop (by isostericity definition) (Lescoute et al. 2005). Unfortunately, most triplets are found across more than one class, and thus are not invariants of particular structures, as we wished. We then focused on the last alternative to find an invariant in triloop structures: the flanking base-pairing type, which was previously linked to tetraloop structures (Cheong et al. 1990; Woese et al. 1990; Antao et al. 1991; Heus and Pardi 1991).
As a matter of fact, the flanking bases bring more than half (1.13/1.88 bits) the mutual information between triloop sequence and structure. As an example, consider the following diverse sequence and structure triplets: AGU, AGA, CAU, and AAC. These triplets are found in one, three, six, and two different structures, respectively. However, when these triplets are combined with a flanking GA base pair, they all fold in a single structure, and the base-pairing type is either sugar/Hoogsteen or sugar/Watson–Crick. Furthermore, in general, the structural class of one specific triplet changes in function of its combined flanking base-pairing type. Consider, in particular, GCG, which folds in a structure flanked by a sugar/Hoogsteen trans C–A base pair, in a different one when flanked by a Watson–Crick trans U–U base pair, in another one when it is flanked by a sugar/Hoogsteen trans G–A base pair, and finally in yet another one when flanked by a sugar/Bifurcated-sugar cis U–A base pair.
The correlation between the flanking base-pairing type, sequence, and triloop structure applies to all sequences but five, which preserve the flanking base-pairing type while changing structures. However, we find that such a small number of “dynamic” sequences suggests that the nucleotide interplay is not a general phenomenon unless we do not have enough structural data to observe it further. Or perhaps, only a limited number of sequences exhibit the interaction dynamics, and all other structures can be predicted precisely from sequence. The former hypothesis is supported by NMR data. Consider the PDB file 1NBR, which contains 15 NMR models of an iron responsive element (IRE). The 15 models classify into three structural graphs, two of which show nucleotide interplay: a base-stacking movement along the backbone path and the appearance of an intraloop base-stacking interaction (data not shown). Corroborating with our modeling hypothesis, the flanking base-pairing type does not change among the 15 NMR models. We find that the IRE solution exhibits a sampling of different triloop conformations.
In terms of 3D modeling, the few possible flanking base-pairing types for each partner pair, in general, implies the consideration of multiple alternative structures for any given triloop sequence (see Table S-III in the online Supplemental Materials). Applied to the IRE triloop sequence, CAGUG, for instance, we find 11 possible subclasses with two different flanking base-pairing types, Watson–Crick/Hoogsteen and Watson–Crick. Among the triplets in the CG entry, we do not find the IRE sequence. However, one is very close, CAGAG, which would be a good first guess. In fact, it actually points to the structure that includes the right flanking base-pairing type and two of the three intraloop interactions of the NMR models. In a real modeling application, however, all alternatives need to be considered.
The importance of the flanking base pair in the sequence–structure relationships can be explained by the identity and positioning of the flanking bases, which strongly direct the backbone conformation and, by ricochet, influence the position of the other bases, which in turn, defines their interactions.
The work presented here highlights the structural diversity of the RNA triloop motif in different contexts. Despite this diversity, we were able to extract sequence–structure relationships. The flanking base-pair sequence provides useful RNA modeling information. The systematic classification approach developed in this work, based on discrete folding states defined by base interactions, is easily scalable to any RNA motif. It would provide valuable information about any RNA motif and modeling approach. Our intention is to build similar classifications for other RNA motifs.
MATERIALS AND METHODS
Triloop database
All RNA X-ray crystallographic structures of resolutions higher or equal to 3 Å, which were available in the PDB (Berman et al. 2000) in October 2006, were considered. Classical RNA triloops are composed of five contiguous nucleotides, where the first and last form a flanking base pair, and the three others form the loop (see Fig. 1). This information was input to MC-Search to find all triloop sites and to produce the triloop database. MC-Search takes an RNA motif description (such as in Fig. 1A, bottom) and a database of 3D structures, and returns the sites that match the motif in the database by applying a classical graph isomorphism algorithm.
Triloop annotation
The triloops were analyzed by the MC-Annotate computer program (Gendron et al. 2001), which labels nucleotide interactions from atomic coordinates. MC-Annotate applies a base-pairing classifier (Lemieux and Major 2002), which returns its type, for each base pair found, using the Leontis and Westhof nomenclature (Leontis and Westhof 2001) (see below). Base stacks are found using the Gabb method (Gabb et al. 1996).
Nomenclature
We consider three kinds of nucleotide interactions: base pairing, base stacking, and nucleotide linkage. The Leontis and Westhof nomenclature is used for labeling the base-pairing types (Leontis and Westhof 2001). We use three different symbols to represent the three edges of a base: the Watson–Crick (W; • cis; ○ trans), Hoogsteen (H; ■ cis; □ trans) and sugar (S; ◀ cis; ◁ trans). The cis/trans indicates the relative orientation of the backbone across the median of the plane formed by the two partners. For instance, a sheared sugar/Hoogsteen trans G–A base pair is written G◁□A, or S/H. We use a single symbol when the groups involved in H bonds in the two bases are on the same edge. For instance, we write X□Y instead of X□□Y. Bifurcated base-pairing types are indicated by Bs and Bh, where Bs points to the bifurcating group between the sugar and Watson–Crick edges, and Bh between the Watson–Crick and Hoogsteen edges (Lemieux and Major 2002).
Base-stacking types are shown using arrows. The tip of the arrows indicates the normal to the plane of the base, defined so that any base in a classical A-RNA type double helix has its normal vector oriented toward the 3′-strand endpoint (Major and Thibault 2006). In pyrimidines, we use a right-handed axis system to define a normal by the rotational vector around atoms N1 to N6. In purines, the normal is reversed to that of pyrimidines, since the atoms of their pyrimidine rings are numbered in the reversed order. Two arrows pointing in the same direction indicate A-RNA double-helix type: up (>>) or down (<<), depending which base is named first (i.e., B1>>B2 means B2 is stacked upward of B1, or B1 is stacked downward of B2). Two other types are possible, but less frequent in RNAs inward (B1> <B2; B1 or B2 is stacked inward of B2 or B1, respectively) and outward, respectively (B1< >B2; B1 or B2 is stacked outward of B2 or B1, respectively).
In the structural graphs, thick lines represent the presence of phosphodiester linkages, whereas a thin line, or the absence of a line, indicates 2 nt that are not linked covalently. The crystallographic structure numbering system is used throughout the article.
Mutual sequence–structure information
The mutual sequence–structure information has been obtained from building the first layer of an ID3 decision tree from an information table (Genesereth and Nilsson 1987), in which we assigned one of the eight structural triloop topology to each sequence (see Table S-III in the online Supplemental Materials). The mutual information is calculated by:
![]() |
where pi is the probability of class i, that is:
![]() |
We then calculate the gain in information of the flanking base-pair nucleotides by first calculating the information (as above) for this criterion, I(datav), where i takes all the values in the criterion:
Then, the gain in information of this criterion (flanking base pair) in our data is calculated as:
where |datav| is the number of occurrences of a particular flanking base pair.
SUPPLEMENTAL DATA
All Supplemental Materials are available at www.major.iric.ca.
ACKNOWLEDGMENTS
This work was supported by a grant from the Canadian Institutes of Health Research (CIHR) (MT-14604) to F.M. F.M. is a CIHR investigator and a member of the Centre Robert-Cedergren of the Université de Montréal. V.L., at the time of this work, held a CIHR scholarship to encourage higher education in bioinformatics (Université de Montréal, programme biT). The authors thank Dr. Robin Gutell for providing them with the image of the secondary structure of the 23S rRNA of H. marismortui.
Footnotes
Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.597507.
REFERENCES
- Antao, V.P., Lai, S.Y., Tinoco I., Jr A thermodynamic study of unusually stable RNA and DNA hairpins. Nucleic Acids Res. 1991;19:5901–5905. doi: 10.1093/nar/19.21.5901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ban, N., Nissen, P., Hansen, J., Moore, P.B., Steitz, T.A. The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science. 2000;289:905–920. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]
- Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E. The protein data bank. Nucleic Acids Res. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brodersen, D.E., Clemons, J.W.M., Carter, A.P., Wimberly, B.T., Ramakrishnan, V. Crystal structure of the 30 s ribosomal subunit from Thermus thermophilus: Structure of the proteins and their interactions with 16 s RNA. J. Mol. Biol. 2002;316:725. doi: 10.1006/jmbi.2001.5359. [DOI] [PubMed] [Google Scholar]
- Cheong, C., Varani, G., Tinoco I., Jr Solution structure of an unusually stable RNA hairpin, 5′-GGAC(UUCG)GUCC. Nature. 1990;346:680–682. doi: 10.1038/346680a0. [DOI] [PubMed] [Google Scholar]
- Gabb, H.A., Sanghani, S.R., Robert, C.H., Prevost, C. Finding and visualizing nucleic acid base stacking. J. Mol. Graph. 1996;14:6–11. 23–24. doi: 10.1016/0263-7855(95)00086-0. [DOI] [PubMed] [Google Scholar]
- Gendron, P., Lemieux, S., Major, F. Quantitative analysis of nucleic acid three-dimensional structures. J. Mol. Biol. 2001;308:919. doi: 10.1006/jmbi.2001.4626. [DOI] [PubMed] [Google Scholar]
- Genesereth, M.R., Nilsson, N.J. Logical foundations of artificial intelligence. Morgan Kaufmann Publishers; Los Altos, CA: 1987. [Google Scholar]
- Giedroc, D.P., Theimer, C.A., Nixon, P.L. Structure, stability, and function of RNA pseudoknots involved in stimulating ribosomal frameshifting. J. Mol. Biol. 2000;298:167–185. doi: 10.1006/jmbi.2000.3668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutell, R.R., Cannone, J.J., Konings, D., Gautheret, D. Predicting U-turns in ribosomal RNA with comparative sequence analysis. J. Mol. Biol. 2000;300:791–803. doi: 10.1006/jmbi.2000.3900. [DOI] [PubMed] [Google Scholar]
- Haasnoot, P.C.J., Bol, J.F., Olsthoorn, R.C.L. A plant virus replication system to assay the formation of RNA pseudotriloop motifs in RNA–protein interactions. Proc. Natl. Acad. Sci. 2003;100:12596–12600. doi: 10.1073/pnas.2135413100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harms, J., Schluenzen, F., Zarivach, R., Bashan, A., Gat, S., Agmon, I., Bartels, H., Franceschi, F., Yonath, A. High resolution structure of the large ribosomal subunit from a mesophilic eubacterium. Cell. 2001;107:679. doi: 10.1016/s0092-8674(01)00546-3. [DOI] [PubMed] [Google Scholar]
- Heus, H.A., Pardi, A. Structural features that give rise to the unusual stability of RNA hairpins containing GNRA loops. Science. 1991;253:191–194. doi: 10.1126/science.1712983. [DOI] [PubMed] [Google Scholar]
- Hoffmann, B., Mitchell, G.T., Gendron, P., Major, F., Andersen, A.A., Collins, R.A., Legault, P. NMR structure of the active conformation of the Varkud satellite ribozyme cleavage site. Proc. Natl. Acad. Sci. 2003;100:7003–7008. doi: 10.1073/pnas.0832440100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang, H., Alexandrov, A., Chen, X.Y., Barnes, T.W., Zhang, H., Dutta, K., Pascal, S.M. Structure of an RNA hairpin from HRV-14. Biochemistry. 2001;40:8055–8064. doi: 10.1021/bi010572b. [DOI] [PubMed] [Google Scholar]
- Huang, H.-C., Nagaswamy, U.M.A., Fox, G.E. The application of cluster analysis in the intercomparison of loop structures in RNA. RNA. 2005;11:412–423. doi: 10.1261/rna.7104605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jossinet, F., Westhof, E. Sequence to structure (S2S): Display, manipulate, and interconnect RNA data from sequence to structure. Bioinformatics. 2005;21:3320–3321. doi: 10.1093/bioinformatics/bti504. [DOI] [PubMed] [Google Scholar]
- Jucker, F.M., Pardi, A. GNRA tetraloops make a U-turn. RNA. 1995;1:219–222. [PMC free article] [PubMed] [Google Scholar]
- Jucker, F.M., Heus, H.A., Yip, P.F., Moors, E.H.M., Pardi, A. A network of heterogeneous hydrogen bonds in GNRA tetraloops. J. Mol. Biol. 1996;264:968–980. doi: 10.1006/jmbi.1996.0690. [DOI] [PubMed] [Google Scholar]
- Klein, D.J., Schmeing, T.M., Moore, P.B., Steitz, T.A. The kink-turn: A new RNA secondary structure motif. EMBO J. 2001;20:4212–4221. doi: 10.1093/emboj/20.15.4214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kondo, J., Urzhumtsev, A., Westhof, E. Two conformational states in the crystal structure of the Homo sapiens cytoplasmic ribosomal decoding A site. Nucleic Acids Res. 2006;34:676–685. doi: 10.1093/nar/gkj467. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee, J.C., Cannone, J.J., Gutell, R.R. The lone-pair triloop: A new motif in RNA structure. J. Mol. Biol. 2003;325:65. doi: 10.1016/s0022-2836(02)01106-3. [DOI] [PubMed] [Google Scholar]
- Lemieux, S., Major, F. RNA canonical and noncanonical base-pairing types: A recognition method and complete repertoire. Nucleic Acids Res. 2002;30:4250–4263. doi: 10.1093/nar/gkf540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lemieux, S., Major, F. Automated extraction and classification of RNA tertiary structure cyclic motifs. Nucleic Acids Res. 2006;34:2340–2346. doi: 10.1093/nar/gkl120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leontis, N.B., Westhof, E. Geometric nomenclature and classification of RNA base pairs. RNA. 2001;7:499–512. doi: 10.1017/s1355838201002515. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Leontis, N.B., Altman, R.B., Berman, H.M., Brenner, S.E., Brown, J.W., Engelke, D.R., Harvey, S.C., Holbrook, S.R., Jossinet, F., Lewis, S.E., et al. The RNA Ontology Consortium: An open invitation to the RNA community. RNA. 2006;12:533–541. doi: 10.1261/rna.2343206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lescoute, A., Leontis, N.B., Massire, C., Westhof, E. Recurrent structural RNA motifs, isostericity matrices, and sequence alignments. Nucleic Acids Res. 2005;33:2395–2409. doi: 10.1093/nar/gki535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Major, F., Thibault, P. RNA tertiary structure prediction. In: Lengauer T., editor. Bioinformatics: From genomes to therapies. Wiley-VCH; Weinheim, Germany: 2007. pp. 491–539. [Google Scholar]
- Major, F., Turcotte, M., Gautheret, D., Lapalme, G., Fillion, E., Cedergren, R. The combination of symbolic and numerical computation for three-dimensional modeling of RNA. Science. 1991;253:1255–1260. doi: 10.1126/science.1716375. [DOI] [PubMed] [Google Scholar]
- Massire, C., Westhof, E. MANIP: An interactive tool for modeling RNA. J. Mol. Graph. Model. 1998;16:197–205. 255–257. doi: 10.1016/s1093-3263(98)80004-1. [DOI] [PubMed] [Google Scholar]
- McCallum, S.A., Pardi, A. Refined solution structure of the iron-responsive element RNA using residual dipolar couplings. J. Mol. Biol. 2003;326:1037–1050. doi: 10.1016/s0022-2836(02)01431-6. [DOI] [PubMed] [Google Scholar]
- Nagaswamy, U., Fox, G.E. Frequent occurrence of the T-loop RNA folding motif in ribosomal RNAs. RNA. 2002;8:1112–1119. doi: 10.1017/s135583820202006x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nissen, P., Ippolito, J.A., Ban, N., Moore, P.B., Steitz, T.A. RNA tertiary interactions in the large ribosomal subunit: The A-minor motif. Proc. Natl. Acad. Sci. 2001;98:4899–4903. doi: 10.1073/pnas.081082398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olivier, C., Poirier, G., Gendron, P., Boisgontier, A., Major, F., Chartrand, P. Identification of a conserved RNA motif essential for She2p recognition and mRNA localization to the yeast bud. Mol. Cell. Biol. 2005;25:4752–4766. doi: 10.1128/MCB.25.11.4752-4766.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsthoorn, R.C.L., Bol, J.F. Role of an essential triloop hairpin and flanking structures in the 3′-untranslated region of alfalfa mosaic virus RNA in in vitro transcription. J. Virol. 2002;76:8747–8756. doi: 10.1128/JVI.76.17.8747-8756.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schluenzen, F., Tocilj, A., Zarivach, R., Harms, J., Gluehmann, M., Janell, D., Bashan, A., Bartels, H., Agmon, I., Franceschi, F., et al. Structure of functionally activated small ribosomal subunit at 3.3 Å resolution. Cell. 2000;102:615–623. doi: 10.1016/s0092-8674(00)00084-2. [DOI] [PubMed] [Google Scholar]
- Shankar, N., Kennedy, S.D., Chen, G., Krugh, T.R., Turner, D.H. The NMR structure of an internal loop from 23S ribosomal RNA differs from its structure in crystals of 50s ribosomal subunits. Biochemistry. 2006;45:11776–11789. doi: 10.1021/bi0605787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szewczak, A.A., Moore, P.B. The sarcin/ricin loop, a modular RNA. J. Mol. Biol. 1995;247:81–98. doi: 10.1006/jmbi.1994.0124. [DOI] [PubMed] [Google Scholar]
- Varani, G., Wimberly, B., Tinoco I., Jr Conformation and dynamics of an RNA internal loop. Biochemistry. 1989;28:7760–7772. doi: 10.1021/bi00445a036. [DOI] [PubMed] [Google Scholar]
- Wimberly, B., Varani, G., Tinoco I., Jr The conformation of loop E of eukaryotic 5S ribosomal RNA. Biochemistry. 1993;32:1078–1087. doi: 10.1021/bi00055a013. [DOI] [PubMed] [Google Scholar]
- Wimberly, B.T., Brodersen, D.E., Clemons, W.M., Morgan-Warren, R.J., Carter, A.P., Vonrhein, C., Hartsch, T., Ramakrishnan, V. Structure of the 30S ribosomal subunit. Nature. 2000;407:327. doi: 10.1038/35030006. [DOI] [PubMed] [Google Scholar]
- Woese, C.R., Winker, S., Gutell, R.R. Architecture of ribosomal RNA: Constraints on the sequence of “tetra-loops.”. Proc. Natl. Acad. Sci. 1990;87:8467–8471. doi: 10.1073/pnas.87.21.8467. [DOI] [PMC free article] [PubMed] [Google Scholar]








