Abstract
The diverse functional roles of RNA are determined by its underlying structure. Accurate and comprehensive knowledge of RNA structure would inform a broader understanding of RNA biology and facilitate exploiting RNA as a biotechnological tool and therapeutic target. Determining the pattern of base pairing, or secondary structure, of RNA is a first step in these endeavors. Advances in experimental, computational, and comparative analysis approaches for analyzing secondary structure have yielded accurate structures for many small RNAs, but only a few large (>500 nts) RNAs. In addition, most current methods for determining a secondary structure require considerable effort, analytical expertise, and technical ingenuity. In this review, we outline an efficient strategy for developing accurate secondary structure models for RNAs of arbitrary length. This approach melds structural information obtained using SHAPE chemistry with structure prediction using nearest-neighbor rules and the dynamic programming algorithm implemented in the RNAstructure program. Prediction accuracies reach ≥95% for RNAs on the kilobase scale. This approach facilitates both development of new models and refinement of existing RNA structure models, which we illustrate using the Gag-Pol frameshift element in an HIV-1 M-group genome. Most promisingly, integrated experimental and computational refinement brings closer the ultimate goal of efficiently and accurately establishing the secondary structure for any RNA sequence.
1. Introduction
RNA is a uniquely versatile macromolecule with diverse functions. In addition to its classically understood role as the intermediary between genome and proteome, RNA plays direct roles in fundamental cellular processes including biological catalysis, gene regulation and host defense. RNA also serves as the genome for many viruses. All of these functions depend on, or are modulated by, the ability of RNA to fold into higher order structures. Accurate models for the underlying structure are therefore critical for proposing and confirming hypotheses regarding RNA function.
Determining the complete three-dimensional (termed the tertiary) structure is the ultimate goal for many RNAs. However, only limited sets of RNAs are candidates for current high resolution crystallography and NMR approaches. A simpler problem is to determine the base pairing pattern (termed the secondary structure) of an RNA. Secondary structure determination, independent of higher order structural information, is possible because the hydrogen bonding and stacking interactions that collectively form secondary structure are usually stronger than tertiary interactions [1-4], and because RNA folding is often hierarchical [5, 6], with many secondary structural motifs forming prior to tertiary contacts. Additionally, knowledge of the secondary structure greatly restricts possible three-dimensional conformations and facilitates tertiary structure prediction [7-9]. Moreover, a subset of RNA functions may depend more directly on secondary structural motifs than on global folds.
Insight into the secondary structure can be gleaned using computer-based predictions performed using the sequence alone, or in combination with sequence alignment information or experimental data. Sequence-based folding generally includes two main elements: an energy function based on experimentally derived thermodynamic parameters, and an algorithm that explores the conformational space available to the RNA and ranks computed structures. Most energy functions use the Turner et al. [10, 11] set of nearest neighbor parameters, derived from optical melting experiments. A summary of these parameters is available at the Nearest-Neighbor Database [12]. Exploring conformational space is challenging because of the vast number of possible secondary structures, which is estimated to scale exponentially as ~1.8N, where N is the number of nucleotides in the RNA [13]. This means that a “brute force” approach that samples every possible conformation is impossible both from a computational standpoint and from the perspective of efficient RNA folding in vivo. Consequently, the intrinsic thermodynamics and kinetics of RNA folding must conspire to restrict the folding pathway to a narrow subset of these structures, only one (or perhaps a few) of which is likely to dominate the equilibrium ensemble. Especially for short RNAs, thermodynamic considerations are likely paramount and thus the structure with the lowest free energy is the biologically active one.
1.1 Dynamic Programming Algorithms for RNA Secondary Structure Prediction
Programs based on the Zuker dynamic programming algorithm [14, 15] are widely used to search for the minimum free energy structure [16-22]. These algorithms are deterministic, meaning that given a defined set of energy rules, they always find the lowest free energy structure. The Zuker algorithm scales as O(N3) in time, where N is the number of nucleotides in the sequence. This means that doubling the sequence length requires eight times as much time to predict the structure. Nevertheless, on modern computers, the time to make a prediction is reasonably fast. The guarantee that the optimal structure can be computed and the relative computational efficiency are made possible, first, by incorporating simplifying assumptions into the energy function, and second, by limiting the types of allowed RNA folds.
The total energy is assumed to be a simple sum over all energetic components that characterize local structural elements. Two features primarily contribute to the total energy: negative (favorable) free energies arising from stabilizing base stacking and hydrogen bonding interactions in and adjacent to helices, and positive (unfavorable) free energies arising from the entropic cost of restricting conformational freedom in loops. Helix energy terms are sequence-dependent, reflect the energetic bonus of adding a base pair to a helix, and implicitly include both canonical hydrogen bonding and base stacking. These terms depend solely on interactions involving adjacent base pairs or interactions at the ends of helices. This local interaction model is termed the nearest-neighbor approximation [23].
The dynamic programming algorithm calculates the energy of the lowest free energy structure (but does not compute the complete structure itself) for all possible subsequences of an RNA. This approach is efficient because the solution for each subsequence is computed from solutions for pre-computed smaller subsequences, allowing the energies for each structural element to be computed only once. The results are stored in triangular N × N arrays whose elements i,j represent the optimal folding energy for an RNA subsequence from nucleotide i to nucleotide j. The structure for the entire RNA sequence is obtained by tracing a structure through an optimal combination of component subsequences in the array [24].
Thermodynamics-based dynamic programming algorithms have several limitations. First, computing the minimum free energy structure in a relatively efficient O(N3) manner excludes consideration of non-nested topologies. These include the biologically important case of pseudoknots, in which a loop in one helix forms the stem of another helix. Second, the assumption that the minimum free energy structure is the biologically active one may not always hold for larger RNAs, where folding kinetics may play a prominent role. Third, the biologically relevant ensemble may be dominated by several interconverting states, making a single structural model inadequate. Finally, incomplete thermodynamic rules and the simplifications inherent in the nearest neighbor model introduce uncertainty to the energy calculations.
The net effect of these limitations is that the current best-performing algorithms achieve prediction accuracies of 50-70% [11, 25-29]. Accuracies tend to be especially poor for larger RNAs. For example, for Escherichia coli 16S rRNA, which is probably the most thoroughly studied large RNA, the prediction accuracy based on sequence alone is less than 50% [26, 30].
1.2 Comparative Sequence Analysis
One way of overcoming these limitations is to use information from RNA sequence alignments [31-33]. Termed comparative sequence or covariation analysis, this approach is grounded in the principle that homologous RNAs have secondary structures that are much more conserved than their primary sequences. An alignment of homologous RNAs is used to propose base pairing interactions based on patterns of sequence variation, assuming a common consensus secondary structure. Candidate base pairs are favored or disfavored depending on whether sequence variations tend to maintain canonical base pairing or tend to occur independently, respectively.
A model with good covariation support commands strong confidence in its accuracy and such models are often the gold standard in the absence of crystallographic models. However, comparative sequence analysis cannot be applied to many RNAs of interest because the method requires multiple divergent sequences with a common secondary structure. The sequences must be similar enough to admit a multiple sequence alignment, yet divergent enough to show sufficient variation. Sequences corresponding to open reading frames are especially recalcitrant to analysis because selective pressure at the protein coding level further restricts the degree of variation. Finally, constructing a model from a sequence alignment is an iterative process that requires considerable user effort and skill.
1.3 Incorporating Experimental Data
In cases where comparative analysis is of limited use, significant improvements to RNA secondary structure prediction can be achieved when computer predictions are constrained by experimental data derived from structure-sensitive enzymatic cleavage and chemical probing reagents [11, 34, 35]. However, the net improvement gained from using traditional reagents is often modest. First, traditional reagents tend to react with only a subset of nucleotides, so the absence of reactivity cannot usually be taken as evidence for likely base pairing. Second, different reagents are required to react with all four RNA nucleotides and some of the more useful reagents, like dimethyl sulfate (DMS), react at different base functional groups depending on the nucleotide. Third, the dynamic range for many reagents is low, making it difficult to distinguish levels of reactivity beyond a qualitative “low,” “medium,” and “high” scale. Finally, while alternative chemistries such as in-line probing [36] and hydroxyl radical footprinting [37] provide valuable insight into higher order structures and react broadly with all four RNA nucleotides, they less directly report the intrinsic nucleotide flexibilities that largely characterize secondary structure. Thus, it is challenging to create quantitative relationships between reagent reactivity and RNA secondary structure.
1.4 Towards Accurate SHAPE-Directed Secondary Structure Prediction
Selective 2′-Hydroxyl Acylation analyzed by Primer Extension (SHAPE) [38, 39] chemical probing technology largely addresses these challenges. SHAPE yields quantitative reactivity information for nearly every nucleotide in an RNA. Advantageously, SHAPE is not limited by RNA size and is remarkably insensitive to solvent accessibility [38, 40, 41]. Additionally, SHAPE can be applied to both in vitro transcripts and also to RNAs from native-like cellular and viral environments. Combining SHAPE information with a thermodynamics-based dynamic programming algorithm, as implemented in RNAstructure [11], results in highly accurate secondary structure models [30]. This approach has been benchmarked and shown to yield secondary structures for diverse RNAs, including the E. coli 16S rRNA (1542 nucleotides), with >95% accuracy as judged by sensitivity (percentage of known base pairs predicted correctly) and positive predictive value (PPV, percentage of predicted base pairs in the known structure) [30] (Table 1). SHAPE has been used to propose experimentally-informed secondary structural models for many RNAs and RNA states whose structures are unlikely to be determinable by covariation or high resolution experimental approaches [30, 39, 41-57]. In this work, we will briefly review the SHAPE experimental protocol and data processing steps. We will then describe in detail how SHAPE experimental information is incorporated into a nearest neighbor dynamic programming algorithm to create accurate secondary structure models. We close with an analysis of a novel SHAPE-supported model for the HIV-1 frameshift element.
Table 1.
RNA | Size (nts) | No constraints |
with SHAPE |
||
---|---|---|---|---|---|
Sensitivity | PPV | Sensitivity | PPV | ||
Yeast tRNAAsp | 75 | 95 | 95 | 100 | 100 |
HCV IRES domain II | 95 | 57 | 59 | 96 | 100 |
B. subtilis RNase P, specificity domain |
154 | 53 | 51 | 75 | 83 |
T. thermophila group I intron, P546 domain |
155 | 43 | 44 | 96 | 98 |
E. coli 16S rRNA | 1542 | 50 | 46 | 97 | 95 |
Sensitivity and PPV are the percentage of known base pairs predicted correctly and the percentage of predicted base pairs in the known structure, respectively. Calculations were performed using RNAstructure [11]. Accuracies for SHAPE-constrained structures are typically ≥95%. However, accuracy for the RNase Pspecificity domain is significantly lower, likely because many base pairs in this RNA only form in concert with the tertiary structure [62]. Data are from refs. [30, 62].
2.1 Overview of SHAPE Technology
SHAPE technology involves covalently modifying RNA in a structure-dependent manner (selective 2′-hydroxyl acylation), followed by detecting the sites of modification by primer extension (original protocols described in [58, 59]). The RNA modification involves the nucleophilic attack of the 2′-hydroxyl group of the RNA ribose moiety on an electrophilic SHAPE reagent to form a 2′-O-adduct (Fig. 1A) [38]. This reaction occurs more readily with conformationally unconstrained or flexible nucleotides such as those in single stranded regions, loops, or bulges (spheres, Fig. 1B). Flexible nucleotides react preferentially because they more readily sample conformations conducive to nucleophilic attack. In contrast, nucleotides in highly structured regions are conformationally constrained and less frequently achieve an optimal geometry, making them less reactive towards SHAPE reagents. In general, solvent inaccessible, but unconstrained, nucleotides are still reactive by SHAPE.
Following modification of the RNA, modified positions are detected by primer extension using end-labeled, target-specific primers and a thermostable reverse transcriptase (Fig. 1C). Since the reverse transcriptase enzyme cannot proceed past 2′-O-modified sites in RNA, the lengths of the resulting cDNA products correspond to the distance between the primer binding and 2′-O-adduct sites. Due to differential modification of structured versus unstructured nucleotides, the frequency of producing a given cDNA product reflects the underlying RNA structure. Comparison with dideoxy nucleotide sequencing ladders allows each SHAPE reagent-dependent peak to be matched with the corresponding nucleotide position (Fig. 1D).
SHAPE technology can be implemented in an efficient and high-throughput way by automated capillary electrophoresis using DNA sequencing instruments (Fig. 1E). The capillary electrophoresis data are analyzed using the software program ShapeFinder [60]. ShapeFinder processes these data to yield normalized SHAPE reactivity values (Fig. 1F). These reactivities can be converted to ΔGSHAPE pseudo-free energy terms and used with the energy function in the RNAstructure program to yield, generally highly accurate, secondary structure models for RNA (Table 1, and see section 3.1 below) [11, 30].
2.2 SHAPE Experimental Protocol
The experimental component of a SHAPE analysis has been recently reviewed in detail [59, 61]. Briefly, RNA is modified in a structure-selective way using an electrophilic SHAPE reagent. While SHAPE has been most commonly performed on in vitro RNA transcripts or RNAs extracted from biological environments, SHAPE reagents readily cross biological membranes and, for example, react with RNAs inside authentic HIV-1 particles [39].
Approximately 2 pmol of RNA is needed in each primer extension reaction to obtain adequate signal intensity in the capillary electrophoresis detection step, using commercially available instruments. We routinely achieve read lengths of 300 – 650 nucleotides in each primer extension reaction [50, 60]. For longer RNAs, information obtained from multiple primers, with overlapping read windows, can be combined to create datasets spanning arbitrarily long lengths [30, 39, 41].
To maintain a native-like conformation, the RNA must be renatured (in vitro transcripts) or maintained (RNAs from cellular or viral sources) in a physiological-like folding buffer. We typically use a simple standard solution [50 mM HEPES (pH 8.0), 200 mM potassium acetate (pH 8.0), 3 mM MgCl2], and incubate at 37 °C for 10 – 30 min prior to modification. SHAPE works well under a wide variety of conditions, including in the presence of biological amines and carbohydrates and proteins that bind RNA. The main requirement for SHAPE is that the pH be maintained in the 7.6 – 8.3 range [38].
RNA structure is interrogated by adding a SHAPE reagent. Initial work in our laboratory used the commercially available NMIA reagent [58]; more recent work has utilized the faster-reacting 1M7 reagent, whose synthesis is described in [62]. The SHAPE reagent is dissolved in DMSO and added to the RNA solution to a final concentration of about 5 mM. The optimal reagent concentration varies and can be system-specific: too high a concentration of SHAPE reagent results in significant signal decay and reduced read lengths, while too low a concentration yields data with a poor signal. Background signals in the primer extension reaction are measured by performing a no-reagent control in which DMSO is added in place of the SHAPE reagent, in an otherwise identical reaction. Both reactions should be incubated at 37 °C for either 35 min if using NMIA or 70 sec if using 1M7. Both reagents self-quench by reacting with water in the aqueous solution.
Following an ethanol precipitation step, fluorescently-labeled primers are annealed to the (+) and (−) reagent-treated RNA and to untreated RNAs (the latter are used for sequencing). A thermostable reverse transcriptase enzyme is used for the primer extension reactions to convert the structural information into cDNA libraries. We perform the separation step in a single capillary by employing 3-4 different dyes for the (+) reagent, (−) reagent, and dideoxy sequencing ladder(s) [39, 59]. The dyes are chosen to have similar electrophoretic mobilities, which simplifies the alignment of the electropherograms during the data processing steps. The cDNA products are recovered by ethanol precipitation, resuspended in formamide, and resolved on a commercial capillary electrophoresis DNA sequencing instrument.
2.3 Data Analysis to Create Normalized SHAPE Reactivities
The ShapeFinder software has been described in detail [60] and is freely available for download, with tutorials [63]. Here we briefly outline the steps required to convert capillary electrophoresis electropherograms into quantitative reactivity measurements (Fig. 2).
The raw sequencer data file is opened in ShapeFinder and saved as a .shape folder. The first step is to correct baselines using the Fitted Baseline Adjust tool. Second, fluorescence intensity decays exponentially with increasing cDNA length due to incomplete processivity of the reverse transcriptase enzyme during primer extension [44, 60]. This is corrected using the Signal Decay Correction tool. Third, the Mobility Shift tool is used to align (+) reagent, (−) reagent, and dideoxy sequencing traces, since the different fluorescent dyes introduce small offsets in the raw electropherogram in their respective labeled cDNA fragments. Mobility shifts are performed manually using the sliding traces function in ShapeFinder. Fourth, the (+) and (−) reagent traces are scaled to each other to account for differences in signal intensity between the dyes. In general, the lowest (+) reagent peaks, corresponding to low or no SHAPE reactivity, should be scaled to overlap with their corresponding (−) reagent peaks. Finally, the Align and Integrate tool is used to align all peaks with the known primary sequence (supplied as a .seq text file), to make minor adjustments in peak alignments, and to integrate all peaks in the (+) and (−) reagent traces. When the calculation is complete, a text file called the peaks file is generated (Fig. 2). This file contains information about each nucleotide, including integrated (+) and (−) reagent peak areas (labeled RX and BG, respectively) and their subtracted, normalized SHAPE reactivities.
3.1 SHAPE-Constrained RNAstructure Folding: Theory
A major challenging endeavor in RNA biology is to consistently and efficiently develop correct secondary structure models for RNAs of arbitrary length and complexity. The thermodynamics-based computational methods outlined above (Section 1.1) are highly useful for rapid computation of candidate structural models. However, prediction accuracies are inconsistent for many RNAs and tend to be particularly poor for large RNAs. These limitations can be broadly attributed to simplifications inherent in the nearest-neighbor model and incomplete knowledge of RNA energetics. However, for many RNAs, it is possible to obtain robust secondary structure predictions by incorporating SHAPE reactivities into the energy function used in a nearest neighbor dynamic programming algorithm. This approach has been implemented in the RNAstructure program [64].
The RNAstructure energy function is modified by adding pseudo-free energy change terms derived from SHAPE reactivities. This approach is grounded in the observation that SHAPE reactivities correlate strongly with local nucleotide flexibility [38, 40] and, thus, also with the probability that a nucleotide is single stranded. The NMIA and 1M7 SHAPE reagents react with all four RNA nucleotides with limited base-dependent preferences [65]. It is therefore possible to create a softer, continuous, and more physically grounded restraint function than is typically used with conventional chemical mapping reagents that exhibit strong idiosyncratic and nucleotide-specific reactivities. In essence, these additional energetic terms provide a knowledge-based correction to the nearest-neighbor energy function.
We derive a pseudo-free energy change term for each base-paired residue i from its SHAPE reactivity:
(1) |
The empirical parameters m and b serve to scale the strength of the experimental contribution to the energy function. The intercept b represents the pseudo-free energy contribution of a base-paired nucleotide whose SHAPE reactivity is zero. The sign of b is negative to reflect an energetic bonus for base pairing by constrained nucleotides. In contrast, the slope m represents the strength of the energetic penalty assigned for pairing nucleotides with high SHAPE reactivities and consequently has a positive sign.
Optimal values for m and b were determined by assessing the prediction accuracy for E. coli 23S rRNA over a range of slope and intercept values [30]. This work identified m = 2.6 kcal/mol and b = −0.8 kcal/mol as optimal values for folding large ribosomal RNAs and, importantly, also established these values as being located at the center of a “sweet spot” of a broad set of m and b values that yields accurate SHAPE-directed structure predictions [30] (emphasized in red, Fig. 3). Given the large size (2,904 nts) of the E. coli 23S rRNA and the diversity of structural motifs it contains, these parameter values are also likely to work well for other RNAs. We empirically find this to be the case, although slightly different parameter values, still in the sweet spot (Fig. 3), can be chosen heuristically to refine predictions for some RNAs [41]. The logarithmic relationship between SHAPE reactivities and the derived ΔGSHAPE term has the effect of forgiving differences among the most highly reactive nucleotides. The usefulness of this behavior reflects the observation that highly reactive nucleotides are the most sensitive to signal processing artifacts and have the highest variance. Furthermore, the logarithmic relationship between SHAPE reactivity and pseudo-free energy change loosely reflects a statistical mechanical interpretation of SHAPE reactivity, which indirectly measures the number of conformational states accessible to each nucleotide.
We illustrate the combined nearest-neighbor and SHAPE energy function, as implemented in RNAstructure, for a short fragment of an HIV-1 RNA sequence (Fig. 4). Nucleotides are color-coded by their SHAPE reactivities as reported in [41]. The energy function [12] includes favorable nearest-neighbor energy terms for helix stacking (in green, Fig. 4) and entropic penalties for anchoring loops (in red, Fig. 4). Stacking terms are added for all helical interactions, including terminal mismatches and dangling ends at helix termini, as well as for coaxial stacking between adjacent helices [25, 66]. Stacking terms depend on the sequence identity of all nucleotides participating in the stack (the nearest-neighbors), while loop entropy terms depend primarily on the number of nucleotides in the loop.
In contrast to the nearest-neighbor thermodynamics-based energy parameters, pseudo-free energy terms (ΔGSHAPE) are calculated for each nucleotide individually (Fig. 4, black and gray numbers). Nucleotides with high SHAPE reactivities have positive pseudo-free energies and those with low SHAPE reactivities have negative pseudo-free energies (Eqn. 1). ΔGSHAPE terms are only added to the free energy calculation for base paired nucleotides (Fig. 4, black numbers). ΔGSHAPE terms for nucleotides at the ends of helices are counted once and those in the interior of helices are counted twice since they contribute to two stacks (Fig. 4, blue ×1 and ×2 symbols, respectively). Base paired nucleotides with high SHAPE reactivities contribute large positive pseudo-free energies (for example, see the red G in Fig. 4). Such nucleotides are more likely to be allowed at the end, as opposed to the interior, of a helix because they are added to the total free energy only once. This is consistent with the observation that nucleotides at the ends of helices are more dynamic, and experience greater fraying, than interior nucleotides. On the other hand, unpaired nucleotides with low SHAPE reactivities represent an incomplete model and could suggest non-canonical interactions that are not currently predicted by the algorithm (for example, see the tandem black G residues in the apical loop of Fig. 4). The total folding energy (ΔGtotal) is simply the sum of all nearest neighbor thermodynamic terms (ΔGNN) and pseudo-free energy (ΔGSHAPE) contributions (Fig. 4). This sum is used to rank RNA structures and should not be interpreted as a physical energy because it includes both thermodynamic terms and SHAPE-derived pseudo-free energy change terms.
3.2 SHAPE-Constrained RNAstructure Folding: Procedure
The final output of ShapeFinder peak integration is a tab-delimited text file termed the peaks file (Fig. 2, top). Columns in the file include integrated peak areas for the (+) and (−) reagent traces, their subtracted areas, and absolute SHAPE reactivities.
3.2.1 Normalization
SHAPE reactivities are normalized to a uniform scale that is valid for diverse RNAs. Some RNAs are highly structured, with relatively few unconstrained nucleotides, while other RNAs contain large flexible loop regions. In developing a normalization procedure, we make the fundamental assumption that all RNAs will have at least a few unreactive and also a few highly reactive positions, corresponding to strongly constrained and highly dynamic nucleotides, respectively. Experience in our laboratory has found that secondary structure calculations are tolerant of variation in the absolute normalization scale, and instead depend primarily on the relative differences in SHAPE reactivities.
A normalized reactivity of 1.0 is defined as the average intensity of the top 10% most reactive peaks, excluding a few highly reactive nucleotides taken to be outliers. We use two distinct approaches to identify outlier peaks, the choice of which varies depending on the system under study. In the simple normalization scheme, the most reactive 2% of all intensities are removed from the pool. The intensities of the next 8% most reactive peaks are averaged and all reactivities are divided by this average value. This heuristic rule is based on general experience in our laboratory.
In the box-plot normalization scheme, peaks greater than 1.5 times the interquartile range (numerical distance between the 25th and 75th percentiles) above the 75th percentile are removed. This definition of outliers is consistent with common practice in model-free statistics [67]. After excluding these outliers, the next 10% of intensities are averaged and all reactivities, including outliers, are divided by this value. Generally, we suggest using the box-plot method if the sequence is long enough for meaningful statistics to be calculated (typically > 300 reactivity measurements). Advanced users may opt to calculate their own normalized SHAPE reactivities if a particular experiment has a large number of very reactive peaks. The net result of normalization is to place all reactivities on a scale spanning 0 to ~1.5, where 0 indicates no reactivity (and a highly constrained nucleotide) and reactivities >0.7 typically indicate highly flexible nucleotides.
3.2.2 Maximum Pairing Distance in Large RNAs
For large RNAs, we typically disallow base-pairing between nucleotides greater than 600 positions distant from each other in the primary sequence. More than 99% of all known ribosomal RNA pairings span less than 600 nucleotides and applying this restriction increases prediction accuracy for the 16S and 23S rRNAs [30]. Applying this constraint is also attractive from the perspective of RNA folding kinetics, since RNA folding likely occurs co-transcriptionally, and nucleotides located very far from each other are unlikely to have the opportunity to base pair. This constraint thus represents a very approximate approach for accounting for RNA folding kinetics, which are otherwise ignored in a conventional thermodynamic nearest-neighbor or our SHAPE pseudo-free energy folding algorithm.
3.2.3 File Preparation
SHAPE-constrained RNA secondary structure calculations using the RNAstructure program require two input text files: (1) a sequence file with a .seq extension that contains the primary sequence, and (2) a SHAPE reactivity file with a .shape extension (Fig. 2). The sequence file format has at least one comment line, each preceded by a semicolon, followed by a one-line title, followed by the RNA sequence. The numeral 1 signals the end of the sequence. The sequence should be entered in uppercase; lowercase letters may be included and indicate nucleotides that the user specifically wishes to prohibit from base pairing (an alternative method using the .shape file is also described below). Any T’s present in the sequence are interpreted as U’s.
The user creates a .shape file as a text file containing two columns: the numerical nucleotide position and the SHAPE reactivity for that position. It is important to differentiate positions where the measured reactivity is zero from positions where no data was obtained or SHAPE reactivities could not be determined. The measurement of zero is a critical one and indicates that a position is highly structured. If the reagent and background traces were properly scaled in the ShapeFinder analysis step, there should be no, or very few, negative SHAPE reactivity values. Negative peaks are treated as having a SHAPE reactivity of zero (provided they are > −500).
SHAPE reactivities for a few nucleotides typically need to be excluded from the folding calculation. These no-data positions include nucleotides with high background in the no-reagent control and difficult-to-resolve peaks either near the 3′ primer annealing site or at the 5′ end of the trace. For such positions, one of two methods is used to signal the RNAstructure program to use only thermodynamic parameters when calculating energies involving these nucleotides. The row containing the nucleotide number and its reactivity can be deleted from the .shape file or the SHAPE reactivity can be replaced with a value ≤ −500. We typically use the latter approach and set uncertain nucleotides to -999. For a carefully performed experiment, only a small number of positions typically need to be excluded. For example, out of >9000 nucleotides in the NL4-3 HIV-1 genome, only 53 nucleotides needed to be excluded from the ΔGSHAPE pseudo-free energy calculation. Finally, known single stranded regions or those that interact with another RNA or protein can be prohibited from forming base pairs by assigning these nucleotides a high SHAPE reactivity value (by convention, we set these to 100). This was important, for example, in folding calculations for an HIV-1 genome at positions that form intermolecular base pairs with the tRNA primer [41].
3.2.4 RNAstructure
After preparing the sequence and SHAPE text files, the user is ready to initiate folding in RNAstructure via RNA/Fold RNA Single Strand, then selecting the input sequence and output connectivity files (.ct file, Fig. 2). The sequence can also be input by hand via File/New Sequence. For large RNAs, we usually restrict base pair distances to less than 600 nucleotides via Force/Maximum Pairing Distance. SHAPE data are then read via Force/Read SHAPE Reactivity – Pseudo-Energy Constraint at which point the slope (m) and intercept (b) (Eqn. 1) are chosen. Optimal values of m = 2.6 kcal/mol and b = − 0.8 kcal/mol were obtained by optimizing structural predictions for 23S rRNA, but there is a range of values that yield high prediction accuracies (Fig. 3) [30]. We empirically find that different weights within this range may be optimal for other RNAs. For example, in our current HIV-1 work, we use values of m = 3.0 kcal/mol and b = −0.6 kcal/mol [41]. The user accepts the SHAPE file and initiates the folding calculation by selecting START.
3.2.5 Model Visualization
The completed calculation generates a .ct file and the user is prompted with the option of drawing the resulting secondary structures. Viewing perspectives are manipulated under the Draw tab in RNAstructure. The structure can be colored by SHAPE reactivity via Draw/Add SHAPE annotation and choosing the appropriate .shape file. Nucleotides are drawn using the following convention [30]: SHAPE reactivities < 0.3 are black; those ≥ 0.7, red; those in between, orange; and those without SHAPE data, gray (Fig. 4).
An RNA secondary structure model is consistent with the input SHAPE reactivities if double stranded regions are generally black and single stranded nucleotides red or orange. While the RNAstructure viewer is useful for analyzing predicted structures, for presentation quality images and large RNAs, we recommend exporting the structures as helix text files (Draw/Export Structure to Text File) that can be read by viewing software such as XRNA [68]. The .ct file displays the total folding energy corresponding to the sum of both thermodynamic and SHAPE-derived pseudo-free energy change contributions (see Fig. 4). The folding energy that corresponds solely to the sum of thermodynamic terms can be obtained by running RNA/Efn2 RNA on the .ct file.
4. Example: A SHAPE-Supported Model for the HIV-1 Gag-Pol Frameshift Element
We conclude this review with an example from HIV-1 biology describing how SHAPE-constrained RNAstructure calculations can be used to propose new structural models for RNA domains. The human immunodeficiency virus maximizes coding efficiency through the use of overlapping reading frames in its RNA genome. The gene coding for Pol, the polyprotein precursor for viral enzymes, does not have its own start codon but, instead, is encoded in an open reading frame that is offset by -1 nucleotide relative to the upstream Gag reading frame. In order to translate Pol, the ribosome initially translates Gag before pausing, backing up 1 nucleotide, and proceeding to translate the pol reading frame [69]. This process is called frameshifting and occurs at a conserved heptanucleotide UUUUUUA “slippery” sequence with a frequency of approximately 5-10% [70, 71]. The precise level of frameshifting is crucial for viral replication and the ratio of Gag to Gag-Pol polyprotein products appears to be tightly regulated [72]. The HIV-1 frameshift element is thus an intriguing target for antiretroviral drug development [71, 73].
The Gag-Pol frameshift element has traditionally been drawn as consisting of a single stranded slippery sequence followed by a downstream stimulatory element consisting of a 12 base pair hairpin structure (Fig. 5A). This stem-loop RNA structural element functions to enhance ribosomal pausing and to increase the frequency of frameshifting [74]. However, comparisons with ribosomal frameshift structures from other retroviruses and experimental evidence that this classical stem-loop is necessary, but not sufficient, for frameshifting [75, 76] have motivated alternative proposals for this element. These alternative structures include pseudoknots [77-79] and a two-stem model (Fig. 5B) [80]. The two-stem model was confirmed by NMR studies performed on 41 and 45 nucleotide transcripts containing precisely this region [81, 82].
However, SHAPE probing of the full-length HIV-1 RNA genome, as extracted from authentic viruses, suggests yet another, more complex, structure (Fig. 5C) [41]. Most strikingly, nucleotides in the slippery sequence (blue boxes, Fig. 5) have mostly low SHAPE reactivities. These experimental measurements indicate that the slippery sequence is base paired (or otherwise constrained) rather than being single stranded in the intact genome as isolated from viruses. Furthermore, when SHAPE reactivities are used to direct RNAstructure folding calculations of the entire intact genome, analysis of the frameshift region in its global context suggests that this functional element is one part of a much larger, 140-nucleotide long, structural unit (Fig. 5C). Further work will clearly be needed to discriminate among these models and to determine whether the frameshift element might adopt multiple conformations during HIV-1 replication. However, this example illustrates the ability of SHAPE-constrained folding to identify elements of current models that may be incomplete and to facilitate development of new RNA structure models in the context of their global, native-like, sequence and structural environments.
5. Conclusions
The SHAPE-constrained RNA folding approach outlined here provides a straightforward way of proposing, validating, and refining accurate secondary structure models for nearly any RNA. Current limitations in the SHAPE approach remain active research focuses, including the requirement for pmol-scale amounts of RNA, which can be difficult to obtain in some cases, and the inability to directly predict pseudoknots and other tertiary interactions. SHAPE-constrained RNA folding is particularly valuable for the large universe of functionally important RNAs for which there is little evolutionary data and for which high-resolution structure determination is unrealizable. In addition, the ability to probe RNA structures in cellular and viral environments or in native-like extracted forms can provide biological insights that are not obtainable using simplified in vitro models. Continued development of SHAPE reagents and of algorithms for using experimental information to constrain RNA structure prediction will expand the classes of RNA motifs and structure-function relationships that can be understood at a molecular level.
Acknowledgements
This work was supported by National Institutes of Health grant AI068462 (to K.M.W.), National Research Service Award F30DA027364 (to J.T.L.), and Medical Scientist Training Program T32GM008719. Work in our laboratory on experimentally-directed RNA secondary structure prediction benefits from a close and lively collaboration with David Mathews (University of Rochester). We thank David Mauger and David Mathews for critically reviewing this manuscript and members of the Weeks laboratory for helpful comments.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Crothers DM, Cole PE, Hilbers CW, Shulman RG. The molecular mechanism of thermal unfolding of Escherichia coli formylmethionine transfer RNA. J Mol Biol. 1974;87:63–88. doi: 10.1016/0022-2836(74)90560-9. [DOI] [PubMed] [Google Scholar]
- 2.Banerjee AR, Jaeger JA, Turner DH. Thermal unfolding of a group I ribozyme: the low-temperature transition is primarily disruption of tertiary structure. Biochemistry. 1993;32:153–63. doi: 10.1021/bi00052a021. [DOI] [PubMed] [Google Scholar]
- 3.Mathews DH, Banerjee AR, Luan DD, Eickbush TH, Turner DH. Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element. RNA. 1997;3:1–16. [PMC free article] [PubMed] [Google Scholar]
- 4.Onoa B, Dumont S, Liphardt J, Smith SB, Tinoco I, Jr., Bustamante C. Identifying kinetic barriers to mechanical unfolding of the T. thermophila ribozyme. Science. 2003;299:1892–5. doi: 10.1126/science.1081338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tinoco I, Jr., Bustamante C. How RNA folds. J Mol Biol. 1999;293:271–81. doi: 10.1006/jmbi.1999.3001. [DOI] [PubMed] [Google Scholar]
- 6.Greenleaf WJ, Frieda KL, Foster DA, Woodside MT, Block SM. Direct observation of hierarchical folding in single riboswitch aptamers. Science. 2008;319:630–3. doi: 10.1126/science.1151298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Holbrook SR. Structural principles from large RNAs. Annu Rev Biophys. 2008;37:445–64. doi: 10.1146/annurev.biophys.36.040306.132755. [DOI] [PubMed] [Google Scholar]
- 8.Bailor MH, Sun X, Al-Hashimi HM. Topology links RNA secondary structure with global conformation, dynamics, and adaptation. Science. 2010;327:202–6. doi: 10.1126/science.1181085. [DOI] [PubMed] [Google Scholar]
- 9.Hajdin CE, Ding F, Dokholyan NV, Weeks KM. RNA. doi: 10.1261/rna.1837410. (under revision) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Xia T, SantaLucia J, Jr., Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry. 1998;37:14719–35. doi: 10.1021/bi9809425. [DOI] [PubMed] [Google Scholar]
- 11.Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A. 2004;101:7287–92. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Turner DH, Mathews DH. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010;38:D280–2. doi: 10.1093/nar/gkp892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zuker M, Sankoff D. RNA secondary structures and their prediction. Bull. Math. Biol. 1984;46:591–621. [Google Scholar]
- 14.Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9:133–48. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zuker M. On finding all suboptimal foldings of an RNA molecule. Science. 1989;244:48–52. doi: 10.1126/science.2468181. [DOI] [PubMed] [Google Scholar]
- 16.Durbin R, Eddy SR, Krogh A, Mitchison G. Biological sequence analysis : probabilistic models of proteins and nucleic acids. Cambridge University Press; Cambridge: 1998. [Google Scholar]
- 17.Mathews DH, Zuker M. In: Predictive Methods Using RNA Sequences. Baxevanis AD, Ouellette BFF, editors. Wiley; Hoboken, N.J.: 2005. pp. 143–70. [Google Scholar]
- 18.Mathews DH, Schroeder SJ, Turner DH, Zuker M. In: Predicting RNA Secondary Structure. Gesteland RF, Cech T, Atkins JF, editors. Cold Spring Harbor Laboratory Press; Cold Spring Harbor, N.Y.: 2006. pp. 631–57. [Google Scholar]
- 19.Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16:270–8. doi: 10.1016/j.sbi.2006.05.010. [DOI] [PubMed] [Google Scholar]
- 20.Reeder J, Hochsmann M, Rehmsmeier M, Voss B, Giegerich R. Beyond Mfold: recent advances in RNA bioinformatics. J Biotechnol. 2006;124:41–55. doi: 10.1016/j.jbiotec.2006.01.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. Bridging the gap in RNA structure prediction. Curr Opin Struct Biol. 2007;17:157–65. doi: 10.1016/j.sbi.2007.03.001. [DOI] [PubMed] [Google Scholar]
- 22.Schroeder SJ. Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships. J Virol. 2009;83:6326–34. doi: 10.1128/JVI.00251-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Turner DH. Thermodynamics of base pairing. Curr Opin Struct Biol. 1996;6:299–304. doi: 10.1016/s0959-440x(96)80047-9. [DOI] [PubMed] [Google Scholar]
- 24.Eddy SR. How do RNA folding algorithms work? Nat Biotechnol. 2004;22:1457–8. doi: 10.1038/nbt1104-1457. [DOI] [PubMed] [Google Scholar]
- 25.Mathews DH, Sabina J, Zuker M, Turner DH. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999;288:911–40. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]
- 26.Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR. Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics. 2004;5:105. doi: 10.1186/1471-2105-5-105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dowell RD, Eddy SR. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics. 2004;5:71. doi: 10.1186/1471-2105-5-71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dima RI, Hyeon C, Thirumalai D. Extracting stacking interaction parameters for RNA from the data set of native structures. J Mol Biol. 2005;347:53–69. doi: 10.1016/j.jmb.2004.12.012. [DOI] [PubMed] [Google Scholar]
- 29.Do CB, Woods DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics. 2006;22:e90–8. doi: 10.1093/bioinformatics/btl246. [DOI] [PubMed] [Google Scholar]
- 30.Deigan KE, Li TW, Mathews DH, Weeks KM. Accurate SHAPE-directed RNA structure determination. Proc Natl Acad Sci U S A. 2009;106:97–102. doi: 10.1073/pnas.0806929106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Pace NR, Smith DK, Olsen GJ, James BD. Phylogenetic comparative analysis and the secondary structure of ribonuclease P RNA--a review. Gene. 1989;82:65–75. doi: 10.1016/0378-1119(89)90031-0. [DOI] [PubMed] [Google Scholar]
- 32.Michel F, Westhof E. Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. J Mol Biol. 1990;216:585–610. doi: 10.1016/0022-2836(90)90386-Z. [DOI] [PubMed] [Google Scholar]
- 33.Gutell RR, Lee JC, Cannone JJ. The accuracy of ribosomal RNA comparative structure models. Curr Opin Struct Biol. 2002;12:301–10. doi: 10.1016/s0959-440x(02)00339-1. [DOI] [PubMed] [Google Scholar]
- 34.Ehresmann C, Baudin F, Mougel M, Romby P, Ebel JP, Ehresmann B. Probing the structure of RNAs in solution. Nucleic Acids Res. 1987;15:9109–28. doi: 10.1093/nar/15.22.9109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Knapp G. Enzymatic approaches to probing of RNA secondary and tertiary structure. Methods Enzymol. 1989;180:192–212. doi: 10.1016/0076-6879(89)80102-8. [DOI] [PubMed] [Google Scholar]
- 36.Regulski EE, Breaker RR. In-line probing analysis of riboswitches. Methods Mol Biol. 2008;419:53–67. doi: 10.1007/978-1-59745-033-1_4. [DOI] [PubMed] [Google Scholar]
- 37.Tullius TD, Greenbaum JA. Mapping nucleic acid structure by hydroxyl radical cleavage. Curr Opin Chem Biol. 2005;9:127–34. doi: 10.1016/j.cbpa.2005.02.009. [DOI] [PubMed] [Google Scholar]
- 38.Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM. RNA structure analysis at single nucleotide resolution by selective 2′-hydroxyl acylation and primer extension (SHAPE) J Am Chem Soc. 2005;127:4223–31. doi: 10.1021/ja043822v. [DOI] [PubMed] [Google Scholar]
- 39.Wilkinson KA, Gorelick RJ, Vasa SM, Guex N, Rein A, Mathews DH, Giddings MC, Weeks KM. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states. PLoS Biol. 2008;6:e96. doi: 10.1371/journal.pbio.0060096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gherghe CM, Shajani Z, Wilkinson KA, Varani G, Weeks KM. Strong correlation between SHAPE chemistry and the generalized NMR order parameter (S2) in RNA. J Am Chem Soc. 2008;130:12244–5. doi: 10.1021/ja804541s. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Watts JM, Dang KK, Gorelick RJ, Leonard CW, Bess JW, Jr., Swanstrom R, Burch CL, Weeks KM. Architecture and secondary structure of an entire HIV-1 RNA genome. Nature. 2009;460:711–6. doi: 10.1038/nature08237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Badorrek CS, Weeks KM. RNA flexibility in the dimerization domain of a gamma retrovirus. Nat Chem Biol. 2005;1:104–11. doi: 10.1038/nchembio712. [DOI] [PubMed] [Google Scholar]
- 43.Badorrek CS, Gherghe CM, Weeks KM. Structure of an RNA switch that enforces stringent retroviral genomic RNA dimerization. Proc Natl Acad Sci U S A. 2006;103:13640–5. doi: 10.1073/pnas.0606156103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Badorrek CS, Weeks KM. Architecture of a gamma retroviral genomic RNA dimer. Biochemistry. 2006;45:12664–72. doi: 10.1021/bi060521k. [DOI] [PubMed] [Google Scholar]
- 45.Chen Y, Fender J, Legassie JD, Jarstfer MB, Bryan TM, Varani G. Structure of stem-loop IV of Tetrahymena telomerase RNA. EMBO J. 2006;25:3156–66. doi: 10.1038/sj.emboj.7601195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Gherghe C, Weeks KM. The SL1-SL2 (stem-loop) domain is the primary determinant for stability of the gamma retroviral genomic RNA dimer. J Biol Chem. 2006;281:37952–61. doi: 10.1074/jbc.M607380200. [DOI] [PubMed] [Google Scholar]
- 47.Lynch SA, Desai SK, Sajja HK, Gallivan JP. A high-throughput screen for synthetic riboswitches reveals mechanistic insights into their function. Chem Biol. 2007;14:173–84. doi: 10.1016/j.chembiol.2006.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Vicens Q, Gooding AR, Laederach A, Cech TR. Local RNA structural changes induced by crystallization are revealed by SHAPE. RNA. 2007;13:536–48. doi: 10.1261/rna.400207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Costantino DA, Pfingsten JS, Rambo RP, Kieft JS. tRNA-mRNA mimicry drives translation initiation from a viral IRES. Nat Struct Mol Biol. 2008;15:57–64. doi: 10.1038/nsmb1351. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Duncan CD, Weeks KM. SHAPE analysis of long-range interactions reveals extensive and thermodynamically preferred misfolding in a fragile group I intron RNA. Biochemistry. 2008;47:8504–13. doi: 10.1021/bi800207b. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Jones CN, Wilkinson KA, Hung KT, Weeks KM, Spremulli LL. Lack of secondary structure characterizes the 5′ ends of mammalian mitochondrial mRNAs. RNA. 2008;14:862–71. doi: 10.1261/rna.909208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Wang B, Wilkinson KA, Weeks KM. Complex ligand-induced conformational changes in tRNA(Asp) revealed by single-nucleotide resolution SHAPE chemistry. Biochemistry. 2008;47:3454–61. doi: 10.1021/bi702372x. [DOI] [PubMed] [Google Scholar]
- 53.Edwards AL, Batey RT. A structural basis for the recognition of 2′-deoxyguanosine by the purine riboswitch. J Mol Biol. 2009;385:938–48. doi: 10.1016/j.jmb.2008.10.074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wang Z, Treder K, Miller WA. Structure of a viral cap-independent translation element that functions via high affinity binding to the eIF4E subunit of eIF4F. J Biol Chem. 2009;284:14189–202. doi: 10.1074/jbc.M808841200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Weil JE, Hadjithomas M, Beemon KL. Structural characterization of the Rous sarcoma virus RNA stability element. J Virol. 2009;83:2119–29. doi: 10.1128/JVI.02113-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Gherghe C, Leonard CW, Gorelick RJ, Weeks KM. Secondary structure of the mature ex virio Moloney murine leukemia virus genomic RNA dimerization domain. J Virol. 2010;84:898–906. doi: 10.1128/JVI.01602-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Pfingsten JS, Castile AE, Kieft JS. Mechanistic role of structurally dynamic regions in Dicistroviridae IGR IRESs. J Mol Biol. 2010;395:205–17. doi: 10.1016/j.jmb.2009.10.047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wilkinson KA, Merino EJ, Weeks KM. Selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE): quantitative RNA structure analysis at single nucleotide resolution. Nat Protoc. 2006;1:1610–6. doi: 10.1038/nprot.2006.249. [DOI] [PubMed] [Google Scholar]
- 59.McGinnis JL, Duncan CDS, Weeks KM. High-Throughput Shape and Hydroxyl Radical Analysis of RNA Structure and Ribonucleoprotein Assembly. Methods Enzymol. 2009;468:67–89. doi: 10.1016/S0076-6879(09)68004-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Vasa SM, Guex N, Wilkinson KA, Weeks KM, Giddings MC. ShapeFinder: a software system for high-throughput quantitative analysis of nucleic acid reactivity information resolved by capillary electrophoresis. RNA. 2008;14:1979–90. doi: 10.1261/rna.1166808. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Mortimer SA, Weeks KM. Time-resolved RNA SHAPE chemistry: quantitative RNA structure analysis in one-second snapshots and at single-nucleotide resolution. Nat Protoc. 2009;4:1413–21. doi: 10.1038/nprot.2009.126. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Mortimer SA, Weeks KM. A fast-acting reagent for accurate analysis of RNA secondary and tertiary structure by SHAPE chemistry. J Am Chem Soc. 2007;129:4144–5. doi: 10.1021/ja0704028. [DOI] [PubMed] [Google Scholar]
- 63. http://bioinfo.unc.edu/Downloads/index.html.
- 64. http://rna.urmc.rochester.edu/rnastructure.html.
- 65.Wilkinson KA, Vasa SM, Deigan KE, Mortimer SA, Giddings MC, Weeks KM. Influence of nucleotide identity on ribose 2′-hydroxyl reactivity in RNA. RNA. 2009;15:1314–21. doi: 10.1261/rna.1536209. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Serra MJ, Turner DH. Predicting thermodynamic properties of RNA. Methods Enzymol. 1995;259:242–61. doi: 10.1016/0076-6879(95)59047-1. [DOI] [PubMed] [Google Scholar]
- 67.Chernick MR, Friis RH. Introductory biostatistics for the health sciences : modern applications including bootstrap. Wiley-Interscience; Hoboken, N.J.: 2003. [Google Scholar]
- 68. http://rna.ucsc.edu/rnacenter/xrna/
- 69.Jacks T, Power MD, Masiarz FR, Luciw PA, Barr PJ, Varmus HE. Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature. 1988;331:280–3. doi: 10.1038/331280a0. [DOI] [PubMed] [Google Scholar]
- 70.Brierley I, Dos Ramos FJ. Programmed ribosomal frameshifting in HIV-1 and the SARS-CoV. Virus Res. 2006;119:29–42. doi: 10.1016/j.virusres.2005.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Gareiss PC, Miller BL. Ribosomal frameshifting: an emerging drug target for HIV. Curr Opin Investig Drugs. 2009;10:121–8. [PubMed] [Google Scholar]
- 72.Shehu-Xhilaga M, Crowe SM, Mak J. Maintenance of the Gag/Gag-Pol ratio is important for human immunodeficiency virus type 1 RNA dimerization and viral infectivity. J Virol. 2001;75:1834–41. doi: 10.1128/JVI.75.4.1834-1841.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Dulude D, Theberge-Julien G, Brakier-Gingras L, Heveker N. Selection of peptides interfering with a ribosomal frameshift in the human immunodeficiency virus type 1. RNA. 2008;14:981–91. doi: 10.1261/rna.887008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Wilson W, Braddock M, Adams SE, Rathjen PD, Kingsman SM, Kingsman AJ. HIV expression strategies: ribosomal frameshifting is directed by a short sequence in both mammalian and yeast systems. Cell. 1988;55:1159–69. doi: 10.1016/0092-8674(88)90260-7. [DOI] [PubMed] [Google Scholar]
- 75.Somogyi P, Jenner AJ, Brierley I, Inglis SC. Ribosomal pausing during translation of an RNA pseudoknot. Mol Cell Biol. 1993;13:6931–40. doi: 10.1128/mcb.13.11.6931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kontos H, Napthine S, Brierley I. Ribosomal pausing at a frameshifter RNA pseudoknot is sensitive to reading phase but shows little correlation with frameshift efficiency. Mol Cell Biol. 2001;21:8657–70. doi: 10.1128/MCB.21.24.8657-8670.2001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Taylor EW, Ramanathan CS, Jalluri RK, Nadimpalli RG. A basis for new approaches to the chemotherapy of AIDS: novel genes in HIV-1 potentially encode selenoproteins expressed by ribosomal frameshifting and termination suppression. J Med Chem. 1994;37:2637–54. doi: 10.1021/jm00043a004. [DOI] [PubMed] [Google Scholar]
- 78.Du Z, Giedroc DP, Hoffman DW. Structure of the autoregulatory pseudoknot within the gene 32 messenger RNA of bacteriophages T2 and T6: a model for a possible family of structurally related RNA pseudoknots. Biochemistry. 1996;35:4187–98. doi: 10.1021/bi9527350. [DOI] [PubMed] [Google Scholar]
- 79.Baril M, Dulude D, Steinberg SV, Brakier-Gingras L. The frameshift stimulatory signal of human immunodeficiency virus type 1 group O is a pseudoknot. J Mol Biol. 2003;331:571–83. doi: 10.1016/S0022-2836(03)00784-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Dulude D, Baril M, Brakier-Gingras L. Characterization of the frameshift stimulatory signal controlling a programmed -1 ribosomal frameshift in the human immunodeficiency virus type 1. Nucleic Acids Res. 2002;30:5094–102. doi: 10.1093/nar/gkf657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Gaudin C, Mazauric MH, Traikia M, Guittet E, Yoshizawa S, Fourmy D. Structure of the RNA signal essential for translational frameshifting in HIV-1. J Mol Biol. 2005;349:1024–35. doi: 10.1016/j.jmb.2005.04.045. [DOI] [PubMed] [Google Scholar]
- 82.Staple DW, Butcher SE. Solution structure and thermodynamic investigation of the HIV-1 frameshift inducing element. J Mol Biol. 2005;349:1011–23. doi: 10.1016/j.jmb.2005.03.038. [DOI] [PubMed] [Google Scholar]