Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2010 Apr 19;107(18):8177–8182. doi: 10.1073/pnas.0911888107

Semiautomated model building for RNA crystallography using a directed rotameric approach

Kevin S Keating a, Anna Marie Pyle b,1
PMCID: PMC2889552  PMID: 20404211

Abstract

Structured RNA molecules play essential roles in a variety of cellular processes; however, crystallographic studies of such RNA molecules present a large number of challenges. One notable complication arises from the low resolutions typical of RNA crystallography, which results in electron density maps that are imprecise and difficult to interpret. This problem is exacerbated by the lack of computational tools for RNA modeling, as many of the techniques commonly used in protein crystallography have no equivalents for RNA structure. This leads to difficulty and errors in the model building process, particularly in modeling of the RNA backbone, which is highly error prone due to the large number of variable torsion angles per nucleotide. To address this, we have developed a method for accurately building the RNA backbone into maps of intermediate or low resolution. This method is semiautomated, as it requires a crystallographer to first locate phosphates and bases in the electron density map. After this initial trace of the molecule, however, an accurate backbone structure can be built without further user intervention. To accomplish this, backbone conformers are first predicted using RNA pseudotorsions and the base-phosphate perpendicular distance. Detailed backbone coordinates are then calculated to conform both to the predicted conformer and to the previously located phosphates and bases. This technique is shown to produce accurate backbone structure even when starting from imprecise phosphate and base coordinates. A program implementing this methodology is currently available, and a plugin for the Coot model building program is under development.

Keywords: backbone conformers, pseudotorsions, reduced representation, RNA conformation, RNA structure


In recent years, RNA crystal structures have contributed greatly to the understanding of numerous cellular processes (15); however, structural studies of RNA are frequently hampered by the difficulties posed by RNA crystallography. One notable complication arises from the fact that RNA crystals typically diffract to lower resolutions than do protein crystals (Fig. 1A). At the 2.5–3.5-Å resolutions that are common in RNA, bases and phosphates can be accurately located due to the large size and rigidity of the bases and the high electron density of the phosphates. At these same resolutions, however, the density for the remainder of the RNA backbone is normally unclear and imprecise and, as a result, the detailed backbone structure is prone to errors (6). However, the molecular details of the backbone are critical for a full understanding of RNA function, because the precise positioning of backbone atoms is integral to both chemical catalysis and molecular recognition (79). Additionally, although there exist many tools to aid crystallographers in the building process for protein structures (1013), tools for RNA structural analysis are only beginning to emerge (10, 1417), and there are currently no tools or methodologies for accurately building RNA backbone structure.

Fig. 1.

Fig. 1.

Typical RNA electron density maps for structures solved at various resolutions. Structures shown in parts BI were retrieved from the Nucleic Acid Database (26), and electron density maps were calculated using observed structure factors and calculated phases. (A) Pie chart showing the resolutions of all large RNA structures (structures that contain a chain of at least 25 nucleotides), as retrieved from the Nucleic Acid Database (26). Numbers in parentheses are the number of structures in the specified resolution range. Note that structures in the 2.5–3.5-Å resolution range account for nearly two-thirds of all large RNA structures, whereas protein structures are typically solved at far higher resolutions. Maps are shown at (B) 1.04, (C) 1.75, (D) 2.25, (E) 2.75, (F) 3.3, (G) 3.8, (H) 4.5, and (I) 6.21 Å resolutions.

To address this issue, we have developed a method that allows for building of the RNA backbone with greatly increased speed and accuracy (Fig. 2B). This method requires a crystallographer to locate only the phosphates and bases in an electron density map. Using this information, a detailed backbone structure can be automatically and accurately predicted and constructed. To accomplish this, the methodology incorporates two distinct approaches for analyzing the RNA backbone: the η and θ pseudotorsions (14, 18, 19) and the consensus backbone conformers (15). The pseudotorsional system simplifies the RNA backbone to a reduced representation where each nucleotide is described by two virtual dihedral angles: η Inline graphic and θ Inline graphic. RNA can then be represented in two dimensions using an η-θ plot, which is similar to a Ramachandran plot for protein structure. The η-θ plot then allows for rapid and accurate categorization of the RNA backbone (14, 16).

Fig. 2.

Fig. 2.

An overview of the model building process. (A) The η and θ pseudotorsions. The suite and nucleotide divisions of the backbone are also indicated. (B) The model building process. A crystallographer initially builds phosphates and bases into electron density. The detailed backbone structure can then be automatically predicted and constructed.

Unlike the pseudotorsional system, the consensus backbone conformers are based on the standard backbone torsions and they comprise 46 discrete configurations for the RNA backbone (15). These conformers are comparable to the side-chain rotamer libraries frequently used in protein structure building (20). The RNA conformers divide the backbone into suites, which span two sugars and the intervening phosphate, as opposed to the more traditional nucleotide division of the backbone, which spans two phosphates and the intervening sugar (Fig. 2A). Each suite conformer is then represented as a favorable combination of seven torsional angles: δ, ε, ζ, α, β, γ, and δ. Two δ torsions are present as the suite spans two sugars, and these δ torsions determine the pucker of their respective sugar: δ values near 84° correspond to a C3′ endo sugar pucker, whereas δ values near 147° correspond to a C2′ endo sugar pucker (6). Note that two adjacent suites overlap by one sugar (Fig. 2A) and thus the ending sugar pucker of one suite must be the same as the leading sugar pucker of the following suite. Each specific conformer is given a two-character name, such as 1a for helical RNA, where the first character represents the initial δ, ε, and ζ torsions and the second character represents the α, β, γ, and δ torsions (15).

In this work, we present a modified pseudotorsional system that is specifically adapted toward structure building rather than structural analysis. Using these modified pseudotorsions, we then present a methodology for predicting backbone conformers using only the phosphate and base coordinates. We demonstrate that this results in highly accurate conformer predictions, even when starting from low- or intermediate-resolution data. We then show that, by using these predictions, accurate backbone structure can be built, and we demonstrate the accuracy of this method by automatically rebuilding the backbone of two previously published high-resolution crystal structures. The rebuilt backbone closely matches the published structures and, in several cases, shows a better match to the original electron density than does the published structure.

Results

The process of RNA model building is divided into three steps: an initial backbone trace, conformer prediction, and coordinate calculation. The initial backbone trace requires the crystallographer to place phosphates and bases into electron density. The conformer prediction step then determines backbone structure by selecting an appropriate conformer from the consensus backbone conformer library (15). Three measurements, all derived from the phosphate and glycosidic bond locations, are used for this prediction: the pseudotorsions, the perpendicular distance from the phosphate to the glycosidic bond, and interatomic distances. The coordinate calculation step then computes atomic coordinates that match both the predicted conformer and the previously determined phosphate and glycosidic bond coordinates. It should be noted that the initial backbone trace is the only step that involves the electron density map, as conformer prediction and coordinate calculation are accomplished using only glycosidic bond and phosphate locations. Below, the conformer prediction and coordinate calculation steps are further explored.

C1′-Based Pseudotorsions.

As previously mentioned, the phosphates and bases are the only nucleic acid structural elements easily visible at resolutions typical of RNA crystallography. Thus, the η and θ pseudotorsions (16), which simplify the backbone to two virtual bonds, represent an appropriate level of detail for interpreting these electron density maps. However, these pseudotorsions rely on the C4′ atom which is difficult to locate in such maps, as the density corresponding to the sugar is frequently imprecise and hard to interpret (Fig. 1). Therefore, for conformer prediction purposes, we calculate pseudotorsions using the C1′ atom in place of the C4′. The location of the C1′ can be directly and uniquely determined from the base coordinates, as it is in the plane of the base and covalently attached to the N9 atom (in purines) or the N1 atom (in pyrimidines). We refer to these C1′-based pseudotorsions as η Inline graphic and θ Inline graphic.

Because the consensus backbone library divides RNA structure into suites instead of nucleotides (Fig. 2A), we also interpret the C1′-based pseudotorsions using the suite division. This results in a (θ,η) pseudotorsion pair for each suite. By plotting these values in two dimensions, a θ-η plot can be constructed, which is analogous to the η-θ plot used to interpret nucleotides of the C4′-based pseudotorsions (14). A θ-η plot was constructed using the RNA05 filtered dataset (15), which showed that each backbone conformer occupied a limited region of θ-η space, but that numerous conformers occupied similar, overlapping regions. To reduce the number of overlapping conformers, the dataset was divided by sugar pucker (based on whether each conformer began and ended with a C2′ or a C3′ endo pucker) and four separate plots were constructed, shown in Fig. 3 and Fig. S1. This division of the dataset is comparable to that done in studies of the C4′-based pseudotorsions, where accurate clustering in η-θ space also required dividing the dataset using sugar pucker (14, 16). In order to quantify the regions of θ-η space occupied by each conformer, 2D Gaussian functions (shown as ellipses in Fig. 3 and Fig. S1) were used to define the center and SD of each conformer. The average SD of these Gaussians was 20.2°. The tightness of these clusters demonstrates the high correlation between the θ-η pseudotorsions and the backbone conformers.

Fig. 3.

Fig. 3.

A θ-η plot showing suites of the RNA05 filtered dataset. Each color and shape combination corresponds to a specific conformer as indicated in the key. Ellipses correspond to the Gaussian functions (at the 1σ level) used in conformer predictions. Only conformers with leading C2′ endo sugar pucker and ending C3′ endo sugar pucker are shown. Other pucker combinations are shown in Fig. S1.

Base-Phosphate Perpendicular Distance.

At higher resolutions, it is possible to determine the appropriate pucker for a given ribose sugar by examining the density corresponding to the O2′ atom; however, at all but the highest resolutions, this requires accurately phased data and careful manual inspection of the electron density map. Additionally, electron density at low and intermediate resolutions frequently lacks the detail required for such manual inspections. As such, in order to reliably determine pucker at these resolutions, one must use the base-phosphate perpendicular distance, as developed by the Richardson lab (10, 21). This distance measures from the 3′ phosphate (i.e., the phosphate of the next nucleotide) to a line containing the glycosidic bond (Fig. 4A). Longer distances correspond to a C3′ endo sugar pucker, whereas shorter distances correspond to a C2′-endo pucker (Fig. 4B). This measurement is easy to automate, as it does not require any manual inspection of the electron density, and it is highly accurate: In the RNA05 filtered dataset (15), 99.7% of C3′-endo pucker nucleotides have base-phosphate perpendicular distances Inline graphic, and all C2′-endo pucker nucleotides have distances Inline graphic.

Fig. 4.

Fig. 4.

Sugar pucker prediction using the base-phosphate perpendicular distance (10, 21). (A) The base-phosphate perpendicular distance, shown with yellow dashes, measures the distance from the 3′ phosphate to a line containing the glycosidic bond. (B) A histogram of base-phosphate perpendicular distances divided by sugar pucker. As shown, distances greater than 2.9 Å strongly correlate to C3′ endo sugar pucker, whereas distances less than 2.9 Å strongly correlate to C2′ endo pucker. The solid lines show probability densities calculated using kernel smoothing (see SI Materials and Methods).

Interatomic Distances.

Three additional measurements per suite are used for conformer prediction: the intrasuite C1′–C1′ distance, and two intersuite phosphate–phosphate distances, one measuring from the phosphate of the previous suite to the current phosphate, and one measuring from the current phosphate to that of the next suite.

Conformer Prediction.

By combining the C1′-based pseudotorsions, the base-phosphate perpendicular distance, and the C1′–C1′ and phosphate–phosphate interatomic distances, backbone conformer predictions can be made (SI Materials and Methods). For each suite, this technique calculates a score for each conformer and then ranks the conformers from most to least likely. The accuracy of this technique was assessed using jackknife validation. This showed that the first (most likely) conformer was correct 80% of the time for nonhelical suites and 84% of the time for helical suites, and that one of the first three conformers was correct 97% of the time for nonhelical suites and 98% of the time for helical suites (Fig. 5A). (Herein, “helical” is used to refer to suites of the 1a conformer, regardless of their molecular context.)

Fig. 5.

Fig. 5.

Jackknife validation using the RNA05 filtered dataset (15) shows that conformer predictions are highly accurate. (A) Prediction accuracy for conformers ranked as most likely, second most likely, etc., by the conformer prediction process. SE is < 0.3% for all bars. (B) A ROC plot for predictions of helical and nonhelical suites. The dashed diagonal line shows the accuracy expected of a completely random predictor. (C) Prediction accuracy for nonhelical suites as a function of structure accuracy. For the red, blue, and green lines, the base angle, base position, or phosphate position were randomized within a specified radius before conformer predictions were made. For the yellow line, all three were randomized simultaneously before conformer predictions were made. SE is < 0.8% on all points.

To further evaluate the accuracy of these predictions, a receiver operating characteristic (ROC) graph was generated. A ROC graph is a plot of true positive rate against false positive rate and is used to depict the tradeoff between sensitivity and specificity (22). As shown in Fig. 5B, this predictor is both highly specific and highly sensitive. At a 5% false positive rate, the true positive rate was 94.7% for nonhelical predictions and 96.2% for helical predictions. From these ROC plots, we can also calculate the area under the curve (AUC), which gives an overall measure of the quality of the predictor (22). A perfect predictor will have an AUC of 1, whereas a predictor that does no better than random guessing will have an AUC of 0.5. For both helical and nonhelical predictions, the AUC was 0.99, further demonstrating the quality of these predictions.

Coordinate Calculation.

After conformer prediction, atomic coordinates must be calculated that match both the predicted conformer and the previously determined phosphate and glycosidic bond locations. This task is not at all straightforward, as the conformer library gives approximate, not exact, values for all backbone torsions (15). Protein side-chain rotamer libraries similarly give approximate values for all χ angles; however, side chains built using modal values for all torsions result in sufficiently accurate structures to start refinement (20). This use of modal torsion values is possible for proteins because the side chain is constrained on only one end (i.e., the Cα–Cβ bond) and adjacent side-chain structures are independent. In RNA, however, it is the main chain being constructed. The RNA main chain is constrained on both ends (i.e., the leading and ending sugars of a suite are covalently bound to the preceding C5′ and following O3′, respectively) and atoms in one suite have an effect on torsions in the adjacent suites. As a result, an RNA structure built using modal torsion values will be highly inaccurate and cannot be used as a starting point for refinement. Instead, adjustments in suite torsion angles must be done during the building process.

A procedure has therefore been developed to calculate coordinates for each nucleotide. This procedure minimizes the atomic coordinates against mean torsion values as well as ideal bond length and angle values (see SI Materials and Methods). To test the accuracy of this process, all nucleotides in the RNA05 filtered dataset (15) were rebuilt starting from their phosphate and base coordinates. (Note that for this test of the coordinate calculation process, conformer prediction was not done. Instead, the RNA05 assigned conformer was used.) The accuracy of these rebuilt structures was determined using the Suitename program (15). This program, developed by the Richardson lab, identifies suites of RNA as a specific conformer given backbone standard torsion values. Suitename categorized 97.6% of rebuilt suites as the appropriate conformer, indicating that they were rebuilt correctly. Many of the suites that were not identified as the intended conformer were helical or helical-like. (“Helical-like” refers to conformers that form satellite clusters or shoulders of the 1a peak in standard torsion space.) There are a number of helical-like conformers, and frequently only minor differences exist between these conformers and other helical or helical-like conformers (15). For example, the differences between a 1a (helical) suite and a 1L or &a (both helical-like) suite can be imperceptible, as shown in Fig. 6C. As such, the inaccuracies in many of the incorrectly built suites are very minor.

Fig. 6.

Fig. 6.

Portions of rebuilt test case structures. Original structures are shown as green sticks and the rebuilt structures are shown in ball-and-stick representation. Atoms built within 0.5 Å of the published coordinates are shown as white spheres, and atoms built within 1.0 Å are shown in yellow. Suite numbers and conformers are labeled. In cases where an incorrect conformer was predicted, the conformer of the published structure is shown in green and the predicted conformer is shown in yellow. (A) The S-motif from the sarcin/ricin domain. All conformers were predicted correctly and all atoms have been built within 0.8 Å of their published coordinates. (B) The J1/2 linker from the guanine riboswitch. All conformers were predicted correctly and all atoms have been built within 0.8 Å of their published coordinates. (C) A helical region of the guanine riboswitch. The conformers for suites 71 and 73 were incorrectly predicted; however, all atoms were built within 0.7 Å of their published location. This is due to the similarity between many of the helical and helical-like conformers.

Phosphate and Base Accuracy.

The above conformer predictions and coordinate calculation tests were carried out using published coordinates for the phosphates and glycosidic bonds. However, when initially building low- and intermediate-resolution structures, the phosphates and bases must be manually placed into an experimentally phased electron density map. Although phosphate atoms can usually be reliably built due to their high electron density, density around the sugars and bases may require more interpretation. Examination of the base density typically reveals the appropriate plane for the base, but precise placement of the base itself it not always possible. To assess the impact of this ambiguity on conformer prediction, we repeated prediction validation with the RNA05 filtered dataset (15) after randomly adjusting phosphate and base coordinates. The phosphate or the base atoms were moved by a random distance up to 0.25, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5, or 3.0 Å, or the base was rotated within its plane by up to 3.75°, 7.5°, 11.25°, 15°, 22.5°, 30°, 37.5°, or 45°. Fig. 5C shows prediction accuracy after each of these structure randomizations. These results show that conformer predictions were still accurate even with inaccuracies in these atomic coordinates. The predictions were most sensitive to the phosphate location, but were relatively insensitive to changes in the base location, and were almost entirely unaffected by changes in the base rotation.

Test Cases.

To test the effectiveness of the conformer prediction and coordinate calculation procedures, two previously published crystal structures were rebuilt: the sarcin/ricin domain (23) [Protein Data Bank (PDB ID): 1Q9A] and the guanine riboswitch (24) (PDB ID: 2EES). High-resolution structures were used to ensure the accuracy of the published coordinates: The sarcin/ricin domain was solved at 1.04 Å and the guanine riboswitch was solved at 1.75 Å.

These structures were rebuilt using only the published phosphate and glycosidic bond locations, and the rebuilt structures were then compared to the original coordinates (see Figs. 6 and 7, and Fig. S2). Before the test cases were run, conformers were assigned to the published structures using the Suitename program, and any suites not categorized by Suitename were visually inspected and assigned conformers where possible. For the sarcin/ricin structure, 25 of the 26 suites were assigned conformers using Suitename, and the remaining suite was assigned visually. For the guanine riboswitch structure, 57 of the 66 suites were assigned conformers using Suitename, and five of the remaining nine suites were assigned visually, resulting in 62 total suites with a conformer assignment.

Fig. 7.

Fig. 7.

For two suites of the guanine riboswitch structure, the rebuilt coordinates (shown in yellow) showed a stronger match to the electron density than did the original structure (shown in green). All electron density maps shown are 2FO-FC omit maps. Note that the rebuilt coordinates shown here have not been refined or minimized against the electron density. (A) The first half of suite 51 with map contoured at 2σ. The published O3′ and OP1 coordinates are both poor fits to the electron density. (B) The second half of suite 51 with map contoured at 1.1σ. The C5′ of the rebuilt structure shows a better match to the map than does the original structure. (C) The first half of suite 64 with map contoured at 1.75σ. The predicted nonbridging phosphoryl oxygens and O3′ show a considerably better fit to the map than the published structure. (D) The second half of suite 64 with map contoured at 1.5σ. The published and predicted structures are similar and both fit the electron density well despite the differences in the first half of the suite (shown in C).

Sarcin/Ricin Test Case.

As the first step of the test case, conformer prediction was run using the published phosphate and glycosidic bond coordinates. Twenty-two of the 26 suites were predicted as the appropriate conformers using only the first (most likely) conformer prediction. Three of the remaining four suites consisted of helical or helical-like RNA, and the correct conformers were predicted as the second, third, and fourth predictions for these three suites. The first prediction in all three cases was an alternative helical-like conformer, which caused only minor changes in the structure of the rebuilt suite (see above). The remaining suite where the correct conformer was not the first prediction was the 5′-most suite. The prediction inaccuracy here was due to the lack of a 5′ phosphate on the leading nucleotide of the structure. This atom is required to calculate θ and the leading phosphate-phosphate distance for the first suite. As such, the conformer prediction was made without this information, leading to the decreased prediction accuracy.

After conformer prediction was complete, coordinate calculation was run. This process was successful for 25 of the 26 suites, as all 25 were recognized by Suitename as their predicted conformer. The remaining suite consisted of a 1a (helical) suite that had been mispredicted as an &a suite (one of the helical-like confomers). After coordinate calculation, Suitename categorized the rebuilt &a suite as a 1a conformer, further demonstrating the similarity between these helical and helical-like conformers.

Guanine Riboswitch Test Case.

As with the previous test case, conformer prediction was run using the published phosphate and glycosidic bond coordinates, and 54 of the 62 suites were predicted as the appropriate conformers using only the first conformer prediction. All eight of the remaining suites consisted of helical or helical-like RNA, and the appropriate conformers were predicted as either the second or third prediction in seven of these eight suites. The only suite where the appropriate conformer was not predicted within the first three predictions was the 3′-most suite of the structure. This prediction inaccuracy was due to the lack of a 3′ phosphate on the final nucleotide, because this atom is required to calculate η, the ending phosphate–phosphate distance, and the ending base-phosphate perpendicular distance for the final suite.

After conformer prediction was complete, coordinate calculation was run. This process was successful for 62 of the 66 suites, because all 62 were recognized by Suitename as their predicted conformer. Three of the four remaining suites were 1a suites that had been mispredicted as &a and then built as 1a, as occurred in the sarcin/ricin test case above. The only other suite that was not successfully built was the 5′-most suite. This is likely the result of a conformer prediction error caused by the lack of 5′ phosphate, because incorrectly predicted suites can lead to coordinate calculation errors if the backbone structure can not be accommodated using the torsions of the predicted conformer. The prediction accuracy of this suite could not be assessed above because the published coordinates could not be assigned to a conformer. Additionally, excluding the four suites of the published structure that could not be assigned to conformers, all atoms in the guanine riboswitch structure were built within 0.9 Å of their published locations, except for seven nonbridging phosphoryl oxygens which were built within 1.6 Å. Except for the 5′-most suite, all errors within the rebuilt structure would easily be corrected during refinement.

As a final examination of the guanine riboswitch structure, the four suites of the original structure that could not be assigned to conformers were closely examined. For each of these suites, a 2FO-FC omit map was calculated. Visual examination of these maps showed that the rebuilt structures of suites 51 and 64 showed a considerably stronger match to the electron density than did the original published structure (Fig. 7). To ensure that these rebuilt suites were stable during crystallographic refinement, minimization was carried out using CNS 1.1 (25), the same program used by Gilbert et al. (24). After minimization, no significant backbone movement occurred in either suite, showing that the rebuilt backbone configurations were stable.

Discussion

Here, we have presented a methodology for accurately building the RNA backbone into an electron density map. Using this methodology, backbone structure was generated in two fully automated steps: conformer prediction, where the appropriate conformers were selected from the consensus backbone conformer library (15), and coordinate calculation, where atomic coordinates were computed to fit the predicted conformers. Both of these steps were shown to be highly accurate. Jackknife validation of the conformer predictions showed that one of the first three predictions was correct 98% (for helical suites) and 97% (for nonhelical suites) of the time, and that the first prediction was correct 84% (helical) and 80% (nonhelical) of the time (see Fig. 5A). Furthermore, the area under the ROC curve (Fig. 5B) was 0.99 for both helical and nonhelical predictions, demonstrating that the predictions were both highly specific and highly sensitive. The coordinate calculation step was similarly accurate, with 97.6% of rebuilt suites being recognized by the Suitename program (15) as the intended conformer.

This methodology can easily be incorporated into the crystallographic model building process: A crystallographer must initially trace phosphates and bases into the electron density map. Using this information, conformer prediction can be carried out and the structure can be automatically built using the most likely sequence of conformers. The crystallographer may then look through the structure and select alternate conformers for any suites which do not appear to fit the density or for which the molecular context of the suite provides clues to the appropriate conformer. [For example, many motifs show a clear conformer preference (15).] These alternate conformers can be selected from the second and third conformer predictions and would only be necessary in a small percentage of suites. The end result of this procedure would be an accurate, fully detailed atomic structure which could then be refined against the electron density.

Test Cases.

As a test of this building methodology, the sarcin/ricin domain (23) and guanine riboswitch (24) structures were automatically rebuilt starting from the published phosphate and glycosidic bond coordinates, and these rebuilt structures were then compared to the original models. This rebuilding resulted in highly accurate backbone structure (Fig. 6 and Fig. S2). Between the two molecules, 76 of the 88 suites with assigned conformers were correctly predicted and built using only the first conformer prediction. For 10 of the remaining 12 suites, the correct conformer was predicted as the second or third prediction. Furthermore, 11 of these 12 suites were helical or helical-like and were built as an alternate helical or helical-like conformer, which frequently produced an imperceptible change in the suite structure (Fig. 6C). Additionally, for two suites of the guanine riboswitch, the rebuilt structure showed a noticeably better match to the electron density than did the original coordinates (Fig. 7). This is particularly notable because the rebuilt structure was not refined or minimized against the electron density during the building process and was based exclusively on the base and glycosidic bond coordinates. These examples demonstrate that this methodology can provide valuable assistance for structure building even at high resolution. These two suites also clearly emphasize the need for improved tools for building and refining RNA structure, because the overall quality of this structure is excellent, with 1.75-Å resolution and an Rfree value of 0.241. This need for improved building tools is further supported by a previous examination (21), which used MolProbity (10), RNABC (17), and manual examination to find and fix several structural issues in a related guanine riboswitch structure (1). In cases where the same problem existed in both guanine riboswitch structures, such as suite 64, the results presented here match the corrections described in this earlier study.

Map Quality.

Because this building procedure requires only phosphate and glycosidic bond coordinates, it will greatly aid structure building for low- and intermediate-resolution structures. Because locating bases (which are used to determine the glycosidic bond coordinates) and phosphates can be imprecise at lower resolution, we wished to examine how this imprecision affected conformer prediction (Fig. 5C). Predictions were most sensitive to inaccuracy in the phosphate coordinates. However, it is typically possible to accurately locate phosphate atoms in maps as low as 4-Å resolution (Fig. 1), as their high electron density leads to distinct peaks in the electron density map. Assuming an inaccuracy of up to 1 Å for phosphate coordinates at these resolutions, conformer prediction accuracy would be decreased by only 5.9%.

In low-resolution maps, the bases are considerably more difficult to locate with this level of precision; however, the conformer predictions are far less sensitive to inaccuracy in the base coordinates, as inaccuracies of up to 2 Å lead to only a 12.9% decrease in prediction accuracy, and inaccuracies of up to 3 Å lead to only a 22.3% decrease. Additionally, imprecision in the base orientation has almost no impact on conformer predictions, as bases were rotated by up to 45° within the base plane with only an 8.3% decrease in prediction accuracy. Furthermore, rotations about χ have no impact on conformer predictions, as only the location of the glycosidic bond, not the base plane itself, has a direct role in the prediction process.

Given these results, this technique can accurately be applied to low- and intermediate-resolution data. It is important to note that the quality of electron density can vary dramatically even in maps of similar resolution due to factors such as crystal mosaicity and the quality of experimental phases. However, this methodology should make it possible to accurately build backbone structure into most maps of up to 4-Å resolution. Even in maps of 4- to 6-Å resolution, this technique should assist in constructing the backbone, although as expected at this resolution, the accuracy will be noticeably decreased.

Conclusions.

The methodology presented here provides an automated and accurate technique for building the RNA backbone and promises to change the way RNA structure is built. It greatly simplifies the building process at all resolutions, and allows the backbone to be accurately constructed even at low and intermediate resolutions.

Materials and Methods

Further descriptions of the algorithms used are given in SI Materials and Methods. All prediction and building procedures were implemented in Perl, and the program and source code are available at http://www.pylelab.org/software. The program takes input of a PDB file containing phosphate and base coordinates and outputs predicted conformers and a PDB file containing the built structure. Work on a Coot plugin that implements this methodology is currently underway. This plugin will assist the crystallographer in placing the base and phosphate atoms, and will then predict and build the appropriate conformers. The plugin will also allow the crystallographer to select alternate conformers for any built suite, and will incorporate the “wannabe” rotamers described in ref. 15.

Supplementary Material

Supporting Information

Acknowledgments.

We thank Leven Wadley for his assistance and advice in early development of the prediction methodology. We also thank Jane Richardson and Laura Murray for discussion and advice on the base-phosphate perpendicular distance, as well as Jane Richardson and Bohdan Schneider for suggestions on the use of the C1′ atom in place of the C4′. We thank Julian Tirado-Rives and Sara Nichols for assistance in development of the coordinate calculation procedure, as well as Paul Emsley for assistance with Coot programming, and Tom Terwilliger and Günter Wagner for helpful discussion and advice. We thank Laura Murray, Jane Richardson, Francis Reyes, and Rob Batey for critical reading and helpful suggestions on this manuscript. This work was supported by the Howard Hughes Medical Institute and National Institutes of Health (NIH) Grant GM50313 to A.M.P., and K.S.K was supported by NIH training Grant T15 LM07056.

Footnotes

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

This article contains supporting information online at www.pnas.org/cgi/content/full/0911888107/DCSupplemental.

References

  • 1.Batey RT, Gilbert SD, Montange RK. Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature. 2004;432:411–415. doi: 10.1038/nature03037. [DOI] [PubMed] [Google Scholar]
  • 2.Adams PL, Stahley MR, Kosek AB, Wang J, Strobel SA. Crystal structure of a self-splicing group I intron with both exons. Nature. 2004;430:45–50. doi: 10.1038/nature02642. [DOI] [PubMed] [Google Scholar]
  • 3.Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. The complete atomic structure of the large ribosomal subunit at 2.4 A resolution. Science. 2000;289:905–920. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]
  • 4.Toor N, Keating KS, Taylor SD, Pyle AM. Crystal structure of a self-spliced group II intron. Science. 2008;320:77–82. doi: 10.1126/science.1153803. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Selmer M, et al. Structure of the 70S ribosome complexed with mRNA and tRNA. Science. 2006;313:1935–1942. doi: 10.1126/science.1131127. [DOI] [PubMed] [Google Scholar]
  • 6.Murray LJW, Arendall WB, Richardson DC, Richardson JS. RNA backbone is rotameric. Proc Natl Acad Sci USA. 2003;100:13904–13909. doi: 10.1073/pnas.1835769100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Steitz TA, Steitz JA. A general two-metal-ion mechanism for catalytic RNA. Proc Natl Acad Sci USA. 1993;90:6498–6502. doi: 10.1073/pnas.90.14.6498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Draper DE. Themes in RNA-protein recognition. J Mol Biol. 1999;293:255–270. doi: 10.1006/jmbi.1999.2991. [DOI] [PubMed] [Google Scholar]
  • 9.Butcher SE. Structure and function of the small ribozymes. Curr Opin Struc Biol. 2001;11:315–320. doi: 10.1016/s0959-440x(00)00207-4. [DOI] [PubMed] [Google Scholar]
  • 10.Davis IW, et al. MolProbity: All-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res. 2007;35:W375–W383. doi: 10.1093/nar/gkm216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Cowtan K. The Buccaneer software for automated model building. 1. Tracing protein chains. Acta Crystallogr D. 2006;62:1002–1011. doi: 10.1107/S0907444906022116. [DOI] [PubMed] [Google Scholar]
  • 12.Terwilliger TC. Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr D. 2002;59:38–44. doi: 10.1107/S0907444902018036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Langer G, Cohen SX, Lamzin VS, Perrakis A. Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7. Nat Protoc. 2008;3:1171–1179. doi: 10.1038/nprot.2008.91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Duarte CM, Pyle AM. Stepping through an RNA structure: A novel approach to conformational analysis. J Mol Biol. 1998;284:1465–1478. doi: 10.1006/jmbi.1998.2233. [DOI] [PubMed] [Google Scholar]
  • 15.Richardson JS, et al. RNA backbone: Consensus all-angle conformers and modular string nomenclature (an RNA Ontology Consortium contribution) RNA. 2008;14:465–481. doi: 10.1261/rna.657708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wadley LM, Keating KS, Duarte CM, Pyle AM. Evaluating and learning from RNA pseudotorsional space: Quantitative validation of a reduced representation for RNA structure. J Mol Biol. 2007;372:942–957. doi: 10.1016/j.jmb.2007.06.058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Wang X, et al. RNABC: Forward kinematics to reduce all-atom steric clashes in RNA backbone. J Math Biol. 2008;56:253–278. doi: 10.1007/s00285-007-0082-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Malathi R, Yathindra N. A novel virtual bond scheme to probe ordered and random coil conformations of nucleic acids: Configurational stastistics of polynucleotie chains. Curr Sci India. 1980;49:803–808. [Google Scholar]
  • 19.Olson WK. Configurational statistics of polynucleotide chains. An updated virtual bond model to treat effects of base stacking. Macromolecules. 1980;13:721–728. [Google Scholar]
  • 20.Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins. 2000;40:389–408. [PubMed] [Google Scholar]
  • 21.Murray LW. Durham, NC: Duke University; 2007. RNA backbone rotamers and chiropraxis. PhD Dissertation. [Google Scholar]
  • 22.Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27:861–874. [Google Scholar]
  • 23.Correll CC, Beneken J, Plantinga MJ, Lubbers M, Chan Y. The common and the distinctive features of the bulged-G motif based on a 1.04 A resolution RNA structure. Nucleic Acids Res. 2003;31:6806–6818. doi: 10.1093/nar/gkg908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gilbert SD, Love CE, Edwards AL, Batey RT. Mutational analysis of the purine riboswitch aptamer domain. Biochemistry. 2007;46:13297–13309. doi: 10.1021/bi700410g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Brünger AT, et al. Crystallography & NMR system: A new software suite for macromolecular structure determination. Acta Crystallogr D. 1998;54:905–921. doi: 10.1107/s0907444998003254. [DOI] [PubMed] [Google Scholar]
  • 26.Berman H, et al. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys J. 1992;63:751–759. doi: 10.1016/S0006-3495(92)81649-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES