Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2003 Jan 1;31(1):502–504. doi: 10.1093/nar/gkg012

RNABase: an annotated database of RNA structures

Venkatesh L Murthy 1, George D Rose 1,*
PMCID: PMC165459  PMID: 12520063

Abstract

RNABase is a unified database of all three-dimensional structures containing RNA deposited in either the Protein Data Bank (PDB) or Nucleic Acid Data Base (NDB). For each structure, RNABase contains a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers. These same analyses can also be performed on structures submitted by users. To facilitate access, structures are automatically placed into a variety of functional and structural categories, including: ribozymes, pseudoknots, etc. RNABase can be freely accessed on the web at http://www.rnabase.org. We are committed to maintaining this database indefinitely.

INTRODUCTION

The pace of RNA structural determination has been accelerating rapidly (Fig. 1). Currently, there are over 500 publicly available structures containing one or more ribonucleotides, including 45 structures of ribozymes and ribozyme fragments as well as 85 structures of partial or complete tRNAs (alone and in complex with other molecules). Inconveniently, neither the Protein Data Bank (PDB) (1) nor the Nucleic Acid Database (NDB) (2) alone contains a complete set of these structures. Furthermore, few structural validations and analyses on the structures are provided by either database.

Figure 1.

Figure 1

The cumulative number of publicly available RNA containing structures determined by X-ray crystallography (red), NMR spectroscopy (purple) and all techniques combined (blue) have been steadily increasing since the first RNA containing structure was released in 1978. There has been a substantial acceleration in RNA structure determination since the mid-1990s.

To address these issues, we have constructed RNABase, a web-driven relational database. For each structure, RNABase contains: a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers.

RNABase ENTRIES AND ANALYSES

Lists of structures in RNABase can be sorted by experimental technique, structural and functional categories (e.g. ribozymes, aptamers, tetraloops, etc.), or by conformational outlier rate. Classification by technique and into structural and functional categories is performed based on keywords in author supplied data such as: title, keywords, etc. RNABase also provides both simple and advanced search tools in addition to these general lists.

For each structure, the RNABase summary page contains information extracted from the structure's header records such as title, authors, experimental technique, keywords, etc. This page also provides links to the corresponding entries in the PDB (1), NDB (2), MMDB (3), related Medline records (4), as well as the Image Library of Biological Molecules (5).

In addition to providing powerful search and browsing facilities, it is critical to perform structure analyses and validations. With the development of complete Ramachandran-style maps for RNA (6), it is increasingly possible to classify RNA conformers and to identify likely conformational errors. For each RNA residue in every structure, a complete set of conformational parameters are calculated, including: the backbone dihedral angles (α, β, γ, δ, ɛ, and ζ), the sidechain dihedral angle (χ), the ribose dihedral angles (θ0, θ1, θ2, θ3, and θ4), and the ribose puckering phase and amplitude. Using these parameters, complete sets of conformational plots are generated for every structure in RNABase. Furthermore, residues whose dihedral or puckering parameters fall outside of allowed areas are specifically flagged as probable model errors.

These conformational parameters are also used to generate discrete conformational codes (www.rnabase.org/confsum). Each conformational code describes a subset of the multi-dimensional conformational space available to nucleotides. Correspondingly, residues with the same conformational code have similar conformations. Using these codes, frequently or rarely occurring conformations can easily be identified.

RNABase META-ANALYSIS

RNABase also contains a number of ‘meta-analyses’ of cumulative properties across the entire database (http://www.rnabase.org/metaanalysis/). Firstly, the total cumulative number of available RNA-containing structures is plotted (Fig. 1), showing that the rate of structure determination has been accelerating. It is interesting to note that the first several dozen structures were all solved by X-ray crystallography, but by the late 1990s the number of structures determined by nuclear magnetic resonance spectroscopy nearly equaled the number determined crystallographically. However, the last two years have seen the balance again shift toward the use of crystallography for RNA structure determination.

We note that the passage of time has not resulted in any noticeable improvement in the rate of conformational outliers on Ramachandran type plots (Fig. 2). Neither structures determined by NMR nor by crystallography have shown any consistent trends over the last two decades. Lastly, the number of conformational outliers varies with resolution for those determined by X-ray diffraction, as expected (Fig. 3). Struc-tures determined by NMR have outlier rates comparable to structures determined to ∼4 Å resolution by X-ray diffraction.

Figure 2.

Figure 2

The average number of Ramachandran conformational map outliers per residue has shown no consistent trends over time for all structures (blue) or for the subsets determined by X-ray crystallography (red) and NMR spectroscopy (purple).

Figure 3.

Figure 3

The average number of Ramachandran conformational map outliers per residue in structures solved by X-ray crystallography decreases with improving resolution (red). The average rate of outliers per residue for NMR structures (purple) is slightly worse than the overall average and is comparable to the rate for structures determined to ∼4 Å resolution. The data for X-ray structures are calculated as averages over 0.5 Å windows.

RNABase ARCHITECTURE

RNABase is built on a free-software platform consisting of the Red Hat Linux operating system, the apache web server and the PostgreSQL relational database system. In addition, a number of custom scripts have been developed using the python scripting language to parse and analyze the structures. Lastly, PHP scripts are used to dynamically generate the interface seen at the RNABase site. The entire system was designed to be largely self-maintaining and to require minimal ongoing work from its administrators.

The data in RNABase is assembled by a python script that parses all entries from the PDB and NDB. Because the same nomenclature is used by the NDB and PDB for both ribonucleotides and deoxyribonucleotides, RNABase classifies structures as containing RNA if they have one or more A, C, T, G, or U residues with an O2′ atom (O2* in PDB notation).

DISCUSSION

RNABase is an integrated database of RNA three-dimensional structures containing a number of annotations. It has been designed to facilitate access to structures of RNA molecules by structural, functional or experimental features. Furthermore, by providing interactive structure analysis, the quality of RNA structures may show more consistent improvement in the future. We anticipate that RNABase will continue to expand and evolve. These developments will be reported on the RNABase News page (http://www.rnabase.org/news/).

Acknowledgments

ACKNOWLEDGEMENTS

Supported by grants from the NIH (GM29458) and the Mathers foundation.

REFERENCES

  • 1.Berman H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Berman H.M., Olson,W.K., Beveridge,D.L., Westbrook,J., Gelbin,A., Demeny,T., Shieh,S.H., Srinivasan,A.R. and Schneider,B. (1992) The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J., 63, 751–759. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wang Y., Addess,K.J., Geer,L., Madej,T., Marchler-Bauer,A., Zimmerman,D. and Bryant,S.H. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res., 28, 243–245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wheeler D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 11–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Reichert J., Jabs,A., Slickers,P. and Sühnel,J. (2000) The IMB Jena Image Library of Biological Macromolecules. Nucleic Acids Res., 28, 246–249. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Murthy V.L., Srinivasan,R., Draper,D.E. and Rose,G.D. (1999) A complete conformational map for RNA. J. Mol. Biol., 291, 313–327. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES