Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 May 16.
Published in final edited form as: J Phys Chem B. 2010 Nov 4;114(48):15723–15741. doi: 10.1021/jp104361m

Quantum chemical studies of nucleic acids: Can we construct a bridge to the RNA structural biology and bioinformatics communities?

Jiří Šponer 1,*, Judit E Šponer 1, Anton I Petrov 2, Neocles B Leontis 3,*
PMCID: PMC4868365  NIHMSID: NIHMS250563  PMID: 21049899

Abstract

In this feature article, we provide a side-by-side introduction for two research fields: quantum chemical calculations of molecular interaction in nucleic acids and RNA structural bioinformatics. Our main aim is to demonstrate that these research areas, while largely separated in contemporary literature, have substantial potential to complement each other that could significantly contribute to our understanding of the exciting world of nucleic acids. We identify research questions amenable to the combined application of modern ab initio methods and bioinformatics analysis of experimental structures, while also assessing the limitations of these approaches. The ultimate aim is to attain valuable physico-chemical insights regarding the nature of the fundamental molecular interactions and how they shape RNA structures, dynamics, function and evolution.

Keywords: RNA, quantum chemistry, bioinformatics, molecular interactions, evolution

Graphical Abstract

graphic file with name nihms250563f11.jpg

Introduction

Nucleic acids (DNA and RNA) are perhaps the most important biomolecules. In modern cellular organisms, genetic information is encoded in the sequences of exceedingly long molecules of DNA. RNA molecules are produced by copying selected segments of genomic DNA (“genes”) in a process called “transcription.” RNA, long considered to possess few specialized functions beyond transmission of genetic information from the nucleus to the cytoplasm, has emerged as a central player in all aspects of gene expression and its regulation. The discovery of RNA enzymes (ribozymes) less than 30 years ago set the stage for one of the most exciting revolutions in modern biology.1 Since the discovery of ribozymes, one major RNA breakthrough has followed another, revealing the biochemical versatility of RNA that enables it to play major roles in so many biological functions. Besides their familiar roles as messenger RNAs (mRNA) and transfer RNAs (tRNA), RNA molecules are integral components of the cellular machineries for mRNA splicing (the splicesome),2 for mRNA-directed protein synthesis (the ribosome)3 and for sequence-directed protein targeting and transport to different membranes or compartments of the cell (the signal-recognition particle, “SRP”).4 The ribosome, one of the most complex biomolecular machines ever evolved, is essentially a ribozyme, although its function has been refined during evolution by recruitment of several dozen protein partners. RNA forms the catalytic core and the primary functional centers of the ribosome, which successively bind the correct amino-acid bearing tRNAs to synthesize proteins by linking together the amino acids in the order specified by an mRNA template. The catalytic core of the splicesome is also composed of RNA, and likely evolved from self-splicing, auto-catalytic RNA introns. Splicing is a fundamental modification of RNA after transcription, in which large RNA segments, called introns,5 are removed and the remaining parts, called exons, are covalently joined (“spliced”) to produce the mature mRNA. Early in the evolution of life, splicing was probably largely autocatalytic.

The discovery of catalytic RNA provided a new paradigm for theories of the origin of life by resolving the “chicken-or-the-egg” conundrum of which came first, DNA or protein. In modern cells, DNA codes for proteins but proteins are needed to copy DNA in a process called “replication” that occurs when cells divide. Because RNA appears to be chemically capable of serving simultaneously as an information carrier as well as a self-replicator, it is likely that primitive cellular life was RNA-based.6 In modern cells, RNA has ceded information storage to DNA and most catalytic functions to proteins. However, RNA has retained some of the decisive roles it probably had in primitive, hypothetical, RNA-based life forms, while acquiring new ones in the course of evolution. Although less than 2% of the human genomic DNA directly encodes protein sequences, over 80% of the genome is actually transcribed into RNA at some point in the life cycle. In other words, the vast majority of the genome is transcribed into non-protein coding RNAs (ncRNA), including well-known functional RNAs such as ribosomal and transfer RNAs. However, the functions of most ncRNAs are unknown.7 For example, the recent discovery of small RNA molecules (micro-RNA) that regulate gene expression at multiple levels was a complete surprise.8 Thus the structural complexity and functional versatility of RNA molecules is much greater than that of DNA.

The last thirty years have witnessed parallel breakthrough developments in the application of physics to chemistry and biochemistry. It has long been anticipated that quantum mechanics (QM) would provide the ultimate description and understanding of molecular systems by describing electronic structures in terms of fundamental physical principles. However, meaningful practical applications of quantum mechanics (“quantum chemistry”) had to await the development of sufficiently powerful computer hardware and software, something achieved only in the last two decades. Subsequently, QM methods have emerged as powerful tools in many areas of modern chemical research. It is now, in principle, possible to carry out QM calculations on modest laptop computers that were unthinkable 20 years ago or that required the most powerful supercomputers 15 years ago. These developments have now made it possible to address challenging chemical problems by close collaboration between theoretical, computational and experimental approaches, as exemplified, e.g., by the studies of Hobza and Schlag on the benzene dimer,9 the basic results of which remain unchallenged to this day. The QM field continues on a track of rapid and sustained methodological development, illustrated by recent innovations in density functional theory.10 By contrast, the biomolecular modeling field, based on the description of molecular systems using classic potential functions, has not enjoyed this level of sustained progress. Despite intense efforts, for example, to develop polarization force fields,11 variants of second-generation pair-additive force fields, first developed in the 1990’s,12 remain dominant in modeling studies of nucleic acids. In spite of the enormous complexity of macromolecular biological systems, which challenges the application of theoretical approaches, QM has provided interesting and relevant results that could not be obtained by any other techniques. From early on, nucleic acids, with their well-defined molecular interactions, especially base stacking and base pairing, have been favorable targets for QM computations.13 For example, QM calculations have clarified the physico-chemical origins of base stacking, as will be detailed below.14 QM studies revealed new phenomena, such as the intrinsic propensity of amino groups of nucleic acid bases to undergo partial sp3 pyramidalization.15 Modern QM calculations, carried out with expansion to complete basis sets of atomic orbitals and inclusion of higher-order electron correlation effects provide accurate energies of molecular interactions.16 QM calculations have addressed additional aspects of nucleic acid structure and function that are beyond the scope of this review, including metal-nucleic acid interactions, proton transfer processes, electronically excited states, effects of radiation and reactive free radicals, and chemical aspects of theories of the origin of life. QM calculations play decisive roles in parametrization of biomolecular force fields.17

Nevertheless, one must concede that the direct impact of QM studies on structural biology, biochemistry and bioinformatics has remained limited. For example, the basic research of the effect of base stacking on the local conformational variability of B-DNA and the classification of RNA base pairing were accomplished without considering QM data.18,19 The literatures of nucleic acid quantum chemistry and nucleic acid structural biology and bioinformatics remain largely segregated, reflecting the lack of significant interaction between the respective communities. What are the reasons for this state of affairs? We question the facile suggestion that QM research is less relevant to structural biology than it is to other fields of molecular sciences. As QM approaches are based on fundamental physical principles, they are the most sophisticated tools available to directly study the specific local interactions that occur widely in macromolecular structures. In the nucleic acid context, the local interactions occur between bases (base-stacking and base-pairing), between bases and backbone moieties (e.g. base-phosphate), or involve interactions with solvent molecules, including ions. The usefulness of QM to study similar non-covalent interactions has been widely accepted in many other areas of chemistry.20 Thus, the striking lack of interest in high-quality QM results relevant to structural biology and bioinformatics is puzzling. Obscure, outdated and even incorrect models too often continue to circulate. Common Biochemistry textbook21 continues to publish outdated stacking energies obtained in the 1970’s by at the time affordable semiempirical approaches that are wildly in error by modern calculational standards.

What can be done to increase communication between practitioners of quantum chemistry and biochemistry, structural biology and bioinformatics? In this feature article we compare the methodological approaches of QM and structural bioinformatics as they pertain to molecular interactions in RNA and provide suggestions as to how these two fields could profit from greater interaction and cooperation. The lack of communication between the QM and structural biology communities may have its origin in one salient feature of QM calculations. To be tractable, QM calculations must be carried out on sufficiently small model systems, of the order of dozens to no more than about 100+ atoms. These model systems are studied in complete isolation, that is, largely in the absence of solvent. While QM calculations provide very accurate and physically complete descriptions of the molecular interactions in these model systems, and exactly the same interactions are indeed present in nucleic acids, their influence is always realized within a context of a multitude of other effects that produces a delicate balance between all molecular interactions. This balance is so exceptionally complex that it is very difficult to correlate the calculated data derived for isolated model systems with relevant experimentally measurable quantities of interacting systems. For example, with proper attention to the choice of geometry, QM calculations can provide accurate descriptions of base stacking energies. 22 These calculations, however, do not correlate well with experimentally derived thermodynamic parameters for nucleic acids, obtained in water solution at moderate ionic strength (e.g. 1.0 M NaCl) and physiological temperatures. 23 This does not imply that the QM stacking data are irrelevant: QM calculations do provide valid and correct descriptions of one of the dominant forces in nucleic acids, information that cannot be collected by any other technique. At the same time, the experimental thermodynamic data, because of the complexity of the interactions, do not, in fact, provide unambiguous measures of the strength of the direct base – base interactions. The experimental measurements reflect the overall free energies associated with a given nucleic acid structure and sequence and are not dissectible into the contributions of individual interactions that would be equivalent to the QM data. Thus, QM calculations and thermodynamics experiments reflect different aspects of the base stacking phenomenon, and while both descriptions are valid, neither is complete by itself. To achieve the best possible insight into base stacking, we need to integrate both sources of information.

In summary, we need to keep in mind that biomolecular systems are exceptionally complex, and only a fraction of problems in biology can be addressed in a comprehensive manner by computations. It is crucial not to overrate the capabilities of QM tools in biology. Nevertheless, considering the importance and richness of biology, it pays for to apply computations to accessible problems, where they can provide valuable, and often unexpected, insights. Over-interpretation of computational results sometimes occurs in another area of computational chemistry, nucleic acid molecular modeling based on molecular mechanics (MM) force fields. Some studies push molecular dynamics (MD) simulations beyond the limits of the force fields, a practice that reflects a lack of attention to the limitations imposed by the underlying approximations of current techniques.24 This is usually not so great a problem in the QM literature, as most QM practitioners are attentive to the limitations of accuracy. The greater difficulty of extrapolating results obtained from QM studies of small model systems to intact biomolecular systems, however, can easily lead to oversimplification, even when the model studies per se are correctly executed.

The parallel application of the QM and MM methodologies to the same nucleic acid systems has great potential, but is rarely attempted. The interpretation of many QM studies would profit from complementary, explicit solvent classical MD simulations. Likewise, simulation studies can gain credibility when the limitations of force field approximations are complemented by insights from QM calculations. Wisely combining the two computational techniques gives us more space for maneuvering when facing the daunting challenges posed by biochemical and biological systems. The direct integration of QM and MM descriptions has produced genuine hybrid QM/MM methods, suitable for studies of enzymatic reactions.25

Another source of misunderstanding between the QM and structural biology communities concerns problems of scientific communication, which in principle can be solved. QM studies are written in a style and terminology that limits accessibility to non-expert readers. Consequently, potentially valuable results and insights do not reach the audience which would most benefit. At the same time, most structural biologists and bioinformaticists largely ignore the QM literature, assuming that it does not contain relevant information. This is partially understandable, as only few non-specialists have time to follow the computational literature in sufficient depth, given the huge volume of new biological literature they need to follow in their own fields. Also, it is difficult for non-experts to sift out the most relevant studies from the large number of papers in the computational literature. Nonetheless, peremptory dismissals of theory tip too far in the other direction, and the result is that a significant number of carefully done theoretical and computational studies have been ignored, which could aid in the interpretation and understanding of much experimental data.

Purposes of QM calculations

QM calculations quantify crucial properties of molecular systems. For some of these properties, QM calculations represent the only available tool of contemporary science. QM methods can calculate some properties with high accuracy, while providing qualitative insights for others. The most important of these is the intrinsic electronic energy, which is a function of the molecular geometry, defined as the exact Cartesian coordinates of all atoms (i.e. nuclei) of the molecule. By calculating the energy on a grid of varying geometric coordinates, potential energy surfaces can be constructed point by point ( Figure 1 12, 14b, 16). The calculated intrinsic energy for a given, fixed geometry corresponds to a hypothetical measurement of the energy at zero Kelvin temperature. This differs from the averaged energies obtainable by any real experiment, which is necessarily performed at non-zero temperature on an ensemble of populated structures. Thus, QM calculations have one considerable advantage over experiment; they can investigate the properties of any geometry of choice, including geometries that would not be populated in experiments of model complexes, but which occur when the model system is embedded in real nucleic acid structures. In principle, even the nature of fleeting transition states occurring during conformational changes or chemical reactions, can be investigated.

Figure 1.

Figure 1

Modern QM theory to base stacking.16f The dependence of base stacking energy on twist angle between the nucleobases in A/A, U/U, C/C and G/G base-base stacks, calculated for undisplaced face-to-back nucleobase dimers. The geometries of A/A stacks are shown to illustrate the twist angle. The solid lines represent force field calculations using the Cornell et al (AMBER)12 MM force field with point charges derived to fit the electrostatic potential of the monomers at the MP2 level of theory. MP2 charges are more appropriate for direct comparison with the QM data than more polar HF charge distributions recommended for condensed phase simulations 14b The QM data (the specific symbols) are calculated with MP2 method expanded to the complete basis set (CBS) of atomic orbitals and corrected for higher-level electron correlation effects. Although the agreement between the force field and QM is not perfect, there are no major deviations. The Figure illustrates one of the key applications of QM methods, point by point comparison of the rigorous data with approximate descriptions.14b In this particular case, the lack of substantial deviations between QM and force field over the whole potential energy surface indirectly clarifies the nature of base stacking.16

Typically, QM is applied to calculate interaction energies of model molecular complexes representing interactions that occur in intact macromolecular nucleic acids. Examples include base pairs, base stacks, base-backbone, and base-solvent interactions. The interaction energy is defined as the difference between the energy of the interaction complex in a given geometry, specified in Cartesian coordinates, and the energies of the corresponding monomers, when they are separated to infinity so that they do not interact. QM energies calculated on completely isolated systems (in vacuo, i.e. in the gas phase), are called “intrinsic energies”. For gas phase calculations, modern QM methods can achieve high chemical accuracy, that is, deviations of ~0.5-1.0 kcal/mol from true values for interactions between two bases (Figure 2). 16, 26 This is qualified estimate with respect to hypothetical (i.e., unknown) values that would be obtained by fully converged calculations. For comparison, the gas phase interaction energies of AU and GC Watson Crick base pairs are ~ −15 and −30 kcal/mol, respectively, while optimal configurations of base-on-base stacks possess interaction energies ~ −10 kcal/mol.16 The capability of QM calculations to include solvent effects on conformational preferences and interaction energies is limited by lack of accurate methods to model the solution phase.27 Nonetheless, recently continuum solvent techniques are becoming increasingly popular in the quantum chemistry of nucleic acids. These methods are common in that they treat the solvent as a dielectric continuum, which creates an interaction potential around the solute molecule. There are plenty of variants of this technique, which differ in the formalism used to express this interaction potential. Among them, the COSMO28 and MST29 models are rather suitable to characterize the strength of intermolecular interactions in nucleic acids, albeit the results should not be over-interpreted (see also below).16e,30 In addition, recently several new continuum solvent techniques have become available, like the IEF-MST31 and SMx32 methods, which, based on the results of blind tests on nucleobase derivatives, seem to provide very promising computational platforms for future studies. A similar performance can be expected also from the COSMO-RS method,33 which combines the original COSMO formalism28 with a statistical treatment of surface interactions.

Figure 2.

Figure 2

Convergence of stacking energies for antiparallel undisplaced face to back arrangements of A/A, U/U, C/C and G/G stacks. The data show MP2/6-31G*(0.25) method (reference method in 1990’s), MP2/aug-cc-pVDZ (ADZ) calculations, MP2/CBS calculations using aug-cc-pVDZ → aug-cc-pVTZ (D→T) and aug-cc-pVTZ → aug-cc-pVQZ (T→Q) extrapolations and the final MP2/CBS T→Q calculations corrected for the CCSD(T) contribution with small basis set. AXZ(X=D,T,Q) = aug-cc-pVXZ.16

Although QM can in principle provide very accurate values of the intrinsic energies of interacting systems, not all results obtained by calculation are biologically relevant. The quality of calculations can suffer either because of inappropriate choice of QM method or unsuitable geometry. With presently available QM methods, the later problem is more significant. Use of X-ray structures (which are inevitably averaged and influenced by data/refinement errors) can lead to major and uncontrollable defects in accurate interaction energy calculations, for a variety of reasons.16f, 22, 26

While QM calculations of base stacking, base pairing and related interactions can be carried out routinely, it is more difficult to perform realistic calculations to obtain the energies of models of the flexible sugar-phosphate backbone, which can assume a large range of conformations.34 An important source of error in calculations of the backbone is the artifact known as “basis set superposition error” (BSSE), which is a spurious unphysical stabilization of molecular contacts in variational QM computations using finite basis sets of atomic orbitals.14c, 26 BSSE can be eliminated in a straightforward fashion from calculations of inter-molecular complexes but not from those of intra-molecular interactions. Large problems also arise when trying to carry out computations relevant to macromolecular nucleic acids, due to the uncompensated charges of the phosphates. Thus, it is best to avoid including more than one phosphate group in model systems. Optimizations of flexible backbone tend to produce geometries that do not occur in polymeric nucleic acids and instead form intra-molecular contacts that prevent biochemically relevant energy analysis. In recent studies investigating sugar-phosphate-sugar DNA model systems, we found it necessary to freeze all dihedral angles, so as to keep the system under control and match experimental or target values.35 Our attempts to adequately include stacked bases into the backbone calculations have not succeeded (unpublished data). We find that in the absence of constraints, the system tends to deviate from biochemically relevant geometries. QM calculations of the nucleic acid backbone are still rather rare.36 Further information about various aspects of QM calculations can be found in the literature.26

Structure/energy QM calculations can be supplemented by electron density and energy decomposition analyses. There are methods to analyze the electronic density, which can be directly derived from the wavefunction. These methods divide the electronic density into components, which can be assigned to classical chemical bonds. Among them perhaps the Natural Bonding Orbital (NBO) analysis37 or Bader’s Atoms in Molecules (AIM) model38 are the most accurate and widespread. For example, a combination of these two techniques can be successfully applied to evaluate contribution of individual H-bonds in complex interaction networks. Such calculations rule out presence of weak C-H…O H-bond in canonical AU and AT base pairs.39 Common problem of these methods is that it is not easily possible to translate the knowledge of the fine aspects of the electronic structure into direct information about energetics. This limits practical applicability of such analyses to structural biology problems or force field derivation.

Another extension of basic QM description is energy decomposition. Various energy decomposition schemes have been elaborated to disclose the physico-chemical nature of the stabilizing forces acting in intermolecular complexes. Some of them, e.g. analysis of the frontier molecular orbitals has been applied to analyze base pairing in DNA.40 The SAPT (symmetry adapted perturbation theory) method has been employed to evaluate the balance of stabilizing forces in RNA base pairs, tertiary interactions41 and base stacking.42 Decomposition was utilized in parameterization of specialized polarizable SIBFA force field43 which is useful for model calculations of interactions between metal-cations and nucleic acids components.44 Usual limitation of the decompositions is that they are not fully unambiguous. They also decompose the interaction energy into set of large exponentially growing (in absolute values) terms. This makes the decompositions very sensitive to small variations of interatomic distances and impractical for force field derivation and biochemical/bioinformatics analyses. Note that while energy calculations correspond to observable (i.e. “real”, physically existing) quantities, electron topology analyses and energy decompositions require arbitrary decisions. Nature of a H-bond as derived by decomposition in the gas phase 0K minimum geometry is not fully representative for a biomolecular H-bond that extensively fluctuates and competes with other interactions. Among decomposition schemes, SAPT and its faster DFT-SAPT variant are considered as most physically based.16f, 45

In summary, the goal of QM computations is to provide quantitative understanding of the energetics and physico-chemical nature of the interactions that structure RNA molecules. This information, when wisely used, can improve our ability to predict RNA structure from sequence and gain insight into its function. Thus, computations represent a potentially useful complement to RNA structural bioinformatics, where such physico-chemical insights are lacking.

What is the scope of RNA structural bioinformatics?

A major goal of RNA bioinformatics is to identify all genes of ncRNA (see above) in genomes. Entire sequenced genomes are accumulating rapidly in sequence databases. Transcriptome projects have demonstrated that much of the genome is transcribed (i.e. copied into RNA) and that most of the RNA produced is ncRNA.7a, b Evidence is rapidly accumulating that much of this RNA production plays critical roles in gene regulation, development, adaptation to environmental changes, and evolutionary plasticity.7c, 46 Still, the structures and functions of most ncRNAs remain unknown. Thus, another task is to predict the secondary (2D) and tertiary (3D) structures of ncRNAs identified in the genomic sequences or discovered by transcriptome projects. In addition, one seeks to identify possible protein and RNA interaction partners for ncRNAs. Finally, one would like to predict the possible functions of ncRNAs or to better understand their mechanisms of action, including their dynamics.

Hierarchical structure of RNA

Like DNA, RNA is an unbranched, linear polymer composed of four nucleotide units, A, C, G, and U. In RNA, the base uracil (U) replaces thymine (T), found in DNA. Each nucleotide consists of a planar aromatic base attached to a five-member sugar moiety (ribose in RNA or 2’-deoxyribose in DNA) and a phosphate group. Nucleic acid chains (RNA or DNA) result from phospho-diester linkages between successive sugar residues, with phosphate groups linking the 3’-carbon of each sugar to the 5’-carbon of the next sugar, leaving a free (unlinked) 5’- position at the beginning of each chain (the “5’-end”) and a free 3’- position at the other end (the “3’-end”). Thus, nucleic acid chains are asymmetric; i.e., 5’-ACGU-3’ is a different molecule from 3’-ACGU-5’. The 5’-end is considered the beginning because that is where chain synthesis begins in living organisms.

The ribose 2’-OH group (absent in DNA) induces profound differences between DNA and RNA. It makes RNA chemically less stable than DNA (by assisting in auto-chain cleavage), so DNA is better suited for stably coding large genomes. Because the 2’-OH group is a versatile H-bond donor and acceptor, its presence enhances the ability of RNA to create complex architectures not available to DNA.19a, 47 As detailed below, the 2’-OH makes possible a whole range of non-Watson-Crick base pairs not found in DNA and facilitates compact packing of RNA helices. Evolution has exploited the versatile self-interaction properties of RNA to generate an incredible diversity of RNA structures capable of a large range of specific RNA-RNA, RNA-protein, RNA-DNA and RNA-small molecule or ion interactions of great biological importance. The chemical difference between U and T is rather subtle.48 U is essentially a T lacking the methyl group at position C5 (Carbon-5) of the base. The methyl group contributes to more efficient base stacking and thus subtly improves helix stability.2223 For recent gas phase analysis of the effect of the methyl group on pairing and stacking see also ref.49 The most important role of the 5-methyl group of T is likely related to DNA repair. Cytosines occasionally deaminate to uracils. The presence of a U (T lacking the 5-methyl group) in DNA marks sites at which a C→U conversion has occurred, requiring repair. 21, 50

DNA largely occurs as a dimeric, double helical complex comprising two long complementary strands, which associate anti-parallel to each other by forming exclusively AT and GC Watson-Crick (WC) base pairs, i.e., canonical base pairs. The base pairs stack to form the regular B-form right-handed double helix. In contrast, RNA molecules are single-stranded. Nevertheless, they can also form usually short (see below) anti-parallel double helices by folding back upon themselves to align WC complementary stretches of sequence. Besides the canonical AU and GC WC base pairs, A-RNA double helices contain a significant fraction of GU “wobble” base pairs (see Figure S1 in the Supporting Information). Canonical RNA double helices alternate with regions of nucleotides that do not form canonical base pairs, i.e., that are nominally unpaired. The secondary (2D) structure of an RNA is a summary of the adjacent canonical base pairs formed when an RNA molecule folds. Drawings representing the 2D structures of RNA molecules often show only the nested canonical base pairs. All the remaining nucleotides are shown as unpaired “loops” (see below) in the 2D plots (Figure 319a). Many of the nominally unpaired nucleotides, however, form non-canonical (non-Watson-Crick, non-WC) base pairs. All these terms will be in detail explained below.

Figure 3.

Figure 3

Domain I of 16S rRNA from Escherichia coli with internal loops (colored in green), hairpin loops (blue) and multi-helix junction loops (yellow) highlighted both on the stereo view of the 3D structure (upper left) and on the 2D diagram (center). The 2D diagram shows canonical base pairs (marked by short lines) and GU wobbles (dots). The other nucleotides are formally unpaired and form loops (see the text for explanation). In reality, the loops are precisely structured RNA elements. Locations of several RNA 3D motifs are marked on the 2D diagram as a1, a2, b and c. Two instances of a recurrent sarcin-ricin motif (a1 and a2) found in this region are superimposed and their interactions are annotated according to the Leontis-Westhof19a classification in the green inset. The blue inset (b) shows a T-loop hairpin with an annotation, and the yellow inset (c) depicts a multi-helix junction loop (the backbone of each of the three strands is traced by a red ribbon).

The tertiary (3D) structure refers to the non-WC and long-range interactions that stabilize the exact RNA three-dimensional structure. Predicting RNA 2D and 3D structures starting from sequence is a challenging and multi-step process.51 The WC base pairs determine the basic folding and contribute most of the thermodynamic stability to the folded 3D structure. Thus, structure prediction usually starts with prediction of 2D structure.52 In other words, the 2D structure is “separable”, so that to a good first approximation, most RNAs fold so as to minimize the free energy of the 2D structure. Approximately (only) 60% of bases in structured RNAs form canonical base pairs. However, the tertiary interactions can also contribute decisively to the overall free energy of RNA molecules, especially in those cases where part or all of a molecule can form two or more distinct 3D structures having comparable free energies. In fact, the ability to form more than one structure is essential to the function of some RNAs. Environmental factors, interactions with other molecules, or subtle effects of the kinetics of folding may affect which structure is finally realized under specific conditions. For RNAs with length up to ~700 nucleotides, contemporary methods for predicting the 2D structure by computational folding of a single sequence achieve ~70% accuracy.52 This is calculated as the percentage of correctly predicted WC base pairs minus predicted base pairs that do not occur. This accuracy is considerably improved when additional experimental data are available, for example chemical or enzymatic probing data of the folded RNA molecule.53 Probing data can identify nucleotides, which are more likely to belong to 2D structure loops vs. WC paired helices. Folding programs allow one to include these data as constraints.5354

Predictions of 2D structure can also be improved by knowledge of additional homologous sequences that are sufficiently, but not excessively, diverged.55 The success of comparative sequence analysis (CSA) methods is based on the idea that random mutations that occur during evolutionary processes are not equally likely to be passed on to progeny. Natural selection rapidly eliminates mutations that disrupt the 3D structure in ways that block the proper function of RNA molecules. Moreover, natural selection favors compensating mutations that restore function to molecules whose function is compromised by the initial mutation. Thus, the 2D and 3D structures of homologous RNA molecules tend to diverge much more slowly than their sequences. By identifying compensating mutations that preserve WC complementarity at equivalent sequence positions in homologous RNAs, one obtains reliable evidence for conserved base pairs that belong to the common 2D structure. If nucleotides “i” and “j” in the RNA sequence form canonical base pair, then conservation of the 2D structure requires that a mutation at position “i” be accompanied by a compensatory mutation at position “j” to maintain the WC base pair and the functional structure.

Accurate 2D structure provides the necessary basis for prediction of the 3D structure, but is not sufficient by itself. Despite considerable progress in structure prediction methods, the only reliable way to obtain atomic-resolution 3D structures of new RNA molecules remains X-ray crystallography. Sequence alignment and CSA can also play a role in modeling RNA 3D structure56 and are especially efficient if one or more exemplar X-ray structures are available. When an X-ray structure of a given RNA of one organism is available, it is possible to deduce molecular interactions of equivalent RNAs of other species by aligning their sequences to the known structure. Sequence alignment means arranging the sequences of two or multiple RNAs to identify regions that mutually correspond because of structural or evolutionary relationships between them.57 Unless the sequences to be aligned are nearly identical, structural alignment of RNA molecules requires simultaneously determining the 2D structure. When correctly constructed and properly annotated, sequence alignments allow one to infer for each RNA sequence the base pairs and other interactions that form at positions equivalent to the “parent” X-ray structure. Accurate alignments allow one to identify evolutionary conserved motifs, sequence patterns that form characteristic RNA 3D “building blocks”. The quality of alignments can thus be improved using sequence signatures known to form specific 3D molecular building blocks and interactions. We suggest that advanced QM and MM computations can substantially enrich the RNA structural bioinformatics by providing additional insights into the physical chemistry of molecular interactions determining the sequence signatures. Guided by phylogenetic analysis and 3D bioinformatics, computations can be used to explore and analyze the effects of base substitutions not yet observed in the available experimental structures.47a, 58

Watson-Crick Base pairs

The most frequent base pairs in RNA molecules are those that compose canonical A-form double helices, the Watson-Crick AU and GC (canonical) base pairs. They have the special property of being exactly superposable on each other, so we say that GC and AU canonical base pairs are isosteric. In fact, GC and AU pairs are self-isosteric, in the sense that AU superposes on UA and GC superposes on CG. Thus, all four WC pairs, GC, CG, AU, and UA are mutually isosteric. The structural consequence of this isostericity is that the canonical A-RNA double helix has a regular, periodic and largely sequence-independent 3D shape. The biological consequence is that mutations that substitute, for example, a UA base pair by a GC, CG or AU, do not change the 3D structure of the helix to which the mutated bases belong. If nucleotides “i” and “j” in the RNA chain form a conserved XiYj canonical base pair in the X-ray structure, sequence alignments generally reveal co-variation (alternation) of CG, GC, AU and UA at corresponding positions of homologous RNA molecules, even those from distantly related organisms.

The free energy of an RNA helix is an important biochemical parameter. It depends on the length and sequence of the helix, and can be quantified by the free energy released upon RNA chain folding. Because of the asymmetry of RNA chains, a CG pair stacked on a GC pair (i.e. 5’-GC-3’ paired with ‘3-CG-5’) is different from a GC pair stacked below a CG pair (i.e. 5’-CG-3’ paired with ‘3-GC-5’). As in DNA, there are ten unique dinucleotide sequences (base pair steps) formed by canonical base pairs in RNA (and 21 including GU “wobble” pairs). Measured thermodynamic (TD) parameters for these ten canonical steps are called nearest-neighbor parameters and constitute the core for predicting secondary structure from base sequence.23a The relative contributions of base pairing and base stacking to the thermodynamics of RNA are not known, but it is assumed that the two interactions are roughly of equal importance.59 While TD parameters for canonical base pair steps are well established,52 extension of the TD predictions to non-WC duplexes and motifs, which would dramatically improve 2D predictions, is limited by lack of experimental data.52 Carefully designed computations of molecular interactions could contribute to finding or at least rationalizing the TD rules for these RNA elements, which are difficult to access by experiment. This is one of the main areas where computational chemists should direct their efforts.58c, 60,61

“Wobble” base pairs.”

The next most frequent base pair is the GU “wobble” base pair.47a For optimal H-bonding between the WC edges of G and U, a lateral shift of the U towards the major (deep) groove is required (see Figure S1 in the Supporting Information). This perturbation is relatively minor and does not greatly distort the A-form double helix, i.e., the GU wobble pair is nearly isosteric with the AU and GC WC base pairs. Thus, GU wobble pairs occur frequently within or at the ends of WC helices and are thermodynamically quite stable, on a par with WC AU pairs. The lateral shift nevertheless creates a pocket in the minor groove which can be occupied by a water molecule, the O2′ hydroxyl of another nucleotide or a phosphate group, and is often used for RNA tertiary interactions. Importantly, the GU wobble is not self-isosteric, i.e., GU is not superposable with UG. Consequently, GU is rarely observed to co-vary with UG in RNA 3D structures or correctly constructed sequence alignments.47a

Ribosome decoding – shape vs. energy

One significant problem that evolution has had to solve is how to discriminate between GU wobble and canonical base pairs formed between the first and second positions of the codons of mRNAs and the anticodons of tRNAs.3e Because of its genuine thermodynamic stability, the GU wobble pairs can participate in stable codon-anticodon interactions between mRNA and tRNA. It is not necessary to prevent a GU pair from forming at the third codon-anticodon position because for most amino acids, the genetic code is degenerate at this position, in the sense that more than one codon base (up to four) will be decoded as the same amino acid. However, acceptance of GU wobble in the first two positions would mean acceptance of “near-cognate” tRNAs and subsequently insertion of incorrect amino acids into the growing protein chain. Thus the ribosome decoding center located on the small ribosomal subunit utilizes a sophisticated network of dynamical molecular interactions to discriminate between the shape of GU wobble pairs (formed by near-cognate tRNA binding) and the near-isosteric shapes of the canonical base pairs (formed by cognate tRNA).3e An important lesson for anyone who makes calculations is that the most critical stage of ribosomal decoding relies not on differences in the energies of wobble versus canonical base pairing, but on precise monitoring of the exact shapes of the base pairs formed between mRNA and tRNA. The basic principle of decoding cannot be deduced from any studies of stability of codon-anticodon base pairing. Such supremacy of shape over energy is common in biology. Thus, during DNA replication, DNA polymerase also monitors the shape of bases and base pairs to ensure that the correct DNA base is inserted to form a canonical base pair with the base in the template strand. This fact has been demonstrated by efficient replication of isosteric non-polar nucleobase analogs that cannot form H-bonded base pairs, but which mimic the shape of the natural base pairs.62 However, there is evidence that some other classes of DNA polymerases involved in DNA repair directly recognize DNA base pairing and its stability.63 This is therefore another lesson illustrating the enormous complexity and variability of biomolecular recognition processes. We cannot expect to find a simple set of universally valid rules. For every rule one tries to formulate, evolution finds other ways to achieve optimal function. Thus, the energy of molecular interactions, while often “silent” in biochemical processes, remains an integral part of the overall picture. In specific cases the intrinsic energetics can play decisive roles. Exactly how evolution uses energy to achieve biomolecular recognition and functional dynamics must be determined on a case-by-case basis. Clearly, the role of energetics in biomolecular interactions cannot be ignored without serious misunderstandings. This underlines the potential usefulness of appropriate computational efforts in getting in-depth insights into the balance of forces in the individual systems and recognition patterns.

RNA 3D motifs

Canonical RNA helices tend to be short, generally less than about twelve consecutive WC base pairs. Longer stretches of canonical RNA base pairs are probably too monotonous and too stable to be useful for evolution of complex and often dynamical RNA molecules and RNA-based biomolecular machines. Computations on canonical base pairs and A-RNA helices give only limited information about functional RNAs. As noted above, RNA secondary structure consists of short canonical helices punctuated by nominally single-stranded segments forming what appear as “loops” in planar 2D representations. At the level of the secondary structure, loops consist of one or more strand segments and can accordingly be classified in three basic types: 1) Hairpin loops consist of a single continuous strand segment folded on itself and terminate a helix; 2) Internal loops comprise two strand segments and occur between two helices; 3) Multi-helix junction loops consist of three or more strand segments and occur where three or more helices meet. Figure 3 shows a part of the 2D structure of 16S (small subunit)3 ribosomal RNA and is annotated to illustrate examples of each kind of loop. The term “loop” causes confusion for those unfamiliar with RNA 3D structure as it evokes the idea of unstructured, floppy chain segments. However, most “loops” in RNA molecules that function by virtue of their 3D structure are, in fact, precisely structured, including the most common apical hairpin loops.64 The nucleotides of structured loops form multiple interactions with each other, and frequently with other parts of the same RNA or with other molecules. Such structures are called “RNA 3D motifs”.65 Thus, RNA “loops” are generally the most interesting and functionally important parts of RNA molecules and are frequently recurrent, highly specific molecular building blocks.

What kinds of interactions structure RNA 3D motifs?

RNA motifs largely lack WC base pairs. They are usually rich in non-WC base pairs, as well as base stacking and a variety of base-backbone interactions. Some internal loops are fully paired duplexes, but as they consist of non-WC base pairs, their backbone structures deviate substantially from A-form helices. Many hairpin (or terminal) loops are highly structured and have few nucleotides that are not paired or stacked. Examples include the two most common hairpin loops, the “UNCG” and “GNRA” tetraloops. They usually have four nucleotides and conform to the indicated consensus sequences, where “N” indicates any nucleotide and “R” purine. These hairpin loops are 3D motifs in the sense that they have strictly defined molecular shapes stabilized by characteristic invariant signature molecular interactions. In each case, the first and fourth nucleotides of the loop form non-WC base pairs. Junction loops can have enormous structural complexity.66 The characteristic molecular interactions in all three classes of RNA loops represent genuine targets for systematic computational studies to clarify the role of molecular energetics in relating the observed sequence preferences to the conserved 3D structures.

Properties of RNA 3D Motifs

RNA 3D motifs are ordered arrays of non-WC base pairs under sequence constraints. This means that it is usually not possible to change just one base without having to change others to keep the functional RNA motif. The general properties of RNA 3D motifs include the following: 1) They are modular, in the sense that they can occur as discrete units. This makes it hard to experimentally dissect the effect of individual interactions, as disrupting one non-WC base pair can cause the entire motif to collapse. 2) They are autonomous, i.e., they can occur in different molecular contexts, folding into their characteristic geometry dictated by their specific sequence independently of the context. For example sarcin/ricin motifs (“SR loops” – Figure 3) occur in internal loops or in multi-helix junctions in many different molecules. 3) They are recurrent, in the sense that they occur in different molecules or different places in the same molecule. The same motif can evolve convergently in different molecules, i.e., evolution is finding multiple times independently the same 3D arrangement. 4) They are multipurpose. For example, the same motif can participate in proteins binding in one context and various kinds of RNA-RNA interactions in others. Some RNA motifs, like UNCG hairpin tetraloops, appear to largely function by nucleating RNA folding because of their local stabilities. Other RNA motifs, like GNRA hairpin loops appear to function primarily by mediating RNA tertiary interactions. Thus, almost every GNRA loop in the ribosome forms a tertiary interaction while almost none UNCG loop does so. V-shaped Kink-turn internal loops play primary roles in protein assisted RNA folding67 and also can act as anisotropic flexible elbow.68

RNA 3D motifs can be in principle predicted from sequence. By detecting their characteristic signature sequences in ncRNA sequences, their occurrence in the folded RNA may be inferred. In favorable cases, we can infer their likely role in the functional structure. However, the occurrence of a sequence potentially forming an established RNA building block does not always guarantee its actual formation, as the surrounding context can also play a role. Some well studied 3D motifs exhibit most or all of the above listed properties. They include internal loops such as loop E from 5S rRNA and the Sarcin-Ricin loop,69 various kink-turns,67 C-loops, the prominent hairpin loops UNCG and GNRA, and the anti-codon loop and T-loop both from tRNA. Our unpublished data suggest that the current 3D database has about 100 distinct internal loops, some of which occur so far in only one instance. Thus we have currently ~100 distinct modular RNA building blocks that are used to construct RNAs. Thermodynamic parameters have been determined for only a small number of 3D RNA motifs for use in energy-based RNA 2D prediction programs.52 Sequence- and knowledge-based approaches for predicting the 3D structures of small RNA 3D motifs are promising.51b, 70

Not all recurrent RNA building blocks are autonomous. For example, in isolation the 5’-UAA/5’-GAN internal loop forms a fully base paired noncanonical double helix, basically consistent with standard thermodynamics predictions. This topology is not used by evolution. Nevertheless, in specific tertiary contexts, this loop is completely remodeled into an RNA module serving in tertiary interactions.58b,71

The amazingly variable 3D RNA motifs represent some of the most attractive targets for all kinds of advanced QM and MM computations. They contain many interesting molecular interactions; they are small enough to be tractable and sufficiently structured to define the computational task. RNA motifs are enormously important and there is a desperate need for quantitative insights in light of the lack of data for use in thermodynamics-based prediction algorithms.

Non-Watson-Crick base pairs and RNA 3D motifs

RNA nucleobases form a bewildering variety of base pairs. Only in the early 2000s, after a critical mass of RNA 3D structures become available, did the general principles for cataloguing RNA base pairs emerge. The decisive step forward was to extend the RNA base pair definition to include base – sugar and sugar – sugar hydrogen bonding ( Figure 4).19 In fact, some RNA base pairs contain no direct base – base H-bonds and still are biochemically highly relevant. The generalized principle of RNA base pairing (the “Leontis-Westhof” classification) states that each RNA nucleobase can pair with another base using one of three base edges: the Watson-Crick edge (W), the Hoogsteen edge (H), or the Sugar edge (S). The 2’-OH of the ribose is considered part of the Sugar edge, and generally contributes to base-pairing interactions involving this edge, a feature that makes RNA distinct from DNA. Thus, base-pairing can occur by bases interacting edge-to-edge in six ways: W edge to W edge (“WW”), W to H (“WH”), W to S (“WS”), H to H (“HH”), H to S (“HS”) or S to S (“SS”).19a, 64a Further, the edges can come together in cis or trans, depending on whether the glycosidic bonds attaching the sugars are on the same or the opposite side of the axis joining the base centers. This leads to twelve basic geometric families of RNA base pairs ( Figure 519b). The families are marked by using “c” or “t” to refer to cis or trans, and the capital letters W, C, and H to refer to the edges. Thus “tWH” stands for the trans Watson-Crick/Hoogsteen family. Note that within the individual families, only certain base combinations can form. For example, there is no cWW GG base pair. Also, it is not sufficiently specific to refer to a base pair only by its base combination. For example “AG base pair” could refer to cWW, tHS, tWS, cWH, tWH, cWS, cHH, tHH, cHS, cSS and tSS AG arrangements. This illustrates the difficulty of predicting non-WC base pairs from sequence. The individual families contain up to twelve or sixteen distinct base combinations, depending on whether they are self-symmetric or not. Some families typically occur as part of larger contexts, forming base triples or quadruples (Figure 4). Besides the standard classification, there are additional planar interactions involving bifurcated hydrogen bonds, inserted solvent molecules or single H-bonds that are not included in the classification.19a The canonical “WC” base pairs belong to the cWW family, and in addition require GC and AU nucleotide combinations. Also GU wobble belongs to the cWW family. The remaining cWW base pairs are already referred to as non-WC base pairs. I.e., non-canonical and non-WC are synonyms. In structured RNAs 30% or more of base pairs are non-WC base pairs, i.e., all pairs other than cWW AU or GC pairs.19b Furthermore, in contrast to canonical base pairs, some non-WC base pairs possess alternative substates. Let us consider the A-minor I triplet in Figure 4 left which consists of cWW GC, tSS AG and cSS AC base pairs. The A-minor I tSS AG base pair is evidently not optimally intrinsically paired as the adenine nucleoside is also involved in the cSS base pair. Still, this specific observed tSS AG geometry is dominant in experimental structures and has been identified by structural bioinformatics. It is certainly related to the enormous frequency of A-minor I interactions. Existence of substates means that a given base pair may adopt several competing micro-arrangements. Substates are even more important for larger contexts such as triples and quadruples. Consider again the A-minor I triple. Figure 4 shows its fully paired (direct) variant. However, in some observed instances its cSS AC interaction is water-mediated, with water molecule inserted between the two 2’O groups (Supplementary Figure S2).68a In MD simulations, the triple often fluctuates between direct and water-mediated geometries, which creates energetically flat (anharmonic) triplet system68a Such flexibility of RNA base pairing is functionally important, as for example dynamical water insertion in the A-minor I interaction contributes to elbow-like flexibility of kink-turn motifs 68a. Starting from the A-minor I interaction the adenine nucleotide can slide along the CG base pair in both directions and create a number of additional alternative substates known as A-minor II, A-minor III and A-minor 0 interactions (Supplementary Figure S2). Similar flexibility is known also for the phosphate-in-pocket interactions, where phosphate groups are inserted into minor groove of a double helix, as shown by the quadruple in Figure 4 middle47a In such cases, different parts of the RNA molecule can slide relatively to each other over several Angstroms. Sometimes the ribosomal structures even show nucleosides that are properly arranged to make some interaction but are too far from each other. This is known as potential interactions. 47a, 47f Potential interactions can be converted into real interactions upon conformational changes. This indicates that substates provided by RNA base pairing are likely of outmost importance for functional dynamics of large RNAs and ribonucleoprotein systems.

Figure 4.

Figure 4

Examples of RNA base pairing involving backbone atoms. Left: A-minor type I GCA triple interaction is the most frequent tertiary interaction in structured RNAs. Middle: Packing interaction GCUG quartet is another powerful and recurrent tertiary interaction. Right: Example of base-phospate “base pair” interaction. Note the dominant role of the backbone functional groups in the interactions. All interactions are highly sequence-specific.

Figure 5.

Figure 5

Base pair classification and occurrence. Upper-left. AU cis and trans Watson-Crick Watson-Crick base pairs superimposed. The arrows indicate the direction of the glycosidic bond. The adenine belongs to both base pairs. The triangle abstraction of a nucleobase is overlaid onto the adenine to demonstrate the three nucleotide edges available for base pairing. Right. Twelve geometric families. Each base pair family is defined by the interacting edges of the bases and the relative orientation of the glycosidic bonds (columns 2-4). Abbreviations and symbols for representing base pair families in text and secondary structures are shown in columns 5 and 6. Column 7 shows an abstract representation of each family using triangles to represent the bases, where the hypotenuse represents the Hoogsteen edge. The shaded cells denote base pairs in the cis orientation. Lower-left. Base pair frequencies (how frequent are the twelve base pair families in %) based on their occurrence in ribosomal RNA structures (adapted from Ref.19b).

Some base pairs occur frequently in RNAs, while others are infrequent (Figure 5). The frequencies of occurrence (how often has been evolution using a given pair) of each base combination for each base pair family has been determined using a non-redundant set of atomic-resolution X-ray structures from PDB and also by using rRNA sequence alignments.19b The frequencies of occurrence of each base combination within each geometric base pair family result from their relative stabilities and shapes. In other words, each base pair possesses at least three characteristic features: i) an intrinsic capability for base pairing, ii) a shape which determines its structural compatibility with the overall RNA structure, and iii) also specific capabilities to contribute to functionally interesting RNA architectures. Each of these three factors contributes to the frequency with which the given base pair is selected by evolution. Generally, the geometric family is very conserved by evolution if the motif is conserved. Changing the geometric family of a single base pair can completely change the 3D motif structure. The factors that determine which geometric base pair family forms in a given context, or which base combinations occur most frequently for a given geometric family, are not fully known and constitute another area where computations are needed.

Robustness of base pairing families

The base pairing classification was proposed almost ten years ago, before sufficient numbers of atomic resolution structures had been determined to provide examples of all base pair combinations in each family. The classification suggested additional (at that time unobserved) base combinations for certain families.19a Recently, the compilation was updated using RNA 3D structures available in 2009.19b This analysis provided experimental confirmation for almost all base pairs predicted in 2002. Even more significantly, no additional base pairs (absent in the 2002 compilation) have been found.19b

Isostericity principle

Above, we illustrated the importance of molecular shape in structural biology. For RNA base pairing, this can be formulated as RNA base pair isostericity principle. During evolution, natural selection typically eliminates those base mutations that disrupt the 3D structure of RNA molecule, preventing it from achieving its function. For bases involved in edge-to-edge pairing, substitutions resulting in base combinations that cannot form the right base pair family or that produce a non-isosteric base pair are likely to disrupt structure and function. Thus, only isosteric or near-isosteric base substitutions are found at corresponding positions of homologous RNA molecules or recurrent RNA 3D motifs.19b

Why is the shape of the base pairs so important? It is because it determines the position and direction of the attached RNA backbone, and thus the RNA topology. The twelve isosteric families are mutually non-isosteric and even within a given family, not all base pairs are mutually isosteric. Thus, a given family can be split into several isosteric or near isosteric subfamilies. For example, in the cWW family, GC and AU are isosteric with each other, while GC and GU are near isosteric. However GA and GC or even GU and UG are non-isosteric ( Figure 672). Analysis of the available RNA 3D structures and sequences shows that the RNA isostericity principle is one of the most powerful constraints of RNA sequence evolution. Normally, evolution very strictly conserves base pair shapes and only such substitutions are realized, which can be isosteric or near isosteric. This demonstrates the dominance of the 3D structure over the primary sequence and dominance of the shape over the intrinsic energetics of molecular interactions.

Figure 6.

Figure 6

Isosteric relationships between basepairs. Two basepairs are isosteric when they meet three criteria: (1) The C1′–C1′ distances are the same; (2) the paired bases are related by same rotations in 3D space; and (3) H-bonds are formed between equivalent base positions. The cWW GC, CG, and AU base pairs (upper and lower left and upper center) meet all three criteria and are isosteric to each other, as shown. The cWW AG pair (lower center) and GU pair (upper right) belong to the same geometric family and so the paired bases are related by the same 3D rotation. However, the cWW AG pair has a significantly longer C1′–C1′ distance and so is not isosteric to the other pairs, even though it meets the other two criteria. The C1′–C1′ distance in the cWW GU (wobble) pair is about the same as in canonical pairs, but the U is shifted toward the major groove, so H-bonding does not occur between the same positions as in the other cWW pairs. This change is more subtle and so GU is considered near isosteric to the canonical cWW pairs AU, UA, GC, and CG, consistent with its ability to substitute in Watson–Crick helices for these pairs. The last example, cWH AG (lower right), has about the same C1′–C1′ distance as the canonical cWW pairs, but belongs to a different geometric family. The bases are related by a very different 3D rotation so the base pair it is non-isosteric to all cWW base pairs. (From Ref. 72).

Of course energetics also plays a role when there is a large difference in stability. Thus, while “wobble” cWW GU and AC base pairs are isosteric to each other, AC is considerably less stable, owing in part to the need to protonate the A(N1) position to allow H-bonding with C(O2). (The AC base pair is protonated.) Thus AC is much less frequent than GU. Further, GU and AC cWW base pairs have entirely different electrostatic potentials which may affect stability of their involvement in triples and quadruples. Nevertheless, the AC quite often to certain extent co-vary with GU in sequences. However, there are tertiary interactions that specifically require the wobble shape of cWW GU while covariation with AC is forbidden. Textbook example is GU/CG tertiary quartet known as packing interaction (Figure 4) which is destabilized by electrostatic potential of AC cWW.47a

The isostericity concept extends traditional views of sequence conservation. Many RNA 3D motifs that are not conserved at the level of sequence are entirely conserved when considering base pair isostericity. It should be added, however, that occasionally we observe more complex scenarios, for example, replacement of one RNA 3D motif by another that uses entirely different base interactions to achieve the same function (motif swap).47a Still, important physico-chemical properties such as overall topology or flexibility can be conserved. These cases are not predictable even with the aid of structural bioinformatics and show the almost endless complexity of RNA molecules.47a, 68b, 73 The fundamental importance of the base pairing became highlighted by recent comparative analysis of the available atomic-resolution 3D structures of the ribosomal RNAs of Escherichia coli (E.c.) and Thermus thermophilus (T.t.), two distantly related bacteria.19b ( Figure 719b). All base pairs were identified using the FR3D program64a and the corresponding structures aligned base pair by base pair. The analysis shows that over 90% of the base pairs (non-WC as well as canonical base pairs) belong to the conserved (core) structures common to these two highly diverged bacterial species, and are thus likely also common to most other bacterial rRNA molecules. Moreover, the aligned base pairs were almost 100% conserved as to geometric base pair family (type) between the two structures. For example, if two bases of the conserved core of the E.c. structure formed a tWH base pair then the corresponding bases in the T.t. structure also form tWH base pair. Amazingly, in 98% of cases, the corresponding base pairs in the two structures were found to be isosteric or near isosteric. This reveals enormous degree of conservation, which is not visible when considering only the sequence information.

Figure 7.

Figure 7

Comparison of corresponding base pairs in the 3D structural alignment of E.c. and T.t. 5S, 16S and 23S rRNAs based on the IsoDiscrepancy Index (IDI), a qualitative measure of isostericity.19b There are total 2027 base pairs that belong to the conserved core of the bacterial ribosome and, therefore, are seen in both ribosomes at equivalent positions. The majority of the base pairs in the corresponding positions of the 3D alignment are identical (shown in green, IDI close to 0). The next largest group, isosteric base pairs, have IDI <= 2.0 (shown in blue). Near-isosteric base pairs (yellow) are characterized by 2.0 < IDI <=3.3. Only 2% of all base pairs are non-isosteric (IDI > 3.3). The Figure illustrates how strictly evolution preserves the isosteric base pairs even in such distantly related bacteria as E.c. and T.t..

DNA vs. RNA difference from perspective of computational chemists

Figure 6 illustrates the well-known fact that the hydrogen-bond donor and acceptor properties on the exposed faces of the canonical base pairs (GC, CG, UA and AU) are sequence-specific. This in turn influences the interaction with other molecules. So, isosteric pairs can have different hydrogen-bond patterns with the interacting molecules. In B-DNA molecules many proteins directly read the functional groups in both major and minor groove, and thus distinguish the B-DNA sequence. Further, B-DNA double helix possesses fine sequence-dependent conformational variability (irregularity) that is of outmost importance for majority of DNA-related molecular recognition processes. Numerous computational studies have been devoted to sequence-dependence of B-DNA and molecular recognition of base functional groups. The RNA molecular recognition processes are strikingly different. The core of RNA interactions involves all kinds of the non-canonical structural features, starting from single bulges and non-WC base pairs up to complex motifs and architectures (Figure 3). They provide incomparably more variability than canonical duplexes. This is also the reason why the word “mismatch” base pair (non-canonical base pair in B-DNA, usually perturbing the molecule) is rather inappropriate when discussing RNA structures. These are functional base pairs, not mismatches. Thus, evolution does not need to extensively experiment with fine local conformational variations of canonical A-RNA helix and utilize specifically the base functional groups in the canonical helix grooves. Purely canonical A-RNA helices are anyway only short, 2–10 base pairs in the ribosome (cf. Figure 3) and even shorter in messenger RNAs. This contrasts the extremely long B-DNA canonical duplexes. (For the sake of completeness, longer 20+ base pair canonical A-RNA duplexes are important in RNA interference.) The functional groups in the deep A-form major groove are quite inaccessible. We in no case suggest that the base functional groups are entirely unimportant, but definitely the common picture is that RNAs, in a given position, typically freely alternate all four possible canonical base pairs (CG, GC, AU and UA). GC base pairs are more common in RNA probably because they are more stable while thermophilic organisms show increased content of GC base pairs. The 2’OH groups which often interact with base exocyclic groups can act both as donor and acceptor. Local variation of a canonical helix that would be considered excitingly large by DNA researcher would not be noticed by RNA researcher. Therefore, the two most favorable problems of DNA computational studies are less relevant for RNA. We do not suggest that A-RNA does not possess sequence-dependent local variations, and recent simulations revealed large effect of sequence on inclination, roll and major groove width of A-RNA.74 However, RNA is usually not about canonical helices. Contemporary nucleic acids computational literature remains visibly concentrated on DNA, strikingly contrasting recent trends in biology and biochemistry. On the other side, RNA molecules offer a much wider (basically unlimited) range of problems than can be amenable to computations. And there is yet another advantage. Although the RNA molecules are definitely more complex and at first sight more difficult to describe than DNA, we usually do not need to study fine details such as few degrees variation of the helical twist. This increases chances that the studied effects can be properly reflected despite limitations of the computational methods.

QM calculations on RNA base pairs

We have carried out basic QM computations on all six “sugar edge” base pair families.30b, 75 The calculations demonstrate that classifications excluding the sugar moieties would not respect the basic physical chemistry of the interactions. Upon inclusion of the riboses, the QM calculations are quite consistent with the classification, i.e., the RNA base pairing principles emerge directly from the intrinsic interactions. For many of the sugar-edge base pairs, the calculated gas phase geometries closely resemble exemplar (centroid) base pairs extracted from the RNA structure database.19b The exemplar base pair structure is calculated as the most representative instance in the database of experimental structures. The sugar edge base pairs generally profit from relatively large dispersion attraction. This indicates that they are more hydrophobic compared to canonical base pairs, which may support formation of tertiary interactions. To carry out structural optimizations of some isolated base pairs, it was necessary to impose specific geometrical constraints to keep them close to the experimental geometries. In some cases the computations predict additional H-bonds not seen in the X-ray structures. Some of these are artifacts due to the absence of the RNA context, in which such H-bonds violate optimal RNA topologies. However, the calculations may also detect the capability of the base pairs to occasionally form interactions that are missed by bioinformatics. We are currently scrutinizing this issue in more detail for those cases where the QM predicted minimal energy structures differ visibly from the exemplar structures. In most cases, we indeed found one or more instances in the 3D database very similar to the calculated structure, even when it differed from the exemplar (centroid). For most base pairs involving one or more sugar edges, the exemplar tends to be rather planar, but the individual instances in the database exhibit a large range of inter-base angles. The calculated optima for these base pairs tend to be non-planar, but within the observed range (Figure 8). Work is in progress to extend the RNA base pair computations by considering base pairs in specific structural contexts (with additional aid of computer simulations). Such targeted studies can bring the QM and bioinformatics data into a really intimate relation.76

Figure 8.

Figure 8

The tWS UA basepair is shown as represented by an exemplar structure (centroid), in yellow, two most diverse observed instances of tWS UA basepair (in green), and a calculated QM model (red). While not as planar as the exemplar, the QM model is within the range of observed variation.

For some base pairs, QM predicts the capability to form substates with amino-acceptor interactions.15a, 77 It is presently difficult to assess their relevance, since the resolution of X-ray structures is low, and the possibility of such interactions is considered neither in X-ray crystallography nor in 3D bioinformatics. In the first crystallographic study reporting interaction with nonplanar amino group (1.9 Å resolution B-DNA – DAPI complex), the interaction was identified after unsuccessful refinement attempts to eliminate a presumably repulsive amidinium-amino interaction.78 The interaction was then explained using QM computations. Had the crystallographer been less patient, the interaction would have been completely misunderstood. Thus, if experimental structures are analyzed with presumption, many rare but functionally important cases deviating from usual expectations are missed.

Extension of base pairing classification to base – phosphate interactions: combining structural bioinformatics directly with QM computations

For some geometric base pair families, certain base combinations occur much more frequently than others.19b In some cases it is evident that interactions not included in the standard classification play a role. Thus, we have complemented the base pair classification by introducing a classification of base-phosphate (BPh) interactions,79 which has substantially expanded the number and kind of interactions that can be annotated in RNA 3D structures. This has been the first joint study simultaneously applying the tools of bioinformatics and QM calculation to a major class of interactions in RNA. Approximately 12% of the nucleotides in ribosomal RNA form direct internucleotide H-bonds between nucleobase donor atoms and phosphate oxygen acceptor atoms. The bioinformatics analysis provided the initial classification of the BPh interactions, and this was subsequently refined by QM calculations, which uncovered the physicochemical differences between the various binding types. The BPh interactions obey the isostericity principle and impose significant evolutionary constraints on the RNA sequences. Further, we found correlation between the calculated intrinsic stabilities of the BPh interactions and their occurrence frequencies ( Figure 979). Those BPh interactions that are calculated as intrinsically more stable tend to occur more frequently in biological RNA structures. This is a clear indication that natural selection at the level of RNA sequence is to certain extent sensitive also to molecular interaction energies.

Figure 9.

Figure 9

Left, proposed nomenclature for BPh interactions and superpositions of idealized BPh interactions observed in RNA 3D crystal structures for each base. H-bonds are indicated with dashed lines. Cf. Figure 4 for a representative example. BPh categories are numbered 0–9, starting at the H6 (pyrimidine) or H8 (purine) base positions. BPh interactions that involve equivalent functional groups on different bases are grouped together, i.e. 0BPh (A, C, G, U), 5BPh (G, U), 6BPh (A, C), 7BPh (A, C) and 9BPh (C, U). Right, comparison of calculated BPh interaction energies (red) and BPh occurrence frequencies (blue) from a reduced-redundancy set of crystal structures for cytosine. Adapted from Ref. 79

We have also carried out preliminary analysis correlating QM-calculated interaction energies of RNA base pairs and their occurrence frequencies (unpublished data). In contrast to BPh interactions, we do not see clear correlations. This, however, is not so surprising, as the analysis averages over all base pairs in all their contexts, which basically means comparing apples and oranges. The role of energy may only become apparent upon considering specific interaction contexts. We now analyze RNA interactions in their different specific contexts, which is the ultimate way how computations can aid the bioinformatics. Structural bioinformatics relies heavily on known 3D structures to construct databases of possible RNA motifs and interactions. However, 3D structures do not provide direct information about energies. Thus, structural biology and bioinformatics are biased towards purely structural data. By applying modern computational approaches, the energy dimension can be added to the 3D structures.79

Base stacking in RNA

RNA nucleobases are planar one atom thick entities and can interact by stacking on each other like two plates. Base stacking interactions are just as important as base pairs for stabilizing RNA 3D structures. Base stacking can occur between individual bases, as occurs in tertiary interactions involving looped out bases, but more often between two base pairs, as occurs in helices and many local motifs comprising non-WC base pairs. Common also are stacking arrangements involving base triples and quartets. Coaxial stacking of helices is one of the fundamental driving forces of RNA folding. Each nucleobase has two distinct planar faces considering its 5’-3’ position in RNA. Thus, two distinct bases can stack on each other in four unique ways with regard to which base faces are in contact. However, stacking is actually a continuum of possible geometries, as the bases can slide in two dimensions and rotate relative to one another, while remaining stacked. Thus, it has proven difficult so far to distinguish clear-cut sub-classes of the stacking relations, although it is apparent that some base combinations prefer some stacking arrangements over others. Classification of base stacking interactions will require a combination of structural and energy data with substantial input from computations.

Two contributions dominate the direct stacking interactions (Figure 1): Van der Waals interaction, which is a combination of short-range repulsion effects and dispersion attraction, i.e., roughly the Lennard-Jones term of MM force fields, and the electrostatic interaction, which is roughly the coulombic term of MM force fields.14b The former term maximizes the overlap of the bases while minimizing steric clashes, vertical compressions and gaps between bases.80 The later term is responsible for the orientational component of stacking. There are no substantial specific “π-π” or “aromatic” effects associated with base stacking attributable to the delocalized π-electron cloud, and the currently used functions employed in MM force fields account well for base stacking.14b However, the electrostatic contribution to stacking free energies is dramatically counterbalanced by solvent screening effects,27b which can even, in appropriate structural context, stabilize stacking of consecutive positively charged base pairs.81 This illustrates the fundamental problem in the interpretation of stacking calculations for biologically relevant contexts: the degree of solvent screening modulation of the stability of stacking interactions is highly context-dependent.16f

How to compare computed and experimental data? Base stacking as the case example

As noted above, clarification of the nature of intrinsic base stacking and its subsequent quantification (cf. Figures 1 and 2) is widely accepted as a substantial success of QM methodology. Therefore, let us provide some additional data which illustrate what can be clarified by QM computations and what cannot. The best currently available QM methods (MP2 calculations extrapolated to the complete basis set of atomic orbitals supplemented by higher order electron correlation calculations with smaller basis set) can derive highly accurate interaction energy for any single xyz geometry of a pair of two nucleic acid bases (see above and Figures 1 and 2). Selection of relevant xyz geometry is nowadays a larger source of uncertainty than the interaction energy evaluations per se. Let us consider calculation of interaction energy between two bases. The monomer geometries need to be sufficiently relaxed using a good-quality method. Substantially unrelaxed geometries may compromise the electronic structure (incorrect dipoles, for example) which may bias the intermolecular terms. There are two limit cases of such stacking calculations, both valid. i) Calculations carried out with rigid monomers (relaxed in isolation) neglect the intramolecular relaxation processes due to intermolecular interactions. ii) Computations carried out with complete relaxation of the whole system fully include mutual structural adaptation of the interacting monomers, as it would occur in the gas phase. The two stacked bases are visibly deformed towards each other. Neither approach is perfect. Deformations of monomers seen upon full gas phase optimization are exaggerated compared to stacking in biomolecular systems, condensed phase or solid state experiments, where the bases are surrounded by other interacting partners from all sides. The approximation of entirely rigid monomers is also not fully realistic. The later calculations, however, allow sampling of the conformational space. These two approaches can be combined by freezing the intermolecular geometry while at least partially relaxing the monomers58c. H-bonded base pairs need always to be relaxed to allow full optimization of the H-bonds that includes stretching of the X-H covalent bonds of the donors.

Table 114e, 16e, 23b compiles computed and experimental data for stacking between two consecutive base pairs for all ten independent steps in B-DNA. (The available theoretical B-DNA data are more complete than A-RNA data, although the main conclusions would be the same.) The “2006 QM pair” column presents the current reference values obtained using idealized geometries of base pair steps with helical twist 36°, optimized propeller twist and optimal vertical separation.16e Optimized base pair geometries were used as the starting structures to construct the base pair step geometries with rigid monomer structures. For the 5’-AA-3’ step, due to its prominent propeller twisting tendency, we show geometries with 0 and optimized (−20°) propeller twist. The energies were calculated as a sum of four pair base – base stacking contributions. Energies of H-bonded base pairs are not included to get pure stacking energies. The second “2006 QM + mb” column shows stacking energies upon addition of the many-body term, i.e., nonadditivity of base stacking. The B-DNA stacking energies are −13.1 to −18.4 for the pair additive calculation and −11.2 to −17.3 after adding the many body term. The most interesting step is the 5’-GG-3’ one, which has rather unfavorable intrastrand electrostatics due to the two intrastrand GG and CC homo-stacks of highly polar G (dipole moment of ~6.6 D) and C (dipole moment of ~6.4 D). T and A have dipoles of ~4.3 and ~2.6 D.14d The GG step is the only one with significant mb term of +2.2 kcal/mol, meaning that the stacking is anticooperative. For more details see literature.16e

Table 1.

Energies of base stacking in B-DNA base pair steps (i.e., between two consecutive base pairs in B-DNA geometry).a

5’-XY-3’ 2006
QM pair
2006 QM
+ mb
1997
QM pair
AMBER
/MP2
Exp
ΔG
Exp
ΔH
2006 QM
water
Ornstein
GG=CC −13.7 −11.2 −11.5 −13.8 −1.8 −8.0 −9.4 −8.3
GC −16.6 −15.8 −14.1 −15.6 −2.3 −9.8 −10.3 −9.7
CG −18.4 −17.3 −13.8 −16.3 −2.1 −10.6 −9.2 −14.6
AA=TTb −13.1 −13.1 −12.0 −14.7 - - −9.9 −5.4
AT −13.3 −13.3 −11.6 −15.6 −0.7 −7.2 −11.9 −3.8
TA −13.0 −12.8 −11.2 −14.2 −0.6 −7.2 −9.2 −6.7
AG=CT −14.3 −12.5 −12.2 −14.9 −1.2 −8.4 −9.8 −9.8
GA=TC −13.6 −12.9 −12.1 −13.7 −1.5 −7.8 −10.2 −6.8
AC=GT −14.2 −13.4 −12.3 −14.6 −1.4 −8.2 −10.2 −6.6
CA=TG −16.0 −15.1 −12.5 −15.7 −1.4 −8.4 −9.2 −10.5
AA(prop)c −14.7 −14.7 - −15.8 −1.4 −8.5 −11.5 -
a

2006 QM pair – 2006 QM reference data calculated as sum of four base - base terms.16e Idealized geometries with helical twist of 36°. Propeller twist and vertical separation between base pairs are optimized; 2006 QM+mb – the preceding data with added many-body term; 1997 QM – 1997 reference QM calculations,14e AMBER/MP2 – Cornell et al force field;12 the atomic charges derived with inclusion of electron correlation effects (MP2 method) – see the text; Exp ΔG – reference experimental values of B-DNA base pair step free energies,23b Exp ΔH – the corresponding enthalpies,23b 2006 QM water – the 2006 QM reference values corrected for water solvent screening effects using continuum solvent model, obtained by combining the “2006 QM+ mb” data with Table 7 B3LYP data from ref. 16e; Ornstein - forty-years old semiempirical QM calculations.13a

b

propeller twist 0.

c

propeller twist was optimized and has a value of −20°.

The third column (1997 QM pair) shows the 1997 benchmark calculations, within the pair approximation, for similar but not identical geometries.14e There is a meaningful agreement between the first and third column data showing that the 1997 calculations were already qualitatively correct. The fourth column shows AMBER Cornell et al force field calculations which are in very reasonable agreement with the reference data. It illustrates the success of this particular force field in description of base stacking which stems from using atomic charges derived to reproduce the electrostatic potentials around the monomers. The calculations in Table 1 were done with charge-derivation using QM method with inclusion of electron correlation which allows consistent comparison between the QM and force field data. The actual simulation force field is derived using the same basic scheme but with the uncorrelated HF approximation which over-polarizes the monomer charge distributions. HF-derived charges may be more suitable for condensed phase simulations with non-polarizable force fields since real molecules are polarized by water. The utilization of QM methods to verify and parametrize force fields is straightforward and via improving force fields the QM methods indirectly influence our knowledge of nucleic acids. Nevertheless, the force fields remain inevitably approximate. Force fields neglect polarization and charge transfer effects. Force fields do not allow to describe non-classical H-bonds and amino-acceptor interactions utilizing the intrinsic flexibility of amino groups.14c, d, 15, 7778 Force fields are inherently less accurate in description of backbone topologies, which must be carefully tuned by non-physical dihedral “cosine” force field terms. The Lennard-Jones empirical potential with excessively steep r−12 repulsion term (r, interatomic distance) is inexact in description of close interatomic contacts, since correct description of the short-range repulsion would require an exponential term.80b

Columns 5 and 6 bring the reference experimental nearest-neighbor ΔG and ΔH parameters for B-DNA base pair steps. TD measurements and the derived TD stability parameters have enormous impact on 2D predictions and many other applications. Nevertheless, the forces determining the TD data are not fully understood. The Table shows that it is not easy to directly compare the TD reference data with the QM reference data. As already pointed out, QM and thermodynamics (TD) data represent valid description of the interactions upon different conditions. QM calculations show the net stacking energy for a given xyz geometry. The experiment shows the overall thermodynamics stability associated with a given stacked base pair step in the context of B-DNA, which includes not only stacking, but also base pairing, all populated geometries, presence of backbone, all solvent effects, ion binding, etc. Still, the experimental data reflect some of the gas phase trends. The stability order increases with the number of GC base pairs (0, 1 or 2) which reflects the intrinsic stability of GC and AT base pairs. For the GC, CG and GG base pair steps, the enthalpies reflect the gas phase stability order, mainly the relatively low stability of the GG step. This agreement can nevertheless be incidental. The measured TD parameters may often be radically affected by incidental contributions which differ from case to case, rather than being determined by the most fundamental forces such as base stacking.60b,61 Let us assume that we deal with two configurations. One of them is optimally hydrated by an integer number of waters while the other is not. The second configuration may be penalized due to unoptimal distribution of hydration sites. As another example, let us consider the guanine to inosine (I) substitution in GC WC base pair. The GC and IC base pairs are isosteric and except of the missing NH2 group the electronic structures of I and G (such as the dipole moment magnitude and orientation) are rather similar.14d So, we do not expect a radical effect of such substitution on base stacking and pairing, except of preventing minor groove clash in the B-DNA for the respective 5’-PyrPur-3’ step. Yet, there is a striking ~1.6 kcal/mol free energy difference between equivalent G→I substitutions in canonical B-DNA and A-RNA.82 TD studies sometimes attempt to rationalize the measured trends by intrinsic stacking and base pairing interactions, but not always considering the corresponding modern physical-chemistry data. Some of the assumptions are then not in agreement with modern physical chemistry of molecular interactions. We suggest that if TD experiments are discussed using the intrinsic molecular interactions such as stacking and base pairing, it should be done using the modern physical-chemistry computations.83 If a meaningful correlation between TD data and the intrinsic forces does not exist then it should be understood as result of the overall complex balance of molecular forces.

Inclusion of solvent effects could bring the calculations closer to experiment. Relatively straightforward approach is to use continuum solvent approximations (see above). Thus, the seventh column of Table 1 gives the 2006 QM reference calculations extended by continuum solvent (water) calculations, by combing (using B3LYP data) results of Tables 4 and 7 of ref 16e. Such calculations include the effect of solvent screening of the electrostatic interactions, however, they still do not allow direct comparison with the experiments, as many other contributions remain excluded. The calculations are still using just a single static geometry, do not include solute entropy terms, do not include the rest of the NA molecule (i.e., the two stacked base pairs are fully immersed to water), do not include specific water binding, etc. In fact, comparing the QM data for GC, CG and GG steps with the TD data we see that the above-noted correlation for ΔH is lost. So linking such computations (for systems as complex as nucleic acids) to existing experiments is not trivial. One important feature revealed by such calculations is nevertheless clear. The solvent screening is effectively suppressing (or counterbalancing) the electrostatic energy contribution to stacking which dominates the orientation dependence of base stacking in the gas phase.27b It agrees with the empirical experience. For decades, structural biologists rationalize stability of stacking as dispersion-controlled and hydrophobic interaction based solely on the degree of mutual overlap of the stacked bases, not considering the mutual orientations of nucleobases which vary widely. This simple approach which is equivalent to switching off the electrostatic term in computations is quite insightful in linking structural data with biochemical data. Apparently, orientation of stacking geometries in nucleic acids is not determined by the electrostatic part of stacking, definitely not to the extent seen in gas phase computations. From this point of view, assuming that modern QM calculations revealed the role of dispersion forces in nucleic acids is not fully accurate. The role of dispersion has been well known in experimental science and has been quite accurately included even in the oldest empirical force fields. The right statement is that correct evaluation of the (roughly known) dispersion forces in QM computations has been achieved upon inclusion of large portion of intermolecular electron correlation effects as the last step to reach chemical accuracy and to provide the ultimate and unambiguous picture of the interaction. The fundamental issue of the degree of expression/attenuation of electrostatic effects in nucleic acids awaits an in depth physical-chemistry analysis as it likely varies from context to context while evolution is utilizing this variabilty.14d

To complete the comparison, the last column of the Table shows B-DNA stacking energy data derived in 1978 by Ornstein et al.13a The stacking energies range from −4 to −15 kcal/mol while the stability order has no correspondence to modern QM calculations. Similarly the 1962 data by deVoe and Tinoco (from −2 kcal/mol, AT and CG steps to −16 kcal/mol, GC step)84 and 1976 data by Kudriatskaya and Danilov (from −7 kcal/mol for the AT step to −24 kcal/mol for the GC step)85 do not resemble modern calculations. This reflects the insurmountable limitations of these pioneering calculations in prehistoric quantum chemistry before advance of modern computers. It is, however, hardly justified to use these older calculations in any discussions of molecular interactions, as still sometimes happens. Actually, the first calculations capable to give meaningful stacking estimate are the 1988 first ab initio data by Aida,86 being in the range of ~-7 to −12 kcal/mol and basically reflecting the correct order of stacking stabilities (See Figure 3 in ref. 14e). Thus meaningful ab initio QM data is available for more than 20 years in the literature, albeit the first 1988 attempt could get only a fraction of the dispersion energy. Within 20 years, the ab initio calculations matured to chemical accuracy and completeness in stacking calculations.16f

Table 2 compares base stacking in 10 unique B-DNA and A-RNA steps, compiled from the recent study by Svozil et al.22 In contrast to Table 1, the geometries are now derived from explicit solvent MD simulation trajectories. The approximate nature of the force field means that the populated structures are not perfect (helicel twist is underestimated, etc) and introduce errors into the calculations. Nevertheless, the simulation allows to monitor the genuine thermal fluctuations of stacking, instead of using just a single static geometry. Thus, for each step sequence, 10–50 individual geometries of A-RNA and 50 for B-DNA are evaluated. The first column gives the average value of stacking energy, the next column the standard deviation, and the subsequent two columns give maximum and minimum values of the calculated stacking energies. The geometries are based on 400-ps averaged portions of trajectories, which is a substantial smoothing of the thermal fluctuations. Individual snapshots would be even more diverse. The force field geometries are replaced by QM-optimized geometries of bases and the stacking energy is calculated using fast DFT-D method (see ref. 16f for explanation). It would not be tractable to use the best calculations for almost 1000 base pair step stacking evaluations. However, to make Table 2 comparable with Table 1, the DFT-D energies are adjusted by the highest-accuracy calculations done for single A-RNA and B-DNA geometry of each sequence. This correction is in the range of −1.14 to −2.06 kcal/mol. Thus, the data in the present Table 2 are compiled from Tables 1, 2, 3, and 4 of the original work.22

Table 2.

Comparison of base stacking in B-DNA and A-RNA base pair steps based on evaluation of series of 400-ps averaged geometries along explicit solvent simulation trajectories.22 AVG, SD, MAX and MIN stand for the averaged value of base stacking, standard deviation, the maximum value and the minimum value. The energies are derived using DFT-D approach and are further corrected using the highest-quality calculations carried out for one single geometry of each step – see the text.

B-DNA A-RNA
5’-XY-3’ AVG SD MAX MIN AVG SD MAX MIN
GG −9.72 0.42 −11.68 −8.89 −8.57 0.40 −9.26 −7.31
GC −15.88 0.55 −16.94 −14.63 −15.72 0.50 −16.63 −14.72
CG −15.75 0.40 −16.50 −14.53 −15.21 0.56 −16.06 −14.45
AA −12.87 0.50 −14.09 −11.73 −10.12 0.23 −10.53 −9.75
AT(U) −13.63 0.23 −14.08 −12.97 −10.31 0.19 −10.77 −9.77
T(U)A −13.44 0.77 −14.41 −11.58 −13.42 0.33 −13.89 −12.61
AG=CT(U) −12.55 0.26 −13.27 −12.08 −11.38 0.15 −11.84 −11.08
GA=T(U)C −12.28 0.30 −12.99 −11.62 −11.65 0.41 −12.70 −10.95
AC=GT(U) −13.70 0.53 −14.65 −12.26 −11.43 0.22 −11.77 −10.88
CA=T(U)G −13.00 0.38 −14.14 −11.68 −12.26 0.28 −12.76 −11.55

The stacking energy varies significantly along the trajectories. Note that all the single geometries are meaningful and representative. Thus, the inevitable conclusion is that base stacking cannot be completely represented based on single geometries, irrespective of how carefully these geometries are designed and selected. The A-RNA and B-DNA intrinsic stacking energies are similar, most of the DNA/RNA energy differences in the Table 2 can be rationalized by presence of T in DNA and U in RNA.22 For the sake of completeness let us reiterate that utilization of experimental geometries is also not problem-free.16f The experiments provide static and averaged structures while even modest data and refinement coordinate errors of X-ray structures may substantially bias energy calculations. In summary, despite that quantum chemistry nowadays provides very accurate structure-energy relation (energy as a function of geometry) for base stacking, finding fully transparent links to various experimental data is not straightforward.

RNA sugar – phosphate backbone

The sugar-phosphate backbone is chemically monotonous (sequence-independent) and contains consecutive single bonds with a substantial freedom for correlated torsional rotations. It has often been assumed that the backbone plays a rather passive role in structuring (as opposed to stabilizing) nucleic acids. According to this “base-centered” view of nucleic acid structure, interactions directly involving nucleobases are decisive in organizing the 3D structure.18 However, others have suggested that backbone conformational preferences are also crucial. We take the view that both noncovalent molecular interactions and backbone internal conformational preferences are important.

Backbone torsional angles are highly correlated, reflecting the intrinsic conformational preferences and topological requirements of nucleic acids. However, the role of backbone conformational preferences in determining 3D structures is less understood than the role of nucleobases, both theoretically and experimentally. Characterization of the sugar-phosphate backbone is a formidable task for QM investigations (see above). Also the MM force fields have limited accuracy, in part because the use of conformation-independent atomic charges is inadequate to properly describe the energetics of the flexible phospho-diester chains. While it is possible to determine the positions of the nucleobases and the centers of phosphate groups quite reliably by x-ray diffraction, even at moderate resolution (~2.5 Å), it is much more difficult at the same resolution to determine the precise backbone conformation, especially for the sugar atoms. A classification of RNA backbone conformations has been proposed,34a but some of the less populated backbone families may be artifacts of the limits of the resolution. Moreover, some individual geometries do not fit any of the suggested families. Future QM calculations will bring new data to bear on the problem of classifying sugar-phosphate conformational preferences and their energetics. The calculations will be complicated by the ribose 2’-OH group.

RNA as a big jigsaw toy, marvelous LEGO or a Russian doll

To choose model systems for computations, it is instructive to think about large RNAs such as the ribosomal RNAs as toys composed of intricate, inter-locking parts, like puzzles, LEGOs or Russian dolls. The isostericity principle with precisely shaped non-WC interactions resembles a complex jigsaw puzzle. Natural selection favors “pieces” that preserve the local RNA shape, but also requires adequate interaction energy, although the most stable interaction is not necessarily the best. Many of the interactions possess substates, which are important for functional dynamics. QM calculations can help to elucidate the principles governing the individual interactions that put each jigsaw puzzle in the right place.

The ribosome also works like a sophisticated LEGO toy. It utilizes recurrent modular building blocks and is highly dynamical. Dynamics is critical for the function, and is not evident from individual structures alone. Much current work using a variety of experimental and computational tools aims to characterize the functional dynamics of the ribosome and other RNA-based molecular machines.3f, 87 Large RNA-based nanomachines, such as the ribosome, work in the regime of high viscosity and very low inertia so that the essential principles of their function are quite different from those of macroscopic machines. Molecular machines are subject to persistent large thermal fluctuations. They use chemical energy (in the form of GTP in case of ribosome) to rectify random thermal fluctuations into directional motions. These functional motions are largely driven by stochastic processes, where fluctuations are of utmost importance. The structural data represent static pictures of the molecular machine, averaged over a large number of particles, over the time scale of the experiments and with limited resolution. Therefore, theory could bring important insights to bear on the relation between thermal fluctuations and flexibility. Obviously, this is a task for MD technique utilizing classical force fields. Nevertheless, force fields are very approximate and never perfect. QM calculations will play a large role in future refinements of the MD force fields and in assessing their limitations for specific types of interactions and molecular architectures. Combination of QM calculations and RNA structural bioinformatics could provide an important feedback to modify the force fields in a targeted manner, in order to improve the description of specific types of RNA interactions, submotifs and motifs, even when full scale force field re-parameterization is not achievable.

Last but not the least, RNA architectures are hierarchical, resembling Russian dolls. Typically a given RNA structural interaction pattern or motif (with its associated sequence signature) includes sub-patterns or sub-motifs while it also participates in larger motifs and contexts.73 This complicates the definition of model systems for computations. However, systematic computations could bring important insights into the basic physical chemistry principles of the RNA structural hierarchy.

From intrinsic interactions to covariation of RNA sequences

Above we have discussed the relationship between physico-chemical insights provided by modern theoretical computations and RNA structural bioinformatics, pointing out with examples the many reasons these two research areas can benefit from close cooperation. We conclude with one final instructive example, which shows that when we know what to track down, we can find surprising relations ranging from subtle gas phase effects through structural and thermodynamics data up to evolutionary covariation patterns. The GA cWW base pair is stabilized by two primary H-bonds while the guanine N2 amino group and adenine C2 are juxtaposed ( Figure 10 15a, 58c). The later interaction is repulsive in the planar conformation. Thus, the base pair undergoes large propeller twisting (counter-rotation of the bases) around its major groove edge. This positions the minor groove guanine amino group away from the adenine plane (Figure 10). The unpaired amino group also utilizes its genuine capability to assume a pyramidal geometry. It exposes the G(N2) lone pair to interact with the A(H2) while the amino group hydrogens can form out-of-plane H-bonds with adjacent base pairs. The characteristic gas phase geometry can be found in many RNA and DNA X-ray structures.88 However, these data were not properly interpreted until recently. The excessive propeller twisting was ad hoc attributed to the base stacking, whereas in reality, base stacking oppose it, as it prefers parallel bases. The structural studies overlooked the stabilizing cross-strand out-of-plane H-bonds between the guanine amino group and O2 of pyrimidine of the adjacent canonical base pair. Complementary insights emerged from NMR and thermodynamics studies.60a, 60c, 89 The cWW GA base pair typically occurs in the following sequence context in duplexes: 5’-RG-3’/5’-AY-3’, (R is A or G and Y is C or U/T). To allow the out-of-plane H-bond to form, Y must be located 3’ to the adenine with which the G is paired. Reversal of the adjacent canonical base pair (5’-YG-3’/5’-AR-3’) abolishes this interaction. Indeed, in the latter sequence context the GA base pair adopts the tSH (“sheared”) geometry, with H-bond between G(N2) and A(N7). Thus, the GA base pair has context-dependent geometry. Finally, bioinformatics guided by QM calculations revealed that observation of cWW GA base pair with the out-of-plane interaction in RNA X-ray structure implies lack of GA to AG covariation in homologous sequences despite isostericity of cWW GA and AG base pairs.58c

Figure 10.

Figure 10

Amino groups of isolated bases are intrinsically nonplanar due to partial sp3 hybridization of the amino group nitrogen atoms15a which is not included in MM force fields. Left, scheme of the pyramidalization, which means that the sum of amino group valence angles is less than 360°. The amino groups are planarized by primary in-plane H-bonds (canonical base pairs) but the amino group flexibility can stabilize specific interactions with out of plane (with respect to the nucleobase) distribution of donors and acceptors. Such local environments are rather common in folded RNAs. Middle, cWW GA base pair. Right, typical stacking in 5’-GG-3’/5’-AC-3’base pair step seen in RNA X-ray structures, with cWW GA base pair stacked on top of canonical GC one.58c The base positions are taken from the experiment while positions of hydrogens are predicted via QM. The profound nonplanarity of the GA base pair is its intrinsic gas phase feature which remains fully expressed in the experimental structures. The nonplanar guanine amino group is involved in out-of-plane H-bond, which is a signature interaction of the 5’-GG-3’/5’-AC-3’ internal loop and provides a constraint on the RNA sequence.

Conclusions

This article jointly presents, for the first time, QM physical chemistry and structural bioinformatics perspectives of forces and rules that shape RNA structures. We suggest that convergence is possible between these heretofore quite separated research areas. Their synergy presents substantial potential to deepen our understanding of RNA structure, dynamics and evolution. We have tried to highlight the unique contributions that each field provides and the value of respecting each field’s unique perspective and limitations to obtain reliable and useful results. In conclusion, we propose that carrying out high quality computations of recurrent structural motifs identified in experimental structures is especially useful in analyzing specific structural motifs and interactions. In this manner, the computations provide in depth insights compensating some of the intrinsic weaknesses of structural bioinformatics, which include bias towards structural data and basically an understandable trend to derive conclusions based on most representative data. Important but specific (less frequent) strategies of molecular adaptation that evolution has discovered can be easily overlooked. In general, the research is initiated by bioinformatic analysis of structural data, because these data represent the primary source of our information and also because 3D structures are decisive for RNA evolution and function (as exemplified by the isostericity principle). The bioinformatics will provide the initial set of targets for the computations, literally guiding the computations through the overwhelming complexity of RNAs. Then, computations can provide, in turn, much needed insights that can lead to further hypothesis that can be tested by bioinformatics or experiments. Nonetheless, the reverse scenario, in which insight from physical chemistry is used to uncover novel structural and sequence patterns, can also be fruitful. This interdisciplinary research should be preferably done with close interaction between RNA bioinformatics and computational researchers, with substantial involvement from both sides. This on one side stems from the enormous complexity of RNA structural biology and evolution, and on the other side also from the fact that it is not as easy to carry out competent computations, as many researchers, who are not computational specialists, assume. When such cooperation is achieved, we will be rewarded by unique physico-chemical insights, which will advance our understanding of RNA structural biology.

Supplementary Material

1

Acknowledgments

This work was supported by the Grant Agency of the Academy of Sciences of the Czech Republic (CR) grant IAA400040802, Grant Agency of the CR grant 203/09/1476 and P208/10/2302, Ministry of Education of the CR LC06030, Academy of Sciences of the CR AV0Z50040507 and AV0Z50040702, grant from the National Institutes of Health (2 R15GM055898-05) and from the National Science Foundation (Research Coordination Network Grant No. 0443508).

Biographies

graphic file with name nihms250563b1.gif

Neocles B. Leontis is Professor of Chemistry at Bowling Green State University (Ohio), where he has taught since 1987. He earned his B.S. in Chemistry at Ohio State University and his Ph.D. in Biophysical Chemistry from Yale University, working with Peter Moore. He carried out post-doctoral research with David Engelke at the University of Michigan. His research interests are in RNA structural bioinformatics, RNA modeling, and RNA nano-scale self-assembly. He served as convener of the RNA Ontology Consortium 2005-2009. Currently, he is serving as a Program Director in Molecular and Cellular Biology at the National Science Foundation.

graphic file with name nihms250563b2.gif

Jiri Sponer (1964) is presently Head of Department of Structure and Dynamics of Nucleic Acids, Institute of Biophysics, Academy of Sciences of the Czech Republic (ASCR), Brno, Czech Republic, Professor of Biomolecular Chemistry at Palacký University, Olomouc, Czech Republic and Masaryk University, Brno, and senior researcher at the Institute of Organic Chemistry and Biochemistry, ASCR, Prague, Czech Republic. He earned his M.Sc. and Ph.D. at Masaryk University, Brno. Since 1992, he is primarily associated with ASCR, initially working mainly with Pavel Hobza. His primary research interests are applications of modern quantum-chemical and molecular simulations methods in studies of structure, dynamics, function and evolution of nucleic acids.

graphic file with name nihms250563b3.gif

Judit E. Sponer is a senior researcher at the Department of Structure and Dynamics of Nucleic Acids, Institute of Biophysics, Academy of Sciences of the Czech Republic (ASCR), Brno. She obtained her M.Sc. at the Eötvös University and her Ph.D. at the Technical University in Budapest. Her current research interest is the quantum chemical modeling of nucleic acids and their components.

graphic file with name nihms250563b4.gif

Anton I. Petrov took his B.Sc. degree in Biology from St. Petersburg State University, Russia, in 2007. He is currently pursuing doctoral studies in Bioinformatics at Bowling Green State University, USA.

Footnotes

Supporting Information Available: Figures illustrating the canonical GC and wobble GU base pairs as well as the possible substates of A-minor interactions. This material is available free of charge via the Internet at http://pubs.acs.org.

REFERENCES

  • 1.(a) Kruger K, Grabowski PJ, Zaug AJ, Sands J, Gottschling DE, Cech TR. Cell. 1982;31:147–157. doi: 10.1016/0092-8674(82)90414-7. [DOI] [PubMed] [Google Scholar]; (b) Guerrier-Takada C, Gardiner K, Marsh T, Pace N, Altman S. Cell. 1983;35:849–857. doi: 10.1016/0092-8674(83)90117-4. [DOI] [PubMed] [Google Scholar]
  • 2.Wahl MC, Will CL, Luhrmann R. Cell. 2009;136:701–718. doi: 10.1016/j.cell.2009.02.009. [DOI] [PubMed] [Google Scholar]
  • 3.(a) Frank J, Zhu J, Penczek P, Li YH, Srivastava S, Verschoor A, Radermacher M, Grassucci R, Lata RK, Agrawal RK. Nature. 1995;376:441–444. doi: 10.1038/376441a0. [DOI] [PubMed] [Google Scholar]; (b) Yusupov MM, Yusupova GZ, Baucom A, Lieberman K, Earnest TN, Cate JHD, Noller HF. Science. 2001;292:883–896. doi: 10.1126/science.1060089. [DOI] [PubMed] [Google Scholar]; (c) Wimberly BT, Brodersen DE, Clemons WM, Morgan-Warren RJ, Carter AP, Vonrhein C, Hartsch T, Ramakrishnan V. Nature. 2000;407:327–339. doi: 10.1038/35030006. [DOI] [PubMed] [Google Scholar]; (d) Ban N, Nissen P, Hansen J, Moore PB, Steitz TA. Science. 2000;289:905–920. doi: 10.1126/science.289.5481.905. [DOI] [PubMed] [Google Scholar]; (e) Ramakrishnan V, Moore PB. Curr. Opin. Struc. Biol. 2001;11:144–154. doi: 10.1016/s0959-440x(00)00184-6. [DOI] [PubMed] [Google Scholar]; (f) Mitra K, Frank J. Annu. Rev. Bioph. Biom. 2006;35:299–317. doi: 10.1146/annurev.biophys.35.040405.101950. [DOI] [PubMed] [Google Scholar]; (g) Korostelev A, Ermolenko DN, Noller HF. Curr. Opin. Chem. Biol. 2008;12:674–683. doi: 10.1016/j.cbpa.2008.08.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.(a) Halic M, Becker T, Pool MR, Spahn CMT, Grassucci RA, Frank J, Beckmann R. Nature. 2004;427:808–814. doi: 10.1038/nature02342. [DOI] [PubMed] [Google Scholar]; (b) Egea PF, Stroud RM, Walter P. Curr. Opin. Struc. Biol. 2005;15:213–220. doi: 10.1016/j.sbi.2005.03.007. [DOI] [PubMed] [Google Scholar]
  • 5.Catania F, Gao X, Scofield DG. J. Hered. 2009;100:591–596. doi: 10.1093/jhered/esp062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gesteland RF, Cech TR, Atkins JF. The RNA World. New York: Cold Spring Harbor Laboratory Press: Cold Spring Harbor; 2006. [Google Scholar]
  • 7.(a) Birney E, Stamatoyannopoulos JA, Dutta A, et al. Nature. 2007;447:799–816. doi: 10.1038/nature05874. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Carninci P. Trends Genet. 2006;22:501–510. doi: 10.1016/j.tig.2006.07.003. [DOI] [PubMed] [Google Scholar]; (c) Pheasant M, Mattick JS. Genome Res. 2007;17:1245–1253. doi: 10.1101/gr.6406307. [DOI] [PubMed] [Google Scholar]
  • 8.Fire A, Xu SQ, Montgomery MK, Kostas SA, Driver SE, Mello CC. Nature. 1998;391:806–811. doi: 10.1038/35888. [DOI] [PubMed] [Google Scholar]
  • 9.(a) Hobza P, Selzle HL, Schlag EW. J. Phys. Chem. 1996;100:18790–18794. [Google Scholar]; (b) Hobza P, Selzle HL, Schlag EW. Chem. Rev. 1994;94:1767–1785. [Google Scholar]
  • 10.Zhao Y, Truhlar DG. Accounts Chem. Res. 2008;41:157–167. doi: 10.1021/ar700111a. [DOI] [PubMed] [Google Scholar]
  • 11.(a) Cieplak P, Dupradeau FY, Duan Y, Wang JM. J. Phys.- Condens. Mat. 2009;21 doi: 10.1088/0953-8984/21/33/333102. Art. No. 333102. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Ponder JW, Wu CJ, Ren PY, Pande VS, Chodera JD, Schnieders MJ, Haque I, Mobley DL, Lambrecht DS, DiStasio RA, Head-Gordon M, Clark GNI, Johnson ME, Head-Gordon T. J. Phys. Chem. B. 2010;114:2549–2564. doi: 10.1021/jp910674d. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. J. Am. Chem. Soc. 1995;117:5179–5197. [Google Scholar]
  • 13.(a) Ornstein RL, Rein R, Breen DL, Macelroy RD. Biopolymers. 1978;17:2341–2360. doi: 10.1002/bip.1978.360171005. [DOI] [PubMed] [Google Scholar]; (b) Langlet J, Claverie P, Caron F, Boeuve JC. Int. J. Quant. Chem. 1981;20:299–338. [Google Scholar]; (c) Forner W, Otto P, Ladik J. Chem. Phys. 1984;86:49–56. [Google Scholar]; (d) Gresh N, Pullman B. Int. J. Quant. Chem. 1985;(Suppl. 12):49–56. [Google Scholar]; (e) Aida M, Nagata C. Int. J. Quant. Chem. 1986;29:1253–1261. [Google Scholar]; (f) Hobza P, Sandorfy C. J. Am. Chem. Soc. 1987;109:1302–1307. [Google Scholar]; (g) Anwander EHS, Probst MM, Rode BM. Biopolymers. 1990;29:757–769. doi: 10.1002/bip.360290410. [DOI] [PubMed] [Google Scholar]; (h) Colson AO, Besler B, Close DM, Sevilla MD. J. Phys. Chem. 1992;96:661–668. [Google Scholar]
  • 14.(a) Hobza P, Sponer J, Polasek M. J. Am. Chem. Soc. 1995;117:792–798. [Google Scholar]; (b) Sponer J, Leszczynski J, Hobza P. J. Phys. Chem. 1996;100:5590–5596. [Google Scholar]; (c) Hobza P, Sponer J. Chem. Rev. 1999;99:3247–3276. doi: 10.1021/cr9800255. [DOI] [PubMed] [Google Scholar]; (d) Sponer J, Leszczynski J, Hobza P. Biopolymers. 2001;61:3–31. doi: 10.1002/1097-0282(2001)61:1<3::AID-BIP10048>3.0.CO;2-4. [DOI] [PubMed] [Google Scholar]; (e) Sponer J, Gabb HA, Leszczynski J, Hobza P. Biophys. J. 1997;73:76–87. doi: 10.1016/S0006-3495(97)78049-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.(a) Sponer J, Hobza P. J. Phys. Chem. 1994;98:3161–3164. [Google Scholar]; (b) Sponer J, Hobza P. J. Am. Chem. Soc. 1994;116:709–714. [Google Scholar]
  • 16.(a) Hobza P, Sponer J. J. Am. Chem. Soc. 2002;124:11802–11808. doi: 10.1021/ja026759n. [DOI] [PubMed] [Google Scholar]; (b) Leininger ML, Nielsen IMB, Colvin ME, Janssen CL. J. Phys. Chem. A. 2002;106:3850–3854. [Google Scholar]; (c) Jurecka P, Hobza P. J. Am. Chem. Soc. 2003;125:15608–15613. doi: 10.1021/ja036611j. [DOI] [PubMed] [Google Scholar]; (d) Sponer J, Jurecka P, Hobza P. J. Am. Chem. Soc. 2004;126:10142–10151. doi: 10.1021/ja048436s. [DOI] [PubMed] [Google Scholar]; (e) Sponer J, Jurecka P, Marchan I, Luque FJ, Orozco M, Hobza P. Chem.- Eur. J. 2006;12:2854–2865. doi: 10.1002/chem.200501239. [DOI] [PubMed] [Google Scholar]; (f) Sponer J, Riley KE, Hobza P. Phys. Chem. Chem. Phys. 2008;10:2595–2610. doi: 10.1039/b719370j. [DOI] [PubMed] [Google Scholar]
  • 17.(a) Perez A, Marchan I, Svozil D, Sponer J, Cheatham TE, Laughton CA, Orozco M. Biophys. J. 2007;92:3817–3829. doi: 10.1529/biophysj.106.097782. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Cheatham TE, Cieplak P, Kollman PA. J. Biomol. Struct. Dyn. 1999;16:845–862. doi: 10.1080/07391102.1999.10508297. [DOI] [PubMed] [Google Scholar]; (c) Mackerell AD. J. Comput. Chem. 2004;25:1584–1604. doi: 10.1002/jcc.20082. [DOI] [PubMed] [Google Scholar]
  • 18.(a) Prive GG, Yanagi K, Dickerson RE. J. Mol. Biol. 1991;217:177–199. doi: 10.1016/0022-2836(91)90619-h. [DOI] [PubMed] [Google Scholar]; (b) Suzuki M, Amano N, Kakinuma J, Tateno M. J. Mol. Biol. 1997;274:421–435. doi: 10.1006/jmbi.1997.1406. [DOI] [PubMed] [Google Scholar]
  • 19.(a) Leontis NB, Stombaugh J, Westhof E. Nucleic Acids Res. 2002;30:3497–3531. doi: 10.1093/nar/gkf481. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Stombaugh J, Zirbel CL, Westhof E, Leontis NB. Nucleic Acids Res. 2009;37:2294–2312. doi: 10.1093/nar/gkp011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.de Vries MS, Hobza P. Annu. Rev. Phys. Chem. 2007;58:585–612. doi: 10.1146/annurev.physchem.57.032905.104722. [DOI] [PubMed] [Google Scholar]
  • 21.Voet D, Voet JG, Pratt CW. Fundamentals of Biochemistry. 3rd. Wiley & Sons, Inc; 2008. [Google Scholar]
  • 22.Svozil D, Hobza P, Sponer J. J. Phys. Chem. B. 2010;114:1191–1203. doi: 10.1021/jp910788e. [DOI] [PubMed] [Google Scholar]
  • 23.(a) Mathews DH, Sabina J, Zuker M, Turner DH. J. Mol. Biol. 1999;288:911–940. doi: 10.1006/jmbi.1999.2700. [DOI] [PubMed] [Google Scholar]; (b) SantaLucia J. Proc. Natl. Acad. Sci. U. S. A. 1998;95:1460–1465. doi: 10.1073/pnas.95.4.1460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Ditzler MA, Otyepka M, Sponer J, Walter NG. Accounts Chem. Res. 2010;43:40–47. doi: 10.1021/ar900093g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Banas P, Jurecka P, Walter NG, Sponer J, Otyepka M. Methods. 2009;49:202–216. doi: 10.1016/j.ymeth.2009.04.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Sponer J, Lankas F. Computational Studies of RNA and DNA. Dordrecht: Springer; 2006. [Google Scholar]
  • 27.(a) Tomasi J, Mennucci B, Cammi R. Chem. Rev. 2005;105:2999–3093. doi: 10.1021/cr9904009. [DOI] [PubMed] [Google Scholar]; (b) Florian J, Sponer J, Warshel A. J. Phys. Chem. B. 1999;103:884–892. [Google Scholar]
  • 28.Klamt A, Schuurmann G. J. Chem. Soc.-Perkin Trans. 1993;2:799–805. [Google Scholar]
  • 29.(a) Bachs M, Luque FJ, Orozco M. J. Comput. Chem. 1994;15:446–454. [Google Scholar]; (b) Luque FJ, Curutchet C, Munoz-Muriedas J, Bidon-Chanal A, Soteras I, Morreale A, Gelpi JL, Orozco M. Phys. Chem. Chem. Phys. 2003;5:3827–3836. [Google Scholar]
  • 30.(a) Sponer JE, Reblova K, Mokdad A, Sychrovsky V, Leszczynski J, Sponer J. J. Phys. Chem. B. 2007;111:9153–9164. doi: 10.1021/jp0704261. [DOI] [PubMed] [Google Scholar]; (b) Mladek A, Sharma P, Mitra A, Bhattacharyya D, Sponer J, Sponer JE. J. Phys. Chem. B. 2009;113:1743–1755. doi: 10.1021/jp808357m. [DOI] [PubMed] [Google Scholar]
  • 31.(a) Soteras I, Forti F, Orozco M, Luque FJ. J. Phys. Chem. B. 2009;113:9330–9334. doi: 10.1021/jp903514u. [DOI] [PubMed] [Google Scholar]; (b) Soteras I, Orozco M, Luque FJ. J. Comput .- Aided Mol. Des. 2010;24:281–291. doi: 10.1007/s10822-010-9331-y. [DOI] [PubMed] [Google Scholar]; (c) Curutchet C, Orozco M, Luque FJ. J. Comput. Chem. 2001;22:1180–1193. [Google Scholar]; (d) Soteras I, Curutchet C, Bidon-Chanal A, Orozco M, Luque FJ. J. Mol. Struct.-Theochem. 2005;727:29–40. [Google Scholar]
  • 32.(a) Ribeiro RF, Marenich AV, Cramer CJ, Truhlar DG. J. Comput .- Aided Mol. Des. 2010;24:317–333. doi: 10.1007/s10822-010-9333-9. [DOI] [PubMed] [Google Scholar]; (b) Marenich AV, Olson RM, Kelly CP, Cramer CJ, Truhlar DG. J. Chem. Theory Comput. 2007;3:2011–2033. doi: 10.1021/ct7001418. [DOI] [PubMed] [Google Scholar]; (c) Marenich AV, Cramer CJ, Truhlar DG. J. Phys. Chem. B. 2009;113:6378–6396. doi: 10.1021/jp810292n. [DOI] [PubMed] [Google Scholar]; (d) Marenich AV, Cramer CJ, Truhlar DG. J. Chem. Theory Comput. 2009;5:2447–2464. doi: 10.1021/ct900312z. [DOI] [PubMed] [Google Scholar]
  • 33.(a) Klamt A, Diedenhofen M. J. Comput .- Aided Mol. Des. 2010;24:357–360. doi: 10.1007/s10822-010-9354-4. [DOI] [PubMed] [Google Scholar]; (b) Klamt A, Jonas V, Burger T, Lohrenz JCW. J. Phys. Chem. A. 1998;102:5074–5085. [Google Scholar]
  • 34.(a) Richardson JS, Schneider B, Murray LW, Kapral GJ, Immormino RM, Headd JJ, Richardson DC, Ham D, Hershkovits E, Williams LD, Keating KS, Pyle AM, Micallef D, Westbrook J, Berman HM. RNA. 2008;14:465–481. doi: 10.1261/rna.657708. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Svozil D, Kalina J, Omelka M, Schneider B. Nucleic Acids Res. 2008;36:3690–3706. doi: 10.1093/nar/gkn260. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Svozil D, Sponer JE, Marchan I, Perez A, Cheatham TE, Forti F, Luque FJ, Orozco M, Sponer J. J. Phys. Chem. B. 2008;112:8188–8197. doi: 10.1021/jp801245h. [DOI] [PubMed] [Google Scholar]
  • 36.(a) Millen AL, Manderville RA, Wetmore SD. J. Phys. Chem. B. 2010;114:4373–4382. doi: 10.1021/jp911993f. [DOI] [PubMed] [Google Scholar]; (b) Palamarchuk GV, Shishkin OV, Gorb L, Leszczynski J. J. Biomol. Struct. Dyn. 2009;26:653–661. doi: 10.1080/07391102.2009.10507279. [DOI] [PubMed] [Google Scholar]; (c) MacKerell AD. J. Phys. Chem. B. 2009;113:3235–3244. doi: 10.1021/jp8102782. [DOI] [PMC free article] [PubMed] [Google Scholar]; (d) Foloppe N, Hartmann B, Nilsson L, MacKerell AD. Biophys. J. 2002;82:1554–1569. doi: 10.1016/S0006-3495(02)75507-0. [DOI] [PMC free article] [PubMed] [Google Scholar]; (e) Foloppe N, Nilsson L, MacKerell AD. Biopolymers. 2001;61:61–76. doi: 10.1002/1097-0282(2001)61:1<61::AID-BIP10047>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]; (f) Hocquet A, Leulliot N, Ghomi M. J. Phys. Chem. B. 2000;104:4560–4568. [Google Scholar]; (g) Florian J, Strajbl M, Warshel A. J. Am. Chem. Soc. 1998;120:7959–7966. [Google Scholar]; (h) Sychrovsky V, Vokacova Z, Sponer J, Spackova N, Schneider B. J. Phys. Chem. B. 2006;110:22894–22902. doi: 10.1021/jp065000l. [DOI] [PubMed] [Google Scholar]
  • 37.Reed AE, Curtiss LA, Weinhold F. Chem. Rev. 1988;88:899–926. [Google Scholar]
  • 38.Bader RFW. Atoms in Molecules. A Quantum Theory. Oxford: Clarendon Press; 1990. [Google Scholar]
  • 39.Hobza P, Sponer J, Cubero E, Orozco M, Luque FJ. J. Phys. Chem. B. 2000;104:6286–6292. [Google Scholar]
  • 40.Guerra CF, Bickelhaupt FM. Angew. Chem. Int. Ed. 2002;41:2092–2095. [PubMed] [Google Scholar]
  • 41.Sponer J, Zgarbova M, Jurecka P, Riley KE, Sponer JE, Hobza P. J. Chem. Theory Comput. 2009;5:1166–1179. doi: 10.1021/ct800547k. [DOI] [PubMed] [Google Scholar]
  • 42.Hesselmann A, Jansen G, Schutz M. J. Am. Chem. Soc. 2006;128:11730–11731. doi: 10.1021/ja0633363. [DOI] [PubMed] [Google Scholar]
  • 43.Gresh N. J. Comput. Chem. 1995;16:856–882. [Google Scholar]
  • 44.(a) Gresh N, Sponer JE, Spackova N, Leszczynski J, Sponer J. J. Phys. Chem. B. 2003;107:8669–8681. [Google Scholar]; (b) Gresh N, Sponer J. J. Phys. Chem. B. 1999;103:11415–11427. [Google Scholar]
  • 45.(a) Jeziorski B, Moszynski R, Szalewicz K. Chem. Rev. 1994;94:1887–1930. [Google Scholar]; (b) Hesselmann A, Jansen G, Schutz M. J. Chem. Phys. 2005;122:014103. doi: 10.1063/1.1824898. [DOI] [PubMed] [Google Scholar]
  • 46.(a) Mattick JS. J. Exp. Biol. 2007;210:1526–1547. doi: 10.1242/jeb.005017. [DOI] [PubMed] [Google Scholar]; (b) Beniaminov A, Westhof E, Krol A. RNA. 2008;14:1270–1275. doi: 10.1261/rna.1054608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.(a) Mokdad A, Krasovska MV, Sponer J, Leontis NB. Nucleic Acids Res. 2006;34:1326–1341. doi: 10.1093/nar/gkl025. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Leontis NB, Westhof E. RNA. 2001;7:499–512. doi: 10.1017/s1355838201002515. [DOI] [PMC free article] [PubMed] [Google Scholar]; (c) Nissen P, Ippolito JA, Ban N, Moore PB, Steitz TA. Proc. Natl. Acad. Sci. U. S. A. 2001;98:4899–4903. doi: 10.1073/pnas.081082398. [DOI] [PMC free article] [PubMed] [Google Scholar]; (d) Tamura M, Holbrook SR. J. Mol. Biol. 2002;320:455–474. doi: 10.1016/s0022-2836(02)00515-6. [DOI] [PubMed] [Google Scholar]; (e) Gagnon MG, Steinberg SV. RNA. 2002;8:873–877. doi: 10.1017/s135583820202602x. [DOI] [PMC free article] [PubMed] [Google Scholar]; (f) Noller HF. Science. 2005;309:1508–1514. doi: 10.1126/science.1111771. [DOI] [PubMed] [Google Scholar]
  • 48.(a) Swart M, Guerra CF, Bickelhaupt FM. J. Am. Chem. Soc. 2004;126:16718–16719. doi: 10.1021/ja045276b. [DOI] [PubMed] [Google Scholar]; (b) Perez A, Sponer J, Jurecka P, Hobza P, Luque FJ, Orozco M. Chem. Eur. J. 2005;11:5062–5066. doi: 10.1002/chem.200500255. [DOI] [PubMed] [Google Scholar]
  • 49.Acosta-Silva C, Branchadell V, Bertran J, Oliva A. J. Phys. Chem. B. 2010;114:10217–10227. doi: 10.1021/jp103850h. [DOI] [PubMed] [Google Scholar]
  • 50.(a) Vertessy BG, Toth J. Accounts Chem. Res. 2009;42:97–106. doi: 10.1021/ar800114w. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Krokan HE, Drablos F, Slupphaug G. Oncogene. 2002;21:8935–8948. doi: 10.1038/sj.onc.1205996. [DOI] [PubMed] [Google Scholar]
  • 51.(a) Parisien M, Major F. Nature. 2008;452:51–55. doi: 10.1038/nature06684. [DOI] [PubMed] [Google Scholar]; (b) Das R, Karanicolas J, Baker D. Nat. Methods. 2010;7:291–294. doi: 10.1038/nmeth.1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Mathews DH, Turner DH. Curr. Opin. Struc. Biol. 2006;16:270–278. doi: 10.1016/j.sbi.2006.05.010. [DOI] [PubMed] [Google Scholar]
  • 53.Deigan KE, Li TW, Mathews DH, Weeks KM. Proc. Natl. Acad. Sci. U. S. A. 2009;106:97–102. doi: 10.1073/pnas.0806929106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mathews D, Disney M, Childs J, Schroeder S, Zuker M, Turner D. Proc. Natl. Acad. Sci. U. S. A. 2004;101:7287–7292. doi: 10.1073/pnas.0401799101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.(a) Eddy SR, Durbin R. Nucleic Acids Res. 1994;22:2079–2088. doi: 10.1093/nar/22.11.2079. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Gardner PP, Giegerich R. BMC Bioinformatics. 2004;5 doi: 10.1186/1471-2105-5-140. Art. No. 140. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.(a) Michel F, Westhof E. J. Mol. Biol. 1990;216:585–610. doi: 10.1016/0022-2836(90)90386-Z. [DOI] [PubMed] [Google Scholar]; (b) Massire C, Jaeger L, Westhof E. J. Mol. Biol. 1998;279:773–793. doi: 10.1006/jmbi.1998.1797. [DOI] [PubMed] [Google Scholar]; (c) Michel F, Costa M, Massire C, Westhof E. Methods Enzymol. 2000;317:491–510. doi: 10.1016/s0076-6879(00)17031-4. [DOI] [PubMed] [Google Scholar]
  • 57.Brown JW, Birmingham A, Griffiths PE, Jossinet F, Kachouri-Lafond R, Knight R, Lang BF, Leontis N, Steger G, Stombaugh J, Westhof E. RNA. 2009;15:1623–1631. doi: 10.1261/rna.1601409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.(a) Reblova K, Spackova N, Stefl R, Csaszar K, Koca J, Leontis NB, Sponer J. Biophys. J. 2003;84:3564–3582. doi: 10.1016/S0006-3495(03)75089-9. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Reblova K, Strelcova Z, Kulhanek P, Besscova I, Mathews DH, Van Nostrand K, Yildirim I, Turner DH, Sponer J. J. Chem. Theory Comput. 2010;6:910–929. [PMC free article] [PubMed] [Google Scholar]; (c) Sponer J, Mokdad A, Sponer JE, Spackova N, Leszczynski J, Leontis NB. J. Mol. Biol. 2003;330:967–978. doi: 10.1016/s0022-2836(03)00667-3. [DOI] [PubMed] [Google Scholar]
  • 59.Freier SM, Sugimoto N, Sinclair A, Alkema D, Neilson T, Kierzek R, Caruthers MH, Turner DH. Biochemistry. 1986;25:3214–3219. doi: 10.1021/bi00359a020. [DOI] [PubMed] [Google Scholar]
  • 60.(a) Yildirim I, Turner DH. Biochemistry. 2005;44:13225–13234. doi: 10.1021/bi051236o. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Kopitz H, Zivkovic A, Engels JW, Gohlke H. ChemBioChem. 2008;9:2619–2622. doi: 10.1002/cbic.200800461. [DOI] [PubMed] [Google Scholar]; (c) Yildirim I, Stern HA, Sponer J, Spackova N, Turner DH. J. Chem. Theory Comput. 2009;5:2088–2100. doi: 10.1021/ct800540c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Koller AN, Bozilovic J, Engels JW, Gohlke H. Nucleic Acids Res. 2010;38:3133–3146. doi: 10.1093/nar/gkp1237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Kool ET. Annu. Rev. Biochemistry. 2002;71:191–219. doi: 10.1146/annurev.biochem.71.110601.135453. [DOI] [PubMed] [Google Scholar]
  • 63.Krueger AT, Kool ET. Curr. Opin. Chem. Biol. 2007;11:588–594. doi: 10.1016/j.cbpa.2007.09.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.(a) Sarver M, Zirbel CL, Stombaugh J, Mokdad A, Leontis NB. J. Math. Biol. 2008;56:215–252. doi: 10.1007/s00285-007-0110-x. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Nozinovic S, Furtig B, Jonker HRA, Richter C, Schwalbe H. Nucleic Acids Res. 2010;38:683–694. doi: 10.1093/nar/gkp956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.(a) Leontis NB, Lescoute A, Westhof E. Curr. Opin. Struc. Biol. 2006;16:279–287. doi: 10.1016/j.sbi.2006.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Lescoute A, Leontis NB, Massire C, Westhof E. Nucleic Acids Res. 2005;33:2395–2409. doi: 10.1093/nar/gki535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.(a) Lescoute A, Westhof E. RNA. 2006;12:83–93. doi: 10.1261/rna.2208106. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Laing C, Schlick T. J. Mol. Biol. 2009;390:547–559. doi: 10.1016/j.jmb.2009.04.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Klein DJ, Schmeing TM, Moore PB, Steitz TA. EMBO J. 2001;20:4214–4221. doi: 10.1093/emboj/20.15.4214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.(a) Razga F, Koca J, Sponer J, Leontis NB. Biophys. J. 2005;88:3466–3485. doi: 10.1529/biophysj.104.054916. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Reblova K, Razga F, Li W, Gao HX, Frank J, Sponer J. Nucleic Acids Res. 2010;38:1325–1340. doi: 10.1093/nar/gkp1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.(a) Correll CC, Freeborn B, Moore PB, Steitz TA. Cell. 1997;91:705–712. doi: 10.1016/s0092-8674(00)80457-2. [DOI] [PubMed] [Google Scholar]; (b) Correll CC, Munishkin A, Chan YL, Ren Z, Wool IG, Steitz TA. Proc. Natl. Acad. Sci. U. S. A. 1998;95:13436–13441. doi: 10.1073/pnas.95.23.13436. [DOI] [PMC free article] [PubMed] [Google Scholar]; (c) Correll CC, Wool IG, Munishkin A. J. Mol. Biol. 1999;292:275–287. doi: 10.1006/jmbi.1999.3072. [DOI] [PubMed] [Google Scholar]
  • 70.Parisien M, Cruz JA, Westhof E, Major F. RNA. 2009;15:1875–1885. doi: 10.1261/rna.1700409. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.(a) Shankar N, Kennedy SD, Chen G, Krugh TR, Turner DH. Biochemistry. 2006;45:11776–11789. doi: 10.1021/bi0605787. [DOI] [PMC free article] [PubMed] [Google Scholar]; (b) Lee JC, Gutell RR, Russell R. J. Mol. Biol. 2006;360:978–988. doi: 10.1016/j.jmb.2006.05.066. [DOI] [PubMed] [Google Scholar]
  • 72.Nasalean L, Stombaugh J, Zirbel C, Leontis N, Walter NG, Woodson S, Batey RT. Non-Protein Coding RNAs. 2009;13:1–26. [Google Scholar]
  • 73.Jaeger L, Verzemnieks EJ, Geary C. Nucleic Acids Res. 2009;37:215–230. doi: 10.1093/nar/gkn911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Besseova I, Otyepka M, Reblova K, Sponer J. Phys. Chem. Chem. Phys. 2009;11:10701–10711. doi: 10.1039/b911169g. [DOI] [PubMed] [Google Scholar]
  • 75.(a) Sponer JE, Spackova N, Kulhanek P, Leszczynski J, Sponer J. J. Phys. Chem. A. 2005;109:2292–2301. doi: 10.1021/jp050132k. [DOI] [PubMed] [Google Scholar]; (b) Sponer JE, Spackova N, Leszczynski J, Sponer J. J. Phys. Chem. B. 2005;109:11399–11410. doi: 10.1021/jp051126r. [DOI] [PubMed] [Google Scholar]; (c) Sponer JE, Leszczynski J, Sychrovsky V, Sponer J. J. Phys. Chem. B. 2005;109:18680–18689. doi: 10.1021/jp053379q. [DOI] [PubMed] [Google Scholar]; (d) Sharma P, Sponer JE, Sponer J, Sharma S, Bhattacharyya D, Mitra A. J. Phys. Chem. B. 2010;114:3307–3320. doi: 10.1021/jp910226e. [DOI] [PubMed] [Google Scholar]
  • 76.(a) Sharma P, Mitra A, Sharma S, Singh H, Bhattacharyya D. J. Biomol. Struct. Dyn. 2008;25:709–732. doi: 10.1080/07391102.2008.10507216. [DOI] [PubMed] [Google Scholar]; (b) Oliva R, Tramontano A, Cavallo L. RNA. 2007;13:1427–1436. doi: 10.1261/rna.574407. [DOI] [PMC free article] [PubMed] [Google Scholar]; (c) Oliva R, Cavallo L, Tramontano A. Nucleic Acids Res. 2006;34:865–879. doi: 10.1093/nar/gkj491. [DOI] [PMC free article] [PubMed] [Google Scholar]; (d) Sharma P, Chawla M, Sharma S, Mitra A. RNA. 2010;16:942–957. doi: 10.1261/rna.1919010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Luisi B, Orozco M, Sponer J, Luque FJ, Shakked Z. J. Mol. Biol. 1998;279:1123–1136. doi: 10.1006/jmbi.1998.1833. [DOI] [PubMed] [Google Scholar]
  • 78.Vlieghe D, Sponer J, Van Meervelt L. Biochemistry. 1999;38:16443–16451. doi: 10.1021/bi9907882. [DOI] [PubMed] [Google Scholar]
  • 79.Zirbel CL, Sponer JE, Sponer J, Stombaugh J, Leontis NB. Nucleic Acids Res. 2009;37:4898–4918. doi: 10.1093/nar/gkp468. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.(a) Sponer J, Kypr J. J. Biomol. Struct. Dyn. 1993;11:277–292. doi: 10.1080/07391102.1993.10508726. [DOI] [PubMed] [Google Scholar]; (b) Morgado CA, Jurecka P, Svozil D, Hobza P, Sponer J. J. Chem. Theory Comput. 2009;5:1524–1544. doi: 10.1021/ct9000125. [DOI] [PubMed] [Google Scholar]
  • 81.Spackova N, Berger I, Egli M, Sponer J. J. Am. Chem. Soc. 1998;120:6147–6151. [Google Scholar]
  • 82.(a) Siegfried NA, Metzger SL, Bevilacqua PC. Biochemistry. 2007;46:172–181. doi: 10.1021/bi061375l. [DOI] [PubMed] [Google Scholar]; (b) Siegfried NA, Kierzek R, Bevilacqua PC. J. Am. Chem. Soc. 2010;132:5342–5344. doi: 10.1021/ja9107726. [DOI] [PubMed] [Google Scholar]
  • 83.Meneni SR, Shell SM, Gao L, Jurecka P, Lee W, Sponer J, Zou Y, Chiarelli MP, Cho BP. Biochemistry. 2007;46:11263–11278. doi: 10.1021/bi700858s. [DOI] [PubMed] [Google Scholar]
  • 84.Devoe H, Tinoco I. J. Mol. Biol. 1962;4:500–517. doi: 10.1016/s0022-2836(62)80105-3. [DOI] [PubMed] [Google Scholar]
  • 85.Kudritskaya ZG, Danilov VI. J. Theor. Biol. 1976;59:303–318. doi: 10.1016/0022-5193(76)90172-7. [DOI] [PubMed] [Google Scholar]
  • 86.Aida M. J. Theor. Biol. 1988;130:327–335. doi: 10.1016/s0022-5193(88)80032-8. [DOI] [PubMed] [Google Scholar]
  • 87.(a) Ninio J. Biochimie. 2006;88:963–992. doi: 10.1016/j.biochi.2006.06.002. [DOI] [PubMed] [Google Scholar]; (b) Blanchard SC, Kim HD, Gonzalez RL, Puglisi JD, Chu S. Proc. Natl. Acad. Sci. U. S. A. 2004;101:12893–12898. doi: 10.1073/pnas.0403884101. [DOI] [PMC free article] [PubMed] [Google Scholar]; (c) Frank J, Spahn CMT. Rep. Prog. Phys. 2006;69:1383–1417. [Google Scholar]; (d) Spirin AS. FEBS Lett. 2002;514:2–10. doi: 10.1016/s0014-5793(02)02309-8. [DOI] [PubMed] [Google Scholar]
  • 88.(a) Prive GG, Heinemann U, Chandrasegaran S, Kan LS, Kopka ML, Dickerson RE. Science. 1987;238:498–504. doi: 10.1126/science.3310237. [DOI] [PubMed] [Google Scholar]; (b) Ennifar E, Yusupov M, Walter P, Marquet R, Ehresmann B, Ehresmann C, Dumas P. Structure. 1999;7:1439–1449. doi: 10.1016/s0969-2126(00)80033-7. [DOI] [PubMed] [Google Scholar]
  • 89.(a) Wu M, Turner DH. Biochemistry. 1996;35:9677–9689. doi: 10.1021/bi960133q. [DOI] [PubMed] [Google Scholar]; (b) Santalucia J, Turner DH. Biochemistry. 1993;32:12612–12623. doi: 10.1021/bi00210a009. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES