Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2020 May 19;48(11):5839–5848. doi: 10.1093/nar/gkaa399

Integrating NMR and simulations reveals motions in the UUCG tetraloop

Sandro Bottaro 1,2,, Parker J Nichols 3, Beat Vögeli 3, Michele Parrinello 1, Kresten Lindorff-Larsen 2,
PMCID: PMC7293013  PMID: 32427326

Abstract

We provide an atomic-level description of the structure and dynamics of the UUCG RNA stem–loop by combining molecular dynamics simulations with experimental data. The integration of simulations with exact nuclear Overhauser enhancements data allowed us to characterize two distinct states of this molecule. The most stable conformation corresponds to the consensus three-dimensional structure. The second state is characterized by the absence of the peculiar non-Watson–Crick interactions in the loop region. By using machine learning techniques we identify a set of experimental measurements that are most sensitive to the presence of non-native states. We find that although our MD ensemble, as well as the consensus UUCG tetraloop structures, are in good agreement with experiments, there are remaining discrepancies. Together, our results show that (i) the MD simulation overstabilize a non-native loop conformation, (ii) eNOE data support its presence with a population of ≈10% and (iii) the structural interpretation of experimental data for dynamic RNAs is highly complex, even for a simple model system such as the UUCG tetraloop.

INTRODUCTION

RNA loops are structural elements that cap A-form double helices, and as such are fundamental structural units in RNA molecules. The great majority of known RNA loops contain four nucleotides (1), and these so-called tetraloops are one of the most common and well-studied three-dimensional RNA motifs. The great majority of known RNA tetraloops have the sequence GNRA or UNCG, where N is any nucleotide and R is guanine or adenine (2). Their small size, together with their biological relevance, has made these systems primary targets for nuclear magnetic resonance (NMR) spectroscopy, X-ray-crystallography, and atomistic molecular dynamics (MD) simulation studies (3–5).

The UUCG tetraloop has been long known to be highly stable, and both crystallographic and NMR studies suggest that this tetraloop adopts a well-defined three dimensional structure including a characteristic trans-Sugar-Watson (tSW) interaction between U6 and G9 (6,7) (Figure 1). Experimentally, the UUCG tetraloop is used to stabilize the secondary structure of larger RNA molecules without interacting with other RNAs or proteins (8).

Figure 1.

Figure 1.

Consensus secondary structure (left) and three dimensional structure (right) of the UUCG tetraloop (7). The stem is formed by 5 consecutive Watson–Crick base-pairs capped by the loop U6–U7–C8–G9. One of the most distinctive feature of this structure is the trans-sugar–Watson interaction between U6 and G9 (bottom). Extended secondary structure annotation follows the Leontis–Westhof nomenclature (9).

Despite its stability, the UUCG tetraloop is not rigid, and the sequence does not systematically determine a unique structure (2,10). Three recent studies by independent groups indicate the presence of alternative loop conformations (11–13), and earlier NMR studies (7,14) also suggested the presence of loop dynamics, without providing a detailed structural interpretation of the data. More generally, the atomic-detailed characterization of RNA structure and dynamics requires specialized techniques and substantial experimental effort, including NMR measurements of nuclear Overhauser effects (NOE), scalar couplings, chemical shifts, residual dipolar couplings, cross-correlated relaxation rates as well as a wide range of relaxation–dispersion type NMR experiments (15,16).

While NOEs are typically used to determine RNA and protein structures, they also contain dynamic information. Because ensemble-averaged NOEs are highly sensitive to the underlying distance fluctuations, they may contain contributions even from minor populations. Normally, such information is difficult to extract because standard NOE measurements are relatively inaccurate. It has, however, been demonstrated that a substantial part of the information content inherent to these probes can be obtained from exact NOE measurements (eNOEs) (13,17). As opposed to conventional NOEs, eNOEs can be converted into tight upper and lower distance limit restraints (18–20).

Previous computational studies of the UUCG tetraloop focused either on the dynamics around the near-native state (21) or on the difficulty in separating force-field inaccuracies from insufficient sampling (22,23). In a previous study, we reported converged free-energy landscape for RNA 8-mer and 6-mer loops, and we have shown that native-like states are not the global free-energy minimum using the current AMBER RNA force-field (24). This problem has been addressed in a new parameterization of the AMBER force-field, that improves the description of the UUCG 14-mer and other RNA systems (25). Nevertheless, it remains difficult to assess the accuracy of these simulations, because experiments alone do not provide an atomic-detailed description of structure and dynamics that serve as a benchmark.

Here, we use extensive atomistic MD simulations to map the conformational landscape of the UUCG tetraloop using enhanced sampling techniques and a recent force-field parameterization. To improve the description of this system further, we perform an a posteriori refinement of the MD simulation using experimental data via a Bayesian/maximum entropy procedure (26,27). We validate the eNOE-refined ensemble against independent NMR measurements and find an agreement that is on average comparable with NMR structures of the UUCG tetraloop deposited in the Protein Data Bank (PDB).

Our experimentally-refined ensemble reveals the presence of two conformational states. The dominant, major state (here called state A) is the consensus UUCG structure shown in Figure 1. The second, previously unreported lowly-populated state (state B) is characterized by the absence of the signature U6–G9 non-Watson–Crick base pair, with the C8 and G9 bases exposed into solution. We employ a random forest classifier to identify the structural properties that distinguish state A from state B. Furthermore, we use the same method in the space of experimental data to identify specific measurements that are most sensitive to the presence of state B.

The paper is organized as follows: in the Results section we first compare the predictions obtained from MD simulation against different experimental datasets. We then discuss the effect of the refinement procedure, showing how it improves the agreement with experiments and how it affects the population of different conformations. We proceed by identifying the relevant degrees of freedom and contacts that characterize the two states. Finally, we identify experiments sensitive to the presence of state B. We accompany this paper with the code to reproduce step-by-step the complete analysis, including all figures and supplementary results presented in the manuscript.

MATERIALS AND METHODS

MD simulations

We simulate the RNA 14-mer with sequence GGCACUUCGGUGCC starting from a completely extended conformation. Studying the folding free-energy landscape of this system is computationally expensive: for this reason previous attempts required μs-long simulations in combination with tempering protocols (25,28,29).

Here, we combine two enhanced sampling techniques: solute tempering in the REST2 formulation (30) and well-tempered metadynamics (31). We used a nucleic-acid specific metric, called eRMSD, (32) as a collective variable for enhanced sampling. The MD simulation setup and convergence analysis are presented in supporting information 1 (SI1).

Calculating experimental quantities from structure (forward models)

We calculated NOEs from the ensembles as Inline graphic. The index j runs over the n frames/models with associated weight Inline graphicj. This is an approximation because the NOE depends also on the timescales of motion and may contain contributions from angular fluctuations (18). In our analyses we circumvent these problems by extracting effective average distances from the NOE buildup curves and calculate these as weighted distance averages, and note that more advanced analyses would require reweighting of long-timescale correlation functions. We calculated 3J scalar couplings using Karplus equations as defined in the software package baRNAba (33). Cross-correlated relaxation rates are predicted with an in-house script available on Github (see Data Availability). Pales (34) is used to compute residual dipolar couplings (RDC), while solvent paramagnetic resonance enhancements (sPRE) are calculated as described in (35). More details and practical examples are available in SI2.

Integrating simulations and experiments

We combine the MD simulation with experimental data using a maximum entropy/Bayesian procedure (26,36,37). In our previous work, we have described this reweighting procedure as Bayesian/MaxEnt (BME) (27,38). In BME we use the experimental data to modify a posteriori the simulation so that the new conformational ensemble has the following properties: (i) the calculated averages are close to the experimental values taking uncertainty into account and (ii) it maximizes the relative Shannon entropy with respect to the original simulation ensemble. The modification comes in the form of a new set of weights Inline graphic, one for each simulation frame.

It can be shown that this problem can be cast as a minimization problem, in which one seeks the minimum of the function Γ with respect to the set of Lagrange multipliers Inline graphic, with m being the number of experimental constraints.

graphic file with name M10.gif (1)

Here, σi are the uncertainties on the experimental measurements Inline graphic and include experimental errors and inaccuracies introduced by the calculation of the experimental quantity from the atomic positions (Inline graphic). The partition function Z is defined as

graphic file with name M13.gif (2)

The sum over the index j runs over the n frames in the simulation, and Inline graphic are the original weights. Inline graphic0 = 1/n when using plain MD simulations or enhanced sampling techniques that sample directly from the target distribution (e.g. parallel tempering). In this paper we use WT-METAD, and the original weights Inline graphic0 are estimated using the final bias potential (39). The minimization of equation (1) yields a set of Lagrange multipliers Inline graphic that are used to calculate the optimal weights

graphic file with name M18.gif (3)

We choose the free hyper-parameter of the algorithm (θ) by performing a 5-fold cross-validation procedure as described in SI3.

Random forest classifier

The random forest analysis is set up according to the following procedure:

  • Bootstrap n = 50 000 samples from the MD simulation trajectories. The weight of each sample, Inline graphic, is calculated as described above.

  • Samples are assigned to state A if the eRMSD from the first model of the NMR structure 2KOC is <0.7, and state B otherwise.

  • Calculate a set of m features (torsion angles, distances between ring centres, experimental data) for each sample (33).

  • Construct a random forest classifier using n samples, m features and 2 classes (state A and state B). In our work, we use the implementation of the random forest algorithm (40) available in sklearn 0.22 using a maximum tree depth of 2. Inline graphic of the samples are used for training, while the remaining Inline graphic are used to evaluate the accuracy of the classifier (>Inline graphic in all cases).

  • Rank the m features by their importance. In the analysis of the experimental data in Figure 5, only features with importance greater than 0.2 are shown.

Figure 5.

Figure 5.

Comparison between 2KOC and MD+set A ensemble on experimental data that are most sensitive to the presence of state B. Scatter plots show the Z-score calculated on the MD+set A ensemble (x-axis) versus the same quantity calculated on the PDB ensemble 2KOC (y-axis). The four panels show data belonging to the four datasets: set A in panel (A), set B in panel (B), set C in panel (C) and set D in panel (D). Points discussed in the main text and shown in Figure 6 are labeled in red.

The implementation of the procedure is described in SI 4.

RESULTS

First, we compare the computational prediction with available NMR spectroscopy data. More precisely, we consider the following experimental datasets:

  • Dataset A. Exact eNOEs (13), consisting in 62 bidirectional exact NOE, 177 unidirectional eNOE and 77 generic normalized eNOE (gn-eNOE). This dataset alone was used to determine the structure of the UUCG tetraloop with PDB accession codes 6BY4 and 6BY5. In addition to the original dataset, we added one new eNOE and six new gn-eNOEs.

  • Dataset B. 96 3J scalar couplings, 32 RDCs, 251 NOE distances, and 84 cross-correlated relaxation rates. These data were used to calculate the consensus UUCG tetraloop structure (PDB 2KOC (7)).

  • Dataset C. 39 (RDC1) plus 14 (RDC2) residual dipolar couplings. These RDCs have been used in conjunction with MD simulations to obtain a dynamic ensemble of the UUCG tetraloop. (11).

  • Dataset D. 91 solvent paramagnetic resonance enhancement (sPRE) measurements (12).

The exact set of experimental data we used is described in SI2 and available in tabular format in SI 5. The orange bars in the four panels of Figure 2 show the agreement between simulation and the different experimental datasets. The agreement between experiment and simulations is expressed using the reduced χ2 statistics, defined as the average square difference between the experimental measurement (Fexp) and the back-calculated ensemble average Inline graphicnormalized by the experimental error σ:

graphic file with name M24.gif (4)

Hence, the lower the χ2, the better the agreement. As a rule of thumb, χ2 < 1 can be considered small, as the average difference between experiment and prediction is within experimental error.

Figure 2.

Figure 2.

Comparison between experiment and simulations. (A) χ2 between experimental dataset A against the MD ensemble (MD) and against the refined ensembles (MD+set A, MD+set B, MD+set C, MD+set D). As a reference, values calculated from all NMR models from PDB structures 2KOC and 6BY5 are shown as dashed lines. The agreement between the same ensembles and datasets B, C, D, are shown in panels (B), (C) and (D), respectively. Error bars show the standard error estimated using four blocks.

As a reference, we report in Figure 2 the agreement calculated on the PDB ensembles 6BY5 (13) and 2KOC (7). For set A, the agreement of the MD with experiment is considerably poorer than the one calculated on 6BY5. We recall that this latter ensemble was determined by fitting dataset A, we thus expect χ2 to be small in this case. On datasets B, D the MD better agrees with experiments compared to 2KOC, 6BY5, while differences on set C are smaller. The same conclusions apply when performing the comparison for each type of data and when considering other statistics (SI 6). Note that χ2 for sPRE, RDC, and scalar couplings is very large. This discrepancy may arise both from the imperfect ensembles, from the underestimation of the experimental error, as well as from the limitation of the function used to calculate the experimental quantity from the atomic positions (i.e. the forward model). As an example, the parameters in the Karplus equation for HCOP couplings critically depend on a single experimental data point measured in 1969 (41).

Bayesian/Maximum entropy refinement of the MD ensemble

As described above, our MD simulation provides a conformational ensemble consisting of a rich and diverse set of conformations, that, however, as a whole does not match all experimental data perfectly.

In order to improve the description provided by the MD simulation, we calculate a refined conformational ensemble by a posteriori including experimental information into simulations. In brief, the refinement is obtained by assigning a new weight to each MD snapshot, in such a way that the averages calculated with these new weights match a set of input (or ‘training’) experimental data within a given error. Among all the possible solutions to this underdetermined problem, we use the one that maximizes the Shannon cross-entropy (36,42,43).

Since we have four independent experimental datasets, we can refine the simulation by using one of them, using the other three as test set. By construction, the refinement procedure improves the agreement on the training data, but it does not guarantee improved agreement on the test sets. Figure 2 shows the effect of the refinement for different combinations of training/test set. For example, the blue bar in panel A shows the agreement between experiment and the MD simulations refined against set A (MD+set A) evaluated on set A itself (training). The blue bars in panels B, C, D show the χ2 of MD+set A on datasets B, C, D (test). The MD+set B is trained on set B and tested on datasets A, C, D and so on.

We observe that including into simulations dataset A has a detrimental effect on the agreement with set D, leaving the agreement with dataset B and C unchanged. The MD+set B ensemble shows improved agreement with dataset D, but performs worse than the original MD on set A and C. We observe a similar behavior for MD+set D. Among the four refined ensembles the MD+set C is the one that behaves better, as it shows a smaller or equal χ2 (relative to MD) on the three tests sets A,B,D. Taken together, our results show that the MD and the refined ensembles fit available experimental data to a degree that is comparable to the one calculated from PDB structures 2KOC and 6BY5 (Figure 2). However, there exists substantial discrepancies with experimental data, and none of the considered ensembles clearly outperforms the others.

Conformational ensemble of the UUCG Tetraloop

In this section we analyse in detail the refined MD ensemble, and discuss the differences with respect to the original simulation and previously determined structures. We consider the histogram of the distance (structural dissimilarity) from the consensus structure (PDB: 2KOC). Distances are measured using the eRMSD, a nucleic-acid specific metric that takes into account both position and orientations between nucleobases (32), although a qualitatively similar picture is obtained using the standard RMSD metric.

The distribution obtained from the original MD ensemble is shown in orange in Figure 3A, and repeated in the four panels. The two peaks correspond to structures where the stem is fully formed, but with different loop conformations. Structures in the left peak (eRMSD < 0.7, state A) display the signature interactions present in the consensus structure, while in the second peak (state B) the loop is disordered. The population of state A in our MD is Inline graphic, larger than the one calculated from the 180μs simulated tempering simulation by Tan et al. (25) (Inline graphic).

Figure 3.

Figure 3.

Histograms of the eRMSD from native. The original MD simulation (orange) is compared with the four refined ensembles: MD+set A in panel (A), MD+set B in panel (B), MD+set C in panel (C) and MD+set D in (D). Shades show the standard error estimated using four blocks. The vertical dashed line in panel (A) shows the separation between state A (eRMSD < 0.7) and state B (eRMSD ≥ 0.7).

The inclusion of experimental data in simulations affects the histogram in different ways. When including eNOE data from dataset A (Figure 3A, blue), we observe an increase in the population of state A to Inline graphic. The refinement obtained with datasets B and C (Figure 3 panels b,c) has a small effect on the population of state A, i.e. Inline graphic and Inline graphic, respectively. In the MD+set D ensemble, the population of state A is decreased (Inline graphic). Interestingly, a new peak appears at eRMSD >1.5 in the MD+set B and MD+set C ensembles. This peak corresponds to fully extended conformations, and it is present when including RDC measurements. If we trust our simulation, the forward model including the complex issue of estimating the alignment tensor, and the refinement procedure, we can conclude that RDC are sensitive to a small population of either dimers or unfolded structures. Unfortunately, we do not have the possibility to test this hypothesis, which is left for future investigations.

Here, instead, we focus on the results obtained from MD+set A. First, because this ensemble has the highest population of state A, and it is thus less ‘surprising’. Second, because we better understand and control the experimental data, and lastly because we can take advantage of our previous experience in using NOE for RNA ensemble refinement (13,27).

Structural differences between state A and B

Having discovered this new B-state, we proceed to analyse its structural features. While state A is known and structurally well-defined (Figures 1 and 4A), it is not trivial from a simple visual inspection to identify which are the main features of state B (Figure 4B). Here, we address this question by using a random forest classifier. In practice, we first extract samples from the MD+set A ensemble. Second, we label each conformation depending on the distance from native (state A if eRMSD<0.7, and state B otherwise). For each conformation we calculate structural properties (e.g. torsion angles, distances) that are used to train a random forest classifier. In this context we are not interested in the decision tree per se, but rather in its ability to rank the importance of the input features in the classification problem. Thus, we can find the most relevant degrees of freedom that discriminate the two states. The result of such analysis on all dihedral angles α, β, γ, δ, ε, ζ, χ in the 14-mer reveals that the two highest-ranked angles are ζ in C8 and G9. Figure 4D,E show the histograms for these two angles: the angle in C8 is a good classifier, as all samples from state A (empty curve) are in gauche + conformation, while all samples from state B (filled curve) are in gauche −. Information on state A/B differences are also contained in G9 ζ, although the separation of the two states is not as striking as in C8.

Figure 4.

Figure 4.

(A) Representative (random) conformations sampled from state A. (B) Representative (random) conformations sampled from state B. The color code is identical to Figure 1: U6 in ochre, U7 in purple, C8 in green and G9 in red. (C) Free energy surface projected onto the the U6-C8/U6-G9 distance between ring centres. The units of the colorbar are in kBT. (D) Histogram of ζ dihedral angle in C8. The open filled area indicates conformations belonging to state A, and the filled area indicates conformations belonging to state B. (E) Histogram of ζ dihedral angle in G9.

The importance of C8 and G9 is further confirmed when using the distance between the center of the six-membered rings in the nucleobases as input features. In this case, the distances between U6–C8 and U6–G9 are the two most important degrees of freedom that distinguish state A from state B. The two-dimensional log-histogram (i.e. the free energy surface) of these distances is shown in Figure 4C. In the consensus structure U6 and G9 interact through a trans sugar-Watson base-pair and U6 and C8 are stacked. On top of the free-energy surface, in Figure 4C we plot the two tetraloop structures 2KOC and 6BY5. As expected, both structures lie in the left-bottom region. Note also that the original experimental study described the presence of two sub-states in 6BY5, that are separated along the y projection in Figure 4C. State B is characterized instead by the absence of the stacking interaction (large U6–C8 distance), and of the non-Watson-Crick base-pair (large U6–G9 distance). PDB structures of both states A and B are available in supplementary material as well as in the Github repository (see data availability).

Experimental measurements sensitive to the presence of state B

In the previous section we have described the structural differences between state A and state B, and we now seek for additional experimental validation. In particular, we would like to answer the following question: does the MD+set A ensemble provide a better loop description compared to the consensus NMR structure 2KOC?

The four experimental sets contain >1000 datapoints in total, and it is therefore not trivial to identify specific measurements that probe directly the presence of state B. In machine learning this is called a feature importance problem, that we solve using a random forest classifier, as we did in the previous section. Here, however, the features are not structural properties, but back-calculated experimental data. We construct a random forest classifier for each experimental dataset, and we obtain four sub-datasets with the features (measurements) that are most important for classifying (i.e. separating) state A and B. Consistently with Figure 4, we find that these measurements involve nucleotides in the loop region.

For each measurement in the sub-datasets, we calculate the difference between the ensemble average and experiments divided by the error σ (i.e. the Z-score).

graphic file with name M31.gif (5)

Figure 5 shows the Z-score calculated on the MD+set A ensemble (x-axis) against the same quantity calculated on the 2KOC structures. Points above the diagonal (such as 1 in panel a, 3 in panel b, and 7 in panel d) indicate measurements for which the MD+set A ensemble better agrees with experimental data (|Z-score|MD + setA < |Z-score|2KOC), while points below the diagonal (e.g. 2, 4, 5, 6, 8) indicate the opposite.

Overall, there are more points below the diagonal (56) than above (40), suggesting that the original 2KOC ensemble provide a better description of the loop region. For example, the eNOE C8 H5’ to G9 H8 in dataset A is significantly closer to the experiment in the MD+set A ensemble (point 1, Figure 6). This is also true for the cross-correlated relaxation rate U7 C3’–C8 P (point 3). In other cases the 2KOC value is within experimental error, but not in our MD+set A ensemble (points 2,5 Figure 6). Note that major discrepancies are present in both ensembles, such as point 4 (RDC in C8 C1’–H1’), point 6 (RDC G10 C8–H8), and point 8 (sPRE G9 H1). Again, we stress that these discrepancies can be ascribed to errors in the ensembles, but also to inaccuracies in the empirical model employed to calculate experimental data from structures, or to errors in the data.

Figure 6.

Figure 6.

Comparison between calculated and experimental data for selected measurements discussed in the text and labelled as in Figure 5.

DISCUSSION

Based on our extensive MD simulations and integrating them with exact NOE data, we report the free energy landscape of a prototype stem-loop RNA 14-mer known as the UUCG tetraloop. By combining a recently refined force field for RNA with enhanced sampling MD we were able to fold the tetraloop to its native conformation(s) as judged by the agreement with several sets of experiments. The main finding of the present study is the presence of a low populated, non-native conformation (state B). The low-populated state differs from the consensus structure (state A) only in the loop region, and it is characterized by the absence of the tSW base-pair between U6 and G9, with C8 and G9 partially exposed into solution (Figure 4). This result has been obtained by using atomistic MD simulations and eNOE, without the need of additional data.

The free-energy surfaces and estimated population provided here are based on the available experimental data, on the employed model, and the extent of our sampling. Therefore, they are subject to inaccuracies. However, both simulations and eNOE data are consistent with the presence of the B state as described in this paper. This interpretation is qualitatively consistent with several NMR studies, that also suggested the presence of dynamics in G9 (7,12,14). Conversely, on-resonance 13C R1ρ relaxation dispersion experiments on a UUCG tetraloop with a different stem sequence showed no significant exchange contributions, indicating the absence of motions with substantial chemical shift variation in the μ-ms timescale (44). Note also that G9-exposed structures were reported in previous MD simulations (23,28,45), suggesting our finding to be robust with respect to the choice of the force-field and water model.

In this work, we have used eNOEs to reweight a posteriori the ensemble generated via enhanced sampling MD simulations. This refinement procedure is a computationally cheap post-processing (26,27,46) that allows one to try different combinations of training and test sets, as we did in this work (43). Note that refinement is in principle less powerful compared to on-the-fly methods that samples directly from the target probability distribution (47,48).

In our study, we refine the simulation by matching RDC data (set B, C) or solvent PRE (set D) as well. Only when we use set C for training we obtain an improved or equal agreement on the test sets relative to the original MD simulation (Figure 2). Additionally, different data affect the MD conformational ensemble in different ways (Figure 3). Several reasons can contribute to this behavior. First, we do not expect all experimental data to be perfectly compatible one with the other, because measurements were conducted in similar, but not identical conditions. Second, the forward models might not be accurate for arbitrary molecular conformations. For example, if the forward model accurately predicts the RDC given the native structure, but fails on unfolded/misfolded conformations, we obtain artefacts that cannot be easily accounted for in our refinement procedure. Note that this problem is typically less relevant when using experimental RDC, sPRE or chemical shift data for scoring structures (12,44,49).

Based on the above observations, and considering our previous experience with eNOE data, we here analyse in detail the results obtained using MD refined using set A (Figure 3A). The structural features that are most important to discriminate between state A and state B are identified using a random forest classifier. The problem of concisely interpret differences in biomolecular conformations has been recently pursued using a variety of machine learning methods, including linear discriminant analysis (50), decision trees (51), and others (52). In this work we extend this idea, and use back-calculated experimental data as input features for the random forest classifier. In this way, we identify individual (available) measurements that are most sensitive to the presence of state B. Note that this approach can also be used in a generative fashion to design experiments that probe the existence of specific conformational states.

We closely inspect the selected set of measurements that are sensitive to state B (Figure 5). In the majority of the cases, we find the presence of the additional state to provide a worse agreement with experiment compared to the consensus NMR structure (PDB code 2KOC) (see e.g. Figure 6, points 2, 4, 5). In other cases (Figure 6, points 1, 3, 7, instead, the MD+set A performs better than 2KOC. Several other data significantly deviate from experiments in both ensembles (Figure 6, points 6,8). This suggest the possibility that conformations that are different from state A are indeed present, but do not correspond to the state B as described in Figure 4B.

Finally, we note that the approach taken here is general and it is applicable to other RNA or protein systems (53,54). Previous characterization of slow, larger motions in RNA molecules have mostly relied on relaxation-dispersion, chemical exchange saturation transfer or related NMR experiments that probe chemical shift differences between different conformational states. We hope that the integration of MD simulations and eNOE measurements provides further opportunities for characterizing the free energy landscapes of RNA molecules.

DATA AVAILABILITY

Jupyter notebooks to reproduce the analysis and all figures are included as supporting information. The MD trajectory, together with the experimental data are hosted on github (https://github.com/KULL-Centre/papers/tree/master/2020/UUCG-dynamics-Bottaro-et-al). The Plumed input file to reproduce the simulation is hosted on the Plumed nest (55) under the accession code 19.070.

Supplementary Material

gkaa399_Supplemental_Files

ACKNOWLEDGEMENTS

The authors thank Giovanni Bussi for helpful comments. We also thank Tobias Madl, Harald Schwalbe and coworkers for useful discussions and for sharing their data.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Velux Foundations (to S.B. and K.L.-L.); Lundbeck Foundation BRAINSTRUC initiative (to K.L.-L.); University of Colorado (to B.V.); NSF [1917254] for Infrastructure Innovation for Biological Research (to B.V.). Funding for open access charge: Internal funding.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Wolters J. The nature of preferred hairpin structures in 16S-like rRNA variable regions. Nucleic Acids Res. 1992; 20:1843–1850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Bottaro S., Lindorff-Larsen K.. Mapping the universe of RNA tetraloop folds. Biophys. J. 2017; 113:257–267. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Cheong C., Varani G., Tinoco I. Jr. Solution structure of an unusually stable RNA hairpin, 5GGAC (UUCG) GUCC. Nature. 1990; 346:680. [DOI] [PubMed] [Google Scholar]
  • 4. Woese C., Winker S., Gutell R.. Architecture of ribosomal RNA: constraints on the sequence of ‘tetra-loops’. Proc. Natl. Acad. Sci. U.S.A. 1990; 87:8467–8471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ferner J., Villa A., Duchardt E., Widjajakusuma E., Wöhnert J., Stock G., Schwalbe H.. NMR and MD studies of the temperature-dependent dynamics of RNA YNMG-tetraloops. Nucleic Acids Res. 2008; 36:1928–1940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Ennifar E., Nikulin A., Tishchenko S., Serganov A., Nevskaya N., Garber M., Ehresmann B., Ehresmann C., Nikonov S., Dumas P.. The crystal structure of UUCG tetraloop1. J. Mol. Biol. 2000; 304:35–42. [DOI] [PubMed] [Google Scholar]
  • 7. Nozinovic S., Fürtig B., Jonker H.R., Richter C., Schwalbe H.. High-resolution NMR structure of an RNA model system: the 14-mer cUUCGg tetraloop hairpin RNA. Nucleic Acids Res. 2010; 38:683–694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Hall K.B. Mighty tiny. RNA. 2015; 21:630–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Leontis N.B., Westhof E.. Geometric nomenclature and classification of RNA base pairs. RNA. 2001; 7:499–512. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. d’Ascenzo L., Leonarski F., Vicens Q., Auffinger P.. Revisiting GNRA and UNCG folds: U-turns versus Z-turns in RNA hairpin loops. RNA. 2017; 23:259–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Borkar A.N., Vallurupalli P., Camilloni C., Kay L.E., Vendruscolo M.. Simultaneous NMR characterisation of multiple minima in the free energy landscape of an RNA UUCG tetraloop. Phys. Chem. Chem. Phys. 2017; 19:2797–2804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Hartlmüller C., Günther J.C., Wolter A.C., Wöhnert J., Sattler M., Madl T.. RNA structure refinement using NMR solvent accessibility data. Sci. Rep. 2017; 7:5393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Nichols P.J., Henen M.A., Born A., Strotz D., Güntert P., Vögeli B.. High-resolution small RNA structures from exact nuclear Overhauser enhancement measurements without additional restraints. Comm. Biol. 2018; 1:61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Duchardt E., Schwalbe H.. Residue specific ribose and nucleobase dynamics of the cUUCGg RNA tetraloop motif by MNMR 13 C relaxation. J. Biomol. NMR. 2005; 32:295–308. [DOI] [PubMed] [Google Scholar]
  • 15. Salmon L., Yang S., Al-Hashimi H.M.. Advances in the determination of nucleic acid conformational ensembles. Annu. Rev. Phys. Chem. 2014; 65:293–316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Marušič M., Schlagnitweit J., Petzold K.. RNA dynamics by NMR. ChemBioChem. 2019; 20:2685–2710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Boelens R., Koning T., Van der Marel G., Van Boom J., Kaptein R.. Iterative procedure for structure determination from proton-proton NOEs using a full relaxation matrix approach. Application to a DNA octamer. J. Magn. Reson. 1989; 82:290–308. [Google Scholar]
  • 18. Vögeli B. The nuclear Overhauser effect from a quantitative perspective. Prog. Nucl. Mag. Res. Sp. 2014; 78:1–46. [DOI] [PubMed] [Google Scholar]
  • 19. Nichols P., Born A., Henen M., Strotz D., Orts J., Olsson S., Güntert P., Chi C., Vögeli B.. The exact nuclear overhauser enhancement: recent advances. Molecules. 2017; 22:1176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Nichols P.J., Born A., Henen M.A., Strotz D., Celestine C.N., Güntert P., Vögeli B.. Extending the Applicability of Exact Nuclear Overhauser Enhancements to Large Proteins and RNA. ChemBioChem. 2018; 19:1695–1701. [DOI] [PubMed] [Google Scholar]
  • 21. Giambaşu G.M., York D.M., Case D.A.. Structural fidelity and NMR relaxation analysis in a prototype RNA hairpin. RNA. 2015; 21:963–974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Banás P., Hollas D., Zgarbová M., Jurecka P., Orozco M., Cheatham T.E. III, Sponer J., Otyepka M.. Performance of molecular mechanics force fields for RNA simulations: stability of UUCG and GNRA hairpins. J. Chem. Theory Comput. 2010; 6:3836–3849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Bergonzo C., Henriksen N.M., Roe D.R., Cheatham T.E.. Highly sampled tetranucleotide and tetraloop motifs enable evaluation of common RNA force fields. RNA. 2015; 21:1578–1590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Bottaro S., Banas P., Sponer J., Bussi G.. Free energy landscape of GAGA and UUCG RNA tetraloops. J. Phys. Chem. Lett. 2016; 7:4032–4038. [DOI] [PubMed] [Google Scholar]
  • 25. Tan D., Piana S., Dirks R.M., Shaw D.E.. RNA force field with accuracy comparable to state-of-the-art protein force fields. Proc. Natl. Acad. Sci. U.S.A. 2018; 115:E1346–E1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Hummer G., Köfinger J.. Bayesian ensemble refinement by replica simulations and reweighting. J. Chem. Phys. 2015; 143:243150. [DOI] [PubMed] [Google Scholar]
  • 27. Bottaro S., Bengtsen T., Lindorff-Larsen K.. Integrating molecular simulation and experimental data: a Bayesian/maximum entropy reweighting approach. Struct. Bioinform. 2020; 2112:219–240. [DOI] [PubMed] [Google Scholar]
  • 28. Kuhrova P., Banas P., Best R.B., Sponer J., Otyepka M.. Computer folding of RNA tetraloops? Are we there yet?. J. Chem. Theory Comput. 2013; 9:2115–2125. [DOI] [PubMed] [Google Scholar]
  • 29. Chen A.A., García A.E.. High-resolution reversible folding of hyperstable RNA tetraloops using molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 2013; 110:16820–16825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Wang L., Friesner R.A., Berne B.. Replica exchange with solute scaling: a more efficient version of replica exchange with solute tempering (REST2). J. Phys. Chem. B. 2011; 115:9431–9438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Barducci A., Bussi G., Parrinello M.. Well-tempered metadynamics: a smoothly converging and tunable free-energy method. Phys. Rev. Lett. 2008; 100:020603. [DOI] [PubMed] [Google Scholar]
  • 32. Bottaro S., Di Palma F., Bussi G.. The role of nucleobase interactions in RNA structure and dynamics. Nucleic Acids Res. 2014; 42:13306–13314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Bottaro S., Bussi G., Pinamonti G., Reisser S., Boomsma W., Lindorff-Larsen K.. Barnaba: software for analysis of nucleic acid structures and trajectories. RNA. 2019; 25:219–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Zweckstetter M. NMR: prediction of molecular alignment from structure using the PALES software. Nat. Protoc. 2008; 3:679. [DOI] [PubMed] [Google Scholar]
  • 35. Gong Z., Schwieters C.D., Tang C.. Theory and practice of using solvent paramagnetic relaxation enhancement to characterize protein conformational dynamics. Methods. 2018; 148:48–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Boomsma W., Ferkinghoff-Borg J., Lindorff-Larsen K.. Combining experiments and simulations using the maximum entropy principle. PLoS Comput. Biol. 2014; 10:e1003406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Beauchamp K.A., Pande V.S., Das R.. Bayesian energy landscape tilting: towards concordant models of molecular ensembles. Biophys. J. 2014; 106:1381–1390. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Bottaro S., Bussi G., Kennedy S.D., Turner D.H., Lindorff-Larsen K.. Conformational ensembles of RNA oligonucleotides from integrating NMR and molecular simulations. Sci. Adv. 2018; 4:eaar8521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Branduardi D., Bussi G., Parrinello M.. Metadynamics with adaptive Gaussians. J. Chem. Theory Comput. 2012; 8:2247–2254. [DOI] [PubMed] [Google Scholar]
  • 40. Breiman L. Random forests. Mach. Learn. 2001; 45:5–32. [Google Scholar]
  • 41. Bentrude W.G., Hargis J.H.. Conformations of 6-membered-ring phosphorus heterocycles: the 5-t-butyl-2-oxo-1, 3, 2-dioxaphosphorinans. J. Chem. Soc. D. 1969; 19:1113b–1114. [Google Scholar]
  • 42. Pitera J.W., Chodera J.D.. On the use of experimental observations to bias simulated ensembles. J. Chem. Theory Comput. 2012; 8:3445–3451. [DOI] [PubMed] [Google Scholar]
  • 43. Orioli S., Larsen A.H., Bottaro S., Lindorff-Larsen K.. How to learn from inconsistencies: integrating molecular simulations with experimental data. Comput. Approach. Understand. Dyn. Syst.: Protein Fold. Assembly. 2020; 170:123–176. [DOI] [PubMed] [Google Scholar]
  • 44. Salmon L., Giambasu G.M., Nikolova E.N., Petzold K., Bhattacharya A., Case D.A., Al-Hashimi H.M.. Modulating RNA alignment using directional dynamic kinks: application in determining an atomic-resolution ensemble for a hairpin using NMR residual dipolar couplings. J. Am. Chem. Soc. 2015; 137:12954–12965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Cesari A., Bottaro S., Lindorff-Larsen K., Banáš P., Sponer J., Bussi G.. Fitting corrections to an RNA force field using experimental data. J. Chem. Theory Comput. 2019; 15:3425–3431. [DOI] [PubMed] [Google Scholar]
  • 46. Graf J., Nguyen P.H., Stock G., Schwalbe H.. Structure and dynamics of the homologous series of alanine peptides: a joint molecular dynamics/NMR study. J. Am. Chem. Soc. 2007; 129:1179–1189. [DOI] [PubMed] [Google Scholar]
  • 47. Bonomi M., Heller G.T., Camilloni C., Vendruscolo M.. Principles of protein structural ensemble determination. Curr. Opin. Struct. Biol. 2017; 42:106–116. [DOI] [PubMed] [Google Scholar]
  • 48. Reißer S., Zucchelli S., Gustincich S., Bussi G.. Conformational ensembles of an RNA hairpin using molecular dynamics and sparse NMR data. Nucleic Acids Res. 2019; 48:1164–1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Sripakdeevong P., Cevec M., Chang A.T., Erat M.C., Ziegeler M., Zhao Q., Fox G.E., Gao X., Kennedy S.D., Kierzek R. et al.. Structure determination of noncanonical RNA motifs guided by 1 H NMR chemical shifts. Nat. Met. 2014; 11:413. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Piccini G., Mendels D., Parrinello M.. Metadynamics with discriminants: a tool for understanding chemistry. J. Chem. Theory Comput. 2018; 14:5040–5044. [DOI] [PubMed] [Google Scholar]
  • 51. Brandt S., Sittel F., Ernst M., Stock G.. Machine learning of biomolecular reaction coordinates. J. Phys. Chem. Lett. 2018; 9:2144–2150. [DOI] [PubMed] [Google Scholar]
  • 52. Fleetwood O., Kasimova M.A., Westerlund A.M., Delemotte L.. Extracting molecular insights from conformational ensembles using machine learning. Biophys. J. 2020; 118:765–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Escobedo A., Topal B., Kunze M.B.A., Aranda J., Chiesa G., Mungianu D., Bernardo-Seisedos G., Eftekharzadeh B., Gairi M., Pieratelli R. et al.. Side chain to main chain hydrogen bonds stabilize polyglutamine helices in transcription factors. Nat. Commun. 2019; 10:2034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Crehuet R., Buigues P.J., Salvatella X., Lindorff-Larsen K.. Bayesian-maximum-entropy reweighting of IDP ensembles based on NMR chemical shifts. Entropy. 2019; 21:898. [Google Scholar]
  • 55. Bonomi M., Bussi G., Camilloni C., Tribello G., Bonas P., Barducci A., Bernetti M., Bolhuis P.G., Bottaro S., Branduardi D. et al.. Promoting transparency and reproducibility in enhanced molecular simulations. Nat. Met. 2019; 16:670–673. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaa399_Supplemental_Files

Data Availability Statement

Jupyter notebooks to reproduce the analysis and all figures are included as supporting information. The MD trajectory, together with the experimental data are hosted on github (https://github.com/KULL-Centre/papers/tree/master/2020/UUCG-dynamics-Bottaro-et-al). The Plumed input file to reproduce the simulation is hosted on the Plumed nest (55) under the accession code 19.070.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES