Abstract
Inspired by the recent success of scientific-discovery games for predicting protein tertiary and RNA secondary structures, we have developed an open software for coarse-grained RNA folding simulations, guided by human intuition. To determine the extent to which interactive simulations can accurately predict 3D RNA structures of increasing complexity and lengths (four RNAs with 22–47 nucleotides), an interactive experiment was conducted with 141 participants who had very little knowledge of nucleic acids systems and computer simulations, and had received only a brief description of the important forces stabilizing RNA structures. Their structures and full trajectories have been analyzed statistically and compared to standard replica exchange molecular dynamics simulations. Our analyses show that participants gain easily chemical intelligence to fold simple and nontrivial topologies, with little computer time, and this result opens the door for the use of human-guided simulations to RNA folding. Our experiment shows that interactive simulations have better chances of success when the user widely explores the conformational space. Interestingly, providing on-the-fly feedback of the root mean square deviation with respect to the experimental structure did not improve the quality of the proposed models.
Introduction
It is recognized that the function of many RNA molecules depends crucially on their 3D structures. According to Leontis classification (RNA Basepair Catalog of the Nucleic Acids Database) (1, 2), these structures exhibit a wide diversity of architectures, often including noncanonical pairs as well as triplets and quartets with 145 different basepairs. Compared to proteins, the number of experimentally resolved RNA structures is still very limited. In silico predictions can therefore help fill the gap between sequences and structures. In recent years, three series of RNA structure prediction competitions (RNA-puzzles (3, 4, 5)) have highlighted how computer predictions are best when homology reconstruction is a viable route, the experimental information is available on local basepairing from chemical probing, and the structure itself is mainly driven by Watson-Crick basepairings. Predictions of structures stabilized by non-Watson-Crick basepairs are still challenging, even when the sequence information is complemented by chemical probing data (4).
The best prediction methods currently available are those based on fragment reconstructions (6) and those including predictions of secondary structures first, followed by 3D motif assembly (7). Methods based on secondary structure predictions start by considering canonical basepairs, because they are the most abundant, and stacks of canonical basepairs making up A-RNA 3D stems. Canonical basepairs are also the best characterized for ΔΔG free energy predictions, and can therefore be used for accurate thermodynamic predictions of duplex formations (8). However, in a significant percentage of experimental RNA structures, noncanonical basepairs, triplets, and quartets, as well as pseudoknots, substantially increase the complexity of RNA 3D structures (in 28S rRNA, 15% of in-stem pairs are noncanonical, and ∼20% are long-range pairs or triplets). As a result, the combinatorial complexity of RNA increases sharply with sequence length: O(N3) for secondary structures without pseudoknots (9, 10), and between O(N4) and O(N6) for secondary structures with pseudoknots (11, 12). Altogether, RNA secondary structure prediction, including pseudoknots, has been shown to be NP-complete (13).
A complementary strategy to bioinformatics approaches entails the building of physical models by simulating the molecule’s folding according to a force field. Physical models have the advantage that the basepairing space is naturally restricted by physically accessible conformations, allowing for an arbitrarily large set of possible basepairs and generating all topologies with the same computational complexity. The limitation of physical models resides in the incomplete sampling of the conformational space, even with the most advanced enhanced simulation techniques. To investigate large structural rearrangements, like those involved in folding, a simplification of the system through coarse-graining is needed (14, 15, 16, 17, 18). Despite the fact that coarse-grained force fields are still in their infancy, simulations can complement bioinformatic predictions, by giving access to the dynamical and thermodynamical behavior of the molecule, and by identifying possible alternative conformations, metastable states, and kinetic traps (19, 20, 21).
Although more work is certainly necessary to achieve reliable RNA force fields, we present here an application of coarse-grained modeling, coupled to interactive molecular dynamics (MD) simulations, as a proof of principle of what can be accomplished when a user is given the opportunity to steer the system using a force field that describes most of the underlying physical interactions. For biomolecular systems, for which it is difficult to identify a limited set of descriptors able to capture the specificity of a given state (justifying why dihedral angle principal component analysis is often used to describe the energy landscape (22)), interactive simulations offer the possibility of exploiting the human ability to recognize patterns.
Inspired by the excellent results of Foldit (23) for predicting protein 3D structures, and EteRNA (24) for predicting RNA secondary structures, which pioneered the coupling between the powers of computer predictions and the intuition of the human intellect, we have developed an open software combining interactive nonequilibrium MD simulations with the HiRE-RNA force field for folding, unfolding, or deforming structural models (25). Interactive simulations were performed with the in-house software UnityMol (26), which allows for the visualization of a MD trajectory in real time, and allows the user to change the temperature and apply forces to selected particles through a variety of hardware devices, including the ubiquitous computer mouse.
As a first test of the effectiveness of our approach, we set up an interactive experiment where 141 participants were asked to make RNA folding predictions using interactive simulations for four molecules of increasing length (22–47 nucleotides) and 3D complexity. The experiment was carried out in two successive rounds, with slight variations as detailed below. In this article, we present the basic ideas of the HiRE-RNA model and of interactive simulations, the setup of the experiment, and the prediction results. We also compare these results with fully automatic computer simulations. The software and benchmarked molecules used in the experiment are freely available on the HiRE-RNA contest page (https://hirerna.galaxy.ibpc.fr/).
Materials and Methods
We carry out interactive simulations by coupling UnityMol (http://www.baaden.ibpc.fr/umol/), a molecular visualization software for chemistry and biology, with the simulator MD engine that implements the HiRE-RNA force field (27, 28, 29, 30).
The HiRE-RNA coarse-grained RNA model
This description of the HiRE-RNA model is consistent with the explanations received by all participants before carrying out the experiment. The full presentation of the model can be found in Pasquali and Derreumaux (17) and Cragnolini et al. (25).
HiRE-RNA is an implicit solvent, implicit ion model, where each nucleotide is represented by six or seven beads (see Fig. 1) corresponding to the backbone heavy atoms P, O5′, C5′, and C4, C1′ of the sugar, and to the center of mass of each aromatic ring of the bases (G1, G2, A1, A2, C1, U1). The force field is composed of local interactions accounting for the local stereochemistry, an excluded volume interaction giving a physical size to the beads, and nonlocal interactions for basepairing, basestacking, and electrostatics. Local interactions are composed of a harmonic potential for bond lengths and angle amplitudes, and a sinusoidal potential for dihedral angles. A fast-decreasing exponential function describes the excluded volume potential. Each phosphate bead carries one negative charge, and has a mutually repulsive interaction.
Figure 1.
Shown here are the four molecules given to participants for the folding challenge, represented in three forms: atomistically (left); in the coarse-grained representation used in the actual experiment with UnityMol (center); and by the lowest RMSD prediction made by the participants in the first round (right). PDB: 1F9L and 2G1W correspond well to the native structure with a low RMSD and the correct basepairing organization. PDB: 1N8X exhibits some nonnative basepairs. PDB: 2K98 has the correct overall shape, forming a triple helix pseudoknot, but deviates significantly in the local organization. The complete list of basepairs for the four molecules (native and predictions) is given in the Supporting Material. To see this figure in color, go online.
Both basepairing and stacking crucially depend on the relative positions and orientation of the bases. To recover the anisotropy of a base from the model’s isotropic particles, base planes are identified by the particles C1′-B1-B2 (for purines) and C4-C1′-B1 (for pyrimidines). Both stacking and basepairing can occur between any two bases of the system. The stacking potential is minimized when the distance between bases is close to an equilibrium distance, and when the planes are parallel and vertically aligned (see Fig. 2). Basepairing occurs when two bases are side-by-side on the same plane and depends on the relative distance and orientation. To account for the multiple pairing possibilities of each base, equilibrium values depend on the bases’ species and on their orientation. In this model we account for 22 different possible pairs, including the two canonical pairs A-U and G-C, eight pairs occurring between Watson-Crick sides of any two bases (all possibilities with the exception of G-G), and 12 other pairs representing interactions that involve the Hoogsteen and Sugar edges of the base. The energy of each basepair is proportional to the number of hydrogen bonds forming the pair, which is three for G-C, and two or one for the other pairs according to the table in Cragnolini et al. (18).
Figure 2.
Given here is the internal-energy-versus-RMSD distributions for all interactive simulations, as well as for one REMD simulation. The population for full trajectories is shown in a continuous shade, whereas values from individual submitted structures are superposed as gray circles. Internal energy at finite temperature is normalized with respect to the absolute value of the energy of the minimized native structure |E0|. RMSD distributions for both full trajectories and submitted structures are presented on the horizontal histogram, whereas energy distributions are presented on vertical histograms. The pink wedge in each PMF indicates the position of the native structure (RMSD = 0, E/|E0| = −1). To see this figure in color, go online.
The HiRE-RNA force field, like any coarse-grained force field for RNAs, is still evolving and suffers from the limitations of not having an explicit description for ions, of needing parameterization for a refined quantification of the thermodynamical and dynamical quantities, and needs to be benchmarked for larger and more complex systems than previously used benchmark molecules. However, for the experiment in this work, the goal was to have a plausible physical coarse-grained model, to which HiRE-RNA seemed adequate. Given the modular setup of interactive simulations, the molecule’s representation and force field can easily be changed.
Visualization and user interaction through the UnityMol application
UnityMol is a molecular visualization software based on the Unity3D game engine (26). It features molecule representations commonly found in this domain and serves as an experimental platform for producing specialized methods (i.e., custom polysaccharides rendering (31)).
As coarse-grained models are not easily rendered on standard software, UnityMol was modified to generate appropriate and visually appealing representations. For HiRE-RNA, bases can be rendered through ellipsoids whose orientations correspond to the planes of bases, as explained in the previous section. This makes it easier to visually detect stacking and possible basepairing. When connected to an interactive HiRE-RNA simulation, plots of selected energy terms over time yield a quantitative insight into the molecule’s stability (Fig. S5). Using a computer mouse, direct action on the simulation is possible. Force vectors are computed based on the selected atom and this cursor displacement. These forces are transmitted to the simulation engine and added to the force field. This scheme offers a direct, almost instantaneous, visual feedback. More details about the functionalities of UnityMol and the web application are given in the Supporting Material.
Setting up an RNA folding challenge as an interactive experiment
Participants for this study involve two classes of third-year college students majoring in biology. During the courses held in 2015 and 2016, interactive nucleic acid simulations have been integrated as a mandatory lab exercise for the bioinformatics curriculum at Paris Diderot University. The course was the introduction to numerical tools for the study of biomolecules. During the semester, students received a 2-h lecture on the analysis of biomolecular structures including a brief overview of structure prediction methods, as well as a 1-h lecture on modeling biomolecules and basic principles of MD. All participants were therefore novice users of molecular simulation techniques. Because users were only familiar with the DNA double helix, and ignored the folding capabilities of single-stranded nucleic acids, an overview of nucleic acids structures was given as an introduction to the lab.
Users learned how to use UnityMol and perform interactive RNA simulations through two exercises of unwinding a double helix and reforming it. They made observations on the different energy terms, with the local harmonic potential governing the response as the molecule is being pulled by an external force while basepairing and basestacking drove and stabilized folding. Users were then given 3 h to work on the HiRE-RNA folding challenge, where they had to fold four molecules of increasing complexity. The starting point of each exercise was a completely stretched-out conformation. Users could launch an interactive MD simulation with Langevin dynamics for friction. The launching applet allowed users to choose the temperature, which could then be changed by pausing the simulation and relaunching it with a different T value.
Users were given instructions to select up to five conformations that could correspond to the native structure of the molecule. Their selection was submitted to a server and entered in the competition. In 2015, the root mean square deviation (RMSD) of the generated structures, with respect to the experimental analogs, was given to the users at the end of the competition, whereas in 2016 the server indicated the score of the structure and RMSD, immediately upon submission, giving users a real-time assessment of the validity of their structures.
We will refer to the 2015 simulations as the “nonfeedback experiment” and to the 2016 simulations as the “feedback experiment”. Because the strategies adopted for the nonfeedback experiment and the feedback experiment were different, we have analyzed both rounds separately. For our subsequent analysis, all submitted structures were recovered from the server, as well as the full trajectories that were physically recovered from each machine.
Four RNA molecules of increasing complexity. The four molecules consist of a simple hairpin, a hairpin with an asymmetric bulge, a H-pseudoknot, and a triple helix pseudoknot (Fig. 1).
PDB: 1F9L is a hairpin of 22 nucleotides including six canonical basepairs, one Hoogsteen G-U pair, and two A-G pairs (32). According to Leontis classification all pairs are cis Watson-Crick/Watson-Crick (see the Supporting Material for a detailed list). PDB: 1N8X is a 36-nucleotide hairpin, which has one asymmetric bulge (33). The native conformation is stabilized by 14 cis Watson-Crick/Watson-Crick basepairs, including one A-G pair adjacent to the bulge and one G-U pair in proximity of the hairpin loop. PDB: 2G1W is a 22-nucleotide simple pseudoknot, composed of seven canonical G-C basepairs (34). PDB: 2K96 is a 47-nucleotide triple helix pseudoknot (35). It consists of a Watson-Crick double helix and an A-rich dangling strand inserting into the WC helix groove and forming several stacked triplets. Overall, 21 basepairs, canonical and noncanonical, stabilize the native structure, including six triplets (five A-U-A and one C-G-A).
These four sequences, starting from fully elongated states, were previously folded (17, 18) by long nonbiased simulated tempering and replica exchange MD (REMD) simulations with the HiRE-RNA force field. The simulations located the global free energy minimum at 3.2, 3.8, 4.3, and 4.3 Å from the experimental state, for PDB: 1F9L, 1N8X, 2G1W, and 2K96, respectively.
Analysis of the participants’ performance. We carried out two separate analyses to study the usefulness of interactive simulations in addressing the question of RNA folding. The first focuses only on the structures submitted by the users to the online server and assesses whether a naive user can correctly produce folded structures and recognize them as such. The second analysis focuses on the full trajectories generated by each participant and investigates how the molecule’s conformational space is explored.
The two quantities used to compare submitted structures and trajectories to the native conformation are the RMSD (computed on all beads of the coarse-grained representation), and basepairing. For RMSD we have used a cutoff of 6 Å to detect structures corresponding to the native state. This value comes from our experience from previous simulations with HiRE-RNA at physiological temperatures, where the RMSD can fluctuate by ∼6 Å while preserving all correct basepairs and overall fold. This criterion gives only a rough estimate of the correspondence between two structures, as even lower RMSD values do not necessarily imply correct basestacking or basepairing. Pairs of bases with at least 10% of the maximal interaction energy between the two bases were characterized as basepairs.
For all trajectories, we analyzed structures from frames taken every 4 ps. We monitored the total internal energy given by the HiRE-RNA potential, which we then normalized with respect to the energy of the native structure, the overall number of basepairs, as well as the number of native basepairs. To better detect basepairs, we smoothed fluctuations using a moving window over several subsequent frames as described in Stadlbauer et al. (19). To give a more accurate, yet concise, description of the molecule’s architecture, we also looked at its topology starting from the list of detected basepairs, as defined in Gan et al. (36) and Fera et al. (37). Further details of the analysis procedure are given in the Supporting Material.
Results and Discussion
Participants predict a significant proportion of native folds
Overall participants submitted between 80 and 200 structures, depending on the molecule and the year. Not all participants used all five attempts at their disposal. A summary of the results of submitted structures is reported in Table 1. For reference, we also provide predictions by the commonly used McSym (7) and Vfold (38) programs. For each molecule, we report the basepairing of the lowest RMSD structure proposed by the students next to the details of the basepairing of the experimental structure in the Supporting Material.
Table 1.
Statistics for Submitted Structures
| Molecule (PDB:) | Nonfeedback Experiment |
Feedback Experiment |
McSym |
Vfold |
||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Number of Structures | ρ % | Lowest RMSD (Å) | RMSD Peaks (Å) | Number of Structures | ρ % | Lowest RMSD (Å) | RMSD Peaks (Å) | RMSD (Å) | RMSD (Å) | |
| 1F9L | 88 | 50 | 2.0 | 4; 7 | 203 | 25 | 2.1 | 6; 12 | 4.0 | 3.75 |
| 1N8X | 90 | 10 | 4.5 | 5; 10 | 207 | 5 | 2.6 | 8; 12 | 5.5 | 3.7 |
| 2G1W | 96 | 13 | 3.8 | 6; 12 | 168 | 3 | 5.7 | 8; 12 | 12.9 | na |
| 2K96 | 74 | 13a | 11.4 | 11; 15 | 119 | 8a | 11.2 | 11; 15 | 33 | na |
For each round, we report the following: the total number of structures submitted by the participants, the percentage ρ of structures with RMSD below 6 Å, the lowest RMSD among all submissions, and the approximate values of the first two peaks of the distribution in RMSD of all structures (the full distribution is visible in gray in Fig. 2, horizontal histograms).
For PDB: 2K96, we give a looser definition of the percentage of success and we consider the number of structures exhibiting the native topology. As a reference, we also report RMSD values of structures folded with the two bioinformatics programs McSym and Vfold (accessible on-line). Value averages are computed over the 10 best structures according to the programs. For the two pseudoknots, McFold/MCSym was not able to predict the correct topology despite having allowed the search for H-shaped pseudoknots. Instead, it proposed hairpins. VFold found the correct secondary structure, but yielded errors when attempting to build 3D structures based on the pseudoknotted secondary structure, not finding suitable motifs.
The ratio of predicted structures of RMSD lower than 6 Å (ρ) varies significantly with the molecule. The best predictions, as expected, were obtained for the simple hairpin (PDB: 1F9L), where almost one-half of the submissions corresponded to the native state in the nonfeedback experiment and one-quarter in the feedback experiment, and for which the lowest RMSD structures in both experiments have basepairing identical to native.
Molecule 2 (PDB: 1N8X) was harder to predict than molecule 1 because of the asymmetric bulge in its middle region. Most submitted structures include seven correctly paired bases (lower stem). Some structures predicted the correct basepairs but resulted in distorted overall shapes, bringing the RMSD to ∼10 Å, and were therefore not included in ρ. Other high RMSD structures also exhibit a high number of nonnative basepairs, and were folded into alternative low-energy structures from the experimental configuration. The lowest RMSD structures have 9/14 native basepairs for the nonfeedback experiment, and 10/14 for the feedback experiment.
Molecule 3 (PDB: 2G1W) has a markedly doubly peaked distribution. The lowest peak corresponds to the formation of the two stems in the pseudoknot configuration, whereas the higher peak corresponds to only one of the stems being formed. This is in agreement with results from REMD simulations. Most structures predicted the formation of one of the stems and also included some nonnative pairs, achieving alternative compact structures. These are mainly mismatched (noncanonical) hairpins. The lowest RMSD structures have 7/7 native basepairs for the nonfeedback experiment and 4/7 for the feedback experiment.
Given the size and complexity of molecule 4, we did not expect users to be able to fully predict its structure. We were, however, interested in testing how far they could come in proposing a plausible structure with the correct topology. Both years, 10 structures were submitted with RMSD between 11 and 12 Å and corresponded to the topology of the pseudoknot. Other structures included the WC helix, but did not reach the folding into a pseudoknot, leaving a dangling end. The distribution in RMSD exhibits a small peak at 12 Å and is otherwise rather flat, showing how there was not an alternative structure found by the users, but all other proposed structures sampled widely the more or less unfolded states. The lowest RMSD structures are also the ones with the most native basepairs, with 9/21 native basepairs for the nonfeedback experiment and 12/21 for the feedback experiment.
When we analyzed the submitted structures based on their internal energy, we systematically found some structures with the lowest RMSD and highest number of native basepairs among the 15 lowest energy submissions (Table 2). This is an encouraging result, as in a blind prediction one would usually focus on the lowest energy structures.
Table 2.
Lowest Energy Structures and Native Basepair Percentage for Submitted Structures
| Molecule (PDB:) | Nonfeedback Experiment |
Feedback Experiment |
||
|---|---|---|---|---|
| Best RMSD Å | Native BP | Best RMSD Å | Native BP | |
| 1F9L | 10/15 ≤ 5 | 6/15 ≥ 0.75 | 12/15 ≤ 5 | 7/15 ≥ 0.75 |
| 1N8X | 7/15 ≤ 6 | 5/15 ≥ 0.75 | 4/15 ≤ 6 | 3/15 ≥ 0.75 |
| 2G1W | 3/15 ≤ 6 | 7/15 ≥ 0.75 | 2/15 ≤ 8 | 8/15 ≥ 0.75 |
| 2K96 | 6/15 ≤ 11 | 5/15 ≥ 0.40 | 5/15 ≤ 12 | 2/15 ≥ 0.40 |
For each molecule, we analyze the 15 lowest-energy structures submitted by the users and we report on the number of structures with the lowest RMSD values, and the number of structures with a high percentage of native basepairs. The choice of 15 lowest-energy structures is arbitrary, but is in the range of what is typically analyzed by prediction methods.
The combined results for both experiments show that simple molecules could be folded quickly and easily by a large percentage of users, whereas molecules with more articulate structures are clearly harder to predict. Still, a significant portion of users were able to generate the native conformation and recognize it as such in ∼30 min of interactive simulation. In addition, it was possible to generate alternative conformations and test them for stability. Even for very complex architectures, such as the triple helix, some users were able to predict the correct topology of the molecule. This is particularly remarkable, as none of them had any prior experience with RNA structures, other than double helices and hairpins.
The users’ strategy in the feedback experiment was different than in the nonfeedback experiment. Indeed, the number of submissions in the feedback experiment was roughly twice as much as in the nonfeedback experiment. Because the number of users in the two years was comparable, one can immediately observe that users in the feedback experiment submitted more structures than their nonfeedback experiment colleagues. In the feedback experiment, a significant percentage of submitted structures have a high RMSD, suggesting that users submitted one or two randomly chosen structures just to test how far they were from the correct solution and used this information for completing the challenge. It is interesting to notice how this real-time feedback does not seem to give any particular advantage in the prediction of the folded structure as can be observed by the comparison of all statistical quantities in Table 1 between the nonfeedback experiment and the feedback experiment. Conversely, one can argue that results in the nonfeedback experiment are better than those of the feedback experiment. This observation is reminiscent of the observation made with Foldit that players could move from one basin to another through their ability to ignore a quantitative score (23).
Humans explore phase space more broadly than automated approaches
Having retrieved single trajectories from each user’s machine, we have analyzed the full exploration of the conformational space of each simulation with the goal of understanding the contribution of interactive simulations over regular, enhanced sampling, simulations. Results were assessed after merging all trajectories together, keeping the distinction between the nonfeedback experiment and the feedback experiment, and comparing them to the results from REMD simulation, performed on a computer cluster using 32 replicas, spanning from 250 to 500 K. For REMD simulation we have analyzed the structures of one low temperature replica, corresponding to a temperature below melting where the native state is present, if not dominant. An example of a participant’s single trajectory is presented in Figs. S5 and S6.
Sampling is focused on low energy conformational space
Fig. 2 illustrates the distributions of internal energy versus RMSD. Interactive simulations focus sampling on low energy conformational space. Most of the structures in the full trajectories are well above the native internal energy, but structures picked by participants have lower energies, as shown by the height of the gray peaks in population density (both for RMSD and energy) compared to the blue peaks extracted from full trajectories. For low RMSD structures, these energies are close to the native energy. Selected structures’ energies are generally low because users spontaneously proceeded in a sort of simulated tempering by restarting the simulation at different temperatures. When they thought a structure was close to the native one, they stopped the simulation and relaunched it with a lower temperature to reduce fluctuations, and perform small adjustments to the structure, with the temperature lowered to as much as 10 K. They then raised it back to room temperature on the refined structure to test for its stability.
In the nonfeedback experiment, users sampled extensively different basins, including the exact native state for all molecules except the triple helix (PDB: 2K96). In the feedback experiment, users sampled more uniformly the conformational space in RMSD and internal energy. This can be observed by the presence of several well-separated population peaks in the plots of the nonfeedback experiment, whereas a more uniform diagonal shade is observed for the feedback experiment. It appears that the instantaneous assessment provided in the feedback experiment led to a gradual decrease in RMSD, but prevented users from exploring disconnected basins. This can explain why, in the feedback experiment, users were less successful at folding than in the nonfeedback experiment.
The details of the results vary from molecule to molecule. For PDB: 1F9L, in the nonfeedback experiment users sampled extensively at least three different basins, as it appears from the three distinct peaks in population density, whereas they sampled more connected basins in the feedback experiment, remaining further away from the native state. For PDB: 1N8X, the full trajectories of the nonfeedback experiment remained globally at a higher internal energy than those of the feedback experiment. However, in the nonfeedback experiment users were able to reach lower energy states with a better correspondence to the native structure and select them as candidates for native. The same is true for PDB: 2G1W. Interestingly for this molecule, in the feedback experiment users did sample a basin at 6 Å RMSD, corresponding to the native state, but they did not select these structures as possible native candidates. In the nonfeedback experiment, this region was less explored, but recognized as native by a dozen users. As a general trend, in the nonfeedback experiment users explored a wider energy range. They seem to have sampled lower energy states than in the feedback experiment and chose these states for their submission.
For comparison, trajectories from REMD simulations spent most of their time exploring the unfolded states and, despite the presence of low temperatures, did not minimize the energy as effectively as interactive simulations. Still, a peak corresponding to the native structure is clearly visible, even though it represents only a small fraction of the overall population and its internal energy is similar to other states.
Basepairing and topology measure native fold propensity
To assess whether an RNA structure is correctly folded it is important to consider also the basepairing network, and not simply at the RMSD. For the nonfeedback experiment, which was based on the previous analysis and discussion we consider the most interesting, we have analyzed the details of basepairing. Results are reported in Fig. 3. For each molecule, we analyzed the overall number of basepairs, the number of native basepairs, and two topological parameters, allowing comparison of the general features of the basepairing network to that of the native structure.
Figure 3.
Given here is the basepairs-versus-RMSD analysis for the nonfeedback experiment: the number of detected basepairs (left), the percentage of native basepairs (center), and the molecule’s topology (right) as defined by the second eigenvalue of the Laplacian matrix. Eigenvalues are normalized with respect to the second eigenvalue of the Laplacian matrix λ0 for the native structure. In the central and right columns, the pink wedge corresponds to the position of the native structure (RMSD = 0; % native pairs = 100; λ/λ0 = 1). To see this figure in color, go online.
For all molecules, we can observe that trajectories focused on configurations with a relatively high number of basepairs. This is particularly evident for PDB: 1F9L and 1N8X, where we can observe a peak of the distribution of basepairs at values close to the native number of pairs (first column, vertical histogram in blue). The number of native basepairs, however, is low. Only a negligible percentage of all trajectories explore conformations with exactly the same basepairs as the native structure (second column). Interestingly, these structures were chosen for submission and indeed correspond to the best predictions also in terms of RMSD. A possible explanation for the choice of the users comes from the observation of the stability of the molecule, which is not captured by the instantaneous structure they submitted. Indeed, native states are generally more stable than other states, as observed by our previous computer simulation studies for the same molecules. Additionally, users had the tendency to submit structures that remained stable in the simulation.
For PDB: 1N8X and 2G1W, we can observe that one native stem is clearly explored in the trajectories. This corresponds to the lower stem for PDB: 1N8X, formed by seven pairs, and one or the other of the stems of PDB: 2G1W, which can both be composed of four pairs. Trajectories of PDB: 2K96 explore configurations with a wide range of basepairs, with the number of native basepairs not exceeding 40%. There is not a clear peak of the distribution, but all options seemed to be explored rather uniformly.
The comparison of topological values (column 3) gives a measure of the extent to which the basepairing organization of generated structures (trajectories or submitted) corresponds to the native secondary structure. For PDB: 1F9L and 1N8X the topology explored in the trajectories, and even more so that of selected structures, corresponds well to the native topology, suggesting that most users focused on the hairpin as their prediction for the molecule’s architecture. Indeed most trajectories focus on a topological parameter (λ, see Supporting Material) equal or close to native, and selected structures are very strongly peaked at the correct eigenvalue. If we consider dual graph topological parameters (37), for PDB: 1N8X 17% of all trajectories and 33% of selected structures share the native values of the number of vertices and second eigenvalue of the Laplacian matrix, indicating that the overall basepair organization and stem-loop organization of the explored configurations correspond to native. For PDB: 2G1W, analysis of topological parameters shows that full trajectories focused on configurations of topologies different from the pseudoknot (indeed most users tried to form hairpins). However, submitted structures were also chosen from conformations of the correct topology, as it is shown by a peak of the distribution for λ/λ0 ∼ 1. For PDB: 2K96 the best predictions have the native value λ ∼ λ0, supporting the observation that even though the details of the structures are not predicted correctly, the overall organization of submitted structures corresponds to native.
Interactive RNA folding opens new opportunities
The fact that participants were quite successful in folding the four molecules and exploring phase space in a broad manner opens the prospect for applications. In research, such interactive simulations on unknown targets may provide a complementary means to generate a pool of plausible structures. In combination with experimental data, this can be a powerful tool to refine structural models. Teaching is another promising application area, as we noticed that many of the complex concepts associated to RNA conformational flexibility were easily grasped by the participants. The interactive approach is also a wonderful tool for outreach activities.
A key question for research applications is whether one is able to select the correctly folded structures from the pool of all submissions. This dataset suggests that structural clustering of the solutions, combined with a low energy filter, should lead to a good selection of candidate structures. In that context, it should also be recalled that in some sense our experiment setup was not ideal, because the participants were only allowed quite basic tools, without 3D visualization of the structures, nor the use of 3D input devices that would facilitate the manipulation in space. Furthermore, available time for the experiment was limited. In particular, for the more complex PDB: 2K96 molecule, this limitation had an impact on what could be achieved. Another promising avenue for future extension would be to implement collaborative strategies whereby users would not only be able to work individually but also collectively. This route is successfully taken by Foldit through the use of a scripting tool that allowed players to share their strategies (39). Furthermore, one could imagine several participants working on distinct parts of one molecule at the same time, or cross-checking each other’s solutions.
These preliminary results on molecular explorations by interactive simulations are encouraging especially if projected onto the direction of the use by the research community, where the average user would already have some prior knowledge of the possible motifs of nucleic acids systems and possibly modeling. One of the main directions of our development of HiRE-RNA and UnityMol is to include different sources of experimental information. In this simulation software, it is possible to include local restraints such as basepairs, including information from secondary structure predictions, crystallographic data of subparts of the molecule, and preliminary NMR data. In a new version, soon to be released, it will be possible to include the on-the-fly calculation of theoretical small-angle x-ray scattering curves and compare them to a target curve as the simulation proceeds. The introduction of indirect experimental data to guide simulations has to be done with the awareness that a successful strategy is to explore very different regions of the conformational space, and not to focus on a restricted region of one single parameter. Indeed, the knowledge of a score with respect to a target structure, such as the RMSD in the feedback experiment, appears to limit the conformational space that is explored by the users, which have then the tendency to remain in a single region of good scores, instead of exploring more widely.
Conclusions
The strategy presented here fundamentally builds upon the interactive MD family of approaches (40), yet provides many significant improvements, in particular the introduction of crowd sourcing tools for harvesting user contributions. To our knowledge, this is the first large-scale participative experiment at a coarse-grained level of representation, whereas alternative approaches such as Foldit focus on all-atom models. We previously demonstrated that the coarse-grained level provides particular opportunities for interactive simulations (40). In our approach, the physically sound simulation of the conformational dynamics is at the center of the experiment and it is guided by the user; in other folding challenges the user 3D puzzle is at the focus point, with limited contribution from modeling, rather using instantaneous minimization. For our purpose, we extend the existing Interactive MD protocol with the possibility to steer simulation parameters, such as temperature, or to exchange experimental data used as additional constraints on the simulation. We provide several adapted real-time analyses, such as live plots of relevant quantities to monitor the simulation, on-the-fly topology, and secondary structure graphs, as well as the generation of experimentally relevant information, such as a small-angle x-ray scattering scattered intensity profile. Overall, we propose an open design that others can build on for similar experiments, providing, among others, a convenient web application to harvest and manage participants’ contributions.
The main result of our experiment was that through interactive simulations, and a simplified representation of the molecule, naive users were able to successfully predict native RNA folds. The use of interactive nonequilibrium MD simulations, with the possibility of monitoring in real-time certain features such as internal energies, not only allows the participants to explore the conformational space more widely, with respect to what is done by standard REMD computer simulations, but leads to the identification of nativelike structures and more thorough exploration of the corresponding basins. The plurality of proposed structures is an advantage in folding predictions. Given the variability of experimental conditions that cannot be accounted for in simulations, the ability to quickly produce plausible alternative structures is indeed a valuable feature in the context of real scientific research, in which the target structure is unknown and where possible conformations have to be selected based on their agreement with indirect experimental information. Submission of several different structures is also a winning strategy in RNA and protein folding competitions. A collective assessment of all energy and basepairing plots from the nonfeedback experiment leads to the observation that there is no straightforward correlation between energy and RMSD, nor between the number of basepairs and RMSD, yet the participants could reach a high success rate. Comparing the results from the nonfeedback and feedback experiments, there is no increase of the success rate. This is interesting but also somewhat expected, because a single parameter is not sufficient to detect the native state. Participants were able to guide their molecules to native basins and to select nativelike conformations, by acquiring chemical and physical intelligence that standard computer simulations based on the equations of motion, and energy calculations, do not possess. This observation makes a strong argument for the pursuit of hybrid methods, where the power of computers is combined with the creativity of humans.
With the amazing variety of RNA structures, there are many more RNA folding challenges than those presented here, biologically more interesting and more intriguing for predictions. However, our goal here was to demonstrate that naive users, with little background in nucleic acid structures and modeling could, in just one day, go from learning to use the software to proposing solutions in good agreement with experimental structures. Our ultimate goal is to provide this open software to experts in RNA structure and function, who are aware of the complexity of the structural details of single-stranded nucleic acids, have a good knowledge of the NDB, and by intuition can test their ideas very rapidly without having to enumerate an excessively large number of conformations.
Author Contributions
L.M. analyzed data and contributed to writing the paper. S.D. contributed the interactive simulation and visualization tools, developed the website, managed data collection, and contributed to writing the paper. C.G. helped in setting up the lab. P.D. contributed developing the coarse-grained model, wrote the MD simulation code, and contributed to writing the paper. A.T. contributed in designing the in-class experiment. M.B. designed the simulation tools, and contributed in designing the research setup and in writing the paper. S.P. designed the coarse-grained model, designed the research setup, and wrote the paper.
Acknowledgments
We thank all third-year biology majors of 2015 and 2016 at Paris Diderot University for enthusiastically taking part in this experiment. We also thank the first-year students of “Frontières du Vivant” curriculum at Paris Descartes University, as well as the MOOC students “Bases Moléculaires de la Vie”, for also testing our interactive simulations software and folding challenge, aiding in the proper setup of the experiment.
L.M. was supported by the “Initiative d’Excellence” program from the French government, project “DYNAMO”, under ANR-11-LABX-0011-01. S.D. was supported by the French National Agency for Research, under proposals “ExaViz” (ANR-11-MONU-003) and “GRAL” (ANR-12-BS07-0017).
Editor: Tamar Schlick.
Footnotes
Liuba Mazzanti and Sébastien Doutreligne contributed equally to this work.
Supporting Materials and Methods, seven figures, and four tables are available at http://www.biophysj.org/biophysj/supplemental/S0006-3495(17)30616-1.
Supporting Material
References
- 1.Berman H.M., Westbrook J., Zardecki C. The nucleic acid database. Acta Crystallogr. D Biol. Crystallogr. 2002;58:889–898. doi: 10.1107/s0907444902003487. [DOI] [PubMed] [Google Scholar]
- 2.Leontis N.B., Stombaugh J., Westhof E. The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic Acids Res. 2002;30:3497–3531. doi: 10.1093/nar/gkf481. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cruz J.A., Blanchet M.F., Westhof E. RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction. RNA. 2012;18:610–625. doi: 10.1261/rna.031054.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Miao Z., Adamiak R.W., Westhof E. RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures. RNA. 2015;21:1066–1084. doi: 10.1261/rna.049502.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Miao Z., Adamiak R.W., Westhof E. RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA. 2017;23:655–672. doi: 10.1261/rna.060368.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Cheng C.Y., Chou F.-C., Das R. Modeling complex RNA tertiary folds with ROSETTA. Methods Enzymol. 2015;553:35–64. doi: 10.1016/bs.mie.2014.10.051. [DOI] [PubMed] [Google Scholar]
- 7.Parisien M., Major F. The MC-fold and MC-sym pipeline infers RNA structure from sequence data. Nature. 2008;452:51–55. doi: 10.1038/nature06684. [DOI] [PubMed] [Google Scholar]
- 8.Turner D.H., Mathews D.H. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010;38:D280–D282. doi: 10.1093/nar/gkp892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zuker M., Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 1981;9:133–148. doi: 10.1093/nar/9.1.133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Nussinov R., Jacobson A.B. Fast algorithm for predicting the secondary structure of single-stranded RNA. Proc. Natl. Acad. Sci. USA. 1980;77:6309–6313. doi: 10.1073/pnas.77.11.6309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ruan J., Stormo G.D., Zhang W. An iterated loop matching approach to the prediction of RNA secondary structures with pseudoknots. Bioinformatics. 2004;20:58–66. doi: 10.1093/bioinformatics/btg373. [DOI] [PubMed] [Google Scholar]
- 12.Rivas E., Eddy S.R. A dynamic programming algorithm for RNA structure prediction including pseudoknots. J. Mol. Biol. 1999;285:2053–2068. doi: 10.1006/jmbi.1998.2436. [DOI] [PubMed] [Google Scholar]
- 13.Lyngsø R.B., Pedersen C.N. RNA pseudoknot prediction in energy-based models. J. Comput. Biol. 2000;7:409–427. doi: 10.1089/106652700750050862. [DOI] [PubMed] [Google Scholar]
- 14.Xia Z., Bell D.R., Ren P. RNA 3D structure prediction by using a coarse-grained model and experimental data. J. Phys. Chem. B. 2013;117:3135–3144. doi: 10.1021/jp400751w. [DOI] [PubMed] [Google Scholar]
- 15.Ding F., Sharma S., Dokholyan N.V. Ab initio RNA folding by discrete molecular dynamics: from structure prediction to folding mechanisms. RNA. 2008;14:1164–1173. doi: 10.1261/rna.894608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Šulc P., Romano F., Louis A.A. A nucleotide-level coarse-grained model of RNA. J. Chem. Phys. 2014;140:235102. doi: 10.1063/1.4881424. [DOI] [PubMed] [Google Scholar]
- 17.Pasquali S., Derreumaux P. HiRE-RNA: a high resolution coarse-grained energy model for RNA. J. Phys. Chem. B. 2010;114:11957–11966. doi: 10.1021/jp102497y. [DOI] [PubMed] [Google Scholar]
- 18.Cragnolini T., Derreumaux P., Pasquali S. Ab initio RNA folding. J. Phys. Condens. Matter. 2015;27:233102. doi: 10.1088/0953-8984/27/23/233102. [DOI] [PubMed] [Google Scholar]
- 19.Stadlbauer P., Mazzanti L., Šponer J. Coarse-grained simulations complemented by atomistic molecular dynamics provide new insights into folding and unfolding of human telomeric g-quadruplexes. J. Chem. Theory Comput. 2016;12:6077–6097. doi: 10.1021/acs.jctc.6b00667. [DOI] [PubMed] [Google Scholar]
- 20.Cho S.S., Pincus D.L., Thirumalai D. Assembly mechanisms of RNA pseudoknots are determined by the stabilities of constituent secondary structures. Proc. Natl. Acad. Sci. USA. 2009;106:17349–17354. doi: 10.1073/pnas.0906625106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Šulc P., Ouldridge T.E., Louis A.A. Modelling toehold-mediated RNA strand displacement. Biophys. J. 2015;108:1238–1247. doi: 10.1016/j.bpj.2015.01.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang T., Zhang J., Mu Y. Molecular mechanism of the inhibition of EGCG on the Alzheimer Aβ(1–42) dimer. J. Phys. Chem. B. 2013;117:3993–4002. doi: 10.1021/jp312573y. [DOI] [PubMed] [Google Scholar]
- 23.Cooper S., Khatib F., Players F. Predicting protein structures with a multiplayer online game. Nature. 2010;466:756–760. doi: 10.1038/nature09304. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee J., Kladwang W., Das R., EteRNA Participants RNA design rules from a massive open laboratory. Proc. Natl. Acad. Sci. USA. 2014;111:2122–2127. doi: 10.1073/pnas.1313039111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cragnolini T., Laurin Y., Pasquali S. Coarse-grained HiRE-RNA model for ab initio RNA folding beyond simple molecules, including noncanonical and multiple base pairings. J. Chem. Theory Comput. 2015;11:3510–3522. doi: 10.1021/acs.jctc.5b00200. [DOI] [PubMed] [Google Scholar]
- 26.Lv Z., Tek A., Baaden M. Game on, science—how video game technology may help biologists tackle visualization challenges. PLoS One. 2013;8:e57990. doi: 10.1371/journal.pone.0057990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Doutreligne S., Gageat C., Baaden M. 2015 IEEE 1st International Workshop on Virtual and Augmented Reality for Molecular Science, Arles, France. Institute of Electrical and Electronics Engineers; Piscataway, NJ: 2015. Unitymol: interactive and ludic visual manipulation of coarse-grained RNA and other biomolecules; pp. 1–6. [Google Scholar]
- 28.Sterpone F., Melchionna S., Derreumaux P. The OPEP protein model: from single molecules, amyloid formation, crowding and hydrodynamics to DNA/RNA systems. Chem. Soc. Rev. 2014;43:4871–4893. doi: 10.1039/c4cs00048j. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Chebaro Y., Pasquali S., Derreumaux P. The coarse-grained OPEP force field for non-amyloid and amyloid proteins. J. Phys. Chem. B. 2012;116:8741–8752. doi: 10.1021/jp301665f. [DOI] [PubMed] [Google Scholar]
- 30.Nguyen P.H., Okamoto Y., Derreumaux P. Communication: simulated tempering with fast on-the-fly weight determination. J. Chem. Phys. 2013;138:061102. doi: 10.1063/1.4792046. [DOI] [PubMed] [Google Scholar]
- 31.Pérez S., Tubiana T., Baaden M. Three-dimensional representations of complex carbohydrates and polysaccharides—sweetunityMol: a video game-based computer graphic software. Glycobiology. 2015;25:483–491. doi: 10.1093/glycob/cwu133. [DOI] [PubMed] [Google Scholar]
- 32.Rüdisser S., Tinoco I., Jr. Solution structure of Cobalt(III)hexammine complexed to the GAAA tetraloop, and metal-ion binding to G.A mismatches. J. Mol. Biol. 2000;295:1211–1223. doi: 10.1006/jmbi.1999.3421. [DOI] [PubMed] [Google Scholar]
- 33.Lawrence D.C., Stover C.C., Summers M.F. Structure of the intact stem and bulge of HIV-1 Psi-RNA stem-loop SL1. J. Mol. Biol. 2003;326:529–542. doi: 10.1016/s0022-2836(02)01305-0. [DOI] [PubMed] [Google Scholar]
- 34.Nonin-Lecomte S., Felden B., Dardel F. NMR structure of the Aquifex aeolicus tmRNA pseudoknot PK1: new insights into the recoding event of the ribosomal trans-translation. Nucleic Acids Res. 2006;34:1847–1853. doi: 10.1093/nar/gkl111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kim N.-K., Zhang Q., Feigon J. Solution structure and dynamics of the wild-type pseudoknot of human telomerase RNA. J. Mol. Biol. 2008;384:1249–1261. doi: 10.1016/j.jmb.2008.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Gan H.H., Pasquali S., Schlick T. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res. 2003;31:2926–2943. doi: 10.1093/nar/gkg365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Fera D., Kim N., Schlick T. RAG: RNA-As-Graphs web resource. BMC Bioinformatics. 2004;5:88. doi: 10.1186/1471-2105-5-88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Xu X., Zhao P., Chen S.-J. Vfold: a web server for RNA structure and folding thermodynamics prediction. PLoS One. 2014;9:e107504. doi: 10.1371/journal.pone.0107504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Khatib F., Cooper S., Players F. Algorithm discovery by protein folding game players. Proc. Natl. Acad. Sci. USA. 2011;108:18949–18953. doi: 10.1073/pnas.1115898108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Delalande O., Férey N., Baaden M. Complex molecular assemblies at hand via interactive simulations. J. Comput. Chem. 2009;30:2375–2387. doi: 10.1002/jcc.21235. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



