Abstract
The ever increasing discoveries of noncoding RNA functions draw a strong demand for RNA structure determination from the sequence. In recently years, computational studies for RNA structures, at both the two-dimensional and the three-dimensional levels, led to several highly promising new developments. In this chapter, we describe a recently developed RNA structure prediction method based on the virtual bond-based coarse-grained folding model (Vfold). The main emphasis in the Vfold method is placed on the loop entropy calculations, the treatment of noncanonical (mismatch) interactions and the 3D structure assembly from motif-based template library. As case studies, we use the glycine riboswitch and the G310-U376 domain of MLV RNA to illustrate the Vfold-based prediction of RNA 3D structures from the sequences.
Keywords: Partition function, Loop entropy, Mismatched stacks, 2D structure motif, Structure assembly
1 Introduction
To perform crucial cellular functions, RNA molecules fold up to form compact three-dimensional (3D) structures [1–5]. The RNA structure determination by experiments alone cannot keep up the pace with the ever increasing number of RNA sequences and new functions. The gap between the number of known RNA 3D structures and the number of biologically significant RNA sequences underscores more than ever the request for accurate computational models for RNA structure prediction.
An RNA structure can be described at the two-dimensional (2D) and three-dimensional (3D) levels. A 2D structure is defined as the sum of all the base-base pairs in the structure, including long-range base pairs in tertiary folds. Computational prediction of RNA 2D structures falls into two categories [6–10]: sequence comparison (alignment) analysis and free energy-based modeling. In general, sequence comparison-based methods can give more reliable predictions than free energy-based methods, but it depends on the availability of homologous sequences and often cannot directly provide information about the alternative structures. For the free energy-based modeling, a key problem is to determine the helix stabilities and loop free energies. The free energy parameters for a helix stem can be calculated from the Turner experimental data [11], but the loop free energy requires a model.
The recently developed Vfold model is a statistical mechanics-based RNA folding model. The model relies on a coarse-grained (virtual bond) representation of RNA structures [12–14]. Compared with other free energy-based RNA 2D structure prediction models, such as Mfold [15] and RNAstructure [16], the Vfold model computes loop entropy parameters from explicit conformational sampling. Furthermore, by enumerating all the possible (sequence-dependent) intra-loop mismatches, the Vfold model partially accounts for the sequence-dependence of the loop free energy. Through application to a broad range of experimental and biological problems, the Vfold-based predictions have shown to be able to provide novel insights for RNA mechanisms, such as pseudoknot-involved conformational switch between bistable secondary structures [17], microRNA–gene target interactions [18], and RNA–RNA kissing dimerization in viral replication [19, 20].
Knowing RNA 2D structures alone is often not sufficient to understand RNA function. We also need RNA 3D structure information in order to understand the interactions between RNA and other molecules and RNA functions [21–24]. One way to predict RNA 3D structure is to combine a coarse-grained RNA structure model with the knowledge-based force field and fold the RNA through discrete molecular dynamics (DMD) simulations [25–28]. Due to the limitation of conformational sampling, this method would be most suitable for short RNAs or large RNAs with auxiliary constraints from experimental data. Based on the assumption that 3D structure is more conserved and can be recognized by the alignment of sequences and structure motifs, (3D structure) template-based modeling has become a promising method in RNA 3D structure predictions [29–31]. The template-based methods build RNA 3D structures using known structures ranging from fragments of 1–3 nucleotides to larger structural motifs. One of the common limitations for the template (structure assembly) approaches is the completeness of the fragment library. The lack of reliable structural motifs for many loops and junctions greatly hampers the success of accurate 3D structure prediction.
For a given 2D structure, the Vfold-based 3D structure prediction method searches for the appropriate template for each loop/junction in the structure, and assembles the 3D template structures into a scaffold for further structure refinement. In comparison with other template-based (structure assembly) methods such as FARNA/FARFAR [29] and MC-Sym [31], which sample structures from small fragments of known RNA structures, the Vfold-based method uses motif-based instead of fragment-based templates.
2 Algorithms
2.1 RNA Motif-Based Loop Entropy
Using two virtual bonds per nucleotide to represent the backbone conformation, the Vfold model samples fluctuations of loops/ junction conformations in 3D space through conformational enumeration (see Fig. 1). By calculating the probability of loop formation, it gives the conformational entropy parameters for the formation of the different types of loops such as hairpin, bulge, internal, pseudoknot loops. The model has the advantage of accounting for chain connectivity, exclude volume and the completeness of conformational ensemble.
Enumerate all the possible virtual bond backbone conformations for a given chain length (see Note 1) and count the total number Ωcoil of the conformations.
From the conformational ensemble above, identify the loop conformations according to the loop closure condition. For example, for hairpin loops, the two ends of a loop conformation should be fitted to an A-form base pair. Count the total number Ωloop of loop conformations.
Calculate the loop entropy ΔSloop = kB 1n (Ωloop / Ωcoil). Here, kb is the Boltzmann constant.
Vfold computations lead to pre-tabulated entropy parameters for hairpin loops [12], internal/bulge loops [12], H-type pseudoknots with/without inter-helix junction [32, 33] and hairpin-hairpin kissing motifs [19].
2.2 RNA Motif-Based Template Library
The (3D structure) template library was built from 2,621 PDB structures (see Note 2), including RNA-involved complexes. It contains 3D templates for hairpin loops, internal/bulge loops, H-type pseudoknots, and multibranched junctions.
For a given RNA 3D structure, extract the A-form helices. From the information of helices and base pairs, the corresponding 2D structure is determined.
Identify all the non-helix 2D structure motifs for the given 3D structure.
Remove the redundant templates for those with root mean square deviation (RMSD) ≤1.5 Å for the same motif, same size, and identical sequence.
Collect all the nonredundant motif structures to construct a template library. Table 1 shows the statistics for the current template library.
Table 1.
Motif name | Number of templates |
---|---|
Hairpin loops | 2,366 |
Internal/bulge loops | 3,260 |
3-way junctions | 820 |
4-way junctions | 506 |
5-way junctions | 222 |
6-way junctions | 49 |
7-way junctions | 61 |
H-type pseudoknots | 56 |
3 Methods
To predict RNA 3D structures, Vfold first predicts the 2D structures from the sequence. Using the 2D structures as constraint, the model then predicts the corresponding 3D structures.
3.1 RNA 2D Structure Prediction from the Sequence
The key of the free energy-based RNA 2D structure prediction is the enthalpy and entropy parameters used to evaluate the stability of sampled structures. The enthalpy and entropy for the canonical and mismatched base stacks are calculated from Turner’s experimental data. The loop entropies are from the Vfold pre-tabulated parameters.
Enumerate all the possible base pair arrangements (2D structures), including H-type pseudoknots with/without inter-helix loops and non-pseudoknotted secondary structures, for a given RNA sequence (see Note 3).
For each helix, calculate its free energy ΔGhelix (see Note 4) as a sum of the free energy ΔGstack of each constituent base stack: based on the nearest-neighbor model, where ΔGstack is determined from Turner’s experimental parameters11 (see Note 5).
-
Enumerate all the possible intra-loop mismatches and compute the loop free energy for each given set of intra-loop mismatches (see Note 6) (see Fig. 2 for a hairpin loop for illustration). The loop free energy ΔGloop is calculated (see Note 4) from the loop partition function Qloop , the Boltzmann sum over all the possible arrangements of intra-loop mismatched base stacks:
Here ΔGmm is total free energies of the mismatched base stacks and ΔSloop is the loop entropy for the given intra-loop mismatch constraints.
Assign the free energy for each sampled 2D structure: ΔGs = ΔGhelix + ΔGloop.
- Calculate the total partition function as the sum over all the possible (2D) structures:
Following the similar procedure as above for the total partition function, compute the conditional partition function Qij for all the 2D structures with nucleotides i and j base paired.
Calculate the probability of forming the (i, j) base pair: pij = Qij / Qtot.
From the base pairing probability for all the possible (i, j) pairs, extract the predicted most probable (see Note 7) as well as alternative structures.
We use the glycine riboswitch (PDB: 3owi) as an example to show how Vfold predicts the 2D structure. Given the 84-nt RNA sequence of the glycine riboswitch (see Note 8), Vfold calculates the base pairing probabilities pij for all the possible base pairs. The predicted most probable 2D structure (see Note 7) can be predicted from pij, as shown in Fig. 3. It should be noted that depending on the sequence, the Vfold model predicts all the stable structures, including the most probable (most stable) structure as well as the alternative (metastable) structures. Therefore, it is recommended to also find out the possible alternative structures from the base pairing probabilities.
3.2 RNA 3D Structure Prediction for a Given 2D Structure
The Vfold model predicts the 3D structure from a 2D structure by assembling motif-specific structural templates. Currently, due to the limited structural template database, Vfold can only predict the 3D structures with hairpin loops, internal/bulge loops, multi-branched junctions and pseudoknots.
Identify the structure motifs (such as hairpin loop, internal loop, pseudoknot loop, and three-way junction) from the given 2D structure.
Build the virtual bond 3D structure for helices according to the A-form helix template.
For each non-helix motif, search for the best templates from the template library. The search criteria are based on the size (first) and sequence (second) matches (see Note 9).
From the (all-atom) templates found in the previous step (see Notes 10 and 11), build the virtual bond 3D structures of each motifs.
Assemble the virtual bond 3D structure of the motifs to construct the 3D scaffold of the whole RNA.
Add bases to the virtual bond backbone according to the templates for base configurations (see Note 12).
Refine the 3D structure by AMBER energy minimization (see Note 13).
We use the glycine riboswitch (PDB: 3owi) case for the illustration of the 3D structure prediction. For the purpose of 3D structure prediction, we list only the canonical base pairs in the 2D structure and treat noncanonical base pairs as part of the loop/ junction structure (see Fig. 3). The predicted 2D structure shows a three-way junction, two internal loops, two hairpin loops and five helices. If the motif structure in the glycine riboswitch is included in the template library, as shown in Fig. 4a, the RMSD between the predicted structure and the experimentally determined structure is 6.3 Å. This RMSD is smaller than the previous prediction of 7.24 Å, shown in Fig. 4b, which excluded the native templates in the template library.
It should be noted that even with the native templates included in the template library, the predicted structure still shows a non-zero RMSD with the PDB structure. The small difference between the A-form helix and the real (slightly distorted) RNA helix (see Note 14) could results in a notable structural difference in the global fold.
As another example, we predict the 3D structure for the G310-U376 domain of MLV RNA (see Note 15). Vfold correctly predicts the 2D structure (see Note 16) (see Fig. 5). If the native motif structure (PDB: 1s9s) is not included in the template library, we find a larger RMSD between the predicted and the PDB structure.
Because the template-based 3D structure prediction algorithm relies on the knowledge of the known structures, we can realistically expect continuous improvements in the quality of the structure prediction as more and more structures are solved.
Acknowledgments
This research was supported by NIH grant GM063732 and NSF grant MCB0920411.
Footnotes
A survey of the known structures suggests that the virtual bonds (P-C4′ and C4′-P) have bond length of ~3.9 Å, and have bond angles of (βc, βp) in the range of 90–120°.
The list of the 2,621 PDB structures used for constructing the template library includes all the PDB entries released before January of 2014. It includes RNA-involved complexes except RNA/DNA hybrids.
The computational time scales with the chain length N as O(N6) and the memory scales as O(N2).
The thermodynamic parameters for the different base stacks are from the experimental data (Turner parameters). We use Vfold derived loop entropies to evaluate the loop free energy.
The nearest-neighbor model of RNA structure assumes that the stability of a base pair depends only on its adjacent bases, which could be either base-paired (a stacking contribution) or unpaired (a mismatch contribution).
A mismatched base stack is formed by a canonical base pair (A–U, G–C, or G–U) and a noncanonical base pair. Consecutive noncanonical base pairs are considered to be unstable and are not accounted for in the intra-loop mismatch rearrangements.
The predicted most probable 2D structure is formed by base pairs of the largest base pairing probabilities.
The sequence of the glycine riboswitch is: 5′CUCUGGAGA GAACCGUUUAAUCGGUCGCCGAAGGAG-CAAGCU CUGCGCAUAUGCAGAGUGAAACUCUCAGGCAAAA GGACAGAG3′.
To find the best template for a loop, Vfold screens the template library according to the loop size (first criteria) and the sequence (second criteria) matches. If necessary, this may involve sequence replacement in order to match the sequences in the template library. Vfold defines the sequence distance to find the optimal templates. Here, hi is the hamming distance between nucleotide i in the selected template and the corresponding nucleotide in the target sequence through the following substitution cycles: A → G → C → U, C → U → A → G, G → A → U → C, U → C → G → A.
If there is no templates available for a motif, no 3D structures will be predicted.
There might have more than one optimal templates available for the same motif. This can lead to more than one predicted 3D structures.
For the nucleotides in helices, the base atoms are added following the A-form helix.
The all-atom energy (such as AMBER) minimization causes only small change in the RMSD of the structure.
Helices contain canonical base pairs only: A–U, G–C, and G–U base pairs. The RMSD between a helix of known RNA structures and the standard A-form helix is <1.2 Å.
The sequence of the G310-U376 domain of MLV RNA is: 5′ G C G G A C C C G U G G U G G A A C U G U G AAGU-UCGGAACACCCGGCCGCAACCCUGGGA GAGAUCCCAGGGUU3′.
The base pairs used to predict the 3D structures of MLV RNA: (1 43), (2 42), (3 41), (4 40), (7 39), (8 38), (9 37), (11 36), (12 35), (13 34), (14 33), (15 28), (16 27), (17 26), (18 25), (19 24), (44 67), (45 66), (46 65), (47 64), (48 63), (49 62), (50 61), (51 60), (52 59), (53 58).
References
- 1.Doudna JA, Cech TR. The chemical repertoire of natural ribozymes. Nature. 2002;418:222–228. doi: 10.1038/418222a. [DOI] [PubMed] [Google Scholar]
- 2.Bachellerie JP, Cavaille J, Huttenhofer A. The expanding snoRNA world. Biochimie. 2002;84:774–790. doi: 10.1016/s0300-9084(02)01402-5. [DOI] [PubMed] [Google Scholar]
- 3.Gong C, Maquat LE. lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 3 UTRs via Alu elements. Nature. 2011;470:284–288. doi: 10.1038/nature09701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bartel DP. MicroRNAs: target recognition and regulatory functions. Cell. 2009;136:215–233. doi: 10.1016/j.cell.2009.01.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E. The role of site accessibility in microRNA target recognition. Nat Genet. 2007;39:1278–1284. doi: 10.1038/ng2135. [DOI] [PubMed] [Google Scholar]
- 6.Gardner PP, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics. 2004;5:140. doi: 10.1186/1471-2105-5-140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol. 2010;2:a003665. doi: 10.1101/cshperspect.a003665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Washietl S. Sequence and structure analysis of noncoding RNAs. Methods Mol Biol. 2010;609:285–306. doi: 10.1007/978-1-60327-241-4_17. [DOI] [PubMed] [Google Scholar]
- 9.Machado-Lima A, del Portillo HA, Durham AM. Computational methods in non-coding RNA research. J Math Biol. 2008;56:15–49. doi: 10.1007/s00285-007-0122-6. [DOI] [PubMed] [Google Scholar]
- 10.Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol. 2006;16:270–278. doi: 10.1016/j.sbi.2006.05.010. [DOI] [PubMed] [Google Scholar]
- 11.Turner DH, Mathews DH. NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 2010;38:D280–D282. doi: 10.1093/nar/gkp892. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cao S, Chen S-J. Predicting RNA folding thermodynamics with a reduced chain representation model. RNA. 2005;11:1884–1897. doi: 10.1261/rna.2109105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chen S-J. RNA folding: conformational statistics, folding kinetics, and ion electrostatics. Annu Rev Biophys. 2008;37:197–214. doi: 10.1146/annurev.biophys.37.032807.125957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Xu X, Zhao P, Chen S-J. Vfold: a web server for RNA structure and folding thermodynamics prediction. PLoS ONE. 2014;9(9):e107504. doi: 10.1371/journal.pone.0107504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31:3406–3415. doi: 10.1093/nar/gkg595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Bellaousov S, Reuter JS, Seetin MG, Methews DH. RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res. 2013;41:W471–W474. doi: 10.1093/nar/gkt290. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Xu X, Chen S-J. Kinetic mechanism of conformational switch between bistable RNA hairpins. J Am Chem Soc. 2012;134:12499–12507. doi: 10.1021/ja3013819. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cao S, Chen S-J. Predicting kissing interactions in microRNA-target complex and assessment of microRNA activity. Nucleic Acids Res. 2012;40:4681–4690. doi: 10.1093/nar/gks052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cao S, Chen S-J. Structure and stability of RNA/RNA kissing complex: with application of HIV dimerization initiation signal. RNA. 2011;17:2130–2143. doi: 10.1261/rna.026658.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cao S, Xu X, Chen S-J. Predicting structure and stability for RNA complexes with intermolecular loop- loop base pairing. RNA. 2014;20:835–845. doi: 10.1261/rna.043976.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. Bridging the gap in RNA structure prediction. Curr Opin Struct Biol. 2007;17:157–165. doi: 10.1016/j.sbi.2007.03.001. [DOI] [PubMed] [Google Scholar]
- 22.Rother K, Rother M, Boniecki M, Puton T, Bujnicki JM. RNA and protein 3D structure modeling: similarities and differences. J Mol Model. 2011;17:2325–2336. doi: 10.1007/s00894-010-0951-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Laing C, Schlick T. Computational approaches to RNA structure prediction, analysis, and design. Curr Opin Struct Biol. 2011;21:306–318. doi: 10.1016/j.sbi.2011.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Sim AY, Minary P, Levitt M. Modeling nucleic acids. Curr Opin Struct Biol. 2012;22:1–6. doi: 10.1016/j.sbi.2012.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Tan RK, Petrov AS, Harvey SC. YUP: a molecular simulation program for coarse-grained and multi-scaled models. J Chem Theory Comput. 2006;2:529–540. doi: 10.1021/ct050323r. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jonikas MA, Radmer RJ, Laederach A, Das R, Pearlman S, Herschlag D, Altman RB. Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA. 2009;15:189–199. doi: 10.1261/rna.1270809. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Sharma S, Ding F, Dokholyan NV. iFoldRNA: three-dimensional RNA structure prediction and folding. Bioinformatics. 2008;24:1951–1952. doi: 10.1093/bioinformatics/btn328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Xia Z, Bell DR, Shi Y, Ren P. RNA 3D structure prediction by using a coarse-grained model and experimental data. J Phys Chem B. 2013;117:3135–3144. doi: 10.1021/jp400751w. [DOI] [PubMed] [Google Scholar]
- 29.Das R, Karanicolas J, Baker D. Atomic accuracy in predicting and designing noncanonical RNA structure. Nat Methods. 2010;7:291–294. doi: 10.1038/nmeth.1433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Cao S, Chen S-J. Physics-based de novo prediction of RNA 3D structures. J Phys Chem B. 2011;115:4216–4226. doi: 10.1021/jp112059y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Parisien M, Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature. 2008;452:51–55. doi: 10.1038/nature06684. [DOI] [PubMed] [Google Scholar]
- 32.Cao S, Chen S-J. Predicting RNA psuedoknot folding thermodynamics. Nucleic Acids Res. 2006;34:2634–2652. doi: 10.1093/nar/gkl346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cao S, Chen S-J. Predicting structures and stabilities for H-type pseudoknots with inter-helix loop. RNA. 2009;15:696–706. doi: 10.1261/rna.1429009. [DOI] [PMC free article] [PubMed] [Google Scholar]