Vfold-Pipeline: a web server for RNA 3D structure prediction from sequences

Jun Li; Sicheng Zhang; Dong Zhang; Shi-Jie Chen

doi:10.1093/bioinformatics/btac426

. 2022 Jun 27;38(16):4042–4043. doi: 10.1093/bioinformatics/btac426

Vfold-Pipeline: a web server for RNA 3D structure prediction from sequences

Jun Li ¹, Sicheng Zhang ², Dong Zhang ³, Shi-Jie Chen ^4,^✉

Editor: Yann Ponty

PMCID: PMC9364377 PMID: 35758624

Abstract

Summary

RNA 3D structures are critical for understanding their functions and for RNA-targeted drug design. However, experimental determination of RNA 3D structures is laborious and technically challenging, leading to the huge gap between the number of sequences and the availability of RNA structures. Therefore, the computer-aided structure prediction of RNA 3D structures from sequences becomes a highly desirable solution to this problem. Here, we present a pipeline server for RNA 3D structure prediction from sequences that integrates the Vfold2D, Vfold3D and VfoldLA programs. The Vfold2D program can incorporate the SHAPE experimental data in 2D structure prediction. The pipeline can also automatically extract 2D structural constraints from the Rfam database. Furthermore, with a significantly expanded 3D template database for various motifs, this Vfold-Pipeline server can efficiently return accurate 3D structure predictions or reliable initial 3D structures for further refinement.

Availability and implementation

http://rna.physics.missouri.edu/vfoldPipeline/index.html. The data underlying this article have been provided in the article and in its online supplementary material.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

The various functions of the non-coding RNAs are dictated by their 3D structures. In addition to the experimental methods, computational prediction is becoming an increasingly important approach to determine RNA 3D structures. Many RNA 2D (Hofacker, 2003; Reuter and Mathews, 2010; Zuker, 2003) and 3D (Cragnolini et al., 2015; Das and Baker, 2007; Jonikas et al., 2009; Miao and Westhof, 2017; Parisien and Major, 2008; Popenda et al., 2012; Sharma et al., 2008; Zhang et al., 2021) structure prediction programs have been developed. Previously, we developed the Vfold2D model (Cao and Chen, 2005; Xu et al., 2014) for RNA 2D structure prediction given sequence data, the Vfold3D and VfoldLA models for RNA 3D structure prediction given secondary structures. Here, we introduce a server for fully automated 3D structure prediction from sequence data using a pipeline connecting the Vfold2D (Cao and Chen, 2005; Xu et al., 2014), Vfold3D (Cao and Chen, 2011; Xu et al., 2014) and VfoldLA (Xu and Chen, 2018; Xu et al., 2019) programs. The current Vfold2D program used in the pipeline can utilize the SHAPE experimental information as constraints for 2D structure prediction and can automatically fetch the consensus 2D constraints from the Rfam database (Kalvari et al., 2021). The Rfam database is a collection of RNA families and users can get the 2D structural information by sequence search. The current Vfold3D program uses a new, significantly enlarged motif template database. Such a Vfold-Pipeline server would facilitate RNA 3D structure prediction from sequences.

2 Implementation

The workflow of the Vfold-Pipeline web server is illustrated in Figure 1. The input to the server is RNA sequence, and optionally users can provide 2D structure or SHAPE experimental information for 2D structure prediction. The output of the server is the predicted 2D and 3D structures. The data in the back end flows from the input information to the Vfold2D program (Cao and Chen, 2005; Xu et al., 2014) and then to the Vfold3D (Cao and Chen, 2011; Xu et al., 2014) and VfoldLA (Xu and Chen, 2018; Xu et al., 2019) programs. We will briefly introduce the calculation process next.

Fig. 1. — The workflow of the Vfold-Pipeline server. If the 2D structure of the input RNA sequence is provided by uses, the Vfold2D program (Cao and Chen, 2005; Xu *et al.*, 2014) will be launched for 2D structure prediction. Based on the input or predicted 2D structures, the Vfold3D program (Cao and Chen, 2011; Xu *et al.*, 2014) will be started to predict 3D structures by assembling the 3D templates of the motifs in 2D structures. If the Vfold3D program fails in predicting 3D structures, the VfoldLA program (Xu and Chen, 2018; Xu *et al.*, 2019) will be used for 3D structure assembly based on loop templates

User input: RNA sequence information is an essential input to the server. If the 2D structure is also provided by users, the 2D structure prediction step will be skipped otherwise 2D structure prediction will be performed in the next step using the Vfold2D program. Moreover, SHAPE experimental information can be uploaded to the server and treated as constraints in 2D structure prediction. We have provided three input examples and their results on the main web page.

2D structure prediction: If the 2D structure is not provided by users, the Vfold2D program (Cao and Chen, 2005; Xu et al., 2014) is launched to predict 2D structures. Vfold2D is a free-energy-based model which considers the free energies of various loop motifs and mismatched base pairs. Before prediction, the server first automatically performs a sequence search in the Rfam database. If any 2D constraints are found, they will be used in the following non-pseudoknotted (non-PK) 2D structure prediction. SHAPE experimental information can also be used as constraints in 2D structure prediction. Users can specify the temperature used for free-energy calculation and the number of predicted 2D structures. Moreover, H-type PK 2D structures can be predicted if the option is chosen by users. For RNAs shorter than 160 nucleotides, the 2D structure prediction can usually be done in 30 s for non-PK structures and in 50–1000 s for PK structures. The details about the running time can be found in Supplementary Section SII and Figure S10.

3D structure prediction: Based on the predicted or user input 2D structures, 3D structure prediction will be initiated by the Vfold3D program (Cao and Chen, 2011; Xu et al., 2014), which uses the motif template assembly method. The motif templates are searched and ranked according to motif type, size (sequence length) and sequence similarity with target motifs. Compared with the previous Vfold3D program, the current one involves a much larger motif template database extracted from the RNA structures deposited in the PDB database. The motif template database is augmented by opening/closing various terminal base pairs in helices, by closing/opening base pairs in loops and by assigning the different starting 5ʹ end when reading the (closed) sequence of a loop. The details about the motif augmentation can be found in the Supplementary Section SI, Figures S1–S4 and Table S1. The current motif template database includes 6972 hairpin loops, 117 647 bulge/internal loops, 101 986 three-way, 177 090 four-way, 244 805 five-way, 143 096 six-way, 562 760 seven-way junction loops, 2485 pseudoknots, 12 404 hairpin–hairpin kissing loops, 1015 5ʹ-end tail loops and 1161 3ʹ-end tail loops. For the motifs not defined or without templates in our database, Vfold3D would fail in giving structures. In that case, the VfoldLA program (Xu and Chen, 2018; Xu et al., 2019) will be launched to predict the 3D structures by assembling the loop templates and helices. The loop templates are also searched and ranked according to loop type, loop size and loop sequence similarity with target loops. Up to 10 predicted all-atom structures will be obtained from the Vfold3D or VfoldLA programs. The all-atom energy minimization is performed in the final step to fix the broken bonds and remove steric clashes. Users can also exclude RNAs from the motif/loop template database for Vfold3D and VfoldLA computations. For RNAs shorter than 160 nucleotides, the 3D structure prediction is usually completed in 500 s for non-PK structures and in 1000 s for PK structures, including the all-atom energy minimization implemented in the software QRNAS (Stasiewicz et al., 2019). The details about the running time can be found in Supplementary Section SII and Figure S10.

When the above processes finish, a result summary web page will be generated and the link will be sent to users via email if provided. The predicted 2D and 3D structures can be retrieved from the web page; see Supplementary Figures S11 and S12 for the input and output snapshots, respectively, of the Vfold-Pipeline server. We have also made a performance comparison with the well-known RNAComposer server (Popenda et al., 2012). The results show that our server can give similar or more accurate predictions. The detailed results can be found in Supplementary Section SII, Figures S5–S8 and Table S2. Our existing pipeline, however, has several notable weaknesses. It cannot handle multi-stranded RNAs, can fail in giving predictions for large, complex structures and takes a longer computation time than RNAComposer. In the future, we plan to refine the Vfold-Pipeline server by incorporating simulations into the pipeline to predict larger, more complex structures. The Vfold-Pipeline server and the source codes are freely accessible at http://rna.physics.missouri.edu/vfoldPipeline/index.html.

Supplementary Material

btac426_Supplementary_Data

Click here for additional data file.^{(1.4MB, pdf)}

Acknowledgement

We thank Dr Yuanzhe Zhou for helping optimize the source codes in the Vfold2D program.

Funding

This work was supported by the National Institutes of Health [R35-GM134919 to S.-J.C].

Conflict of Interest: none declared.

Contributor Information

Jun Li, Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.

Sicheng Zhang, Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.

Dong Zhang, College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou 310058, China.

Shi-Jie Chen, Department of Physics, Department of Biochemistry, and Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA.

References

Cao S., Chen S.J. (2005) Predicting RNA folding thermodynamics with a reduced chain representation model. RNA, 11, 1884–1897. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cao S., Chen S.J. (2011) Physics-based de novo prediction of RNA 3D structures. J. Phys. Chem. B, 115, 4216–4226. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cragnolini T. et al. (2015) Coarse-grained HiRE-RNA model for ab initio RNA folding beyond simple molecules, including noncanonical and multiple base pairings. J. Chem. Theory Comput., 11, 3510–3522. [DOI] [PubMed] [Google Scholar]
Das R., Baker D. (2007) Automated de novo prediction of native-like RNA tertiary structures. Proc. Natl. Acad. Sci. USA, 104, 14664–14669. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hofacker I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429–3431. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jonikas M.A. et al. (2009) Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA, 15, 189–199. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kalvari I. et al. (2021) Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res., 49, D192–D200. [DOI] [PMC free article] [PubMed] [Google Scholar]
Miao Z., Westhof E. (2017) RNA structure: advances and assessment of 3D structure prediction. Annu. Rev. Biophys., 46, 483–503. [DOI] [PubMed] [Google Scholar]
Parisien M., Major F. (2008) The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature, 452, 51–55. [DOI] [PubMed] [Google Scholar]
Popenda M. et al. (2012) Automated 3D structure composition for large RNAs. Nucleic Acids Res., 40, e112. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reuter J.S., Mathews D.H. (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics, 11, 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sharma S. et al. (2008) iFoldRNA: three-dimensional RNA structure prediction and folding. Bioinformatics, 24, 1951–1952. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stasiewicz J. et al. (2019) QRNAS: software tool for refinement of nucleic acid structures. BMC Struct. Biol., 19, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu X., Chen S.J. (2018) Hierarchical assembly of RNA three-dimensional structures based on loop templates. J. Phys. Chem. B, 122, 5327–5335. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu X. et al. (2014) Vfold: a web server for RNA structure and folding thermodynamics prediction. PLoS One, 9, e107504. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu X. et al. (2019) VfoldLA: a web server for loop assembly-based prediction of putative 3D RNA structures. J. Struct. Biol., 207, 235–240. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang D. et al. (2021) IsRNA1: de novo prediction and blind screening of RNA 3D structures. J. Chem. Theory Comput., 17, 1842–1857. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuker M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31, 3406–3415. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btac426_Supplementary_Data

Click here for additional data file.^{(1.4MB, pdf)}

[btac426-B1] Cao S., Chen S.J. (2005) Predicting RNA folding thermodynamics with a reduced chain representation model. RNA, 11, 1884–1897. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B2] Cao S., Chen S.J. (2011) Physics-based de novo prediction of RNA 3D structures. J. Phys. Chem. B, 115, 4216–4226. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B3] Cragnolini T. et al. (2015) Coarse-grained HiRE-RNA model for ab initio RNA folding beyond simple molecules, including noncanonical and multiple base pairings. J. Chem. Theory Comput., 11, 3510–3522. [DOI] [PubMed] [Google Scholar]

[btac426-B4] Das R., Baker D. (2007) Automated de novo prediction of native-like RNA tertiary structures. Proc. Natl. Acad. Sci. USA, 104, 14664–14669. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B5] Hofacker I.L. (2003) Vienna RNA secondary structure server. Nucleic Acids Res., 31, 3429–3431. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B6] Jonikas M.A. et al. (2009) Coarse-grained modeling of large RNA molecules with knowledge-based potentials and structural filters. RNA, 15, 189–199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B7] Kalvari I. et al. (2021) Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res., 49, D192–D200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B8] Miao Z., Westhof E. (2017) RNA structure: advances and assessment of 3D structure prediction. Annu. Rev. Biophys., 46, 483–503. [DOI] [PubMed] [Google Scholar]

[btac426-B9] Parisien M., Major F. (2008) The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature, 452, 51–55. [DOI] [PubMed] [Google Scholar]

[btac426-B10] Popenda M. et al. (2012) Automated 3D structure composition for large RNAs. Nucleic Acids Res., 40, e112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B11] Reuter J.S., Mathews D.H. (2010) RNAstructure: software for RNA secondary structure prediction and analysis. BMC Bioinformatics, 11, 129. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B12] Sharma S. et al. (2008) iFoldRNA: three-dimensional RNA structure prediction and folding. Bioinformatics, 24, 1951–1952. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B13] Stasiewicz J. et al. (2019) QRNAS: software tool for refinement of nucleic acid structures. BMC Struct. Biol., 19, 5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B14] Xu X., Chen S.J. (2018) Hierarchical assembly of RNA three-dimensional structures based on loop templates. J. Phys. Chem. B, 122, 5327–5335. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B15] Xu X. et al. (2014) Vfold: a web server for RNA structure and folding thermodynamics prediction. PLoS One, 9, e107504. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B16] Xu X. et al. (2019) VfoldLA: a web server for loop assembly-based prediction of putative 3D RNA structures. J. Struct. Biol., 207, 235–240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B17] Zhang D. et al. (2021) IsRNA1: de novo prediction and blind screening of RNA 3D structures. J. Chem. Theory Comput., 17, 1842–1857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btac426-B18] Zuker M. (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res., 31, 3406–3415. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Vfold-Pipeline: a web server for RNA 3D structure prediction from sequences

Jun Li

Sicheng Zhang

Dong Zhang

Shi-Jie Chen

Roles

Abstract

Summary

Availability and implementation

Supplementary information

1 Introduction

2 Implementation

Fig. 1.

Supplementary Material

Acknowledgement

Funding

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Vfold-Pipeline: a web server for RNA 3D structure prediction from sequences

Jun Li

Sicheng Zhang

Dong Zhang

Shi-Jie Chen

Roles

Abstract

Summary

Availability and implementation

Supplementary information

1 Introduction

2 Implementation

Fig. 1.

Supplementary Material

Acknowledgement

Funding

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases