Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2013 May 10;10(7):1185–1196. doi: 10.4161/rna.24971

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2013 Landes Bioscience

This is an open-access article licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. The article may be redistributed, reproduced, and reused for non-commercial purposes, provided the original source is properly cited.

PMC Copyright notice

graphic file with name rna-10-1185-g1.jpg — Figure 1.Unified description of different methods for single-sequence RNA secondary structure prediction. The menu of elements that define a method are: architecture, scoring scheme, parameterization and inference method. The architecture consists of the list of features which, in turn, determine the number of parameters of the model. The different architectures one can devise for a nested RNA secondary structure all fall into the category of a Context-Free Grammar (CFG). Any architecture can be implemented using either thermodynamic, weights or probabilistic parameters. Both weight and probabilistic schemes can be trained on data (statistical). There are statistical weight schemes such as CLLMs. Statistical probabilistic schemes for RNA folding are usually stochastic CFGs (SCFGs). Notice that SCFGs are a subset of CFGs. SCFGs describe models with a probabilistic scheme, while the concept CFG applies to all scoring schemes. The assignment of values for the parameters (parameterization) depends on the scoring scheme used. Thermodynamic models take values as kcal/mol free-energy estimations from experimental data. Conditional Log-Linear models use methods that require numerical optimization (CML and also online training). Probabilistic models are usually trained by maximum likelihood methods, which simply require obtaining frequencies of occurrences in the training set [and the addition of at least Laplace (+1) priors]. Once an architecture, scoring scheme and parameterization are in place (that is, a “model”), one can use different algorithms to infer plausible secondary structures. Unlike training, which is specific for the different scoring schemes, the folding algorithms (usually dynamic programming algorithms) are essentially identical for all parameterizations (although oftentimes they have different names). A side note; the term “probabilistic” often leads to confusion. In the end, all scoring schemes (probabilistic or not) can give us insight into the probabilistic distribution of structures (π_s) for a given sequence (s) (the so-called Boltzmann ensemble in a thermodynamic scheme). For instance, one can calculate the distribution’s partition function (via the McCaskill or inside algorithms) or rigorously sample structures from that distribution. However, what is normally referred to as a “probabilistic” model is one in which the parameters of the model are themselves probabilities. Probabilistic models are “generative” models, which means that in addition to the Boltzmann ensemble per sequence, they also provide insight into the joint distribution for the ensemble of sequences and structures. With a probabilistic method, one can quite naturally generate sequences together with their structures according to the model.