Skip to main content
. 2013 May 10;10(7):1185–1196. doi: 10.4161/rna.24971

Table 1. Comparison of different methods for RNA secondary structure prediction.

Method Architecture
# free tied parameters
Scoring scheme Parameterization Training datasets Folding method Benchmark
Set best F (%)
 
(6 bps)
(16 bps)
 
TestSetA
TestSetB
g6
11
21
probabilistic
ML
TrainSetA+2*TranSetB
c-MEA
49.1
47.5
basic grammar
532
572
probabilistic
ML
TrainSetA+2*TranSetB
c-MEA
56.9
56.5
CONTRAfold v2.02
~300
­-
weights
CML
S-Processed-TRA
c-MEA
57.2
57.9
CONTRAfoldG
1,278
5,448
probabilistic
ML
TrainSetA+2*TranSetB
c-MEA
58.3
58.6
UNAFold-3.8
~3,500
-
thermodynamic
fit to exp. data
-
CYK
51.0
51.3
Simfold BL*
~as above
-
weights
CML
S-Processed-TRA
CYK
56.5
55.3
RNAstructure v5.2
~12,700
-
thermodynamic
fit to exp. data
-
GCE
53.5
53.8
ViennaRNA v1.8.4
~as above
-
thermodynamic
fit to exp. data
-
GCE
53.7
54.3
ViennaRNAG
14,307
90,497
probabilistic
ML
TrainSetA+2*TranSetB
c-MEA
60.2
59.4
ViennaRNAGz_bulge2_ld_mdangle
14,557
91,997
probabilistic
ML
TrainSetA+2*TranSetB
c-MEA
60.5
59.5
ContextFold v1.00 205,000 - weights online CML S-Full CYK 64.4 49.0

Models. Models with a “⋄” are versioned stand-alone packages. Models with a “” are CFGs (with alternative scoring schemes) introduced in reference 39. In particular, ViennaRNAG is a CFG that when parameterized with thermodynamic scores reproduces the ViennaRNA v1.8.4 method, and CONTRAfoldG is another CFG that when parameterized with particular scores reproduces CONTRAfold v2.02. Here, we present the results of probabilistic parameterizations for those grammars. Parameters. Methods are order by increasing number of parameters. Here we report the effective free parameters after tying. (The number of parameters for some of the native thermodynamic methods is only approximate and corresponds to two different versions of the nearest-neighbor model.) Test sets. TestSetA is a well curated collection of sequences from about 10 bona-fide RNA structures. TestSetB includes a collection of about 22 different RNA structure obtained from Rfam v10.0. TestSetA and TestSetB are structurally dissimilar, and they have been defined in reference 39. Performance accuracy. We use F (the harmonic mean of sensitivity and positive predictive value), such that an F of 100% would mean perfect prediction. Performance accuracy is calculated for the entire test set of sequences (instead of averaging the accuracy of each individual sequence). This “total” measures tend to be smaller than those obtained by averaging over sequences because it corrects for the (usually abundant) small sequences in the test sets for which prediction is much easier than for longer sequences. For methods that use a MEA algorithm with a tunable parameter (both c-MEA31 and GCE36), this table report the “best F” in the ROC curve between sensitivity and positive predictive value (see ref. 39 for more details). Training sets. Provenance of training sets is as follows: TrainSetA+ 2*TrainSetB ,39 S-Processed-TRA,33 S-Full.34