. 2013 May 10;10(7):1185–1196. doi: 10.4161/rna.24971

Table 1. Comparison of different methods for RNA secondary structure prediction.

Method	Architecture # free tied parameters		Scoring scheme	Parameterization	Training datasets	Folding method	Benchmark Set best F (%)
	(6 bps)	(16 bps)					TestSetA	TestSetB
g6	11	21	probabilistic	ML	TrainSetA+2*TranSetB	c-MEA	49.1	47.5
^●basic grammar	532	572	probabilistic	ML	TrainSetA+2*TranSetB	c-MEA	56.9	56.5
^⋄CONTRAfold v2.02	~300	-	weights	CML	S-Processed-TRA	c-MEA	57.2	57.9
^●CONTRAfoldG	1,278	5,448	probabilistic	ML	TrainSetA+2*TranSetB	c-MEA	58.3	58.6
^⋄UNAFold-3.8	~3,500	-	thermodynamic	fit to exp. data	-	CYK	51.0	51.3
^⋄Simfold BL*	~as above	-	weights	CML	S-Processed-TRA	CYK	56.5	55.3
^⋄RNAstructure v5.2	~12,700	-	thermodynamic	fit to exp. data	-	GCE	53.5	53.8
^⋄ViennaRNA v1.8.4	~as above	-	thermodynamic	fit to exp. data	-	GCE	53.7	54.3
^●ViennaRNAG	14,307	90,497	probabilistic	ML	TrainSetA+2*TranSetB	c-MEA	60.2	59.4
^●ViennaRNAGz_bulge2_ld_mdangle	14,557	91,997	probabilistic	ML	TrainSetA+2*TranSetB	c-MEA	60.5	59.5
^⋄ContextFold v1.00	205,000	-	weights	online CML	S-Full	CYK	64.4	49.0

Models. Models with a “⋄” are versioned stand-alone packages. Models with a “^●” are CFGs (with alternative scoring schemes) introduced in reference 39. In particular, ViennaRNAG is a CFG that when parameterized with thermodynamic scores reproduces the ViennaRNA v1.8.4 method, and CONTRAfoldG is another CFG that when parameterized with particular scores reproduces CONTRAfold v2.02. Here, we present the results of probabilistic parameterizations for those grammars. Parameters. Methods are order by increasing number of parameters. Here we report the effective free parameters after tying. (The number of parameters for some of the native thermodynamic methods is only approximate and corresponds to two different versions of the nearest-neighbor model.) Test sets. TestSetA is a well curated collection of sequences from about 10 bona-fide RNA structures. TestSetB includes a collection of about 22 different RNA structure obtained from Rfam v10.0. TestSetA and TestSetB are structurally dissimilar, and they have been defined in reference 39. Performance accuracy. We use F (the harmonic mean of sensitivity and positive predictive value), such that an F of 100% would mean perfect prediction. Performance accuracy is calculated for the entire test set of sequences (instead of averaging the accuracy of each individual sequence). This “total” measures tend to be smaller than those obtained by averaging over sequences because it corrects for the (usually abundant) small sequences in the test sets for which prediction is much easier than for longer sequences. For methods that use a MEA algorithm with a tunable parameter (both c-MEA³¹ and GCE³⁶), this table report the “best F” in the ROC curve between sensitivity and positive predictive value (see ref. 39 for more details). Training sets. Provenance of training sets is as follows: TrainSetA+ 2*TrainSetB ,³⁹ S-Processed-TRA,³³ S-Full.³⁴