. 2017 Jul 6;7(9):2931–2943. doi: 10.1534/g3.117.044024

Table 1. Performance of different predictive models on the training/testing dataset of 737 sequences (T737) during 10-fold cross-validation. Evaluation of the models on an independent validation dataset (V185).

			PCC on Training/Testing Sets (T737) and Independent Validation Sets (V185) Using 10nCV
Predictive Model No.	siRNA Feature Name	No. of Features	T737	V185
1	Mononucleotide composition	4	0.53	0.54
2	Dinucleotide composition	16	0.68	0.64
3	Trinucleotide composition	64	0.70	0.66
4	Tetranucleotide composition	256	0.69	0.65
5	Pentanucleotide composition	1024	0.68	0.63
6	Binary	76	0.55	0.56
7	1+2	20	0.67	0.63
8	1+2+3	84	0.70	0.63
9	1+2+3+4	340	0.71	0.65
10	1+2+3+4+5	1364	0.71	0.65
11	1+2+3+4+6 (ASPsiPred^SVM)	416	0.71	0.65
12	1+2+3+4+5+6	1440	0.71	0.65
13	Thermodynamic feature	21	0.41	0.30
14	Secondary structure	19	0.24	0.07
15	13+14	40	0.35	0.23
16	12+13	437	0.71	0.65
17	12+14	435	0.71	0.65
18	12+13+14	456	0.71	0.65
19	ASPsiPred^matrix	Matrix based	Developed on rules-based studies	0.63

PCC, Pearson correlation coefficient; 10nCV, 10-fold cross-validation; T737, training/testing dataset for 10-fold cross-validation; V185, independent validation dataset. PCC is between actual and observed Eff^mut. Training/testing dataset is used to train different predictive models, while independent validation dataset was not used anywhere during training/testing of algorithm.