Table 1. Classifier-dependent 10-fold cross-validation error rates.
SVM | ||||||
C | 0.5 | 2 | 10 | 30 | 40 | 50 |
Error | 0.114 | 0.107 | 0.086 | 0.087 | 0.084 | 0.087 |
RF | ||||||
No. of trees | 80 | 120 | 160 | 200 | 240 | 280 |
Error | 0.142 | 0.141 | 0.146 | 0.145 | 0.144 | 0.143 |
LR | ||||||
Reg. Param. | 0.001 (L1) | 0.01 (L1) | 0.1 (L1) | 0.001 (L2) | 0.01(L2) | 0.1 (L2) |
Error | 0.138 | 0.133 | 0.130 | 0.127 | 0.121 | 0.119 |
Best-performing ten-fold cross validation models (of all models with feature counts between 1 to 200) for each classification algorithm and for different parameter values. For the SVM, the value of the gamma scaling parameter is optimised at 0.007. For the random forest model, the leaf size is optimised at 1 for any number of trees. Note that for the logistic regression model regularisation outperforms regularisation, but does not achieve the accuracy of the SVM, even for poor choices of .