Table 3. Prediction accuracies using ten times 10-fold cross validation with different gene subsets.
Data Used | Learning Models | |||
SVM | RF | KNN | DLDA | |
All Genes (10592 genes) | 45.9%±5.7 | 52.9%±11.3 | 61.1%±8.1 | 52.6%±8.2 |
Fisher Analysis using a Training Set (Top 100 genes) | 49.3%±11.3 | 53.4%±7.6 | 53.0%±6.6 | 54.2%±5.4 |
Student's T-Test Analysis using a Training Set (Top 100 genes) | 45.7%±10.5 | 51.6%±5.7 | 47.9%±13.9 | 56.6%±6.1 |
SAM Analysis with an FDR = 8% (34 genes) | 70.2%±5.7 | 74.9%±8.1 | 73.7%±4.5 | 80.9%±2.2 |
Fisher Analysis (Top 100 genes) | 67.8%±11.2 | 69.8%±4.6 | 54.9%±6.3 | 70.6%±4.0 |
Student's T-Test Analysis (Top 100 genes) | 77.8%±7.9 | 71.0%±4.9 | 74.0%±4.9 | 78.8%±2.8 |
Combined SAM, Fisher, & Student's T-Test (9 genes) | 73.0%±7.1 | 68.5%±5.3 | 66.5%±4.0 | 69.2%±3.2 |
Golub Cancer Data (7129 genes) | 98.6%±0.0 | 96.9%±2.2 | 92.0%±2.9 | 89.0%±4.1 |
SVM, support vector machine; RF, random forest; KNN, K-nearest neighbour, DLDA, diagonal linear discriminant analysis; SAM, significance analysis of microarrays; FDR, false discovery rate. Values represent mean±standard deviation corresponding to the 95% confidence interval.