Skip to main content
. 2007 Dec 19;2(12):e1344. doi: 10.1371/journal.pone.0001344

Table 3. Prediction accuracies using ten times 10-fold cross validation with different gene subsets.

Data Used Learning Models
SVM RF KNN DLDA
All Genes (10592 genes) 45.9%±5.7 52.9%±11.3 61.1%±8.1 52.6%±8.2
Fisher Analysis using a Training Set (Top 100 genes) 49.3%±11.3 53.4%±7.6 53.0%±6.6 54.2%±5.4
Student's T-Test Analysis using a Training Set (Top 100 genes) 45.7%±10.5 51.6%±5.7 47.9%±13.9 56.6%±6.1
SAM Analysis with an FDR = 8% (34 genes) 70.2%±5.7 74.9%±8.1 73.7%±4.5 80.9%±2.2
Fisher Analysis (Top 100 genes) 67.8%±11.2 69.8%±4.6 54.9%±6.3 70.6%±4.0
Student's T-Test Analysis (Top 100 genes) 77.8%±7.9 71.0%±4.9 74.0%±4.9 78.8%±2.8
Combined SAM, Fisher, & Student's T-Test (9 genes) 73.0%±7.1 68.5%±5.3 66.5%±4.0 69.2%±3.2
Golub Cancer Data (7129 genes) 98.6%±0.0 96.9%±2.2 92.0%±2.9 89.0%±4.1

SVM, support vector machine; RF, random forest; KNN, K-nearest neighbour, DLDA, diagonal linear discriminant analysis; SAM, significance analysis of microarrays; FDR, false discovery rate. Values represent mean±standard deviation corresponding to the 95% confidence interval.