Skip to main content
. 2008 Jun 14;9:280. doi: 10.1186/1471-2105-9-280

Table 1.

Comparison of the results obtained with different classifiers in a variety of data-sets.

Data-set genes samples DP kNN WV LDA SVM ML-s ML-d
BRCA1 3226 7 BRCA1-positive 21/22 18/22 (1) 18/22 18/22 18/22 19/22 16/22
15 BRCA1-negative
BRCA2 3226 8 BRCA2-positive 21/22 21/22 (1) 17/22 19/22 18/22 17/22 17/22
14 BRCA2-negative
PROS 12600 52 tumor tissue 93/102 90/102 (5) 61/102 92/102 93/102 64/102 50/102
50 normal tissue
PROS-OUT 12625 8 non-recurrence 15/21 12/21 (1) 12/21 13/21 14/21 13/21 13/21
13 recurrence
DLBCL-FL 6817 52 DLBCL 74/77 71/77 (7) 63/77 74/77 74/77 65/77 58/77
25 FL
ALL-AML 6817 27 AML 38/38 37/38 (3) 38/38 38/38 38/38 30/38 27/38
11 ALL
I-2000 2000 40 tumor colon tissue 61/62 59/62 (3) 58/62 61/62 61/62 59/62 58/62
22 normal colon tissue

Columns indicate the algorithm used, rows the data-set. In each cell the number in the numerator specifies the number of left-out-samples that has been correctly classified by the corresponding algorithm. The value in the denominator is the total number of samples n. The kNN algorithm has a free parameter that needs to be determined – the number of neighbors k. To allow for a fair comparison, we have optimized this value for each of the databases using cross-validation [12]. The optimal resulting value is specified in parenthesis. In the ML classifier, we consider two cases: those where the two classes are assumed to have the same variance, and those where the variances are assumed to be different. These are referred to as ML-s (same) and ML-d (different).