Table 1.
Data-set | genes | samples | DP | kNN | WV | LDA | SVM | ML-s | ML-d |
BRCA1 | 3226 | 7 BRCA1-positive | 21/22 | 18/22 (1) | 18/22 | 18/22 | 18/22 | 19/22 | 16/22 |
15 BRCA1-negative | |||||||||
BRCA2 | 3226 | 8 BRCA2-positive | 21/22 | 21/22 (1) | 17/22 | 19/22 | 18/22 | 17/22 | 17/22 |
14 BRCA2-negative | |||||||||
PROS | 12600 | 52 tumor tissue | 93/102 | 90/102 (5) | 61/102 | 92/102 | 93/102 | 64/102 | 50/102 |
50 normal tissue | |||||||||
PROS-OUT | 12625 | 8 non-recurrence | 15/21 | 12/21 (1) | 12/21 | 13/21 | 14/21 | 13/21 | 13/21 |
13 recurrence | |||||||||
DLBCL-FL | 6817 | 52 DLBCL | 74/77 | 71/77 (7) | 63/77 | 74/77 | 74/77 | 65/77 | 58/77 |
25 FL | |||||||||
ALL-AML | 6817 | 27 AML | 38/38 | 37/38 (3) | 38/38 | 38/38 | 38/38 | 30/38 | 27/38 |
11 ALL | |||||||||
I-2000 | 2000 | 40 tumor colon tissue | 61/62 | 59/62 (3) | 58/62 | 61/62 | 61/62 | 59/62 | 58/62 |
22 normal colon tissue |
Columns indicate the algorithm used, rows the data-set. In each cell the number in the numerator specifies the number of left-out-samples that has been correctly classified by the corresponding algorithm. The value in the denominator is the total number of samples n. The kNN algorithm has a free parameter that needs to be determined – the number of neighbors k. To allow for a fair comparison, we have optimized this value for each of the databases using cross-validation [12]. The optimal resulting value is specified in parenthesis. In the ML classifier, we consider two cases: those where the two classes are assumed to have the same variance, and those where the variances are assumed to be different. These are referred to as ML-s (same) and ML-d (different).