Skip to main content
. 2010 Jan 14;26(5):668–675. doi: 10.1093/bioinformatics/btq005

Table 1.

Biomedical datasets used for the comparison experiments

# T #C #A #S M Reference
1 D 2 6584 61 0.651 Alon et al. (1999)
2 D 3 12 582 72 0.387 Armstrong et al. (2002)
3 P 2 5372 86 0.795 Beer et al. (2002)
4 D 5 12 600 203 0.657 Bhattacharjee et al. (2001)
5 P 2 5372 69 0.746 Bhattacharjee et al. (2001)
6 D 2 7129 72 0.650 Golub et al., 1999
7 D 2 7464 36 0.500 Hedenfalk et al. (2001)
8 P 2 7129 60 0.661 Iizuka et al. (2003)
9 D 4 2308 83 0.345 Khan et al. (2001)
10 D 4 12 625 50 0.296 Nutt et al. (2003)
11 D 5 7129 90 0.642 Pomeroy et al. (2002)
12 P 2 7129 60 0.645 Pomeroy et al. (2002)
13 D 26 16 063 280 0.574 Ramaswamy et al. (2001)
14 P 2 7399 240 0.145 Rosenwald et al. (2002)
15 D 9 7129 60 0.506 Staunton et al. (2001)
16 D 2 7129 77 0.746 Shipp et al. (2002)
17 D 2 10 510 102 0.150 Singh et al. (2002)
18 D 11 12 533 174 0.150 Su et al. (2001)
19 P 2 24 481 78 0.562 van't Veer et al. (2002)
20 D 2 7039 39 0.878 Welsh et al. (2001)
21 P 2 12 625 249 0.805 Yeoh et al. (2002)
22 D 2 11 003 322 0.784 Petricoin et al. (2002)
23 D 3 11 170 159 0.364 Pusztai et al. (2004)
24 D 2 36 778 52 0.556 Ranganathan (2005)

In the type (T) column, P signifies prognostic and D signifies diagnostic. #C represents the number of classes, #A the number of attributes within the dataset, #S the number of samples and M is the fraction of the data covered by the most frequent target value. The first 21 datasets contain genomic data, whereas the last three datasets contain proteomic data.