Skip to main content
. Author manuscript; available in PMC: 2014 Oct 3.
Published in final edited form as: J Mach Learn Res. 2013 Feb;14:499–566.

Table 12.

Real data sets used in the experiments.

Name Domain # of samples # of variables Response type Data type CV design References
Infant_Mortality clinical 5,337 86 Death within the first year Discrete Holdout Mani and Cooper (1999)
Ohsumed Text 5,000 14,373 Relevant to neonatal diseases Continuous Holdout Joachims (2002)
ACPJ_Etiology Text 15,779 28,228 Relevant to etiology Continuous Holdout Aphinyanaphongs et al. (2006)
Lymphoma Gene Expression 227 7,399 3-year survival:dead vs. alive Continuous 10-fold Rosenwald et al. (2002)
Gisette Digit recognition 7,000 5,000 4 vs. 9 Continuous Holdout NIPS 2003 Feature Selection Challenge Guyon et al. (2006)
Dexter Text 600 19,999 Relevant to corporate acquisitions Continuous 10-fold NIPS 2003 Feature Selection Challenge Guyon et al. (2006)
Sylva Ecology 14,394 216 Ponderosa vs. rest Continuous Holdout WCCI 2006 Perf. Prediction Challenge
Ovarian_Cancer Proteomics 216 2,190 Cancer vs. normal Continuous 10-fold Conrads et al. (2004)
Thrombin Drug discovery 2,543 139,351 Binding to thrombin Discrete Holdout KDD Cup 2001
Breast_Cancer Gene Expression 286 17,816 ER+vs. ER− Continuous 10-fold Wang et al. (2005)
Hiva Drug discovery 4,229 1,617 Activity to HIV AIDS infection Discrete Holdout WCCI 2006 Perf.Prediction Challenge
Nova Text 1,929 16,969 Political topics vs. religious Discrete Holdout WCCI 2006 Perf. Prediction Chanllenge
Bankruptcy Financial 7,063 147 Personal bankruptcy Continuous Holdout Foster and Stine (2004)