Table 1.
Date set code |
Endpoint code |
Endpoint description |
Microarray platform |
Training seta |
Validation seta |
Comments and references | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of samples |
Positives (P) |
Negatives (N) |
P/N ratio |
Number of samples |
Positives (P) |
Negatives (N) |
P/N ratio |
|||||
Hamner | A | Lung tumorigen vs. non-tumorigen (mouse) | Affymetrix Mouse 430 2.0 | 70 | 26 | 44 | 0.59 | 88 | 28 | 60 | 0.47 | The training set was first published in 2007 (ref. 50) and the validation set was generated for MAQC-II |
Iconix | B | Non-genotoxic liver carcinogens vs. non-carcinogens (rat) | Amersham Uniset Rat 1 Bioarray | 216 | 73 | 143 | 0.51 | 201 | 57 | 144 | 0.40 | The data set was first published in 2007 (ref. 51). Raw microarray intensity data, instead of ratio data, were provided for MAQC-II data analysis |
NIEHS | C | Liver toxicants vs. non-toxicants based on overall necrosis score (rat) | Affymetrix Rat 230 2.0 | 214 | 79 | 135 | 0.58 | 204 | 78 | 126 | 0.62 | Exploratory visualization of the data set was reported in 2008 (ref. 53). However, the phenotype classification problem was formulated specifically for MAQC-II. A large amount of additional microarray and phenotype data were provided to MAQC-II for cross-platform and cross-tissue comparisons |
Breast cancer (BR) | D | Pre-operative treatment response (pCR, pathologic complete response) | Affymetrix Human U133A | 130 | 33 | 97 | 0.34 | 100 | 15 | 85 | 0.18 | The training set was first published in 2006 (ref. 56) and the validation set was specifically generated for MAQC-II. In addition, two distinct endpoints (D and E) were analyzed in MAQC-II |
E | Estrogen receptor status (erpos) | 130 | 80 | 50 | 1.6 | 100 | 61 | 39 | 1.56 | |||
Multiple myeloma (MM) | F | Overall survival milestone outcome (OS, 730-d cutoff) | Affymetrix Human U133Plus 2.0 | 340 | 51 | 289 | 0.18 | 214 | 27 | 187 | 0.14 | The data set was first published in 2006 (ref. 57) and 2007 (ref. 58). However, patient survival data were updated and the raw microarray data (CEL files) were provided specifically for MAQC-II data analysis. In addition, endpoints H and I were designed and analyzed specifically in MAQC-II |
G | Event-free survival milestone outcome (EFS, 730-d cutoff) | 340 | 84 | 256 | 0.33 | 214 | 34 | 180 | 0.19 | |||
H | Clinical parameter S1 (CPS1). The actual class label is the sex of the patient. Used as a “positive” control endpoint | 340 | 194 | 146 | 1.33 | 214 | 140 | 74 | 1.89 | |||
I | Clinical parameter R1 (CPR1). The actual class label is randomly assigned. Used as a “negative” control endpoint | 340 | 200 | 140 | 1.43 | 214 | 122 | 92 | 1.33 | |||
Neuro-blastoma (NB) | J | Overall survival milestone outcome (OS, 900-d cutoff) | Different versions of Agilent human microarrays | 238 | 22 | 216 | 0.10 | 177 | 39 | 138 | 0.28 | The training data set was first published in 2006 (ref. 63). The validation set (two-color Agilent platform) was generated specifically for MAQC-II. In addition, one-color Agilent platform data were also generated for most samples used in the training and validation sets specifically for MAQC-II to compare the prediction performance of two-color versus one-color platforms. Patient survival data were also updated. In addition, endpoints L and M were designed and analyzed specifically in MAQC-II |
K | Event-free survival milestone outcome (EFS, 900-d cutoff) | 239 | 49 | 190 | 0.26 | 193 | 83 | 110 | 0.75 | |||
L | Newly established parameter S (NEP_S). The actual class label is the sex of the patient. Used as a “positive” control endpoint | 246 | 145 | 101 | 1.44 | 231 | 133 | 98 | 1.36 | |||
M | Newly established parameter R (NEP_R). The actual class label is randomly assigned. Used as a “negative” control endpoint | 246 | 145 | 101 | 1.44 | 253 | 143 | 110 | 1.30 |
The first three data sets (Hamner, Iconix and NIEHS) are from preclinical toxicogenomics studies, whereas the other three data sets are from clinical studies. Endpoints H and L are positive controls (sex of patient) and endpoints I and M are negative controls (randomly assigned class labels). The nature of H, I, L and M was unknown to MAQC-II participants except for the project leader until all calculations were completed.
Numbers shown are the actual number of samples used for model development or validation.