Table 1.
Dataset | Microarray technology | Survival data | Treatment | No. of patients | Number of probes | Source | Reference |
EXPO | Affymetrix HGU | NA | NA | 353 | 54 675 | GEO: GSE2109 | (23) |
VDX† | Affymetrix HGU | RFS, DMFS | Untreated | 344 | 22 283 | GEO: GSE2034/GSE5327 | (24,25) |
NKI† | Agilent | RFS, DMFS,OS | Untreated, chemo | 337 | 24 481 | Rosetta Inpharmatics | (13,14) |
UCSF† | In-house cDNA | DNFS, RFS, OS | Untreated, chemo, hormonal | 162 | 10 368 | Authors’ website | (26,27) |
STNO2† | In-house cDNA | RFS, OS | Untreated, chemo, hormonal | 122 | 7787 | SMD | (6) |
NCI† | In-house cDNA | RFS | Untreated, chemo, hormonal | 99 | 6878 | Authors’ website | (7) |
MSK | Affymetrix HGU | DMFS | Heterogeneous | 99 | 22 283 | GEO: GSE2603 | (28) |
UPP† | Affymetrix HGU | RFS | untreated, hormonal | 251 (190)‡ | 44 928 | GEO: GSE3494 | (29) |
STK | Affymetrix HGU | RFS | untreated, chemo, hormonal | 159 | 44 928 | GEO: GSE1456 | (30) |
UNT† | Affymetrix HGU | RFS, DMFS | untreated | 137 (92)‡ | 44 928 | GEO: GSE2990 | (16,31) |
UNC4† | Agilent | RFS, OS | Heterogeneous | 337 | 17 779 | UNC DB | (32) |
DUKE | Affymetrix HGU95 | OS | Heterogeneous | 171 | 12 625 | GEO: GSE3143 | (33) |
CAL† | Affymetrix HGU | RFS, DMFS, OS | Chemo, hormonal | 118 | 22 283 | AE: E-TABM-158 | (34) |
TRANSBIG† | Affymetrix HGU | RFS, DMFS, OS | Untreated | 198 | 22 283 | GEO: GSE7390 | (35) |
DUKE2 | Affymetrix X3P | NA | Chemo | 160 | 61 359 | GEO: GSE6961 | (36) |
MAINZ† | Affymetrix HGU | DMFS | Untreated | 200 | 22 283 | GEO: GSE11121 | (37) |
LUND2 | Swegene | NA | Hormonal | 105 | 27 648 | GEO: GSE5325 | (38) |
LUND | Swegene | NA | Heterogeneous | 143 | 26 824 | GEO: GSE5325 | (39) |
FNCLCC | In-house cDNA | NA | Chemo | 150 | 9216 | GEO: GSE7017 | (40) |
MDA4 | Affymetrix HGU | NA | Chemo | 129 (65)‡ | 22 283 | MDACC DB | (10,42) |
EMC2† | Affymetrix HGU | DMFS | Chemo | 204 | 54 675 | GEO: GSE12276 | (43) |
MUG | Operon | NA | Chemo | 152 | 35 788 | GEO: GSE10510 | (44) |
NCCS | Affymetrix HGU | NA | NA | 183 | 22 283 | GEO: GSE5364 | (45) |
MCCC | Illumina | NA | NA | 75 | 48 701 | GEO: GSE19177 | (46) |
KOO† | Affymetrix HGU95 | NA | NA | 88 | 48 701 | Authors’ website | (47) |
EORTC10994 | Affymetrix HGU | NA | Chemo | 49 | 22 283 | GEO: GSE1561 | (41) |
HLP | Illumina | NA | Chemo | 53 | 48 701 | AE: E-TABM-543 | (48) |
DFHCC† | Affymetrix HGU | DMFS | Heterogeneous | 115 | 54 675 | GEO: GSE19615 | (49) |
DFHCC2 | Affymetrix HGU | NA | Chemo | 84 (75)‡ | 54 675 | GEO: GSE18864 | (51) |
DFHCC3 | Affymetrix HGU | NA | Chemo | 40 (26)‡ | 54 675 | GEO: GSE3744 | (52) |
DFHCC4† | Affymetrix HGU | NA | Untreated | 129 | 54 675 | GEO: GSE5460 | (53) |
MAQC2 | Affymetrix HGU | NA | Chemo | 230 | 22 283 | GEO: GSE20194 | (54) |
JBI | Affymetrix HGU | NA | NA | 92 | 54 675 | GEO: GSE20711 | (55) |
Datasets of tamoxifen-treated patients only | |||||||
TAM | Affymetrix HGU | DMFS, RFS | Hormonal | 345 (242)‡§ | 44 928 | GEO: GSE6532/GSE9195 | (56) |
MDA5 | Affymetrix HGU | DMFS | Hormonal | 298 | 22 283 | GEO: GSE17705 | (57) |
VDX3 | Affymetrix HGU | DMFS | Hormonal | 136 | 22 283 | GEO: GSE12093 | (50) |
Microarray datasets of unique breast cancer patients (5715) used in this study were retrieved from authors’ websites, Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (AE; http://www.ebi.ac.uk/arrayexpress/), Stanford Microarray Database (SMD; http://smd.stanford.edu/), MD Anderson Cancer Center Microarray database (MDACC DB; http://bioinformatics.mdanderson.org/pubdata.html), University of North Carolina database (UNC DB; https://genome.unc.edu/), and Rosetta Inpharmatics (http://www.rosettabio.com/). Each dataset was assigned a short acronym and an instance number if several datasets were published by the same institution or consortium. CAL = dataset of breast cancer patients from the University of California, San Francisco, and the California Pacific Medical Center (United States); DFHCC = Dana-Farber Harvard Cancer Center (United States); DUKE = Duke University Hospital (United States); EMC = Erasmus Medical Center (the Netherlands); EORTC10994 = Trial number 10994 from the European Organization for Research and Treatment of Cancer Breast Cancer (Europe); EXPO = expression project for oncology, large dataset of microarray data published by the International Genomics Consortium (United States); FNCLCC = Fédération Nationale des Centres de Lutte contre le Cancer (France); HLP = University Hospital La Paz (Spain); JBI = Jules Bordet Institute (Belgium); KOO = Koo Foundation Sun Yat-Sen Cancer Centre (Taiwan); LUND = Lund University Hospital (Sweden); MAINZ = Mainz hospital (Germany); MAQC = Microarray quality control consortium (United States); MCCC = Peter MacCallum Cancer Centre (Australia); MDA = MD Anderson Cancer Center (United States); MSK = Memorial Sloan-Kettering (United States); MUG = Medical University of Graz (Austria); NCCS = National Cancer Centre of Singapore (Singapore); NCI = National Cancer Institute (United States); NKI = National Kanker Instituut (the Netherlands); STK = Stockholm, Karolinska University Hospital (Sweden); STNO = Stanford/Norway (United States and Norway); TRANSBIG = dataset collected by the TransBIG consortium (Europe); UCSF = University of California, San Francisco (United States); UNC = University of North Carolina (United States); UNT = cohort of untreated breast cancer patients from the Oxford Radcliffe (United Kingdom) and Karolinska (Sweden) hospitals; UPP = Uppsala Hospital (Sweden); VDX = Veridex (the Netherlands). These datasets were generated with diverse microarray technologies developed either by Agilent (http://www.genomics.agilent.com), Affymetrix (HGU GeneChips, which include chips HG-U133A, HG-U133B and HG-U133PLUS2, and X3P GeneChip; http://www.affymetrix.com), Swegene (http://www.genomics.agilent.com), Operon (http://www.operon.com) or developed in-house (cDNA platforms). For most datasets, survival data (distant metastasis–free survival [DMFS], relapse-free survival [RFS], and overall survival [OS]) and information regarding the adjuvant treatment (untreated, chemo, hormonal, and heterogeneous standing for no treatment, chemotherapy, hormonal therapy, and heterogeneous combination of therapies, respectively) were available; otherwise missing information is referred to as not available (NA). Additional clinical characteristics are provided in Table 2. All untreated patients had surgery, and most of them had radiation therapy, although information is not available for all datasets.
Dataset containing untreated patients with node-negative breast tumor, as used in our survival analysis.
Duplicated patients were removed from the UNT, UPP, MDA4, DFHCC2, DFHCC3, and TAM datasets for the estimation of concordance and prognostic value.
Five tumors were removed because of negative or missing estrogen receptor status.