Skip to main content
. 2012 Jan 18;104(4):311–325. doi: 10.1093/jnci/djr545

Table 1.

Compendium of microarray datasets of unique breast cancer patients*

Dataset Microarray technology Survival data Treatment No. of patients Number of probes Source Reference
EXPO Affymetrix HGU NA NA 353 54 675 GEO: GSE2109 (23)
VDX Affymetrix HGU RFS, DMFS Untreated 344 22 283 GEO: GSE2034/GSE5327 (24,25)
NKI Agilent RFS, DMFS,OS Untreated, chemo 337 24 481 Rosetta Inpharmatics (13,14)
UCSF In-house cDNA DNFS, RFS, OS Untreated, chemo, hormonal 162 10 368 Authors’ website (26,27)
STNO2 In-house cDNA RFS, OS Untreated, chemo, hormonal 122 7787 SMD (6)
NCI In-house cDNA RFS Untreated, chemo, hormonal 99 6878 Authors’ website (7)
MSK Affymetrix HGU DMFS Heterogeneous 99 22 283 GEO: GSE2603 (28)
UPP Affymetrix HGU RFS untreated, hormonal 251 (190) 44 928 GEO: GSE3494 (29)
STK Affymetrix HGU RFS untreated, chemo, hormonal 159 44 928 GEO: GSE1456 (30)
UNT Affymetrix HGU RFS, DMFS untreated 137 (92) 44 928 GEO: GSE2990 (16,31)
UNC4 Agilent RFS, OS Heterogeneous 337 17 779 UNC DB (32)
DUKE Affymetrix HGU95 OS Heterogeneous 171 12 625 GEO: GSE3143 (33)
CAL Affymetrix HGU RFS, DMFS, OS Chemo, hormonal 118 22 283 AE: E-TABM-158 (34)
TRANSBIG Affymetrix HGU RFS, DMFS, OS Untreated 198 22 283 GEO: GSE7390 (35)
DUKE2 Affymetrix X3P NA Chemo 160 61 359 GEO: GSE6961 (36)
MAINZ Affymetrix HGU DMFS Untreated 200 22 283 GEO: GSE11121 (37)
LUND2 Swegene NA Hormonal 105 27 648 GEO: GSE5325 (38)
LUND Swegene NA Heterogeneous 143 26 824 GEO: GSE5325 (39)
FNCLCC In-house cDNA NA Chemo 150 9216 GEO: GSE7017 (40)
MDA4 Affymetrix HGU NA Chemo 129 (65) 22 283 MDACC DB (10,42)
EMC2 Affymetrix HGU DMFS Chemo 204 54 675 GEO: GSE12276 (43)
MUG Operon NA Chemo 152 35 788 GEO: GSE10510 (44)
NCCS Affymetrix HGU NA NA 183 22 283 GEO: GSE5364 (45)
MCCC Illumina NA NA 75 48 701 GEO: GSE19177 (46)
KOO Affymetrix HGU95 NA NA 88 48 701 Authors’ website (47)
EORTC10994 Affymetrix HGU NA Chemo 49 22 283 GEO: GSE1561 (41)
HLP Illumina NA Chemo 53 48 701 AE: E-TABM-543 (48)
DFHCC Affymetrix HGU DMFS Heterogeneous 115 54 675 GEO: GSE19615 (49)
DFHCC2 Affymetrix HGU NA Chemo 84 (75) 54 675 GEO: GSE18864 (51)
DFHCC3 Affymetrix HGU NA Chemo 40 (26) 54 675 GEO: GSE3744 (52)
DFHCC4 Affymetrix HGU NA Untreated 129 54 675 GEO: GSE5460 (53)
MAQC2 Affymetrix HGU NA Chemo 230 22 283 GEO: GSE20194 (54)
JBI Affymetrix HGU NA NA 92 54 675 GEO: GSE20711 (55)
Datasets of tamoxifen-treated patients only
    TAM Affymetrix HGU DMFS, RFS Hormonal 345 (242)§ 44 928 GEO: GSE6532/GSE9195 (56)
    MDA5 Affymetrix HGU DMFS Hormonal 298 22 283 GEO: GSE17705 (57)
    VDX3 Affymetrix HGU DMFS Hormonal 136 22 283 GEO: GSE12093 (50)
*

Microarray datasets of unique breast cancer patients (5715) used in this study were retrieved from authors’ websites, Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/), ArrayExpress (AE; http://www.ebi.ac.uk/arrayexpress/), Stanford Microarray Database (SMD; http://smd.stanford.edu/), MD Anderson Cancer Center Microarray database (MDACC DB; http://bioinformatics.mdanderson.org/pubdata.html), University of North Carolina database (UNC DB; https://genome.unc.edu/), and Rosetta Inpharmatics (http://www.rosettabio.com/). Each dataset was assigned a short acronym and an instance number if several datasets were published by the same institution or consortium. CAL = dataset of breast cancer patients from the University of California, San Francisco, and the California Pacific Medical Center (United States); DFHCC = Dana-Farber Harvard Cancer Center (United States); DUKE = Duke University Hospital (United States); EMC = Erasmus Medical Center (the Netherlands); EORTC10994 = Trial number 10994 from the European Organization for Research and Treatment of Cancer Breast Cancer (Europe); EXPO = expression project for oncology, large dataset of microarray data published by the International Genomics Consortium (United States); FNCLCC = Fédération Nationale des Centres de Lutte contre le Cancer (France); HLP = University Hospital La Paz (Spain); JBI = Jules Bordet Institute (Belgium); KOO = Koo Foundation Sun Yat-Sen Cancer Centre (Taiwan); LUND = Lund University Hospital (Sweden); MAINZ = Mainz hospital (Germany); MAQC = Microarray quality control consortium (United States); MCCC = Peter MacCallum Cancer Centre (Australia); MDA = MD Anderson Cancer Center (United States); MSK = Memorial Sloan-Kettering (United States); MUG = Medical University of Graz (Austria); NCCS = National Cancer Centre of Singapore (Singapore); NCI = National Cancer Institute (United States); NKI = National Kanker Instituut (the Netherlands); STK = Stockholm, Karolinska University Hospital (Sweden); STNO = Stanford/Norway (United States and Norway); TRANSBIG = dataset collected by the TransBIG consortium (Europe); UCSF = University of California, San Francisco (United States); UNC = University of North Carolina (United States); UNT = cohort of untreated breast cancer patients from the Oxford Radcliffe (United Kingdom) and Karolinska (Sweden) hospitals; UPP = Uppsala Hospital (Sweden); VDX = Veridex (the Netherlands). These datasets were generated with diverse microarray technologies developed either by Agilent (http://www.genomics.agilent.com), Affymetrix (HGU GeneChips, which include chips HG-U133A, HG-U133B and HG-U133PLUS2, and X3P GeneChip; http://www.affymetrix.com), Swegene (http://www.genomics.agilent.com), Operon (http://www.operon.com) or developed in-house (cDNA platforms). For most datasets, survival data (distant metastasis–free survival [DMFS], relapse-free survival [RFS], and overall survival [OS]) and information regarding the adjuvant treatment (untreated, chemo, hormonal, and heterogeneous standing for no treatment, chemotherapy, hormonal therapy, and heterogeneous combination of therapies, respectively) were available; otherwise missing information is referred to as not available (NA). Additional clinical characteristics are provided in Table 2. All untreated patients had surgery, and most of them had radiation therapy, although information is not available for all datasets.

Dataset containing untreated patients with node-negative breast tumor, as used in our survival analysis.

Duplicated patients were removed from the UNT, UPP, MDA4, DFHCC2, DFHCC3, and TAM datasets for the estimation of concordance and prognostic value.

§

Five tumors were removed because of negative or missing estrogen receptor status.