Skip to main content
Neoplasia (New York, N.Y.) logoLink to Neoplasia (New York, N.Y.)
. 2008 Jan;10(1):79–88. doi: 10.1593/neo.07859

A Transcriptional Fingerprint of Estrogen in Human Breast Cancer Predicts Patient Survival1,2

Jianjun Yu *,†,3, Jindan Yu †,3, Kevin E Cordero , Michael D Johnson #, Debashis Ghosh §, James M Rae ‡,, Arul M Chinnaiyan †,‖,¶,4, Marc E Lippman ‡,4
PMCID: PMC2213902  PMID: 18231641

Abstract

Estrogen signaling plays an essential role in breast cancer progression, and estrogen receptor (ER) status has long been a marker of hormone responsiveness. However, ER status alone has been an incomplete predictor of endocrine therapy, as some ER+ tumors, nevertheless, have poor prognosis. Here we sought to use expression profiling of ER+ breast cancer cells to screen for a robust estrogen-regulated gene signature that may serve as a better indicator of cancer outcome. We identified 532 estrogen-induced genes and further developed a 73-gene signature that best separated a training set of 286 primary breast carcinomas into prognostic subtypes by stepwise cross-validation. Notably, this signature predicts clinical outcome in over 10 patient cohorts as well as their respective ER+ subcohorts. Further, this signature separates patients who have received endocrine therapy into two prognostic subgroups, suggesting its specificity as a measure of estrogen signaling, and thus hormone sensitivity. The 73-gene signature also provides additional predictive value for patient survival, independent of other clinical parameters, and outperforms other previously reported molecular outcome signatures. Taken together, these data demonstrate the power of using cell culture systems to screen for robust gene signatures of clinical relevance.

Introduction

Breast cancer is the most common type of cancer among women in the industrialised world, accounting for nearly one of every three cancers diagnosed. Estrogen is essential for the normal growth and differentiation of the mammary gland, and plays a critical role in the pathogenesis and progression of breast cancer [1]. Increased lifetime exposure to estrogen is a well-known factor for increased breast cancer risk [1], and drugs that block the effects of estrogen has been used to inhibit the growth of hormone-dependent breast cancers [2]. In the last few decades, systemic adjuvant therapy to patients with predicted poor prognosis has significantly increased breast cancer survival [3]. Current prognostic markers for breast cancer include tumor stage, size, histologic grade, and estrogen receptor (ER) status. However, approximately one of four patients diagnosed with breast cancer nevertheless die from the disease [4], indicating the insufficiency of current prognostic biomarkers. In addition, a large number of patients with ER-positive tumors failed on endocrine therapy, suggesting the need of more precise biomarkers of therapy prediction.

Taking advantage of global expression profiling, molecular predictors have been developed to classify and predict patient prognosis [5–10]. This prognostication of breast cancer outcome may be used for the selection of high-risk patient for adjuvant therapy. Transcriptional changes of these predictor genes are presumed to reflect the activity of essential signaling pathways in tumors and thus greatly increase the prediction power. For example, the expression of prostate-specific antigen indicates the activation of androgen receptor and serves as a much better diagnostic/prognostic biomarker of prostate cancer than androgen receptor itself. Similarly, for several decades, ER status has been used as a marker of hormone responsiveness to guide adjuvant therapy, with ER+ tumors having significantly better clinical outcome [11]. Some ER+ tumors, nevertheless, incur disease recurrence, indicating that ER status alone is an incomplete assessor and that additional biomarkers are required. A transcriptional fingerprint of estrogen may better reflect the activity of estrogen signaling, thus being a more definitive predictor of breast cancer recurrence and patients' response to hormonal theraphy.

In this study, we attempted to delineate downstream effector genes of estrogen signaling. We hypothesized that these genes may indicate an activated state of ER, and thus predict cancer outcome and hormone responsiveness. By expression profiling of ER+ breast cancer cells treated with 17β-estradiol, we identified a set of estrogen-induced genes. Of these, we developed a 73-gene signature, which predicts patient survival in over 10 independent breast cancer cohorts. More importantly, our signature is associated with clinical outcome in patients who have received endocrine therapy. This signature also demonstrated superior performance over previously reported molecular signatures.

Materials and Methods

Cell Culture

Breast cancer cell lines (MCF-7, T47-D, and BT-474) were maintained as previously described [12]. For defined estrogen culture experiments, cells were rinsed in PBS, grown in steroid-depleted media (phenol red-free improved minimal essential media) supplemented with 10% charcoal-stripped calf bovine serum for 2 days, and treated with 10-9 M 17β-estradiol for 1, 2, 4, 8, 12, or 24 hours as described previously [13].

RNA Extraction and Microarray Experiments

RNA was isolated, labeled, and hybridized according to the Affymetrix protocol (Affymetrix GeneChip Expression Analysis Technical Manual, Rev. 3) by the University of Michigan Comprehensive Cancer Center Affymetrix and cDNA Microarray Core Facility as described previously [13]. All primary array data have been deposited in the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) with series number GSE3834.

Affymetrix Microarray Data Analysis

Data from microarray experiments were calculated, normalized and log2-transformed using RMAExpress [14]. As described previously [13], the MCF-7 profiles were generated on the Affymetrix U133A platform, and the other profiles were generated on the Affymetrix U133 Plus 2.0 platform; we thus only considered 22,283 probe sets that were common in both platforms for subsequent analysis. Expression values within each cell line were first z-transformed to zero mean and unit variance. Time-course experiments were analyzed using Extraction of Differential Gene Expression (EDGE) [15] to identify genes differentially expressed in estrogen-treated relative to estrogen-starved cells. Multiple hypothesis testing was adjusted by false discovery rate (FDR). A total of 1314 probe sets were identified differentially expressed over time with FDR less than 0.01. These genes were then subjected to hierarchical clustering, which resulted in one estrogen-induced gene cluster containing 532 probe sets and the other estrogen-inhibited gene cluster containing 782 probe sets. For subsequent analyses, only genes in the estrogen-induced cluster were used as we are more interested in estrogen-activated events during tumor progression.

Analysis of Primary Breast Tumor Data Using the Estrogen-Regulated Gene Set

All primary breast tumor sets used in this study were collected by ONCOMINE [16] from previous publications or from the National Center for Biotechnology Information GEO database. Genes within each data set were normalized to zero mean and unit variance. The largest Affymetrix U133A breast cancer data set [10], containing 286 primary breast carcinomas, was used as the training set to conduct cross-validation and to develop an optimal gene set as previously described [17]. All other data sets were used as independent test sets for validation purpose. The basic cross-validation procedure is as follows: 1) Fit Cox regression model and calculate the Cox score for each gene in the training set of Wang et al. 2) Choose a set J of possible values of Cox scores S from step 1), and let Pmin = 1, emin = 1.3) For each S in J, do the following: 4) perform K-means clustering (K = 2) using only genes with absolute Cox scores greater than S. 5) Perform a log-rank test to test whether the two clusters have different survival rates. Name the P-value of this test as P. 6) If P > Pmin, then return to step 3). 7) Perform 10-fold cross-validation by the nearest centroid classification based on the class memberships defined by the clusters obtained in step 4). Name the misclassification error as e. 8) If eemin, then let Sopt = S, Pmin = P, and emin = e, and return to step 3). Otherwise, return to step 3) without changing the value of Sopt. The optimal value of S is the value of Sopt when the cycle of this procedure terminates, and the optimal gene signature is designated as genes with absolute Cox scores greater than Sopt. The two clusters from K-means clustering based on these optimal genes are designated accordingly as either high-risk or low-risk by Kaplan-Meier (KM) survival analysis. Individual samples in the test data sets are then predicted as high-risk or low-risk by the nearest centroid classification. When both the training and the test data sets used the same Affymetrix platform, probe set IDs were used to cross-refer the two data sets. Otherwise, gene symbols were used to map genes from the training set to the test sets. When multiple report identifiers were found for one gene on a given platform, expressions of such reporter IDs were averaged per gene.

Survival Analysis

Kaplan-Meier survival plots were compared by log-rank test in R (the R Foundation, http://www.r-project.org) for individual data sets. The end point of interest for survival analysis is recurrence-free survival unless the data set only provides overall survival (OS) information. Multivariate Cox proportional-hazards regression analysis was conducted on the van de Vijver et al. data set in R. Concordance of sample prediction memberships by different signatures was tested in SPSS 11.5 for Windows (SPSS Inc., Chicago, IL).

Results

Identification of Estrogen-Regulated Genes

To identify robust estrogen-regulated genes, we employed three ER+, estrogen-responsive breast cancer cell lines, i.e., MCF-7, T47-D, and BT-474. We stimulated these cells with 17β-estradiol to emulate the transcriptional events induced by estrogen signaling in vivo. To ensure that we capture the transcriptional changes due to direct regulation by estrogen, rather than downstream effects, we focused primarily on early time-points (0, 1, 2, 4, 8, 12, and 24 hours) after estrogen stimulation [15]. By a time-course analysis on expression profiling of these cell lines, we identified 532 estrogen-induced probe sets, representing 446 unique genes (FDR < 0.01, see the Materials and Methods section and Figure 1a).

Figure 1.

Figure 1

(a) Heat map representation of 532 in vitro estrogen-regulated genes across three ER+, estrogen-sensitive breast cancer cell lines (MCF-7, T47-D, and BT-474) after 17β-estradiol treatment. Each row represents a gene, and each column represents a sample treated with estrogen for different time periods (0, 1, 2, 4, 8, 12, or 24 hours with replicates). Red indicates high expression; blue, low expression. (b–d) Molecular concept map (MCM) analysis of the 532 estrogen-regulated genes (yellow node with black frame) showing enrichment networks of (b) previously reported estrogen-regulated molecular concepts both in vitro and in vivo, (c) gene ontology concepts, and (d) breast cancer progression and prognosis concepts. Each node represents a molecular concept or a set of biologically related genes. The node size is proportional to the number of genes in the concept. Each edge represents a statistically significant enrichment (see Table W1 for P values). Concepts of upregulated genes by estrogen treatment are indicated by light green nodes. Blue, holly green, and purple nodes represent genes upregulated in ER+ breast cancer, high-grade breast cancer, and breast cancer patients with poor outcome, respectively. Enriched gene ontology terms are represented by orange nodes.

Several Lines of evidence support that the genes we selected represent a true downstream transcriptional network of estrogen signaling. Firstly, a subset of these genes, including PGR, PDZK1, CTSD, MYC, MYB, MYBL1, MYBL2, STK6, Ki-67, and GREB1, have been previously confirmed to be induced by estrogen [13,18,19]. Secondly, molecular concept map (MCM) analysis [20], which allows for the identification of molecular correlates of our gene set, revealed significant enrichment of upregulated by estrogen treatment signatures (P ≤ .001, odds ratios ≥ 4.35) previously identified by several independent groups [21–23] (Figure 1b and Table W1). To evaluate the biologic relevance of our gene set in vivo, MCM analysis of cancer profiling concepts found strong enrichment of over-expressed in ER+ breast cancer concepts derived from a number of human breast cancer profiling studies executed by independent investigators [5,8,10,24]. Therefore, our estrogen-regulated gene set is relevant to previously identified gene sets of estrogen regulation reported from both in vitro cell line experiments and in vivo tumor profiling. Interestingly, integrative analysis with a public genome-wide location data of ER occupancy [25] showed that a highly significant portion (P < .00001) of our estrogen-induced genes are direct targets of ER, suggesting that our gene set may represent the direct transcriptional network evoked by activated ER.

To obtain an overall annotation of our estrogen-regulated genes, we performed MCM analysis on gene ontology concepts. Significantly enriched gene ontology categories include DNA replication, regulation of cell cycle, protein folding, tRNA processing, cytokinesis, DNA replication, and DNA repair (Figure 1c). This result is consistent with previously reported functions of estrogen-regulated genes [13,26].

Estrogen-Regulated Gene Signature Predicts Breast Cancer Outcome

Intriguingly, another distinct interaction network revealed by MCM analysis enriched in the over-expressed in high grade breast cancer signatures from various data sets such as the data sets of Miller et al. [5], Sotiriou et al. [24], and van de Veer et al. [9] (Figure 1d). Notably, this enrichment network also includes several concepts of over-expressed in metastasis, dead, or recurrent breast cancers, suggesting a link between our gene signature and breast cancer outcome. Thus, we next attempted to confirm this survival association using breast cancer expression profiling data sets. We performed K-mean clustering (K = 2) with Pearson correlation distance of 286 node-negative primary breast carcinomas [10] (Figure W1 a). Kaplan-Meier survival analysis revealed that the resulting two clusters differed significantly in patient outcome (P = .002; Figure W1 a). The high-risk group with poorer outcome has higher expression of several known ER targets [13,19], such as MYBL1, MYBL2, MKI67, and MCM2. By contrast, good-outcome genes that are over-expressed in the low-risk group include PGR, CD44, ADD1, and PTGER3.

To develop an optimal outcome predictor using top survival-related genes, we ranked the 532 estrogen-regulated genes by their corresponding survival significance and performed stepwise cross-validation. Our results demonstrated a set of top-ranked 73 genes (Table W2) that yielded optimal survival association with the least cross-validation error (Figure 2a). This 73-gene signature successfully dichotomized the 286 training samples into high-risk and low-risk groups with significantly different outcome (P < .00001; Figure 2b). Importantly, by performing 1000 Monte Carlo simulations, we found that the probability for a randomly selected subset of 73 genes to cluster the same samples with equivalent or better significance was less than .001, reaffirming that the performance of our 73-gene signature could not be achieved by chance.

Figure 2.

Figure 2

Estrogen-regulated genes stratified breast cancer samples into two groups with significantly different prognoses. (a) Representation of stepwise cross-validation on the training set of Wang et al. Left panel, the number of misclassified samples by cross-validation. Right panel, survival difference of the resulted two clusters when a particular set of genes were used. x axis, the number of top genes, ordered by their corresponding survival significance. Dashed line, threshold used to select the optimal gene signature. (b) K-mean clustering representation of the 73 estrogen-regulated genes in the training cohort (left) and its Kaplan-Meier survival plot (right). The 73 genes were selected based on minimal misclassification error by 10-fold cross validation in the space of the initial identified 532 genes.

To validate the prediction power of our 73-gene signature, we collected all public breast carcinoma data sets (n = 11) with available patient survival information from the ONCOMINE [16] database. The 73-gene signature was then applied to predict individual samples within each data set into either high-risk or low-risk group using the nearest centroid classification. Strikingly, in 10 of these 11 data sets, KM survival analysis revealed a remarkable outcome difference between the predicted high-risk and low-risk groups (Figure 3, a–j). For the only data set wherein our outcome signature failed to predict, it revealed a marginally significant (log-rank P = .15; Figure 3k) association with distance metastasis within 5 years. To the best of our knowledge, this is the first study thus far that reports a breast cancer outcome predictor that is validated extensively in such many independent patient cohorts.

Figure 3.

Figure 3

The 73-gene outcome signature predicts clinical outcome of breast cancer. The 73-gene signature was applied to predict individual test samples as either “low-risk” or “high-risk” for the studies of (a) van 't Veer et al., (b) Pawitan et al., (c) van de Vijver et al., (d) Miller et al., (e) Sotirious et al., (f) Bild et al., (g) Oh et al., (h) Sorlie et al., (i) Takahashi et al., (j) Ma et al., (k) Minn et al. Kaplan-Meier analysis was used to evaluate the significance of outcome difference between the two groups. P values were calculated by the log-rank test.

We observed that our gene signature correctly predicted most ER-breast tumors within individual data sets as high-risk. As a subset of ER+ tumors relapses regardless of standard antihormone therapy, they may as well have poor prognosis. It is therefore important to identify these patients for more effective adjuvant therapies. We thus examined the ability of our predictor in stratifying the ER+ tumors into prognostic subgroups. We have taken the ER+ samples from each data set and carried out KM survival analysis for the predicted high-risk and low-risk groups by the 73-gene signature. Notably, KM survival analysis demonstrated a strong discriminative power of our 73-gene signature in distinguishing ER+ patients with different prognoses (Figure 4).

Figure 4.

Figure 4

The 73-gene outcome signature predicts clinical outcome of ER+ breast cancer. The ER+ breast cancer samples were respectively extracted from the studies of (a) Wang et al., (b) van 't Veer et al., (c) van de Vijver et al., (d) Miller et al., (e) Sotirious et al., (f) Bild et al., (g) Oh et al., (h) Sorlie et al., (i) Takahashi et al. The significance of outcome difference between the low-risk and high-risk groups were estimated by KM survival analysis. P values were calculated by the log-rank test. The data set of Ma et al. is not included in this analysis as nearly all of its samples are ER+ and thus have been presented in Figure 3j.

Prognostication of breast cancer outcome may guide the respective selection of patients at high risk for systemic adjuvant therapy. However, there is no guarantee that these selected patients will actually benefit from the therapy. It is therefore of important clinical value to predict therapy responsiveness and to spare some patients from unnecessary adjuvant therapies that have side effects and cause more harm than good. For example, endocrine therapy may be sufficient for some node-positive and ER-positive patients, and more aggressive adjuvant therapy may not additionally help these patients. Of the 11 data sets we analyzed above, four contained patient treatment information. We extracted hormone-treated samples from each data set and assessed whether our gene predictor was able to predict patient response to hormonal therapies. Again, we predicted the hormone-treated samples into high-risk and low-risk groups. Importantly, in each cohort, we observed significantly different outcome for the two predicted groups, suggesting an ability of our signature in therapy prediction (Figures 3j and 5, a–c).

Figure 5.

Figure 5

The 73-gene outcome signature predicts clinical outcome in tamoxifen-treated (a–c) breast cancer subcohorts, (d–f) gliomas, and (g) lung adenocarcinoma. The low-risk and high-risk groups were predicted by the 73-gene signature with nearest centroid classification. KM analysis was used to evaluate the significance of outcome difference between the two groups. P values were calculated by the log-rank test.

To further confirm the association of our gene signature with estrogen sensitivity, we determined whether the 73-gene signature is able to classify ER+/ER- cell lines in vitro. We performed hierarchical clustering based on the expression pattern of the 73 genes in five ER- and three ER+ cell lines. Interestingly, we found that the 73 genes perfectly separated the eight cell lines into their respective ER+ and ER- clusters (Figure W1 b), demonstrating that our signature genes are specific to estrogen signaling. Furthermore, as we selected our estrogen-induced genes based on expression induction at relative early time points (not later than 24 hours) after 17β-estradiol treatment, we hypothesized that this subset of 73 genes is also enriched for direct targets of ER. Concordantly, comparative analysis with ER-occupied genes described in a previous study [25] identified a significant overlap (P = .0001), reconfirming the specificity of our signature to estrogen activity.

As estrogen may also play an important role in the development of glioma [27] and lung cancer, especially lung adenocarcinoma [28,29], we examined our outcome signature in three glioma and one lung adenocarcinoma data sets. Notably, our gene signature successfully predicted patient outcome, with P = .0006 for the Freije et al. glioma [30], P = .008 for the Phillips et al. glioma [31], P = .11 for the Nutt et al. glioma [32], and P = .006 for the Bhattacharjee et al. lung adenocarcinoma [33] data sets (Figure 5, d–j).

Our Outcome Signature Predicts Patient Survival Independent of Clinical Criteria and Outperforms Known Predictors

Global gene expression profiling of breast cancer has yielded a number of prognostic signatures in the last decade. To properly evaluate the predictive power of our signature, we compared it with established clinical parameters as well as previously reported gene predictors. We first compared our signature with an 822-gene estrogen-regulated signature (termed as estrogen-SAM) developed by Oh et al. [18] based on the Significance Analysis of Microarrays (SAM) that classified the ER+ cases of the Rosetta data set (n = 225) into prognostic subtypes [8]. We selected the Rosetta data as the test data set because it has been routinely used as a validation data set for breast cancer outcome signatures. Multivariate Cox proportional-hazards regression analysis of these patients showed that both our signature and the estrogen-SAM signature were significant predictors of relapse-free survival (RFS), independent of standard clinical factors (RFS, P = .002 and P = .004 respectively; Table 1). Importantly, our outcome signature was by far the strongest predictor for both relapse-free and OS (RFS, P = .002, Hazard ratio [HR]: 2.24, 95% confidence interval [CI]: 1.35–3.70; and OS, P = .001, HR: 3.27, 95% CI: 1.62–6.62). Thus, our outcome signature achieved better predictive power whereas using substantially fewer genes. In addition, our signature comprised solely of estrogen-regulated genes, thus representing the biologic significance of estrogen activity. By contrast, the estrogen-SAM signature genes were selected based on their differential expression between two tumor subtypes predefined by estrogen-regulated genes, and hence may or may not themselves be regulated by estrogen.

Table 1.

Multivariate Cox Proportional Hazards Analysis for ER+ Tumors in the Data Set of van de Vijver et al. (n = 225).

Variable Relapse-Free Survival Overall Survival


Hazard Ratio (95% CI) P Hazard Ratio (95% CI) P
Our estrogen-regulated signature 2.24 (1.35–3.70) .002 3.27 (1.62–6.62) .001
The Oh et al. estrogen-SAM gene signature (IIE vs IE) 2.32 (1.31–4.11) .004 2.24 (0.95–5.28) .066
Age 0.94 (0.89–0.98) .004 0.94 (0.89–1.00) .069
Size (diameter >2 cm vs <2 cm) 1.49 (0.93–2.37) .095 1.41 (0.76–2.61) .280
Tumor grade
(intermediate vs well differentiated) 1.40 (0.72–2.72) .320 2.02 (0.65–6.28) .230
(poorly vs well differentiated) 1.30 (0.64–2.63) .460 2.86 (0.91–9.02) .070
Node status
(1–3 vs 0 positive nodes) 1.82 (0.93–3.57) .082 1.65 (0.66–4.18) .290
(>3 vs 0 positive nodes) 2.87 (1.23–6.74) .015 2.22 (0.69–7.11) .180
Hormonal or chemotherapy vs no adjuvant therapy 0.33 (0.16–0.66) .002 0.43 (0.17–1.13) .086

We next extended the comparison of our signature and the estrogen-SAM signature to the Rosetta 70-gene signature as well using the Rosetta data set. As the Rosetta signature used a subset of 44 samples during its development, to avoid potential bias these samples were excluded from our analysis. Importantly, our signature and the Rosetta 70-gene signature were both significant predictors of relapse-free survival (P = .026 and P = .021 respectively; Table 2) in this data set. Surprisingly, our signature was the only significant predictor of OS (P = .008), independent of other clinical parameters and signatures. To further compare the performance of our signature to previously reported breast cancer gene signatures, we examined their respective predictive abilities on multiple data sets. As shown in Table W3, the Rosetta 70-gene signature, Oncotype DX gene predictor, and our gene signature demonstrated superior performance over other signatures whereas our gene signature showed overall best performance.

Table 2.

Multivariate Cox Proportional Hazards Analysis for ER+ Tumors in the Data Set of van de Vijver et al. After Excluding Samples Used for the Training Model of van 't Veer et al. (n = 181).

Variable Relapse-Free Survival Overall Survival


Hazard Ratio (95% CI) P Hazard Ratio (95% CI) P
Our estrogen-regulated signature 2.01 (1.09–3.72) .026 3.63 (1.40–9.42) .008
70-Gene signature (poor vs good) 2.42 (1.14–5.14) .021 2.37 (0.72–7.85) .160
The Oh et al. estrogen-SAM gene signature (IIE vs IE) 1.83 (0.95–3.52) .070 1.71 (0.62–4.75) .300
Age 0.97 (0.92–1.03) .340 1.00 (0.92–1.07) .900
Size (diameter >2 cm vs <2 cm) 1.18 (0.68–2.04) .560 1.29 (0.60–2.77) .510
Tumor grade
(intermediate vs well differentiated) 0.97 (0.47–2.00) .930 1.33 (0.39–4.49) .650
(poorly vs well differentiated) 0.67 (0.29–1.54) .350 1.63 (0.46–5.75) .450
Node status
(1–3 vs 0 positive nodes) 1.86 (0.90–3.88) .096 1.76 (0.63–4.92) .280
(>3 vs 0 positive nodes) 3.56 (1.39–9.14) .008 2.84 (0.77–10.5) .120
Hormonal or chemotherapy vs no adjuvant therapy 0.33 (0.16–0.68) .003 0.38 (0.14–1.03) .056

To investigate the molecular difference between our signature and other breast cancer gene predictors of similar size, we examined the number of overlapping genes. Interestingly, only two (PRC1 and CENPA), one (CD44), and three (BRRN1, CDCA8, and MYBL2) genes overlapped between our 73-gene signature and the Rosetta 70-gene signature [9], the Wang et al. [10] 76-gene signature, and the Miller et al. [5] 32-gene signature, respectively. This lack of overlap suggests that our signature is composed of genes distinct from previously reported gene predictors. Nevertheless, two-way contingency table analysis revealed strong associations between prediction results of individual samples made by our outcome signature and the Rosetta 70-gene signature, the wound-response signature and the intrinsic-subtype model [7] (Table W4). These findings are consistent with previously reported study that distinct gene predictors, although with little overlap in terms of gene identity, may have high rates of concordance in prediction results for individual samples [34]. Taken together, our distinct gene signature outperformed other known predictors whereas being concordant in outcome prediction of individual samples.

Discussion

There is established precedence for clinical use of molecular markers to help decide customized therapy for individuals with breast cancer. For example, ER and PR, and ERBB2 have been used to assess potential response to hormonal therapy and trastuzumab (Herceptin), respectively. Molecular signatures have been developed from microarray profiling of cancer to provide prognostication of recurrence or distant metastases, thus serving as the basis for selecting high-risk patients for adjuvant therapy. However, it remains challenging to determine which patients should be selected for adjuvant systemic therapy. A single marker such as ER has been found insufficient to fully stratify patient into different diagnostic/prognostic subtypes. In this study, we aimed to identify a transcriptional fingerprint of estrogen, which reflects the downstream activity of estrogen signaling pathway, and thus may be a more efficient predictor of breast cancer recurrence.

Unlike most previously reported breast cancer signatures that were developed using supervised analysis based on patient diagnosis/prognosis status [5–10], our signature was discovered by specifically selecting estrogen-regulated genes, thus representing the activities of estrogen signaling, a key biologic characteristic of breast cancer tumors. We profiled gene expression of three breast cancer cell lines during early time-points after estrogen treatment. We observed that more than 80% of our estrogen-regulated genes were already activated within 1 to 2 hours after estrogen treatment in the MCF-7 breast cancer cell line. Genome-wide location analysis confirmed that a significant portion of these genes are directly occupied by ER, suggesting an enrichment of direct ER target genes in our signature. In addition, our gene signature distinguishes ER+ and ER- patients, as well as separates patients who did well with hormonal therapy from those who did not, indicating its specificity in monitoring estrogen activity.

In developing the 73-gene outcome signature we focused on in vitro estrogen-regulated genes and further selected a subset that is associated with patient outcome in vivo in human breast tumors. These genes are unique as they are both related to estrogen signaling pathway and are associated with patient survival; that is, they represent a subset of downstream targets of estrogen signaling that are predictive of breast cancer outcome. Concordantly, comparative analyses with previously reported breast cancer predictors revealed rare overlap in gene identity, yet high concordance in outcome prediction of individual samples. Therefore, our gene signature reflects estrogen activity and is distinct from previously reported gene signatures that mainly capture expressional differences between cancer prognostic subtypes.

Our 73-gene signature predicts breast cancer outcome in 10 of 11 data sets we analyzed. Besides correctly assigning most ER- tumors in each data set into high-risk group, this signature is able to stratify the ER+ samples into prognostic subtypes, suggesting that it may better reflect tumor aggressiveness than ER status alone. Most importantly, our signature provides additional prognostic information beyond standard clinical factors and yields overall best performance against previously reported breast cancer outcome predictors.

Further validation and refinement of our signature using additional data sets with larger cohorts of breast cancer patients will help strengthen its clinical value. This study lays the ground for future characterization of individual signature genes to facilitate in the understanding of breast cancer progression as well as help select genes with critical roles in estrogen response for breast cancer therapy. Furthermore, as reverse transcription-polymerase chain reaction assays of paraffin-embedded tissues have recently been developed [6], it is technically feasible to develop an reverse transcription-polymerase chain reaction assay of our 73-gene signature for future validation and, potentially later on, for clinical usage. Our signature may be useful in the selection of highrisk patients for adjuvant therapy as well as in sparing some hormone-sensitive patients from aggressive therapy.

Supplementary Material

Supplementary Figures and Tables
neo1001_0079SD1.pdf (482KB, pdf)

Abbreviations

ER

estrogen receptor

FDR

false discovery rate

KM

Kaplan-Meier

MCM

molecular concept map

OS

overall survival

SAM

significance analysis of microarrays

Footnotes

1

This research was supported in part by the National Institutes of Health (R01 CA97063 to A. M. C. and D. G., U54 DA021519-01A1 to A. M. C.), the Early Detection Research Network (UO1 CA111275 to A. M. C. and D. G.), the National Institutes of General Medical Sciences (GM 72007 to D. G.), the Department of Defense (W81XWH-06-1-0224 to A. M. C. and PC060266 to J. Y.), and the Cancer Center Bioinformatics Core (support grant 5P30 CA46592 to A. M. C.). A. M. C. is supported by a Clinical Translational Research Award from the Burroughs Welcome Foundation. K. E. C., M. E. L., and J. M. R. are supported by the Breast Cancer Research Foundation N003173.

2

This article refers to supplementary material, which are designated by Tables W1, W2, W3, and W4 and Figure W1 and are available online at www.neoplasia.com.

References

  • 1.Clemons M, Goss P. Estrogen and the risk of breast cancer. N Engl J Med. 2001;344:276–285. doi: 10.1056/NEJM200101253440407. [DOI] [PubMed] [Google Scholar]
  • 2.Jordan VC. Tamoxifen: a most unlikely pioneering medicine. Nat Rev Drug Discov. 2003;2:205–213. doi: 10.1038/nrd1031. [DOI] [PubMed] [Google Scholar]
  • 3.Early Breast Cancer Trialists' Collaborative Group (EBCTCG), author Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet. 2005;365:1687–1717. doi: 10.1016/S0140-6736(05)66544-0. [DOI] [PubMed] [Google Scholar]
  • 4.Brenner H. Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis. Lancet. 2002;360:1131–1135. doi: 10.1016/S0140-6736(02)11199-8. [DOI] [PubMed] [Google Scholar]
  • 5.Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, et al. An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA. 2005;102:13550–13555. doi: 10.1073/pnas.0506230102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
  • 7.Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–752. doi: 10.1038/35021093. [DOI] [PubMed] [Google Scholar]
  • 8.van de Vijver MJ, He YD, van 't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]
  • 9.van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
  • 10.Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671–679. doi: 10.1016/S0140-6736(05)17947-1. [DOI] [PubMed] [Google Scholar]
  • 11.Pujol P, Daures JP, Thezenas S, Guilleux F, Rouanet P, Grenier J. Changing estrogen and progesterone receptor patterns in breast carcinoma during the menstrual cycle and menopause. Cancer. 1998;83:698–705. [PubMed] [Google Scholar]
  • 12.Rae JM, Johnson MD, Scheys JO, Cordero KE, Larios JM, Lippman ME. GREB 1 is a critical regulator of hormone dependent breast cancer growth. Breast Cancer Res Treat. 2005;92:141–149. doi: 10.1007/s10549-005-1483-4. [DOI] [PubMed] [Google Scholar]
  • 13.Creighton CJ, Cordero KE, Larios JM, Miller RS, Johnson MD, Chinnaiyan AM, Lippman ME, Rae JM. Genes regulated by estrogen in breast tumor cells in vitro are similarly regulated in vivo in tumor xenografts and human breast tumors. Genome Biol. 2006;7:R28. doi: 10.1186/gb-2006-7-4-r28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
  • 15.Leek JT, Monsen E, Dabney AR, Storey JD. EDGE: extraction and analysis of differential gene expression. Bioinformatics. 2006;22:507–508. doi: 10.1093/bioinformatics/btk005. [DOI] [PubMed] [Google Scholar]
  • 16.Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 2004;6:1–6. doi: 10.1016/s1476-5586(04)80047-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bair E, Tibshirani R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2004;2:E108. doi: 10.1371/journal.pbio.0020108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Oh DS, Troester MA, Usary J, Hu Z, He X, Fan C, Wu J, Carey LA, Perou CM. Estrogen-regulated genes predict survival in hormone receptor-positive breast cancers. J Clin Oncol. 2006;24:1656–1664. doi: 10.1200/JCO.2005.03.2755. [DOI] [PubMed] [Google Scholar]
  • 19.Frasor J, Danes JM, Komm B, Chang KC, Lyttle CR, Katzenellenbogen BS. Profiling of estrogen up- and down-regulated gene expression in human breast cancer cells: insights into gene networks and pathways underlying estrogenic control of proliferation and cell phenotype. Endocrinology. 2003;144:4562–4574. doi: 10.1210/en.2003-0567. [DOI] [PubMed] [Google Scholar]
  • 20.Tomlins SA, Mehra R, Rhodes DR, Cao X, Wang L, Dhanasekaran SM, Kalyana-Sundaram S, Wei JT, Rubin MA, Pienta KJ, et al. Integrative molecular concept modeling of prostate cancer progression. Nat Genet. 2007;39:41–51. doi: 10.1038/ng1935. [DOI] [PubMed] [Google Scholar]
  • 21.Buterin T, Koch C, Naegeli H. Convergent transcriptional profiles induced by endogenous estrogen and distinct xenoestrogens in breast cancer cells. Carcinogenesis. 2006;27:1567–1578. doi: 10.1093/carcin/bgi339. [DOI] [PubMed] [Google Scholar]
  • 22.Frasor J, Stossi F, Danes JM, Komm B, Lyttle CR, Katzenellenbogen BS. Selective estrogen receptor modulators: discrimination of agonistic versus antagonistic activities by gene expression profiling in breast cancer cells. Cancer Res. 2004;64:1522–1533. doi: 10.1158/0008-5472.can-03-3326. [DOI] [PubMed] [Google Scholar]
  • 23.Stossi F, Barnett DH, Frasor J, Komm B, Lyttle CR, Katzenellenbogen BS. Transcriptional profiling of estrogen-regulated gene expression via estrogen receptor (ER) alpha or ERbeta in human osteosarcoma cells: distinct and common target genes for these receptors. Endocrinology. 2004;145:3473–3486. doi: 10.1210/en.2003-1682. [DOI] [PubMed] [Google Scholar]
  • 24.Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006;98:262–272. doi: 10.1093/jnci/djj052. [DOI] [PubMed] [Google Scholar]
  • 25.Carroll JS, Meyer CA, Song J, Li W, Geistlinger TR, Eeckhoute J, Brodsky AS, Keeton EK, Fertuck KC, Hall GF, et al. Genome-wide analysis of estrogen receptor binding sites. Nat Genet. 2006;38:1289–1297. doi: 10.1038/ng1901. [DOI] [PubMed] [Google Scholar]
  • 26.Lin CY, Strom A, Vega VB, Kong SL, Yeo AL, Thomsen JS, Chan WC, Doray B, Bangarusamy DK, Ramasamy A, et al. Discovery of estrogen receptor alpha target genes and response elements in breast tumor cells. Genome Biol. 2004;5:R66. doi: 10.1186/gb-2004-5-9-r66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Sribnick EA, Ray SK, Banik NL. Estrogen prevents glutamateinduced apoptosis in C6 glioma cells by a receptor-mediated mechanism. Neuroscience. 2006;137:197–209. doi: 10.1016/j.neuroscience.2005.08.074. [DOI] [PubMed] [Google Scholar]
  • 28.Marquez-Garban DC, Chen HW, Fishbein MC, Goodglick L, Pietras RJ. Estrogen receptor signaling pathways in human non-small cell lung cancer. Steroids. 2007;72:135–143. doi: 10.1016/j.steroids.2006.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Stabile LP, Siegfried JM. Estrogen receptor pathways in lung cancer. Curr Oncol Rep. 2004;6:259–267. doi: 10.1007/s11912-004-0033-2. [DOI] [PubMed] [Google Scholar]
  • 30.Freije WA, Castro-Vargas FE, Fang Z, Horvath S, Cloughesy T, Liau LM, Mischel PS, Nelson SF. Gene expression profiling of gliomas strongly predicts survival. Cancer Res. 2004;64:6503–6510. doi: 10.1158/0008-5472.CAN-04-0452. [DOI] [PubMed] [Google Scholar]
  • 31.Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, et al. Molecular subclasses of highgrade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell. 2006;9:157–173. doi: 10.1016/j.ccr.2006.02.019. [DOI] [PubMed] [Google Scholar]
  • 32.Nutt CL, Mani DR, Betensky RA, Tamayo P, Cairncross JG, Ladd C, Pohl U, Hartmann C, McLaughlin ME, Batchelor TT, et al. Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 2003;63:1602–1607. [PubMed] [Google Scholar]
  • 33.Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001;98:13790–13795. doi: 10.1073/pnas.191502998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, Nobel AB, van 't Veer LJ, Perou CM. Concordance among gene-expression-based predictors for breast cancer. N Engl J Med. 2006;355:560–569. doi: 10.1056/NEJMoa052933. [DOI] [PubMed] [Google Scholar]
  • 35.Pawitan Y, Bjohle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005;7:R953–R964. doi: 10.1186/bcr1325. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures and Tables
neo1001_0079SD1.pdf (482KB, pdf)

Articles from Neoplasia (New York, N.Y.) are provided here courtesy of Neoplasia Press

RESOURCES