Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jan 1.
Published in final edited form as: Clin Cancer Res. 2009 Dec 22;16(1):249–259. doi: 10.1158/1078-0432.CCR-09-1602

Expression Signature Developed from a Complex Series of Mouse Models Accurately Predicts Human Breast Cancer Survival

Mei He 1, David P Mangiameli 1,2, Stefan Kachala 3, Kent Hunter 4, John Gillespie 5, Xiaopeng Bian 6, H-C Jennifer Shen 1, Steven K Libutti 1,7
PMCID: PMC2866744  NIHMSID: NIHMS150776  PMID: 20028755

Abstract

Purpose

The capability of microarray platform to interrogate thousands of genes has led to the development of molecular diagnostic tools for cancer patients. While large-scale comparative studies of clinical samples are often limited by the access of human tissues, expression profiling databases of various human cancer types are publicly available for researchers. Given that mouse models have been instrumental to our current understanding of cancer progression, we aimed to test the hypothesis that novel gene signatures possessing predictability in clinical outcome can be derived by coupling genomic analyses in mouse models of cancer with publicly available human cancer datasets.

Experimental Design

We established a complex series of syngeneic metastatic animal models using a murine breast cancer cell line. Tumor RNA was hybridized on Affymetrix MouseGenome-430A2.0 GeneChips. With the use of Venn logic, gene signatures that represent metastatic competency were derived and tested against publicly available human breast and lung cancer datasets.

Results

Survival analyses showed that the spontaneous metastasis gene signature was significantly associated with metastasis-free and overall survival (p<0.0005). Consequently, the six-gene model was determined and demonstrated statistical predictability in predicting survival in breast cancer patients. In addition, the model was able to stratify poor from good prognosis for lung cancer patients in majority of the datasets analyzed.

Conclusions

Together, our data support that novel gene signature derived from mouse models of cancer can be utilized for predicting human cancer outcome. Our approaches set precedence that similar strategies may be used to decipher novel gene signatures for clinical utility.

Keywords: Mouse model, Gene Expression, Breast Cancer, Lung Cancer, Survival, Signature

Introduction

Cancer is responsible for about one third of all mortalities in the United States, while metastatic disease is responsible for more than 90% of all cancer-related deaths (1). Subsequently, metastatic competency represents one of the most heavily investigated topics of modern medicine, science and industry. Hanahan and Weinberg have organized the plethora of cellular abnormalities into six basic competency traits that must be acquired for a malignancy to thrive: “self-sufficiency in growth signals, insensitivity to anti-growth signals, evasion of apoptosis, limitless replicative potential, sustained angiogenesis, and tissue invasion and metastasis” (2). These competencies are thought to be the product of alterations attained by the tumor early in the clinical timeline. Coupled with the increasing heterogeneity of the tumor’s cell population, multiple phenotypes may arise with varying levels and tendencies of metastatic competency (3).

Animal models have been important to our current understanding of malignant and metastatic progression (4). The use of different models and techniques, such as in vivo passaging for phenotype purification, transgenic animals for specific molecular manipulation and in vivo and ex vivo models for screening of cancer therapies have led to invaluable functional insights. Importantly, these animal model systems have allowed us to develop useful dogmatic philosophies regarding the causes of malignant transformation and novel strategies to further investigate malignant behavior (5, 6).

Another valuable and recent scientific success has been the development and use of high throughput assays, such as microarray expression analysis. Molecular profiling with this technology has led to derivation of gene signatures for various cancer types (79) and gained utility in the management of selected cancer patients. For instance, two genomic based assays currently serve as surrogate indicators to determine which early stage breast cancer patients are likely to benefit from adjuvant chemotherapy (10).

While various efforts of banking clinical samples are underway, many researchers are still limited by the access and cost of obtaining sufficient amount of human tissues for experimental purpose. However, a wealth amount of expression profiling datasets on different human tumor histologies are publicly available. Thus, this study aimed to test the hypothesis that novel gene signatures possessing predictability of clinical outcome can be derived by coupling genomic analyses in mouse models of cancer with publicly available clinical datasets. Specifically, we established a complex series of metastatic mouse models utilizing a murine breast cancer cell line. Utilizing microarray expression analysis, Venn logic, and clinical datasets, the six-gene signature was derived and demonstrated accuracy in predicting breast cancer patient survival. We believe that this six-gene model represents a general metastatic competency gene signature, as this six-gene model can stratify prognosis outcomes in lung cancer patient cohorts as well. Together we demonstrated that novel gene signature for predicting cancer patient outcome can be developed by coupling properly designed mouse model systems with publicly available clinical datasets. In addition, our study is significant in that it minimizes the need to have direct access to human tissue samples, while maximizes the utility of publicly available clinical datasets for generating novel genomic assays for clinical purpose.

Materials and Methods

ANIMAL MODELS AND TISSUE PROCUREMENT

All animal studies were in accord with the National Institutes of Health, Animal Care and Use Committee (ACUC) Guidelines.

Embolic Liver and Lung Metastatic Model (LvMsv and LMtv)

Liver Metastases Splenic Vein Model (LvMsv)

Murine breast adenocarcinoma (4T1) cells (ATCC: The Global Bioresource Center) were harvested from cell culture flasks, washed three times in HBSS, and adjusted to a final concentration of 1×107 cells/ml. Cell preparations were kept on ice until injection. BALB-c mice were anesthetized with isoflourane and prepared for surgery under sterile conditions. Animals were positioned in right lateral recumbency, shaved and wiped with 70% ethanol. A left subcostal incision, approximately 10mm long, was made and the peritoneum was opened. The spleen was exposed and gently retracted; the gastrosplenic ligament and short gastric vessels were identified and divided, leading to complete mobility of the spleen on its hilar pedicle. The spleen was then extracorporealized and positioned on sterile saline soaked gauze. Next, cell suspension (200µl) was slowly injected into the upper splenic pole, using a 30g needle (Becton Dickinson, Franklin Lakes, NJ). After injection, slight pressure was applied to spleen to achieve hemostasis and minimize extra-splenic seeding. Five minutes were elapsed to allow portal vein embolization. Splenectomy vis-à-vis application of a medium Ligaclip (Ethicon Endo-Surgery Inc., Somerville, NJ) to splenic vessels and sharp excision of the organ followed. The abdominal cavity was then closed en mass with 9-mm wound autoclips (Roboz Surgical, Rockville, MD). Animals were monitored and sacrificed when they became moribund. Livers were examined with 2× surgical loupes and hepatic metastases were immediately resected, snap frozen in liquid nitrogen, and ultimately stored at −80°C.

Lung Metastases Tail Vein Model (LMtv)

4T1 cells were prepared as LvMsv model, adjusted to 5×106 cells/ml and kept on ice until injection. Tail veins of female BALB-c mice were cannulated with a 27g needle and the animals were administered 50uL of cell suspension. After 14 days, they were sacrificed and the tracheobronchopulmonary tree was resected and insufflated with PBS. The lung metastases were resected under surgical loupes, snap frozen in liquid nitrogen, and stored at −80°C.

Spontaneous Liver and Lung Metastases Model (LvMsp and LMsp)

Tumor cell suspension (100µl, 1×107 cells/ml) was prepared as LvMsv model, and then injected into the left cephalad mammary gland of BALB-c mice. After 14 days, the resultant orthotopic tumors were excised under sterile conditions, and the tumor was immediately snap frozen in liquid nitrogen and stored at −80°C. The wound was closed with autoclips. After an additional 14 days, animals were sacrificed and the spontaneous liver (LvMsp) and lung (LMsp) metastases were procured, as described earlier.

MICROARRAY AND STATISTICAL ANALYSIS

To minimize individual variation, tumor samples were used from three individual mice, from each metastatic animal model. Twenty cryostat sections (10µm) were cut in all samples under RNase free condition and stored at −80°C. Sections were stained with hematoxylin and eosin by pathologist (J.G.) and only tumor area was micro-dissected. Total RNA was immediately isolated using the PicoPure RNA Isolation Kit (Arcturus, Mountain View, CA). Total RNA (30ng) from each sample was used in the reverse transcription of two consecutive rounds of linear amplification first using the MessageAmp™ II aRNA Amplification Kit (Ambion, Austin, TX) followed by biotin labeling using the MessageAmp™ II-Biotin Enhanced Kit (Ambion, Austin, TX). RNA concentrations were measured by NanoDrop ND-1000 (NanoDrop, Wilmington, DE). The quality of RNA preparations was assessed with Bioanalyzer RNA 6000 NanoLabChip Kit (Agilent Technology, Santa Clara, CA). All samples included in this study had a 28S/18S ribosomal RNA ratio of more than 1.5, with an average of 2.0. Each biotinylated cRNAs (20µg) was fragmented and hybridized to an Affymetrix® Mouse Genome 430A2.0 Array GeneChip (Affymetrix, Santa Clara, CA). Arrays were scanned utilizing standard Affymetrix protocols. Image analysis and probe quantification was done with the Affymetrix Genechip® Operating Software (GCOS), which produced raw probe intensity data.

Raw intensity profiles were analyzed using Partek Genomics Suite Software (Partek Inc., MO). Robust Microarray Analysis (RMA) was applied for normalization. Significantly regulated genes were defined as those genes from one experimental group whose expression was statistically and significantly different from another group by virtue of multi-way ANOVA. Resulted ratios were transformed into log2 values and used as expression levels for genes in metastatic gene signatures. Genes included in the lists were further selected with a false discovery rate (FDR) of less than 10% using Partek Genomics Suite Software. Each probe set was treated as a separate gene, whereby averaging of the triplicate led to the defined data of the respective gene. Validation was via Cox’s proportional hazard regression using estimated Hazard Ratios (HRs) and clinicopathologic data. Kaplan-Meier survival analysis was applied to generate predictive values for gene signatures.

CLUSTERING

Hierarchical cluster analysis was carried out with Stanford University Cluster Software (11). The average linkage and uncentered Pearson correlation distance measure were used as the similarity metric for clustering of both genes and arrays. The clusters were visualized using Tree View 1.

APPLICATION OF GENE SIGNATURES TO PUBLIC DATASETS

To compare expression data from the mouse and human datasets a common correspondence has to be made between probes on the mouse arrays with probes on the human arrays. To map our mouse signature to public datasets of human arrays, we first matched mouse signature gene symbols to human gene symbols by using a mouse-human homology gene list provided by Microarray Data Base (mAdb, Center for Cancer Research, National Cancer Institute, National Institutes of Health). We then used the gene symbol identifier to match genes represented in different microarray datasets. For cDNA microarrays, genes with fluorescent hybridization signals at least 1.5-fold greater than the local background fluorescent signal in the reference channel (Cy3) were considered adequately measured and were selected for further analyses. For Affymetrix microarray data, signal intensity values were z-transformed into ratios, and genes with technically adequate measurements obtained from at least 90% of the samples in a given dataset were selected for analysis. Gene value was generated by the averaging of each probe set within a given experimental group. The patterns of expression in published datasets were subsequently analyzed according to our gene signature. Averaged linkage clustering was performed using Cluster Software. After application of each signature, the sample data from each public dataset was segregated into two classes based on the first bifurcation of its hierarchical dendrogram. This most proximal bifurcation represents the most fundamental surrogate of fidelity of the samples profile with the tested signature. Survival analysis was performed on each class that resulted from the grouping.

PUBLISHED DATASETS

DATA FROM GENE EXPRESSION OMNIBUS (GEO) DATABASE 2

I. BREAST CANCER DATASETS
van de Vijver Dataset

This was a validation study of a predictive expression signature, which involved 295 young patients with early stage breast cancer, of which 151 were lymph node negative, 226 were estrogen receptor positive, and 110 had received adjuvant chemotherapy (12, 13)

GSE4922 dataset

This was a derivation study for the molecular profiling of the histologic grading of breast cancer; the patients used are referred to as the Uppsala Cohort. Two hundred and forty nine of the 316 patients in the cohort were used to derive the molecular profile of which 211 of them were ER-positive, 81 were lymph node positive and 58 showed p53 mutation. Eighty six patients which overlapped with the GSE2990 dataset were excluded, leaving 163 patients in this analysis. Data originally published by Bergh J et al. (14) and reinvestigated by Ivshina AV et al. (13).

GSE2034 dataset

This was a derivation and validation analysis of a gene signature for the prediction of breast cancer patient outcomes. It consisted of 286 lymph node negative breast cancer patients who never received adjuvant chemotherapy and of which 209 were estrogen receptor positive. Data published by WangY et al. (15).

GSE1456 dataset

This study was a derivation and validation analysis of a predictive gene signature for the outcomes of women with breast cancer. It involved 159 patients with breast cancer, of which 82% were estrogen receptor positive, 62% were lymph node negative and 79% were treated with adjuvant chemotherapy. Data were published by Pawitan et al. (16).

GSE2990 dataset

This study was a derivation and validation analysis of a correlative gene signature aimed at histologic grade. It involved 189 women with breast cancer of which 160 were lymph node negative. Sixty four estrogen receptor positive samples were used to derive a signature that effectively differentiates outcomes and grade. Data were published by Sotiriou et al. (17).

GSE7390 dataset

This study was a multicenter validation trial, to evaluate the clinical utility of a gene signature for the management of early node negative breast cancer. Their analysis involved 198 patients of which we excluded 22 because of overlap with the GSE2990 dataset. Data published by Desmedt et al. (18).

II. LUNG CANCER DATASETS
GSE4573 dataset

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It consisted of 130 patients with squamous cell carcinomas (SCC) from all stages. Data were published by Raponi et al. (19).

GSE11117 dataset

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It involved 41 chemotherapy-naïve non-small cell lung cancer (NSCLC) patients. Data published by Baty F et al., in Gene Expression Omnibus Database, 2008.

THE FOLLOWING DATASETS PUBLISHED BY NATIONAL CANCER INSTITUTE DIRECTOR’S CHALLENGE CONSORTIUM FOR THE MOLECULAR CLASSIFICATION OF LUNG ADENOCARCINOMA AND SHEDDEN ET AL. (8)

Moffitt Cancer Center (HLM) dataset

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It involved 79 patients with non-small cell lung cancer (NSCLC) patients of all stages.

University of Michigan Cancer Center (MICH) dataset

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It involved 177 patients with NSCLC patients of all stages.

The Dana-Farber Cancer Institute (DFCI) dataset

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It involved 82 patients with NSCLC patients of all stages.

Memorial Sloan-Kettering Cancer Center (MSKCC)

This was a derivation and validation analysis of a gene signature for the prediction of lung cancer patient outcomes. It involved 104 patients with NSCLC patients of all stages.

SURVIVAL ANALYSIS

Kaplan-Meier estimates and log rank testing were used to construct survival curves. Statistical significance was evaluated using Cox regression analysis of HRs. Overall survival was defined as the time interval between the first dates of any form of treatment and the last follow-up date or date of death; patients alive at the date of last follow-up were censored at that date. Metastasis-free survival was defined as the interval from the first treatment day to the day of the diagnosis of distant metastases. All other patients were censored on their date of last follow-up, including alive without disease, alive with locoregional recurrence, alive with a second primary cancer, and death from an alternate cause. The relapse-free survival was defined as the time interval between the date of breast surgery and the date of a diagnosed relapse or last follow-up. Women who developed contralateral breast cancer were censored. The data reported in this study were based on the 10-year survival in the van de Vijver, GSE4922, GSE2990 and HLM datasets, 5-year survival in the GSE2034, GSE1456, GSE4573, MICH and DFCI datasets, 12-year survival in the GSE7390 dataset and 4-year survival in the GSE11117 and MSKCC datasets. Patients with missing survival data or those that were reported to have zero follow-up time were excluded from survival analyses. All reported p values are two-sided. Multivariate analysis by Cox proportional hazard regression and all survival statistics were done in Partek Genomics Suite.

SELECTING THE SIX-GENE MODEL FROM THE SPONTANEOUS METASTASIS GENE SIGNATURE (SpMGS)

To further evaluate the prognostic value of each gene within the signatures, intercohort multivariate Cox proportional-hazards analysis of each signature gene was performed in three breast cancer datasets. Genes significantly correlated with patient outcomes (p<0.05) were determined for each datasets. Only genes with p-value less than 0.05 and present in at least one of three datasets were selected. Among the three datasets, a total of 17 unique genes were derived from the original 79 SpMGS. 12 of these had HR>1, of which 6 genes were predictive in all three datasets. This served as the logic and derivation of the new 6-gene model. Survival analysis was performed on the 3 original public breast cancer datasets (van de Vijver, GSE4922 and GSE2034) utilizing the 6-gene model. Additionally, the 6-gene model was tested against three additional independent public breast cancer datasets (GSE1456, GSE2990 and GSE7390 3).

Results

METASTASES GENE SIGNATURES DERIVED FROM MOUSE MODELS

Utilizing a murine breast cancer cell line, a complex series of metastatic mouse models were established as shown in Figure 1A. Spontaneous metastasis to the liver and lungs developed after resection of the primary breast tumors, while embolic metastasis was derived from direct inoculation of tumor cells through the systemic or portal venous system, respectively. Gene expression profiling was performed on the six different tumor types collected (Figure 1A). Based on statistical analyses, we identified genes that were significantly and differentially expressed between the metastatic tumor types (spontaneous and embolic) and primary tumor. As shown in Figure 1B, 194 unique genes (corresponding to 226 gene probe sets) associated with spontaneous lung metastasis (LMsp); 1062 unique genes (corresponding to 1203 gene probe sets) associated with spontaneous liver metastasis (LvMsp); 242 unique genes (corresponding to 271 gene probes sets) associated with embolic lung metastasis (LMtv); 687 unique genes associated (corresponding to 788 gene probe sets) with embolic liver metastasis (LvMsv); only 9 unique genes associated with local recurrence (LR). The embolic lesions allowed us to control for the ambient changes in gene expression associated with tumor growth in an alternate parenchyma which were present despite the earlier steps needed to gain metastatic competency. Using Venn logic we excluded the ambient changes and targeted the alternate expression patterns as a source for predictive power. Thus, we generated a spontaneous metastasis gene signature (SpMGS) containing 79 genes and an embolic metastasis gene signature (EMGS) containing 32 genes.

Figure 1.

Figure 1

A. Establishment of spontaneous (top panel) and embolic (bottom panel) metastatic animal models. Murine breast adenocarcinoma 4T1 cells were used to generate the models, and six different tumor types were procured as indicated. After microdissection, gene expressions were analyzed utilizing Affymetrix microarrays. B. Generation of embolic metastasis gene signature (EMGS) and spontaneous metastasis gene signature (SpMGS) using Venn diagrams. Statistical analyses identified genes that were significantly and differentially expressed between the metastatic tumor types and primary tumors. Further experimental details can be found in the Materials and Methods.

EXPRESSION OF GENE SIGNATURE FROM MOUSE MODEL IN HUMAN BREAST CANCER

To evaluate the prognostic value of the metastatic gene signatures, and to determine which of the 79 SpMGS genes were more predictive with metastasis-free survival, we utilized three publicly available datasets of human breast cancer expression data and correlating clinical outcomes. These included the van de Vijver, GSE4922 and GSE2034 gene sets.

To facilitate visualization and identify subgroups of patients that expressed the SpMGS, we organized the gene expression patterns and samples using hierarchical clustering. We segregated patients into two classes in which patients in Class 2 exhibited the metastatic signature, while those in Class 1 did not (Supplementary Figure 1). To correlate clinical outcome, we calculated the probability of “remaining free of distant metastases” and “overall survival” given the genetic expression class for signature.

van de Vijver Dataset

Kaplan-Meier curves showed a significant association between the SpMGS and both metastasis-free and overall survival (p<0.0005). This analysis indicates that the risk of metastasis was significantly higher for patients in Class 2 than Class 1. Class 1 had better metastases-free and overall survival [(85% and 94% at 5-years), (76% and 84% at 10-years), respectively] compared with class 2 [64% and 77% at 5-years), (51% and 63% at 10-years), respectively] (Figure 2A). The univariate hazard ratio (HR) was 0.36 (p<0.00003) for metastasis and 0.33 (p = 0.00014) for death. Multivariable proportional-hazards analysis confirmed that the SpMGS classification was a significant independent factor in predicting disease outcome (p=0.003). The SpMGS was a sensitive predictor of distant metastases, with HR of 0.46 (Table 1).

Figure 2.

Figure 2

Kaplan-Meier analysis of the probability that patients would: remain free of metastases and overall survival in van de Vijver dataset (panel A), overall survival in GSE4922 dataset (panel B), and relapse-free survival in GSE2034 dataset (panel C). Patients exhibited the metastatic signature (SpMGS) were assigned Class 2 (grey), while those did not were assigned Class 1 (black). Hazard ratios and p-values are as indicated within each graph. The 5-year and 10-year survival rates are as shown at bottom for each dataset.

TABLE 1.

Multivariable Proportional-Hazards Analysis of the risk of distant metastasis as a first event in van de Vijver’s dataset based on SpMGS.

HR p-value
SpMGS 0.46 0.003
  Primary Tumor Size (≤2cm vs. >2cm) 0.62 0.03
  Node ( negative vs. positive ) 0.79 0.45
  Age ( <45 vs. ≥45 years) 2.05 0.0009
  Chemotherapy Exposure (no vs. yes) 1.54 0.17
  ER ( negative vs. positive) 1.1 0.69
  Differentiation: intermediate vs. well 2.15 0.03
       poorly vs. well 2.8 0.004

A univariate Cox proportional-hazards model was used to evaluate the association of our signature with clinical outcome in each category, stratified for multiple clinical parameters. As summarized in Table 2, the prognostic profile based on SpMGS was accurate in predicting the outcome of disease. Comparing patients in Class 1 with those in Class 2 revealed a HR for distant metastases of 0.43 for lymph-node negative patients and 0.28 for lymph-node positive patients (p<0.05 for both). Similarly, the prognostic profile was strongly associated with disease outcome in groups of patients with tumor diameter less than or equal 20mm [HR = 0.33, (p=0.002)] and tumor diameter greater than 20mm [HR = 0.45, (p=0.02)], as well as in patients with age less than or equal to 45 years [HR = 0.30, (p=0.00007)] and age greater than 45 years [HR =0.46, (p=0.05)]. Furthermore, the SpMGS could be used to stratify tumors of well and intermediate differentiation into good and poor prognostic subcategories [HR 0.24 and 0.26, respectively (p<0.05)], but was less correlative with the stratification of poorly differentiated lesions (p=0.67). The clinical corollary was significant for tumors that were estrogen receptor positive [HR 0.36, (p<0.05)], but not for those that were estrogen receptor negative. This analysis also showed that SpMGS was a strong predictor of improved outcomes in the group of patients who did or did not receive chemotherapy [HR 0.25 and 0.43, respectively, (p<0.05)].

TABLE 2.

Univariate Cox Proportional-hazard Model: class 1 vs. class 2 Hazard Ratio for metastasis-free survival according to SpMGS. This analysis included data of the 295 breast cancer patients in van de Vijver dataset, with the prognostic role of the metastases signatures tested within each patient category.

Clinical Patients HR p-value Total Patients
Node Positive 0.28 0.0009 144
Node Negative 0.43 0.006 151
Tumor Size ≤2cm 0.33 0.002 150
Tumor Size >2cm 0.45 0.02 140
Age ≤45yrs 0.3 0.00007 166
Age >45yrs 0.46 0.05 129
Chemo: yes 0.25 0.002 110
Chemo: no 0.43 0.003 185
ER positive 0.36 0.0003 226
ER negative 0.75 0.63 69
Differentiation: Poor 0.87 0.67 119
        Intermediate 0.24 0.0008 101
        Well 0.26 0.03 75

GSE4922 and GSE2034 Datasets

A similar analysis was performed on both GSE4922 and GSE2034 datasets to predict overall survival in GSE4922 dataset and relapse-free survival in GSE2034 dataset. The survival analysis showed that the risk of death or metastasis in both datasets was significantly higher among patients with an expression profile associated with SpMGS Class 2, [HR 0.55 (p=0.019) and 0.47 (p=0.0013), respectively] (Figure 2B and 2C).

It should also be noted that when similar analysis was performed utilizing the 32-gene EMGS on the three datasets, the predictive outcomes were either statistical insignificant or not as powerful as the SpMGS (Supplementary Figure 2).

To determine if SpMGS is unique from previously published work, we cross referenced our SpMGS to other human breast cancer gene profiles. SpMGS has only one gene (PTDSS1) in common with the 70 gene signature by van’t Veer, et al. (20), one gene (FOS) in common with the 264-gene signature by Ivshina, et al. (13), and one gene (TOB2) in common with the 186-gene signature Liu, et al. (21). Together these results indicated that our mouse-derived SpMGS was an independent new expression profile that had prognostic value when applied to human disease.

EVALUATION OF GENE SIGNATURE AND A SIX-GENE MODEL

To further evaluate the prognostic value of each gene within the signatures, we performed multivariate Cox proportional-hazards analysis of each signature gene in different datasets based on clinical information. In SpMGS, 17 of 79 genes were present in at least one of the three breast cancer datasets, and had significant sensitivity in their ability to assign prognosis (p<0.05). More importantly, 12 of these 17 (70.6%) had a hazard ratio of greater than 1 (HR>1) (Table 3), indicating that up-regulation of those genes will lead to poor prognosis. In contrast, 16 of 32 genes from EMGS present in all three datasets had a significant association with prognosis profile (p<0.05), noting that only 4 of these (25%) had a hazard ratio of greater than 1.

TABLE 3.

Cox-regression analysis, the genes had significant sensitivity in predicting favorable or poor prognosis (p<0.05) in three datasets (van de Vijver, GSE4922 and GSE2034 datasets). Boldface emphasized the genes with hazard ratio greater than 1.

SpMGS EMGS
Symbol HR (gene) p-value(gene) Symbol HR (gene) p-value(gene)
ABCF1 2.60 <0.001 GNAI1 2.30 <0.001
PREB 2.05 0.007 HEPH 1.85 0.012
PAPOLA 2.04 0.013 C9orf58 1.43 0.031
PTDSS1 2.00 <0.001 TGFB1I1 1.35 0.009
DOCK7 1.87 <0.001 DPEP1 0.83 0.032
HSPA9A 1.79 0.023 FOLR2 0.82 0.030
CORO1C 1.71 0.002 DSP 0.82 0.049
DPP3* 1.63 0.005 TMEM30B 0.81 0.048
ANAPC5 1.29 0.009 LUM 0.78 0.042
FBXW11 1.26 0.042 KLF15 0.77 0.018
UBE3A** 1.24 0.046 TSC22D3 0.75 0.004
ATP6V1C1 1.23 0.031 ATP1B1 0.73 0.003
HSPC117 0.80 0.018 ELN 0.69 0.006
XBP1 0.68 <0.001 BHLHB5 0.67 0.015
FOS 0.66 0.013 CXCL12 0.64 <0.001
TOB2 0.47 0.050 SPARCL1*** 0.57 <0.001
HCRT 0.43 0.046
*

indicated that HR and p-value were 1.52 and 0.01, respectively, in other dataset

**

indicated that HR and p-value were 0.64 and 0.047, respectively, in other dataset

***

indicated that HR and p-value were 0.7 and 0.003, respectively, 0.7 and 0.006, respectively in other datasets

The genes with high hazard ratios were considered high yield components of the predictive model. As such, out of the 12 genes (HR>1) in SpMGS subgroup, six genes that were present in all three datasets were selected. This “six-gene model” consists of the following genes: Abcf1, Coro1c, Dpp3, Preb, Ptdss1 and Ube3a (Supplementary Table 1). We next tested “six-gene model” for its predictive power as a stand alone expression signature. Survival analysis on the original three public datasets indicated that the six-gene model is powerful in predicting patient outcome (Figure 3, top panel). As expected, similar to the 79 SpMGS, the six-gene model also predicted survival independent of known clinical variables based on multivariable proportional-hazards analysis utilizing the van de Vijver’s dataset (Supplementary Table 2). As it is likely that gene expression profiles will affect future clinical decision making, it has been emphasized that predictive models should be validated independent of its training datasets. Therefore, the six-gene model was tested against 3 additional independent publicly available breast cancer datasets (Figure 3, bottom panel). Data revealed a significant association between the six-gene model and relapse-free survival in GSE1456 and GSE2990 datasets, and overall survival in GSE7390 dataset (p=0.0009, p=0.03 and p=0.018, by log-rank test, respectively). Notably in all datasets tested, patients with poor prognosis correlated largely with up-regulation of the 6 genes based on cluster analysis.

Figure 3.

Figure 3

Survival analysis based on 6-gene model in 6 independent human breast cancer datasets. Three “Original Datasets” (top panel) and three additional “Independent Validation-sets” (bottom panel) were utilized to validate the predictive power of the 6-gene model. Class 2 (blue) included patients who exhibited the 6-gene signature while Class 1 (red) included patients who did not. The hazard ratios (HR) and p-values are as shown within each graph.

EXPRESSION OF SIX-GENE MODEL IN HUMAN LUNG CANCER

Based on our experimental design, we hypothesized that the six-gene model represents a general metastatic competency signature. Thus, we investigated whether the six-gene model plays a role in predicting prognosis outcome in cancer types other than breast cancer. Subsequently, we applied the six-gene model to six independent publicly-available human lung cancer datasets to predict the overall survival. As summarized in Supplementary Table 3, the six-gene model was able to stratify poor from good prognosis with statistical significance in GSE4573 and HLM (Moffitt Cancer Center) datasets (p=0.04, p=0.03, respectively). Though the predictions of other datasets [GSE11117, MICH (University of Michigan Cancer Center), FDCI (The Dana-Farber Cancer Institute) and MSKCC (Memorial Sloan-Kettering Cancer Center)] were not statistically significant, they trended toward poor prognosis (p= 0.09, p=0.08, p=0.07 and p=0.09, respectively), and were well separated by Kaplan-Meier curves (Supplementary Figure 3).

Discussion

The malignant process surmounts several fairly distinct hurdles in its ultimate heterotopic progression (22). This process, so contra-aligned with host survival, led us to focus on why such an endowment is granted to these abnormal cells. We surmised, as have others, that unique genetic aberrancies cause, encourage, or certainly allow this to occur (23). Specific tissue tropisms are additionally confounding because it is evident that there is a differential between differing tumor histologies and their resultant metastatic profiles (24). If early metastatic competence occurs in the setting of vast cellular heterogeneity, would our signature stay durably accurate within and across patients? In devising a model that accurately identifies the genetic perturbations responsible for metastases, we felt that looking at the differential expression between the primary and metastatic lesions was not enough. Breast cancer growing in lung tissue should have genetic expression alterations despite how it arrived there. This ambient organ-imposed expression alteration confounds a straightforward approach towards detecting metastatic competency genes (MCG). We deduced, using Venn logic, that by subtracting the ambient gene profile from the primary and spontaneously metastatic tumor gene profiles, we could derive the constitutive MCG found in the spontaneously metastasizing cancer. Embolic lung and liver mouse models served to provide the respective ambient gene profiles (EMGS). Incorporating multiple tropisms (lung and liver) allowed us to have internally generated controls for genetic interpretive quality assessment. In addition, it allowed us to categorize gene sets into tropism-specific MCG if they were unique to specific organ tropisms, or general MCG if they were present in both tropisms. The SpMGS represents the theoretical general MCG.

The SpMGS is composed of 79 unique genes, given our stringencies. When profiled against publicly available human breast cancer datasets, this signature significantly correlates with nodal status, tumor size, age, response to chemotherapy administration, estrogen receptor positivity and favorable histological differentiation. Additionally, the signature was significantly predictive of patient survival outcomes across three independent datasets. The hazard ratio shown in Table 3 further demonstrated that the SpMGS is composed of genes with higher predictive yield than the EMGS. The consistency of predictive power across unrelated patient cohorts underscores and validates the accuracy of the SpMGS, as well as the approach towards its derivation.

With the intention of further amplifying the clinical yield, we queried the SpMGS in association with the public datasets, and culled six genes that contributed the most to the signature’s predictive ability. This six-gene model was not only more portable than the SpMGS, but was also highly predictive of survival outcomes when tested against three additional independent datasets of human breast cancer. We further evaluated the applicability of this six-gene model to lung cancer patients, and demonstrated predictive value in the survival analysis. This was not as powerful or durable, as for breast cancer, but still showed a significant clear disparity in those patients who did well and those who had poorer outcomes. However, the applicability of this six-gene model to other tumor histologies remained to be tested.

Genomic assays have proven extremely important to the clinical management of early breast cancer patients. Two commercially available assays have allowed physicians to identify patients who are at low risk for recurrence, and subsequently may forego morbid adjuvant chemotherapy (12, 20). Our six-gene model offers a similar utility, although it is more portable and perhaps more applicable to a wider cancer patient population. Due to its portability, it could conceivably be transformed into a hospital-based assay, which would presumably lower the cost of currently available extramural expensive assays. Despite the six-gene model’s promising performance shown in this study, comparative analysis with existing genomic assays and prospective clinical validation will be crucial to demonstrate the clinical potential of the six-gene model.

In addition to the potential clinical benefit of the six-gene model, a significant finding of this study is that gene signatures derived from mouse models can be utilized to predict human cancer patient outcomes. With the increasing numbers of clinical datasets accessible for analysis, constraints (e.g., ethical, fiscal and logistic) associated with utilizing human samples can be mitigated with properly designed mouse model systems. Our analytical approach (Supplementary Figure 4, study flow chart) that involved the use of multiple animal models to derive a novel gene signature applicable to humans, sets precedence that similar strategies may be used, albeit cautiously, for cross species analysis.

Furthermore, specific knock-up or knock-down abrogative studies can be performed in our animal model systems to decipher the functional importance of the six genes, which are correlated with metastatic competency and subsequent human survival in breast cancer. These animal model systems may also serve to provide surrogate markers for screening of therapeutic targets. It should also be noted that the six genes in our signature are novel and independent of the genes utilized in the existing genomic assays for breast cancer patients. Since most of the genes in this signature have not previously been linked to metastasis, detailed molecular studies will help to determine functional roles of these genes in metastatic competency and their utilities as potential therapeutic targets.

In summary, a complex utilization of animal models, microarray and statistics has led us to a gene signature that accurately and consistently predicts human breast cancer patient survival. This six-gene signature may also possess clinical utility for predicting survival outcomes in other cancer types, such as lung cancer. There are certain constrains that restrict our ability to use human subjects for the most optimal methods warranted by a clinical or scientific question. The design of this study underscores the utility of animal models in the development of clinical assays when properly coupled with existing clinical datasets. It is imperative that we remain cognizant to all possible revenues derived from well designed and carefully analyzed animal models.

Statement of Translational Relevance

In this study, we devised an analytical strategy that led to the development of a novel gene signature for predicting cancer patient outcome. Our study is significant in that it minimizes the need to have direct access to human tissue samples, while maximizes the utility of publicly available clinical datasets and mouse models of cancer for generating novel genomic assays for clinical purpose. Our innovated approach that combined mouse models of cancer with publicly available clinical dataset set precedent that similar strategy can be applied to other cancer types. further validation analyses and prospective studies, the novel gene signature described here could potentially be coupled with other genomic-based assays to assist physicians in the management of cancer patients.

Supplementary Material

1

Acknowledgement

The authors would like to thank Qingrong Chen and Yonghong Wang for their valuable input on statistical analysis of this study. Their willingness to share their knowledge greatly helped the author get up to speed in an area previously with little familiarity. On a personal level, they have been wonderful colleagues to work with.

Footnotes

2

available at: http://www.ncbi.nlm.nih.gov/geo with accession code

References

  • 1.Sporn MB. The war on cancer. Lancet. 1996;347:1377–1381. doi: 10.1016/s0140-6736(96)91015-6. [DOI] [PubMed] [Google Scholar]
  • 2.Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/s0092-8674(00)81683-9. [DOI] [PubMed] [Google Scholar]
  • 3.Fidler IJ. The pathogenesis of cancer metastasis: the 'seed and soil' hypothesis revisited. Nat Rev Cancer. 2003;3:453–458. doi: 10.1038/nrc1098. [DOI] [PubMed] [Google Scholar]
  • 4.Langley RR, Fidler IJ. Tumor cell-organ microenvironment interactions in the pathogenesis of cancer metastasis. Endocr Rev. 2007;28:297–321. doi: 10.1210/er.2006-0027. [DOI] [PubMed] [Google Scholar]
  • 5.Kang Y, Siegel PM, Shu W, et al. A multigenic program mediating breast cancer metastasis to bone. Cancer Cell. 2003;3:537–549. doi: 10.1016/s1535-6108(03)00132-6. [DOI] [PubMed] [Google Scholar]
  • 6.Mangiameli DP, Blansfield JA, Kachala S, et al. Combination therapy targeting the tumor microenvironment is effective in a model of human ocular melanoma. J Transl Med. 2007;5:38. doi: 10.1186/1479-5876-5-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Segal E, Friedman N, Kaminski N, Regev A, Koller D. From signatures to models: understanding cancer using microarrays. Nat Genet. 2005;37 Suppl:S38–S45. doi: 10.1038/ng1561. [DOI] [PubMed] [Google Scholar]
  • 8.Shedden K, Taylor JM, Enkemann SA, et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med. 2008;14:822–827. doi: 10.1038/nm.1790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tomida S, Takeuchi T, Shimada Y, et al. Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. J Clin Oncol. 2009;27:2793–2799. doi: 10.1200/JCO.2008.19.7053. [DOI] [PubMed] [Google Scholar]
  • 10.Driouch K, Landemaine T, Sin S, Wang S, Lidereau R. Gene arrays for diagnosis, prognosis and treatment of breast cancer metastasis. Clin Exp Metastasis. 2007;24:575–585. doi: 10.1007/s10585-007-9110-x. [DOI] [PubMed] [Google Scholar]
  • 11.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.van de Vijver MJ, He YD, van't Veer LJ, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009. doi: 10.1056/NEJMoa021967. [DOI] [PubMed] [Google Scholar]
  • 13.Ivshina AV, George J, Senko O, et al. Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006;66:10292–10301. doi: 10.1158/0008-5472.CAN-05-4414. [DOI] [PubMed] [Google Scholar]
  • 14.Bergh J, Norberg T, Sjogren S, Lindgren A, Holmberg L. Complete sequencing of the p53 gene provides prognostic information in breast cancer patients, particularly in relation to adjuvant systemic therapy and radiotherapy. Nat Med. 1995;1:1029–1034. doi: 10.1038/nm1095-1029. [DOI] [PubMed] [Google Scholar]
  • 15.Wang Y, Klijn JG, Zhang Y, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671–679. doi: 10.1016/S0140-6736(05)17947-1. [DOI] [PubMed] [Google Scholar]
  • 16.Pawitan Y, Bjohle J, Amler L, et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005;7:R953–R964. doi: 10.1186/bcr1325. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sotiriou C, Wirapati P, Loi S, et al. Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006;98:262–272. doi: 10.1093/jnci/djj052. [DOI] [PubMed] [Google Scholar]
  • 18.Desmedt C, Piette F, Loi S, et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007;13:3207–3214. doi: 10.1158/1078-0432.CCR-06-2765. [DOI] [PubMed] [Google Scholar]
  • 19.Raponi M, Zhang Y, Yu J, et al. Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. Cancer Res. 2006;66:7466–7472. doi: 10.1158/0008-5472.CAN-06-1191. [DOI] [PubMed] [Google Scholar]
  • 20.van 't Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
  • 21.Liu R, Wang X, Chen GY, et al. The prognostic role of a gene signature from tumorigenic breast-cancer cells. N Engl J Med. 2007;356:217–226. doi: 10.1056/NEJMoa063994. [DOI] [PubMed] [Google Scholar]
  • 22.Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004;10:789–799. doi: 10.1038/nm1087. [DOI] [PubMed] [Google Scholar]
  • 23.Fidler IJ. The organ microenvironment and cancer metastasis. Differentiation. 2002;70:498–505. doi: 10.1046/j.1432-0436.2002.700904.x. [DOI] [PubMed] [Google Scholar]
  • 24.Kang Y. New tricks against an old foe: molecular dissection of metastasis tissue tropism in breast cancer. Breast Dis. 2006;26:129–138. doi: 10.3233/bd-2007-26111. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES