Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Mar 1.
Published in final edited form as: Clin Cancer Res. 2012 Jan 6;18(5):1323–1333. doi: 10.1158/1078-0432.CCR-11-2271

Combination of a novel gene expression signature with a clinical nomogram improves the prediction of survival in high-risk bladder cancer

Markus Riester 1,*, Jennifer M Taylor 2,*, Andrew Feifer 2, Theresa Koppie 3, Jonathan E Rosenberg 4, Robert J Downey 5, Bernard H Bochner 2, Franziska Michor 1,
PMCID: PMC3569085  NIHMSID: NIHMS347821  PMID: 22228636

Abstract

Purpose

We aimed to validate and improve prognostic signatures for high-risk urothelial carcinoma of the bladder.

Experimental Design

We evaluated microarray data from 93 bladder cancer patients managed by radical cystectomy to determine gene expression patterns associated with clinical and prognostic variables. We compared our results with published bladder cancer microarray datasets comprising 578 additional patients, and with 49 published gene signatures from multiple cancer types. Hierarchical clustering was utilized to identify subtypes associated with differences in survival. We then investigated whether the addition of survival-associated gene expression information to a validated post-cystectomy nomogram utilizing clinical and pathologic variables improves prediction of recurrence.

Results

Multiple markers for muscle invasive disease with highly significant expression differences in multiple datasets were identified, such as FN1, NNMT, POSTN and SMAD6. We identified signatures associated with pathologic stage and the likelihood of developing metastasis and death from bladder cancer, as well as with two distinct clustering subtypes of bladder cancer. Our novel signature correlated with overall survival in multiple independent datasets, significantly improving the prediction concordance of standard staging in all datasets (mean ΔC-statistic: 0.14, 95% CI 0.01–0.27; P < 0.001). Tested in our patient cohort, it significantly enhanced the performance of a postoperative survival nomogram (ΔC-statistic: 0.08, 95% CI −0.04–0.20; P < 0.005).

Conclusions

Prognostic information obtained from gene expression data can aid in post-treatment prediction of bladder cancer recurrence. Our findings require further validation in external cohorts and prospectively in a clinical trial setting.

Keywords: Bladder cancer, gene expression analysis, molecular subtypes, survival analysis, bioinformatics

INTRODUCTION

Urothelial carcinoma of the urinary bladder is the fifth most common cancer in the Western world, and the ninth most frequent cancer worldwide, representing 3% of cancers diagnosed globally (1). Treatment of bladder cancer has not changed significantly in over 20 years, and outcomes for patients remain suboptimal. Approximately 20–30% of newly diagnosed patients present with muscle-invasive (MI) disease (stages T2–4) or metastatic disease, while up to a third of patients with initially non-muscle-invasive (non-MI) disease (stages Ta/T1/Tis) later progress to MI or metastatic disease (2). Clinical features including stage and grade are strongly associated with outcome and play an important role in determining treatment. For example, the 5-year tumor-specific mortality rates range from less than 5% for low-stage and low-grade disease up to approximately 50% for all MI lesions (3, 4). However, even though grade and stage are important predictors of outcome, there remains significant variability in the prognosis of patients with similar characteristics. This highlights the need to identify additional tumor characteristics that predict clinical behavior.

In recent years, there has been a growing interest in the use of gene expression signatures for risk stratification of cancer patients (5). Multiple groups have produced urothelial cancer gene signatures predictive of a range of tumor characteristics and outcomes, including stage (68), molecular subtype (9, 10), likelihood of recurrence and progression of non-MI (1113) and MI disease (14), and survival (7). Here we aimed to (i) explore if novel cancer gene expression patterns can be identified that stratify patients undergoing radical cystectomy for urothelial cancer by risk of recurrence and death; (ii) test our novel signatures and all previously developed signatures in all available bladder cancer microarray datasets; and (iii) investigate the gain in predictive accuracy of a validated clinical nomogram by combination with gene signature information.

MATERIALS AND METHODS

MSKCC Samples

Characteristics

We utilized previously unpublished cancer gene expression data from 93 patients undergoing radical cystectomy (RC) at Memorial Sloan-Kettering Cancer Center (MSKCC) between 1993 and 2004. Specimens were collected with MSKCC Institutional Review Board approval, and a waiver of authorization to review associated clinical data was obtained from the Board. The clinicopathologic characteristics are summarized in Table 1 (see Table S1 for details). In those 15 patients with non-MI disease on final pathologic analysis after RC, four had MI disease histologically in tissue obtained at the time of prior transurethral resection (TUR), and the remaining patients had high-risk features (e.g. extensive volume of disease, recurrent or BCG-refractory disease). Lymph node dissection was performed in 77 patients; no patient had metastatic disease at the time of RC. Chemotherapy was administered to 3 patients as neoadjuvant, 16 patients as adjuvant, and 19 patients as salvage for recurrent disease. Case selection was restricted to those with frozen specimens with measurable volume of malignancy and adequate percentage of tumor.

Table 1.

Clinicopathologic characteristics of the tumor samples.

Patient age at RC, median (range), years 69.1 (32.1–91.1)
Gender, number (%)
 Female 25 (27%)
 Male 68 (73%)
Histologic type
 Transitional Cell Carcinoma(TCC) 88 (95%)
 TCC/Squamous 5 (5%)
RC stage, number (%)
 pTa 5 (5%)
 pT1 10 (11%)
 pT2 17 (18%)
 pT3 42 (45%)
 pT4 19 (21%)
Lymph node status
 pN0 49 (52%)
 pN+ 28 (30%)
 pNxA 16 (18%)
Length of follow-up, median (range), mos 32 (1–175)
Last known status, number (%)
 NED 28 (30%)
 Death from other cause 27 (29%)
 Death from bladder cancer 38 (41%)
Recurrence, number (%)B
 No recurrence 30 (32%)
 Urothelial 9 (10%)
 Pelvic 20 (21%)
 Distant site 34 (37%)

RC = radical cystectomy;

A

: Lymph node dissection not done or no lymph nodes analyzed

B

: numbers total more than 100%, as some patients developed recurrence in more than one site.

Clinical endpoints

Overall survival time was defined as the time from date of RC to date of death or last follow-up. Disease-specific survival was defined as the time from RC to death attributed to urothelial cancer, when death occurred with known and progressive metastatic disease. If death was recorded in the institutional database without knowledge of a recurrent cancer or with documentation of another malignancy or a non-malignancy cause, death was attributed to other causes. Recurrence was defined as pelvic if within the pelvis below the aortic bifurcation, distant if there was visceral metastasis or recurrence above the aortic bifurcation, and urothelial if within the remaining urinary tract (renal pelvis, ureter, urethra). In the analysis of the development of metastases, only pelvic and distance recurrences were included. Primary endpoint in this study was overall survival if not mentioned otherwise.

Sample preparation

All patient specimens collected at the time of surgery were processed expeditiously within the Department of Pathology and stored in an institutional biospecimen bank. Frozen bladder tissues were examined by a genitourinary pathologist to identify tumor content, which was microdissected. Ten 50-μm sections were cut, with confirmation of tumor content by pathologic review. RNA was isolated with Trizol (Invitrogen) and cleaned with Rneasy mini kits (Qiagen) according to manufacturer protocols. Expression profiling was performed in the MSKCC Genomics Core Facility. In brief, 5–10 ug of RNA was transcribed utilizing a T7-oligo dT primer and converted to cDNA (Invitrogen cDNA synthesis kit). Biotinylated aRNA was produced from the cDNA using an in vitro transcription kit (Enzo Diagnostics). Following quality assessment using an Agilent Bioanalyzer, the aRNA was fragmented and hybridized onto Affymetrix U133 Plus 2.0 arrays, containing 54,675 probe sets. Data was uploaded to NCBI Geo (15)under accession number GSE31684.

External datasets

We obtained six independent bladder cancer microarray datasets (6, 7, 9, 10, 13, 16) (Table 2) totaling 578 additional patients and encompassing a broad spectrum of stages, grades, and histologic types. Overall survival data was available for the Blaveri et al., Sanchez-Carbayo et al., Lindgren et al., and Kim et al. cohorts (6, 7, 9, 13) (see Supplementary Data for preprocessing details). Disease-specific survival time information was only documented for patients in the cohort from Kim et al. The primary endpoint was therefore overall survival in the meta-analysis.

Table 2.

Independent bladder cancer microarray datasets.

Dataset Accession Number Platform # Probes # Samples (# invasive) Median Age (range) % Males % RC Median Follow-Up Time (max) % Death (DOD)
Blaveri (6) GSE1827 cDNA microarray 10,368 80 (53) 66 (28–113) 70.0% 62.5% 13 (145) 55.0% (n/a)
Sanchez-Carbayo (7) - Affymetrix U133A 22,283 90 (65) 69 (37–85) 68.9% n/a 25 (76) 40.0% (n/a)
Lindgren (9) GSE19915 cDNA microarray 2,506 144 (47) n/a n/a 32.6% 46 (180) 18.1% (n/a)
Dyrskjøt (10) GSE89 Affymetrix Hu6800 7,129 40 (9) n/a n/a n/a n/a n/a
Kim (13) GSE13507 Illumina human-6 v2.0 43,148 165 (62) 66 (24–88) 81.8% n/a 37 (137) 41.8% (19.4%)
Stransky (16) E-TABM-147 Affymetrix U95A/U95A v2 12,626 59 (33) n/a 83% n/a n/a n/a

RC = radical cystectomy; DOD = death of disease; n/a not available

Statistical analysis

Preprocessing

Expression estimates were obtained by the GCRMA algorithm (17).

Differential gene expression

The gene expression profiles were evaluated for significant correlation with multiple clinical variables and outcomes, including stage, lymph node status, recurrence, and overall survival. For the identification of differentially expressed probe sets, we used the LIMMA method (18). If not stated otherwise, the cutoff of the adjusted p-values (the false discovery rate, FDR) (19) was set to 0.01 and the minimum fold change (FC) to 1.5. The probability of observing n or more differentially expressed genes was estimated by permutation tests. For the external datasets, the lists of differentially expressed genes were downloaded from the supplementary material of the respective papers (6, 7, 9, 10, 13, 16) when available, otherwise calculated with LIMMA. When comparing lists of differentially expressed genes with corresponding lists from external datasets, the significance of overlaps was calculated in Bioconductor (GSEABase), using 25,000 permutations.

Machine learning

Classifiers were generated with Fisher’s linear discriminant and support vector machines (SVMs) and were leave-one-out cross-validated. The R packages MASS and e1071 were used for building the classifiers. We utilized a linear SVM kernel and default, untuned parameters. ROC curves were generated with the ROCR package (20).

Risk category creation

For each patient, a risk score was calculated by a Cox proportional hazards model that was fitted using the gene expression profiles of all remaining patient samples (leave-one-out cross-validation). The risk score was defined as the sum of the gene signature expression values, weighted by the Cox regression coefficients (21). Each individual was then classified into low-risk or high-risk categories, based on the median leave-one-out cross-validated risk score: patients were classified as low-risk when their risk score was smaller than the cohort median, otherwise as high-risk. This was done independently for all datasets for which survival information was available. The significance of survival curve differences of cross-validated models was estimated with permutation tests. The process from cross-validation to risk stratification was repeated 500 times with shuffled survival labels. This empirical chi-square distribution was then utilized to estimate p-values. For non-cross-validated models, the log-rank test was used. Multivariate survival prediction models were compared by the C-statistic, an estimator of the model concordance (22), and by likelihood-ratio tests. The concordance represents the probability that given two random non-censored individuals, the one with the higher risk score has a shorter survival time.

Clustering analysis

For unsupervised clustering, we used the Ward clustering method and the Pearson correlation distance as implemented in the pvclust R package (23). The significance of a cluster was reported as its bootstrap value, which is the proportion of 10,000 bootstrap samples showing this particular cluster topology. The clustering was applied to the six external datasets of bladder cancer samples.

Meta-analysis using published data sets

Published signatures

We compiled 49 published gene signatures (Table S2) associated with malignancy: 39 from Lauss et al. (24) as well as other bladder (1114) and melanoma signatures (25, 26); the melanoma signatures were included to test whether signatures from other solid tumors would perform well in bladder cancer. The 49 gene signatures were tested in our dataset and the 4 external gene expression datasets with survival information. Associations of gene expression signatures with outcome and other covariates were calculated using globaltest (27); this test can be used to estimate whether the expression of a group of genes is significantly associated with a particular response variable, for example stage or survival. Gene signatures were further tested by leave-one-out cross-validation. To avoid problems with highly correlated covariates in multivariate Cox proportional hazards models, expression values were scaled by principal component analysis (PCA). The number of components was chosen so that at least 99% of the expression variance was included in the model, with a maximum of 20 components. The choice of this cutoff was examined by varying the maximum number of components from 3 to 30 (Table S2). The performance of these cross-validated models was reported as the C-statistic of a univariate Cox model with the risk score as covariate.

Feature selection

The signatures were optimized by stepwise selection, an iterative procedure which serially removes and adds probe sets from a pool of candidates. The procedure was terminated when adding, removing or replacing a probe set did not further significantly improve the mean leave-one-out cross-validated Area Under Receiver Operating Curve (AUC) (for the stage and subtype signatures) or C-statistic (for the survival signature) over all datasets (see Supplemental Data for details). Thus all datasets were used for training and validation. In total, over a million promising signatures were cross-validated in all datasets.

Addition of new survival signature to existing predictive model for recurrence-free survival

We calculated the recurrence-free survival probabilities in our patients using the postoperative nomogram developed by the International Bladder Cancer Nomogram Consortium (IBCNC) (28). This nomogram, developed from data of over 9,000 patients, includes age, sex, time from diagnosis to surgery, pathologic tumor stage and grade, tumor histologic subtype, and regional lymph note status. The contribution of our newly curated gene signature for overall survival to the predictive ability of the nomogram was then evaluated. Multivariate modeling with nomogram information applied the leave-one-out cross-validated gene signature risk score as a second covariate. Thus two Cox models were used: the first model estimated the risk as described for the Kaplan-Meier analysis using the gene signature alone; the second model combined the nomogram and the signature score. The right endpoint of this second model was set to be recurrence-free survival. Concordance was estimated with the C-statistics approach (22) for a univariate model with the nomogram alone and then for a multivariate model with the addition of the survival gene signature. Prediction models were compared by the likelihood ratio test.

In Figure 1, we provide an overview of the analyses performed in this study.

Figure 1. Overview of the study.

Figure 1

This study represents a massive meta-analysis of our cohort of high-risk bladder cancer patients (Table 1) and 6 independent studies (Table 2). All data was subject to comprehensive differential expression and survival analyses and to hierarchical clustering. Machine learning algorithms were used to find gene signatures predictive of stage, survival and molecular subtype. The survival gene signature was then included in a validated postoperative nomogram, developed by the International Bladder Cancer Nomogram Consortium (IBCNC) (28).

RESULTS

Differentially expressed genes are associated with clinical features and outcome

Comparison by pathologic stage

When comparing non-MI with MI samples in our data set, we found 636 significantly differentially expressed probe sets, which significantly overlapped with the corresponding lists of genes in the six independent studies (P < 0.0001, Tables S3a, S4S5, Figure S1). FN1, a member of the integrin signaling pathway, and several other members or close interaction partners of this pathway were upregulated in MI as compared to non-MI tumors in most datasets we investigated; these genes include ACTN1, COL1A2, COL3A1, COL5A2, COL6A3, COL11A1, COL16A1, FBN1, FLNA, LUM, TGFBI, and TNC. Furthermore, we found that members of the transforming growth factor-beta (TGF-β) signaling pathway were present in most lists of differentially expressed genes; for example, SMAD3, SMAD6, and BMP7 were overexpressed in non-MI as compared to MI tumors, while INHBA, NNMT, and POSTN were overexpressed in MI samples. To further characterize the association of gene expression with pathologic stage, we created a classifier based on consistently differentially expressed genes among all datasets. The optimized classifier (Table S6) identified MI tumors with a mean accuracy of 89.0% (88.4% SVM, Figure S2) over all datasets.

Comparison by recurrence status

We then compared the gene expression patterns of patients who developed metastasis (pelvic or distant recurrence) and died of bladder cancer with those of patients who did not. When considering samples of all stages, we identified no significantly differentially expressed genes. However, when restricting this comparison to patients with pathologically organ-confined MI (pT2N0) tumors (17 patients, 5 of whom developed metastasis), we identified 53 differentially expressed genes with FDR < 10% in our dataset (P < 0.06, Table S3b). In the external datasets, the number of pT2N0 tumors with later metastasis was too small for the identification of significantly differentially expressed genes in this comparison. Among those genes that were differentially expressed in patients who developed metastasis and died were PERP (a TP53 apoptosis factor), ATXN10 (an activator of the Ras-MAPK pathway) [24]; and PCM1 (associated with papillary thyroid cancer, chronic myeloid leukemia, and myeloproliferative disorders) [25].

Identification of a robust survival signature for MI tumors

Validation of published signatures

We then evaluated 49 published gene signatures (Table S2) alongside our classifiers described above for correlation with clinical features in our dataset and with the four independent gene expression datasets with available survival information (6, 7, 9, 13). Due to the smaller number of genes on the Lindgren et al. and Blaveri et al. cDNA microarrays, only fractions of these signatures’ genes could be mapped. We observed overfitting of published survival signatures so that most signatures achieved significance, in terms of survival information, only in the datasets used to identify them (Table 3). Applying previously published signatures to our cohort, we found no significant association between survival and any signatures. Nevertheless, when we analyzed all genes from all survival and progression signatures in a univariate fashion, several markers that achieved low p-values in multiple datasets emerged; for example BST2, HLA-G, ICAM1 and LIMCH1 from the Smith et al. signature (14); DGCR2, DSC3 and SCEL from Kim et al. (13); UBE2C, VCAM1 and WNT5B from Blaveri et al. (6); APOBEC3B from Lindgren et al. (9) and ATXN10, C17orf39 and INPP4B from Sanchez-Carbayo et al. (7) (Table S2). Especially striking are MAP kinases, with different members present in various signatures (7, 9, 11, 14, 29).

Table 3. Performance of published gene signatures.

The table shows performance measures of all published bladder cancer survival gene signatures and three progression signatures. All signatures were tested in all public datasets with survival information(muscle-invasive tumors only). The p-values represent the probabilities that there is no association between expression and survival (27). The C-statistic (C) of a leave-one-out cross-validated Cox model with the gene intensities as covariates is reported as a second performance measure. A C of 0.5 corresponds to a random model, one of 1.0 to a perfect model. All signatures were originally identified with the corresponding dataset, except for the Smith et al. signature (14), which was obtained by training with the Sanchez-Carbayo dataset (7) and with unpublished data. Our signature was obtained by using all datasets for training. Tested right endpoint was overall survival in all datasets, not necessarily the endpoint for which the signature was designed for. See Figure S6 for a detailed analysis of 41 other signatures.

Signature Endpoint Stage Datasets
Blaveri (6) Kim (13) Lindgren (9) Sanchez-Carbayo (7) Riester
P C P C P C P C P C
Blaveri (6) OS MI 0.01 0.66 0.66 0.57 0.15 0.60 0.66 0.61 0.73 0.56
Kim (13) OS MI 0.41 0.60 0.00 0.75 0.76 0.53 0.40 0.54 0.84 0.50
Kim (13) DSS MI 0.11 0.57 0.00 0.72 0.84 0.63 0.26 0.60 0.87 0.56
Kim (13) Progression non-MI 0.70 0.50 0.09 0.56 0.53 0.53 0.04 0.59 0.27 0.56
Lindgren (9) OS all 0.29 0.55 0.17 0.55 0.10 0.72 0.48 0.50 0.93 0.54
Sanchez-Carbayo (7) OS MI 0.15 0.56 0.56 0.54 0.33 0.50 0.02 0.60 0.39 0.50
Sanchez-Carbayo (7) Progression MI 0.21 0.50 0.02 0.58 0.42 0.73 0.15 0.51 0.95 0.54
Smith (14) Progression MI 0.42 0.54 0.17 0.54 0.15 0.58 0.15 0.60 0.79 0.50
Riester OS MI 0.04 0.76 0.04 0.75 0.002 0.87 0.005 0.74 0.06 0.71

P = p-value; C = C-statistic; OS = overall survival; DSS = disease-specific survival, MI=muscle-invasive

Validation of published signatures, stratified by molecular subtype

As reported in multiple other studies (6, 7, 9), bladder tumors cluster in two very distinct molecular subtypes. We reproduced this finding in all analyzed datasets (Figure S4) and further, found significant differences in survival between the two subtypes (Figure S5). The expression differences between the subtypes were remarkable and overlapped significantly across datasets (P < 0.00004, Table S5). This allowed a robust classification of tumors by subtype with a novel 19-gene signature (Table S7, Figure S6). The very heterogeneous landscape of MI bladder tumors motivated us to evaluate the performance of all signatures in the two subtypes separately. Two progression signatures, one developed for bladder (6) and one for breast cancer (30), were significantly associated with survival in our cohort when applied to one of the two subtypes (Figure S7). By analyzing primary and recurrent tumors from the Kim et al. cohort (13), we found evidence that tumors can progress from one molecular subtype to the other (Table S8).

Curation of a novel survival signature

We then curated a new 20-gene overall survival signature for MI tumors (Table 4) by stepwise optimization and tested its predictive accuracy in our dataset and the four external datasets with available survival information. A leave-one-out cross-validated multivariate Cox proportional hazards model based on this signature classified tumors into two equally sized risk groups with significantly different survival in all datasets (Figure 2a-e). As an example, with the model applied to the dataset by Kim et al. (13), there was a 5-fold increase in the risk of death in those classified as high-risk (HR 5.05, 95% CI 2.26 – 11.3, P < 0.008) compared to those classified as low-risk. See Figures S8–S9 for details of the signature. Neoadjuvant chemotherapy, administered to three patients in our cohort, had no impact on the prediction accuracies (Table S9). We next compared the prediction accuracies of the signature with pathologic stage and grade in all cohorts and found highly significant improvements of model concordance and likelihood in all datasets (mean ΔC-statistic: 0.14, 95% CI 0.01–0.27; likelihood-ratio test: P < 0.001; Figure 2f, Table S10).

Table 4.

Gene signature predictive of overall for patients with MI tumors(stage T2–T4).

Gene Symbol Probeset Gene Name
APOBEC3B 206632_s_at apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like 3B
ATF3 202672_s_at activating transcription factor 3
CCL5 204655_at chemokine (C-C motif) ligand 5
DGCR2 214198_s_at DiGeorge syndrome critical region gene 2
ENDOD1 212573_at endonuclease domain containing 1
FADD 202535_at Fas (TNFRSF6)-associated via death domain
JUNB 203022_at ribonuclease H2, subunit A
LMO7 242722_at LIM domain 7
MAP2K1 202670_at mitogen-activated protein kinase kinase 1
MAP3K1 225927_at mitogen-activated protein kinase kinase kinase 1
PDGFC 218718_at platelet derived growth factor C
PEA15 200787_s_at phosphoprotein enriched in astrocytes 15
PFN1 214617_at perforin 1 (pore forming protein)
PPP1R12A 201604_s_at protein phosphatase 1, regulatory (inhibitor) subunit 12A
PRDX1 208680_at peroxiredoxin 1
PRMT1 206445_s_at protein arginine methyltransferase 1
SLC1A5 208916_at solute carrier family 1 (neutral amino acid transporter), member 5
TNFAIP6 206025_s_at tumor necrosis factor, alpha-induced protein 6
TSG101 201758_at tumor susceptibility gene 101
TSPAN5 209890_at tetraspanin 5
Figure 2. Prediction accuracy of our novel gene signature.

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

(a–e) Kaplan-Meier plots for low-and high-risk groups in five datasets. Hazard ratios (HRs) indicate how well the signature separates tumors. A HR of 3.1, for example, corresponds to 3.1-fold increase in risk for the high-risk group. Some genes were not present on all microarray platforms. The novel signature is shown in Table 4. (a) Riester (all 20 genes); (b) Blaveri (18 genes); (c) Kim (20 genes); (d) Lindgren (10 genes); (e) Sanchez-Carbayo (20 genes). In all plots, patients were stratified in two equally sized groups by the cohort medians of the calculated risk scores. We further tested whether the gene signature improves the predictive utility of stage and grade, two established predictors available for all 5 datasets. This comparison was done by analyzing the concordance (C-statistic, (22)) of the models. A concordance of 0.5 corresponds to a random model, one of 1.0 of a perfect model. (f) C-statistics for all 5 datasets for a multivariate model with stage, grade and the gene signature risk score; for model with gene signature risk score alone; and for a model with only stage and grade. Error bars indicate 95% confidence intervals.

Addition of survival signature to the nomogram increases its predictive accuracy

We then incorporated the new signature developed for MI tumors (Table 4) into the International Bladder Cancer Nomogram Consortium (IBCNC) nomogram (28), which was designed to predict 5-year recurrence-free survival after RC. Necessary data points for the nomogram were not available in the external datasets. When tested in our MI patient cohort, a multivariate Cox model predicting recurrence-free survival with both nomogram and gene signature score (C-statistic: 0.66, 95% CI 0.56 – 0.76) was a significant improvement (ΔC-statistic: 0.08, 95% CI −0.04–0.20, p-value < 0.005, likelihood-ratio test) over a model based on the nomogram score alone (C-statistic: 0.58, 95% CI 0.43–0.72). As the signature was designed for overall survival due to the lack of recurrence dates in the independent data, the prediction accuracies increased profoundly in a model predicting overall survival (ΔC-statistic: 0.15, 95% CI 0.07–0.23, Table S10).

DISCUSSION

This study demonstrates the utility of adding molecular information to an existing prognostic tool in bladder cancer patients who undergo RC. Gene expression data from a new cohort of bladder cancer patients was analyzed and compared to gene expression data from multiple published studies. A robust expression signature was identified that improved the predictive outcome of a well-accepted nomogram, and was independently predictive of survival in all cohorts where survival data was available. These findings are particularly striking given the relative homogeneity of the population analyzed, since our cohort, having all undergone the most aggressive surgical therapy, is highly selected for being at high risk of death from disease.

One important contribution of this study is the clinical validation of published gene signatures. To be clinically relevant, new survival markers must (i) stratify patients in groups with significantly different survival; (ii) deliver survival information that is not already included in established clinically used predictors such as grade and stage; and (iii) increase the accuracies of existing prediction models to an extent that warrants the cost and effort of obtaining biomarker status (31). Previously published gene signatures (Table S2), however, performed poorly in our patient cohort: no gene signature provided more survival information than would be expected by chance. This finding can partly be explained by cross-platform issues like incomplete mapping of genes to available probe sets or probe sets targeting different exons. Another reason is a likely overfitting of published signatures due to rather small training datasets for the very heterogeneous group of MI bladder cancers. To circumvent these limitations, we optimized the survival signatures for the subgroup of MI tumors, and showed that gene expression clearly stratifies patients into 2 groups, denoted as low-risk and high-risk. The new signature further segregated by risk 305 patients with MI tumors in the external datasets containing gene expression and survival information. Interestingly, 4 genes in our MI survival signature, APOBEC3B, DGCR2, PRMT1, SLC1A5, are located on chromosomes 22 and 19, in neighboring chromosome bands of SNPs recently described as associated with an increased risk of developing bladder cancer (32). In addition to our signature, several other genes (Table S2) significantly correlated with survival in multiple independent cohorts. These genes identified are potential therapeutic targets and worthy of further investigation to understand their role in bladder cancer. For example, inhibitors of MAP2K1 (i.e. MEK1), which is part of our overall survival signature (Table 4), are currently being evaluated in several clinical trials and our results provide a rationale for this in bladder cancer. Our screen for potential biomarkers in multiple datasets thus also has important applications in gene selection for technologies such as NanoString®, which can target only a limited number of genes.

To place our signature in the context of currently known clinical data, we applied the survival signature to a robust nomogram that predicts outcome after RC (28). This postoperative nomogram was shown to have a significantly better predictive accuracy (concordance index 0.75) than standard TNM (CI 0.68, P < 0.001) or pathologic (CI 0.62, P < 0.001) subgroupings. When this model was applied to our patient cohort with the addition of our newly developed survival signature, we observed statistically significant improvements in the prediction of risk of death as compared to the use of the nomogram only. As the signature reflects unique molecular features associated with risk, its addition to the nomogram thus provides more insights into the biologic behavior of a patient’s malignancy, and leads to improved prognostication as compared to the use of the nomogram alone. A more accurate prediction of progression may help select the higher-risk patients for additional treatment, such as adjuvant systemic therapy after surgery. External validation of these findings is ongoing.

In addition, our analysis led to the identification of 53 genes significantly associated with the development of metastasis and death of patients with pathologically organ-confined MI disease (pT2N0). This list of genes establishes a molecular signature of less favorable outcome, which may help stratify patients in this identically staged subgroup of bladder tumors to more or less intensive multimodal therapy. Patients with this stage of disease are typically followed expectantly, but evaluation of gene expression data or testing for downstream products of expression may identify those patients at higher risk for progression who may benefit from systemic therapy after radical surgery. The investigation of this list in a prospective collection of similar cystectomy samples is essential. An interesting additional opportunity for evaluation would be among patients with MI disease at the time of TUR—by characterizing a greater risk of aggressive behavior, we may more carefully select patients for chemotherapy prior to radical surgery as well, to improve our survival outcomes using multimodal therapy.

We found that fibronectin 1 (FN1) was significantly overexpressed in MI tumors as compared to non-MI samples in all datasets we analyzed. The role of FN1 in cancer invasion and metastasis is still poorly understood, but its increased expression might be the result of an enhanced recruitment of cancer-associated fibroblasts (33, 34). The consistency across datasets suggests that the abundant differential expression of FN1 and other extracellular matrix (ECM) genes is unlikely due to stromal contamination. Our stage classifier (Table S6) could identify patients as having MI disease with a mean accuracy of 89%, which is within the range of previously published classifiers (6, 24). However, our stage classifier was validated in more datasets and consists of fewer genes than the published classifiers, which offer advantages forits potential clinical use. Furthermore, the accuracy rate of 89% does not imply a false classification rate of 11% as some tumors might have been incorrectly classified histologically. Urothelial cancer of the bladder is exceptionally difficult to stage accurately prior to cystectomy, given technical limitations of endoscopic resection and imaging. Significant rates of upstaging from time of TUR to RC have been reported, ranging from 31–61% (3537). In our patient cohort, over 80% of the tumors were determined to have higher pathologic stage at the time of cystectomy, when compared to the preoperative TUR specimens, with many tumors found to be locally advanced (>pT2) when clinically staged as organ-confined (≤pT2).

The unsupervised clustering produced two strikingly different groups of patients, who were not clearly distinguishable by many clinical variables such as the presence of lymph node metastases or recurrence status. Gene expression showed large differences between the two groups (Table S3c) with highly significant overlaps in all datasets (Table S5), allowing a robust identification of cluster membership by a 19-gene signature (Table S7). Lindgren et al. (9) similarly identified two distinct molecular subtypes (denoted as MS1 and MS2) by analyzing 144 tumor samples obtained by transurethral bladder biopsy, representing a wide range of tumor stage and grade. Their investigation, which included whole genome and specific mutational analyses, identified a greater number of genomic alterations in the MS2 group, containing nearly all of the MI samples. The genomic instability in the MS2 group was also associated with expression of genes involved in cell-cycle pathways and cellular transformation (9), all indicative of more aggressive behavior and worse prognosis. Consistent with their results, we found a progression from MS1 to MS2 to be more likely than a progression from MS2 to MS1 when analyzing 23 recurrent tissue samples from Kim et al. (13). While only stage had a significant association with subtype, other covariates not available in this study, for example response to treatments, could be tested for association in prospective data. Not surprisingly, survival signatures often showed very different prediction accuracies in the two subtypes. This heterogeneity of bladder cancer tumors thus points to the necessity of large training data to establish robust signatures. Interestingly, from all 49 considered previously published signatures, a breast cancer progression signature (30) showed the highest association with overall survival in the MS2 group of our patient cohort (Figure S7a). This finding suggests that optimizing thoroughly validated signatures designed for other solid tumors might yield more robust signatures than developing new signatures from scratch, in particular when only small training datasets are available. The identification of different molecular signatures for subtype suggests that these subtypes, which are strongly associated with invasion status, may be a building block for future work to non-invasively determine the aggressiveness of tumors; for example, these changes in expression may be detectable in urine.

Our dataset represents a high-risk population of bladder cancer patients who underwent RC at a high-volume academic center. The sample size is large for a single-center report of microarray analysis and comprises, to our knowledge, the largest number of high-risk bladder cancer patients of all published microarray datasets. One limitation of our findings is, however, a possible element of overfitting, resulting from the use of the same data in testing and validation steps, although this concern is mitigated through the use of statistical methods such as cross-validation. This points to the imperative for validation in independent data of any gene signature associated with survival. Such a validation is necessary because signatures established from single, moderately sized datasets often perform poorly in external data due to the relatively high probability that genes correlate with survival by chance. However, extending the sample size with independent data, as performed here, can minimize this problem. For example, a gene found to be predictive in one dataset could arise due to chance, but if one gene is predictive with consistent expression in multiple datasets, it is more likely to be a robust survival marker.

A considerable strength of our investigation is its comprehensive effort to combine the majority of published data on gene expression in bladder cancer, all arrayed on different platforms, in a meaningful way. With this study, we have shown that meta-analyses across platforms are feasible. Especially in rare cancers or subtypes, the increase in sample size might often outweigh the disadvantages of meta-analyses, such as the heterogeneity of patient cohorts, the different treatments before and after surgery, and the decrease in potential markers when focusing on genes present on all platforms. Nonetheless, large independent patients cohorts are needed for further validation and optimization (i.e., for other endpoints such as recurrence-free survival or for particular microarray platforms) of the prognostic signatures, before warranting a clinical trial (38). The results of this study strongly motivate such sample collection for urothelial tumors.

CONCLUSIONS

Our gene expression analysis identifies novel signatures predictive of tumor stage, progression among patients with organ-confined disease, molecular subtypes of tumors, and overall survival. We further incorporated six published gene expression datasets for a large-scale comparison and validation of predictive signatures, and found a significant increase in predictive accuracy of a postoperative nomogram by addition of our gene signature. Multiple genes identified across several sets have emerged as highly promising candidates for further investigation. The identification of patients at higher risk for progression or death will provide additional rationale for multimodality management, using chemotherapy and surgery, in an effort to improve survival.

Supplementary Material

1
2
3
4
5
6

Translational Relevance.

Urothelial carcinoma of the urinary bladder is the fifth most common cancer in the Western world, representing 3% of the global cancer incidence. Bladder cancers exhibit a heterogeneous clinical behavior highlighted by frequent recurrences in patients with non-invasive tumors and the potential for the development of metastatic disease in those with invasive lesions. Due to this variation, better tools to predict prognosis and refine treatment are needed. Here we evaluated the utility and reproducibility of expression array differences in bladder tumors as potential prognostic indicators. Our analysis identified novel signatures associated with tumor stage, progression among patients with organ-confined disease, and molecular subtypes of tumors with relevance to disease-specific survival. Incorporation of our gene expression signatures furthermore significantly improved the prediction using a postoperative nomogram. Identifying patients at higher risk for progression or death provides additional rationale for selection of multimodality management, using chemotherapy and surgery, in an effort to improve survival.

Acknowledgments

The authors acknowledge support from the NCI initiative to found Physical Science Oncology Centers (U54CA143798) (MR, FM) and an NRSA training grant T32-CA82088 from the National Institutes of Health (JMT). The authors would like to thank Parantu Shah, Dhruv Sharma, Levi Waldron, Marta Sanchez-Carbayo, Mithat Gönen, Samir Amin, Yi Li, and the Michor lab.

References

  • 1.Jemal A, SR, Ward E, Hao Y, Xu J, Thun MJ. Cancer statistics, 2009. CA Cancer J Clin. 2009;59:225–49. doi: 10.3322/caac.20006. [DOI] [PubMed] [Google Scholar]
  • 2.Kaufman DS, Shipley WU, Feldman AS. Bladder cancer. Lancet. 2009;374:239–49. doi: 10.1016/S0140-6736(09)60491-8. [DOI] [PubMed] [Google Scholar]
  • 3.Donat SM. Evaluation and follow-up strategies for superficial bladder cancer. Urol Clin North Am. 2003;30:765–76. doi: 10.1016/s0094-0143(03)00060-0. [DOI] [PubMed] [Google Scholar]
  • 4.Herr HW, Dotan Z, Donat SM, Bajorin DF. Defining optimal therapy for muscle invasive bladder cancer. J Urol. 2007;177:437–43. doi: 10.1016/j.juro.2006.09.027. [DOI] [PubMed] [Google Scholar]
  • 5.McDermott U, Downing JR, Stratton MR. Genomics and the continuum of cancer care. N Engl J Med. 2011;364:340–50. doi: 10.1056/NEJMra0907178. [DOI] [PubMed] [Google Scholar]
  • 6.Blaveri ESJ, Korkola JE, Brewer JL, Baehner F, Mehta K, Devries S, Koppie T, Pejavar S, Carroll P, Waldman FM. Bladder cancer outcome and subtype classification by gene expression. Clin Cancer Res. 2005;11:4044–55. doi: 10.1158/1078-0432.CCR-04-2409. [DOI] [PubMed] [Google Scholar]
  • 7.Sanchez-Carbayo MSN, Lozano J, Saint F, Cordon-Cardo C. Defining molecular profiles of poor outcome in patients with invasive bladder cancer using oligonucleotide microarrays. J Clin Oncol. 2006;24:778–89. doi: 10.1200/JCO.2005.03.2375. [DOI] [PubMed] [Google Scholar]
  • 8.Elsamman E, Fukumori T, Ewis AA, Ali N, Kajimoto K, Shinohara Y, et al. Differences in gene expression between noninvasive and invasive transitional cell carcinoma of the human bladder using complementary deoxyribonucleic acid microarray: preliminary results. Urol Oncol. 2006;24:109–15. doi: 10.1016/j.urolonc.2005.07.011. [DOI] [PubMed] [Google Scholar]
  • 9.Lindgren DFA, Gudjonsson S, Sjödahl G, Hallden C, Chebil G, Veerla S, Ryden T, Månsson W, Liedberg F, Höglund M. Combined gene expression and genomic profiling define two intrinsic molecular subtypes of urothelial carcinoma and gene signatures for molecular grading and outcome. Cancer Res. 2010;70:3463–72. doi: 10.1158/0008-5472.CAN-09-4213. [DOI] [PubMed] [Google Scholar]
  • 10.Dyrskjøt LTT, Kruhøffer M, Jensen JL, Marcussen N, Hamilton-Dutoit S, Wolf H, Orntoft TF. Identifying distinct classes of bladder carcinoma using microarrays. Nat Genet. 2003;33:90–6. doi: 10.1038/ng1061. [DOI] [PubMed] [Google Scholar]
  • 11.Birkhahn MMA, Williams AJ, Lam G, Ye W, Datar RH, Balic M, Groshen S, Steven KE, Cote RJ. Predicting recurrence and progression of noninvasive papillary bladder cancer at initial presentation based on quantitative gene expression profiles. Eur Urol. 2010;57:12–20. doi: 10.1016/j.eururo.2009.09.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Catto JWAM, Wild PJ, Linkens DA, Pilarsky C, Rehman I, Rosario DJ, Denzinger S, Burger M, Stoehr R, Knuechel R, Hartmann A, Hamdy FC. The application of artificial intelligence to microarray data: identification of a novel gene signature to identify bladder cancer progression. Eur Urol. 2010;57:398–406. doi: 10.1016/j.eururo.2009.10.029. [DOI] [PubMed] [Google Scholar]
  • 13.Kim WJ, Kim EJ, Kim SK, Kim YJ, Ha YS, Jeong P, et al. Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol Cancer. 2010;9:3. doi: 10.1186/1476-4598-9-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Smith SC, Baras AS, Dancik G, Ru Y, Ding KF, Moskaluk CA, et al. A 20-gene model for molecular nodal staging of bladder cancer: development and prospective assessment. Lancet Oncol. 2011;12:137–43. doi: 10.1016/S1470-2045(10)70296-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30:207–10. doi: 10.1093/nar/30.1.207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Stransky NVC, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, de Rycke Y, Elvin P, Cassidy A, Spraggon C, Graham A, Southgate J, Asselain B, Allory Y, Abbou CC, Albertson DG, Thiery JP, Chopin DK, Pinkel D, Radvanyi F. Regional copy number-independent deregulation of transcription in cancer. Nat Genet. 2006;38:1386–96. doi: 10.1038/ng1923. [DOI] [PubMed] [Google Scholar]
  • 17.Wu ZIR, Gentleman R, Martinez-Murillo F, Spencer F. A model-based background adjustment for oligonucleotide expression arrays. Journal of the American Statistical Association. 2004;99:909–17. [Google Scholar]
  • 18.Smyth G. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article 3. doi: 10.2202/1544-6115.1027. [DOI] [PubMed] [Google Scholar]
  • 19.Benjamini YHY. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. 1995;B:289–300. [Google Scholar]
  • 20.Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in R. Bioinformatics. 2005;21:3940–1. doi: 10.1093/bioinformatics/bti623. [DOI] [PubMed] [Google Scholar]
  • 21.van Houwelingen HC, Bruinsma T, Hart AA, Van’t Veer LJ, Wessels LF. Cross-validated Cox regression on microarray gene expression data. Stat Med. 2006;25:3201–16. doi: 10.1002/sim.2353. [DOI] [PubMed] [Google Scholar]
  • 22.Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30:1105–17. doi: 10.1002/sim.4154. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Suzuki RSH. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006;22:1540–2. doi: 10.1093/bioinformatics/btl117. [DOI] [PubMed] [Google Scholar]
  • 24.Lauss MRM, Höglund M. Prediction of Stage, Grade, and Survival in Bladder Cancer Using Genome Wide Expression Data: A Validation Study. Clin Cancer Res. 2010;16:4421–33. doi: 10.1158/1078-0432.CCR-10-0606. [DOI] [PubMed] [Google Scholar]
  • 25.Mandruzzato SCA, Turcatel G, Francescato S, Montesco MC, Chiarion-Sileni V, Mocellin S, Rossi CR, Bicciato S, Wang E, Marincola FM, Zanovello P. A gene expression signature associated with survival in metastatic melanoma. J Transl Med. 2006;4:50. doi: 10.1186/1479-5876-4-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Soikkeli JPP, Yin M, Nummela P, Jahkola T, Virolainen S, Krogerus L, Heikkilä P, von Smitten K, Saksela O, Hölttä E. Metastatic outgrowth encompasses COL-I, FN1, and POSTN up-regulation and assembly to fibrillar networks regulating cell adhesion, migration, and growth. Am J Pathol. 2010;177:387–403. doi: 10.2353/ajpath.2010.090748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Goeman JJvdGS, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004;20:93–9. doi: 10.1093/bioinformatics/btg382. [DOI] [PubMed] [Google Scholar]
  • 28.Bochner BH, Kattan MW, Vora KC. Postoperative nomogram predicting risk of recurrence after radical cystectomy for bladder cancer. J Clin Oncol. 2006;24:3967–72. doi: 10.1200/JCO.2005.05.3884. [DOI] [PubMed] [Google Scholar]
  • 29.Mitra AP, Pagliarulo V, Yang D, Waldman FM, Datar RH, Skinner DG, et al. Generation of a concise gene panel for outcome prediction in urinary bladder cancer. J Clin Oncol. 2009;27:3929–37. doi: 10.1200/JCO.2008.18.5744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–6. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
  • 31.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–38. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Rothman N, Garcia-Closas M, Chatterjee N, Malats N, Wu X, Figueroa JD, et al. A multistage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat Genet. 2010;42:978–84. doi: 10.1038/ng.687. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74. doi: 10.1016/j.cell.2011.02.013. [DOI] [PubMed] [Google Scholar]
  • 34.Kalluri R, Zeisberg M. Fibroblasts in cancer. Nat Rev Cancer. 2006;6:392–401. doi: 10.1038/nrc1877. [DOI] [PubMed] [Google Scholar]
  • 35.Ficarra V, Dalpiaz O, Alrabi N, Novara G, Galfano A, Artibani W. Correlation between clinical and pathological staging in a series of radical cystectomies for bladder carcinoma. BJU Int. 2005;95:786–90. doi: 10.1111/j.1464-410X.2005.05401.x. [DOI] [PubMed] [Google Scholar]
  • 36.Shariat SF, Palapattu GS, Karakiewicz PI, Rogers CG, Vazina A, Bastian PJ, et al. Discrepancy between clinical and pathologic stage: impact on prognosis after radical cystectomy. Eur Urol. 2007;51:137–49. doi: 10.1016/j.eururo.2006.05.021. discussion 49–51. [DOI] [PubMed] [Google Scholar]
  • 37.Millikan R, Dinney C, Swanson D, Sweeney P, Ro JY, Smith TL, et al. Integrated therapy for locally advanced bladder cancer: final report of a randomized trial of cystectomy plus adjuvant M-VAC versus cystectomy with both preoperative and postoperative M-VAC. J Clin Oncol. 2001;19:4005–13. doi: 10.1200/JCO.2001.19.20.4005. [DOI] [PubMed] [Google Scholar]
  • 38.Koscielny S. Why most gene expression signatures of tumors have not been useful in the clinic. Sci Transl Med. 2010;2:14ps2. doi: 10.1126/scitranslmed.3000313. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4
5
6

RESOURCES