Skip to main content
BMC Cancer logoLink to BMC Cancer
. 2016 Jun 3;16:351. doi: 10.1186/s12885-016-2361-7

Prediction of non-muscle invasive bladder cancer outcomes assessed by innovative multimarker prognostic models

E López de Maturana 1, A Picornell 1, A Masson-Lecomte 1, M Kogevinas 2,10, M Márquez 1, A Carrato 3, A Tardón 4,10, J Lloreta 5, M García-Closas 6, D Silverman 7, N Rothman 7, S Chanock 7, F X Real 8, M E Goddard 9, N Malats 1,; On behalf of the SBC/EPICURO Study Investigators
PMCID: PMC4893282  PMID: 27259534

Abstract

Background

We adapted Bayesian statistical learning strategies to the prognosis field to investigate if genome-wide common SNP improve the prediction ability of clinico-pathological prognosticators and applied it to non-muscle invasive bladder cancer (NMIBC) patients.

Methods

Adapted Bayesian sequential threshold models in combination with LASSO were applied to consider the time-to-event and the censoring nature of data. We studied 822 NMIBC patients followed-up >10 years. The study outcomes were time-to-first-recurrence and time-to-progression. The predictive ability of the models including up to 171,304 SNP and/or 6 clinico-pathological prognosticators was evaluated using AUC-ROC and determination coefficient.

Results

Clinico-pathological prognosticators explained a larger proportion of the time-to-first-recurrence (3.1 %) and time-to-progression (5.4 %) phenotypic variances than SNPs (1 and 0.01 %, respectively). Adding SNPs to the clinico-pathological-parameters model slightly improved the prediction of time-to-first-recurrence (up to 4 %). The prediction of time-to-progression using both clinico-pathological prognosticators and SNP did not improve. Heritability (ĥ2) of both outcomes was <1 % in NMIBC.

Conclusions

We adapted a Bayesian statistical learning method to deal with a large number of parameters in prognostic studies. Common SNPs showed a limited role in predicting NMIBC outcomes yielding a very low heritability for both outcomes. We report for the first time a heritability estimate for a disease outcome. Our method can be extended to other disease models.

Electronic supplementary material

The online version of this article (doi:10.1186/s12885-016-2361-7) contains supplementary material, which is available to authorized users.

Keywords: Multimarker models, Bayesian statistical learning method, Bayesian regression, Bayesian LASSO, AUC-ROC, Determination coefficient, heritability, Bladder cancer outcome, Prognosis, Recurrence, Progression, Genome-wide common SNP, Illumina Infinium HumanHap 1 M array, Predictive ability

Background

Urothelial bladder cancer (UBC) is among the most common malignant tumors of the urological system and one of the most prevalent cancers due to its chronic nature [1]. As a consequence, it poses an enormous burden on health care systems [2].

UBC also represents a paradigm of heterogeneous diseases with respect to its phenotype and prognosis. Approximately, 75 % of newly diagnosed UBCs do not invade the muscle (non-muscle invasive bladder cancer, NMIBC) at the time of diagnosis. Most of these cancers remain stable over the time after a transurethral resection (TUR); a high proportion relapse without invading the muscle (recurrence) while a lower proportion progress as a muscle invasive bladder cancer (MIBC). Based on tumor characteristics, mainly stage and grade, NMIBC are subsequently classified as “low risk” (LR) and “high risk” (HiR) of progression [3].

Current prognostic tools for NMIBC are based on well-known clinico-pathological prognosticators such as pathological grade and stage, number and size of tumours, and presence of carcinoma in situ [3, 4]. However, these factors do not have enough discriminative ability to predict, at the patient level, the risk of recurrence and progression [5]. An accurate estimation of the outcome risk in the individual patient would help identifying the most appropriate therapy to avoid tumor progression and, hopefully reducing the number of follow-up cystoscopies in patients at low risk [6].

There is a growing evidence for a role of germline genetic polymorphisms in cancer risk and prognosis, UBC being a paradigm [7, 8]. However, the individual effect of the genetic variants is expected to be small and they may not be medically actionable. Multimarker analyses have been shown to capture a much higher percentage of the genetic variance than individual markers which passed the significant threshold in GWAS [911].

Our objective was to investigate whether genome-wide common SNP profiles are able to predict the risk of recurrence and progression in NMIBC patients and to estimate how much they contribute to these predictions when combined with clinico-pathological prognosticators. To this end, we adapted Bayesian statistical learning strategies to be applied to the human prognosis field for the first time.

Methods

Study population

This study was performed in patients with primary UBC included in the Spanish Bladder cancer (SBC)/EPICURO Study. Cases were recruited in 18 hospitals and followed up >10 years after diagnosis. A total of 1,105 patients had their diagnosis confirmed through a pathological review conducted by a panel of experts. Trained monitors collected detailed data on clinico-pathological prognosticators from clinical charts and followed the patients up prospectively through the participating hospitals and direct telephone interviews.

In this study, we focused on patients with a primary diagnosis of NMIBC (N = 995). Two endpoints were of interest: 1) Time-to-first-recurrence (TFR), defined as the reappearance of a NMIBC tumor following a previous negative follow-up cystoscopy, and 2) time-to-progression (TP), defined as the development of a muscle invasive tumor or a metastatic disease, or death because of UCB, after a previous diagnosis of NMIBC. Patients who did not present any event until the end of study, those lost of follow up and those who died from other causes were considered as censored either at last medical visit or at death.

Patients who underwent to a cystectomy were not considered in the analyses of TFR. A final number of 810 and 822 cases with NMIBC were available for the analyses of TFR and TP, respectively: 284 were HiR tumors (Ta high grade, T1 high grade, carcinoma in situ (CIS) and T1 low grade tumors) and 538 LR tumors (those presenting papillary UBC of low malignant potential or Ta low-grade papillary UBC according to the 2004 WHO classification).

Genotyping and quality control

Genotyping was performed as described in 12 and provided calls for 1,072,820 SNP genotypes. We excluded SNPs in sex chromosomes, those with a low genotyping rate (<95 %) and MAF < 0.02 in NMIBC.

Stringent LD pruning (r2 < 0.2) was applied to reduce the number of markers, prioritizing those with less missing data. In addition, SNPs found significant in a previous prognostic study were considered here [11]. The final numbers of assessed SNPs for TFR and TP were 171,295 and 171,304, respectively, providing a good coverage of the genome. Missing genotypes were imputed using the package randomForest in R [12].

Statistical model

We used a sequential threshold model [13] to analyze time-to-event data. This approach was previously applied in quantitative genetics [1315], although till present it has not been applied in a human genomic study. This model assumes that for an observation of a patient to be present at a given period of time, he/she must have survived through all previous time periods. Thus, the probability of not presenting the event of interest until interval k, conditional on the event that the k-th interval has been reached, is given by:

Pryi=k|yik1,γ,β=ΦγlX'βσe,

where γ corresponds to unordered cutoff points corresponding to each time interval, X corresponds to the incidence matrix of effects (β) affecting the liability to survive to the next interval given that the present interval has been reached. Residual variance (σ2e) was fixed to 1 to ensure identifiability of the parameters [16].

Patients were classified as censored or uncensored in each time interval considered for each event as displayed in Fig. 1. We divided the follow-up time for TFR and TP in 9 and 4 intervals, respectively, according to the survival functions for each event (see Figs. 2a and 3a). The analysis of TP was further stratified according to the tumor risk group (LR and HiR, see Fig. 3b). For these subgroup analyses the number of intervals was lower.

Fig. 1.

Fig. 1

Data censoring in each defined interval according to the presence/absence of event when a sequential threshold model is applied

Fig. 2.

Fig. 2

Survival function (solid line) and 95 % CI (dotted lines) of the time to recurrence (TFR) for the whole series (A) and according to the group of risk (B: HiR in red and LR in blue). Vertical lines separate the 9 time intervals considered for this outcome

Fig. 3.

Fig. 3

Survival function (solid line) and 95 % CI (dotted lines) of the time to progression (TP) for the whole series (a) and according to the group of risk (b: HiR in red and LR in blue). Vertical lines separate the 9 time intervals considered for this outcome

Three models were used in the analyses of each outcome: (1) Model including the clinico-pathological prognosticators only, (2) model including the SNP data only, and (3) model including both clinico-pathological prognosticators &SNP data. As for the first model, a Bayesian regression was used (see Additional file 1: Table S1). Further information of the model building is in Additional file 2: Supplementary methods. Regarding the second model, a Bayesian LASSO [17] was applied to analyze the predictive ability of common SNPs (see Additional file 2: Supplementary methods for further details). Finally, for the full model, a Bayesian regression coupled with LASSO [18, 19] was used. Priors and fully conditional distributions for both SNP and clinico-pathological prognosticators are described in Additional file 2: Supplementary methods.

Evaluation of the predictive ability

The predictive ability of each model in the whole cohort was evaluated through a 10 fold cross-validation (CV) [20]. When patients were stratified as HiR/LR for the TP analyses, a 2-fold CV procedure was performed instead, due to the low number of events. We measured the predictive ability of each model using two statistics: 1) the area under the ROC (AUC), generated with the ROCR package for R (www.r-project.org), and the determination coefficient on the liability scale (R2probit), which is the proportion of the total variance explained by predictors in the testing set on the probit liability scale [21]:

Rprobit2=varXtestβ^varXtestβ^+σe2

Results

Additional file 1: Table S2 provides the number of censored patients and events in each time interval according to the outcome of interest (TFR and TP).

Time to first recurrence

33 % of the patients with a primary NMIBC suffered a recurrence of the primary tumor (first recurrence). Fifty percent of patients presented the first recurrence during the first year and in most cases (94 %), the first recurrence was diagnosed during the first 4 years of follow up. Fifty-two percent of the NMIBC patients were censored at the end of the follow-up.

Table 1 and Additional file 1: Table S3 show the averaged AUC and R2probit obtained after the 10 fold CV analyses with the three models. The model including clinico-pathological prognosticators had an averaged AUC of 0.62. Model including only SNPs classified slightly better than random (AUC = 0.55). The joint model did not perform better (AUC = 0.61).

Table 1.

Averaged area under the ROC curve (AUC) and coefficient of determination (R 2probit), as well standard deviations (between parenthesis), obtained from the testing sets in the 10 fold-crossvalidation analyses of time to first recurrence (TFR) and time to progression in the whole (TP), high risk (TPHiR) and low risk (TPLR) cohorts

Model Criterion TFR TP TPHiR TPLR
Whole series Whole series HiR tumors LR tumors
CPP AUC 0.62 (0.05) 0.76 (0.09) 0.57 (0.04) 0.45 (0.02)
R 2probit 0.031 (0.004) 0.054 (0.013) 0.151 (0.013) 0.0358 (0.0094)
SNPs AUC 0.55 (0.02) 0.58 (0.09) 0.56 (0.01) 0.55 (0.01)
R 2probit 0.010 (0.001) 0.001 (0.000) 0.009 (0.002) 0.0005 (0.0002)
CPP&SNPs AUC 0.61 (0.05) 0.76 (0.10) 0.57 (0.03) 0.47 (0.02)
R 2probit 0.041 (0.006) 0.050 (0.013) 0.155 (0.019) 0.0267 (0.0099)

CPP clinico-pathological prognosticators

When the predictive ability was evaluated using R2probit, the model combining clinico-pathological prognosticators &SNPs performed the best, capturing 4 % of the phenotypic variance on the liability scale. The predictive abilities for the clinico-pathological prognosticators and the SNP models were 3 and 1 %, respectively; the latter being the first heritability estimate (ĥ2) for TFR in NMIBC reported so far.

Time to first progression

Whole cohort

Nine percent of the patients with a primary NMIBC suffered of a tumor progression during the follow-up. Fifty percent of the patients were diagnosed during the first two years and most of them (89 %) were diagnosed during the first 5 years (see Additional file 1: Table S2). Seventy five percent of the patients did not show any progression at the end of the follow-up period (>10 year). Table 1 and Additional file 1: Table S4 show the AUC and R2probit after the 10 CV analyses for TP. The model including clinico-pathological prognosticators had an averaged AUC of 0.76, a much higher value than the model with SNPs only (AUC = 0.58). Adding SNPs to clinico-pathological prognosticators did not increase their individual classification performance (AUC = 0.76). Clinico-pathological prognosticators explained 5.4 % of the phenotypic variance on the liability scale. Surprisingly, SNP explained only 0.1 % of the variance. Adding SNPs to the clinico-pathological prognosticators worsened the R2probit of the model (Table 1).

Patients at HiR

The majority (~70 %) of patients showed a progression during the first two years of follow-up and 75 % of them finished the follow-up without any progression (Additional file 1: Table S2). Table 1 and Additional file 1: Table S5 show the AUC and R2probit of the three models evaluated. The model including only clinico-pathological prognosticators classified the patients according to the TP similarly to the model including only SNPs (0.57 vs. 0.56, respectively). The model with the best R2probit for progression at HiR was the one considering clinico-pathological prognosticators (R2probit = 0.151). Including only common SNPs explained <1 % of the phenotypic variance of the cohort at HiR. Adding them to the clinico-pathological prognosticators increased their predictive ability by 2.6 % (R2probit = 0.155).

Patients at LR

Only 24 patients showed a progression during the follow-up (<5 %). Two thirds of those patients were diagnosed during the first 2 years of follow-up. Table 1 and Additional file 1: Table S6 present the AUC and R2probit of the three models corresponding to the 2 fold-CV procedure. The model including clinico-pathological prognosticators poorly categorized LG-NMIBC patients according to their progression status (AUC = 0.45). By including age at diagnosis we obtained a better classification (AUC = 0.68). The SNP model classified the patients slightly better than random (AUC = 0.55). The best R2probit was found for the model including only clinico-pathological prognosticators (0.0358). Adding SNPs to latter model worsened its R2probit (0.0267).

Discussion

Here we present a high dimensional model considering the time-to-event nature of the information and censored data enabling to accommodate a large number of variables in a relatively small number of individuals. To our knowledge, this is the first time that such a model is applied in the clinical and genetic epidemiology fields. More specifically, we have applied it to study the predictive ability of prognostic models for NMIBC patients.

The major goal in managing NMIBC patients is to prevent tumor relapse, this including both the high number of recurrences and the progression to MIBC. To this end, treatment needs to be tailored according to the aggressiveness of the disease. Therefore, accurate prognostic models are crucial. Currently, there are no validated prognostic molecular biomarkers to guide the clinical management of patients [22, 23] and the therapeutic decisions are still based on risk tables only including clinico-pathological prognosticators [3]. Here we have investigated the potential clinical utility of inherited genetic markers (SNP profiles) based on their robustness and precise measurements as well as on their time-independent nature in comparison to serological and histological markers. To this end we have assessed the ability to improve TFR and TP risk stratification in NMIBC patients of genome-wide common SNPs profiles. We have also evaluated the performance of well-known clinico-pathological prognosticators and how much the whole genome approach improved their performance to better classify patients.

Regarding the classification performance of clinico-pathological prognosticators alone, our sequential threshold models for both TFR and TP got similar estimates to those obtained previously by us with a Cox proportional hazard regression analysis [11]. Discrimination of patients according to the risk of TFR using clinico-pathological prognosticators was poorer than previously reported by Hernandez et al [24] (0.62 vs. 0.75), although better than that reported by Vedder et al [25] in a large cohort including ours. Nevertheless, it is worth noting that the definition of the outcome differs (recurrence vs. first recurrence) between our and these studies [24, 25]. Regarding TP outcome, our clinico-pathological prognosticators model classified the patients better than in Hernandez et al [24] (0.76 vs. 0.54) and than in a Danish cohort using both EORTC (0.76 vs. 0.72) and CUETO (0.76 vs. 0.74) scores [25]. However, it performed worse than in a Dutch cohort using the same classifiers: EORTC (0.76 vs. 0.81 and 0.77) and CUETO scores (0.76 vs. 0.82 and 0.81) [25].

The prediction ability of clinico-pathological prognosticators depends on the outcome. They clearly perform better in predicting TP than TFR, both in terms of classification (AUC, 0.76 vs. 0.62) and proportion of the explained variance (R2probit, 5.4 % vs. 3.1 %). Their lower performance when predicting TFR could be due to the dependence of factors other than biological explanations such as the potential incomplete resection of the tumor during the TURB and the tumour cell reimplantation on first tumour recurrence [23], factors that are difficult to be assessed and therefore are not accounted for in the model. When the patients were stratified according to their risk status, clinico-pathological prognosticators explained a larger proportion of the phenotypic variance (~15 %) in the HiR group than in the LR NMIBC, probably because these factors were specifically selected to identify patients with HiR tumors with a high potential of progression. However, the overall classification performance of HiR NMIBC patients was poorer (AUC = 0.57) than in the whole cohort. While the discriminatory ability of clinical-pathological parameters for both NMIBC outcomes is valuable, there is room for improvement. More accurate discriminatory models would better select patients for aggressive treatment as well as would avoid unnecessary treatments towards a better patient management. This justifies the search of further prognostic factors, among them tumour molecular alteration and inherited variation markers [3, 26, 27].

Our results showed that common genome-wide SNPs similarly, though poorly, classified patients regarding both TFR and TP in the whole series and in the HiR and LR subcohorts, AUCs ranging from 0.55 to 0.58. Adding SNP to the models did not improve the classification performance of clinico-pathological prognosticators although improvements of R2probit were achieved for TFR (3–4) and TP in the HiR cohort (15.1 - 15.5 %). Surprisingly, adding SNP to clinico-pathological prognosticators worsened the percentage of phenotypic variance (R2probit) explained by the model with clinico-pathological prognosticators only by 7 and 25 % when predicting TP in the whole and the LR-NMIBC cohorts, respectively. The little improvement or even deterioration in terms of R2probit could be explained by a correlation between the prediction of clinico-pathological prognosticators and that of SNPs. To confirm this, we calculated the R2probit of a model with Xβ^obtained from clinico-pathological prognosticators only as dependent variable and the SNPs as independent variables (see Tables 2 and Additional file 1: Table S6). The proportion of the clinico-pathological prognosticators prediction variances of TFR and TP explained by SNPs was larger than that of the TFR and TP phenotypic variances. The calculation of R2probit allowed us to report the first ĥ2 for TFR and TP in the whole series and in the HiR and LR subcohorts. The largest ĥ2 corresponded to TFR (1 %) and to TP of patients at HiR (1 %), although they may be underestimated because of the sample size and the limitation on the number of SNPs included in the model [28]. All the above explains the small or nil contribution of the SNPs to the predictive ability of clinico-pathological prognosticators of the phenotypes of interest. The poor predictive ability of common SNPs in NMIBC prognosis is in line with a previous study reporting low GWAS risk predictive values for UBC [19], as well as with those obtained in studies predicting risk for other neoplasms, such as breast cancer [29, 30]. The different results obtained with AUC and R2probit can be explained by the different scales in which the predictions are expressed (observable for AUC and liability for R2probit), their non-monotonic relationship, and the lower number of events, especially when the individuals were stratified.

Table 2.

Estimates of the determination coefficient (R 2probit) measuring the proportion of variance of the liability to first recurrence (TFR) and progression (TP) risks in whole, high risk (TPHiR) and low risk (TPLR) cohorts of the clinicopathological prognosticators explained by the common SNPs

TFR TP TPHiR TPLR
Whole series Whole series HiR tumors LR tumors
R 2probit 0.0260 0.0165 0.0025 0.0066

While this is one of the largest and well-characterized NMIBC cohort worldwide, the restricted sample size in the subgroup analyses is one of the limitations we face here because the small number of events limits the prediction accuracy of the genomic profile achieved with the SNPs. This is even clearer when patients were further stratified as LR-NMIBC. Although increasing sample size of the study would be desirable, heterogeneity across studies regarding patient recruitment, pathological classifications applied, and treatment or patient management would increase random misclassification and, therefore, would dilute estimates. While we conducted a genome-wide exploration, the models did not include all genotyped SNPs (1 million) but a subset that were filtered by a restrict LD. When we applied a less restrictive LD threshold (r2 < 0.8) and considered a larger number of common SNPs neither the classification performance nor the percentage of the phenotypic variance explained improved (results not shown). Including in the models both rare and structural variants may help in further characterizing and increase the precision of the predictive estimates. Application of other statistical modeling approaches could indeed yield improvements in the predictive power, for example by considering non-additive models that include epistatic interactions between SNPs or adding functional information in the model. Exploring the integration of other –omics data such as microRNAs, as well as considering possible interactions between treatment and variants could also help in this regard.

This study also presents several strengths as its population-based nature, detailed medical information, long follow-up, and centralized pathological review decreasing heterogeneity of the covariates stage and grade. The use of state-of-the art methodology applied here allowed to handle a highly dimensional problem and time-to-event data, as well as censoring. The application of such methodology allowed us to provide the first estimates of heritability for UBC outcomes.

Conclusions

Here we provide the scientific community, for the first time, with a methodology to estimate the heritability and the prediction ability of multidimensional data in the prognosis field. By applying it to the UBC setting, we observed that the role of common SNPs is very limited in the prediction of risk of recurrence and progression in NMIBC. Future studies should explore whether the integration of other genetic variants, as well as their interaction among them and with treatment, contribute to build a more accurate predictive model allowing the final assessment of the translational potential of genetic inherited variants into the clinics.

Acknowledgements

We acknowledge the coordinators, field and administrative workers, technicians and study participants of the Spanish Bladder Cancer/EPICURO study.

Spanish Bladder Cancer (SBC)/EPICURO Study investigators: Institut Municipal d’Investigació Mèdica, Universitat Pompeu Fabra, Barcelona – Coordinating Center (M. Kogevinas, N. Malats, F.X. Real, M. Sala, G. Castaño, M. Torà, D. Puente, C. Villanueva, C. Murta-Nascimento, J. Fortuny, E. López, S. Hernández, R. Jaramillo, G. Vellalta, L. Palencia, F. Fermández, A. Amorós, A. Alfaro, G. Carretero); Hospital del Mar, Universitat Autònoma de Barcelona, Barcelona (J. Lloreta, S. Serrano, L. Ferrer, A. Gelabert, J. Carles, O. Bielsa, K. Villadiego), Hospital Germans Trias i Pujol, Badalona, Barcelona (L. Cecchini, J.M. Saladié, L. Ibarz); Hospital de Sant Boi, Sant Boi de Llobregat, Barcelona (M. Céspedes); Consorci Hospitalari Parc Taulí, Sabadell (C. Serra, D. García, J. Pujadas, R. Hernando, A. Cabezuelo, C. Abad, A. Prera, J. Prat); Centre Hospitalari i Cardiològic, Manresa, Barcelona (M. Domènech, J. Badal, J. Malet); Hospital Universitario de Canarias, La Laguna, Tenerife (R. García-Closas, J. Rodríguez de Vera, A.I. Martín); Hospital Universitario Nuestra Señora de la Candelaria, Tenerife (J. Taño, F. Cáceres); Hospital General Universitario de Elche, Universidad Miguel Hernández, Elche, Alicante (A. Carrato, F. García-López, M. Ull, A. Teruel, E. Andrada, A. Bustos, A. Castillejo, J.L. Soto); Universidad de Oviedo, Oviedo, Asturias (A. Tardón); Hospital San Agustín, Avilés, Asturias (J.L. Guate, J.M. Lanzas, J. Velasco); Hospital Central Covadonga, Oviedo, Asturias (J.M. Fernández, J.J. Rodríguez, A. Herrero), Hospital Central General, Oviedo, Asturias (R. Abascal, C. Manzano, T. Miralles); Hospital de Cabueñes, Gijón, Asturias (M. Rivas, M. Arguelles); Hospital de Jove, Gijón, Asturias (M. Díaz, J. Sánchez, O. González); Hospital de Cruz Roja, Gijón, Asturias (A. Mateos, V. Frade); Hospital Alvarez-Buylla (Mieres, Asturias): P. Muntañola, C. Pravia; Hospital Jarrio, Coaña, Asturias (A.M. Huescar, F. Huergo); Hospital Carmen y Severo Ochoa, Cangas, Asturias (J. Mosquera).

Funding

The work was partially supported by Red Temática de Investigación Cooperativa en Cáncer (#RD12/0036/0050), Fondo de Investigaciones Sanitarias (FIS), Instituto de Salud Carlos III, (Grant numbers #PI00–0745, #PI05–1436, and #PI06–1614), and Asociación Española Contra el Cáncer (AECC), Spain; the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, USA (Contract NCI NO2-CP-11015); and EU-FP7-HEALTH-F2–2008–201663-UROMOL and EU-7FP-HEALTH-TransBioBC #601933. ELM was funded by a Sara Borrell fellowship, Instituto de Salud Carlos III, Spain; and AML by a fellowship of the European Urological Scholarship Program for Research (EUSP Scholarship S-01–2013).

Availability of data and materials

Data is available upon collaborative research. Please contact the corresponding author.

Authors’ contributions

Conceived and designed the experiments: ELdM, MEG, NM. Performed the experiments: NR, MK, SJC, AT, MGC, AC, DTS, FXR, NM. Analyzed the data: ELdM, ACP, AML. Contributed reagents/materials/analysis tools: MGC, AGN, FXR, NM, MM. Wrote the paper: ELdM, MEG, NM. Lead the statistical analysis: ELdM. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Written informed consent was obtained from study participants in accordance with the Institutional Review Board of the U.S. National Cancer Institute and the Ethics Committees of each participating hospitals (Appendix 1).

Abbreviations

AUC-ROC

area under the receiving operating curve

CiS

carcinoma in situ

GWAS

genome wide association study

HiR

high risk

LASSO

least absolute shrinkage and selection operator

LD

linkage disequilibrium

LR

low risk

MIBC

muscle invasive bladder cancer

NMIBC

non-muscle invasive bladder cancer

SBC

Spanish Bladder Cancer

SNP

single nucleotide polymorphism

TFR

time-to-first-recurrence

TP

time-to-progression

TURB

transurethral resection of the bladder

UBC

urothelial bladder cancer

WHO

World health organization

Appendix 1. Participating centers in the study

U.S. National Cancer Institute (NCI)

Institut Municipal d’Investigació Mèdica and Hospital del Mar

Centro Nacional de Investigaciones Oncológicas (CNIO)

Hospital Germans Tries i Pujol (Badalona, Barcelona)

Hospital de Sant Boi (Sant Boi, Barcelona)

Centre Hospitalari Parc Taulí (Sabadell, Barcelona)

Centre Hospitalari i Cardiològic (Manresa, Barcelona)

Hospital Universitario (La Laguna, Tenerife)

Hospital La Candelaria (Santa Cruz, Tenerife)

Hospital General Universitario de Elche

Universidad Miguel Hernández (Elche, Alicante)

Universidad de Oviedo (Oviedo, Asturias)

Hospital San Agustín (Avilés, Asturias)

Hospital Central Covadonga (Oviedo, Asturias)

Hospital Central General (Oviedo, Asturias)

Hospital de Cabueñes (Gijón, Asturias)

Hospital de Jove (Gijón, Asturias)

Hospital de Cruz Roja (Gijón, Asturias)

Hospital Alvarez-Buylla (Mieres, Asturias)

Hospital Jarrio (Coaña, Asturias)

Hospital Carmen y Severo Ochoa (Cangas, Asturias)

Additional files

Additional file 1: (113KB, doc)

Table S1. Clinico-pathological variables included in the predictive models for time to first recurrence (TFR) and time to progression (TP). Table S2. Summary of censored patients and events (%) for each event in each time interval defined for the statistical analyses. Table S3. Area under the ROC curve (AUC) and coefficient of determination (R 2probit) obtained for each testing set in the 10 fold-crossvalidation analyses of time to first recurrence. Table S4. Area under the ROC curve (AUC) and coefficient of determination (R 2probit) obtained for each testing set in the 10 fold-crossvalidation analyses of time to progression. Table S5. Area under the ROC curve (AUC) and coefficient of determination (R 2probit) obtained for each testing set in the 2 fold-crossvalidation analyses of time to progression in patients at high risk. Table S6. Area under the ROC curve (AUC) and coefficient of determination (R 2probit) obtained for each testing set in the 2 fold-crossvalidation analyses of time to progression in patients at low risk. Table S7. Coefficient of determination (R 2probit) obtained for each testing set in the 10 fold-crossvalidation analyses of time to first recurrence (TFR), time to progression (TP) in the whole cohort, and time to progression (TP) in the high and low risk cohorts (TPHiR and TPLR). (DOC 113 kb)

Additional file 2: (55.5KB, doc)

Supplemental Methods. Model including non-genetic variables. (DOC 55 kb)

References

  • 1.Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010;127:2893–2917. doi: 10.1002/ijc.25516. [DOI] [PubMed] [Google Scholar]
  • 2.Sievert KD, Amend B, Nagele U, Schilling D, Bedke J, Horstmann M, Hennenlotter J, Kruck S, Stenzl A. Economic aspects of bladder cancer: What are the benefits and costs? World J Urol. 2009;27:295–300. doi: 10.1007/s00345-009-0395-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sylvester RJ, Van Der Meijden APM, Oosterlinck W, Witjes JA, Bouffioux C, Denis L, Newling DWW, Kurth K. Predicting recurrence and progression in individual patients with stage Ta T1 bladder cancer using EORTC risk tables: a combined analysis of 2596 patients from seven EORTC trials. Eur Urol. 2006;49:466–475. doi: 10.1016/j.eururo.2005.12.031. [DOI] [PubMed] [Google Scholar]
  • 4.Fernandez-Gomez J, Madero R, Solsona E, Unda M, Martinez-Piñeiro L, Gonzalez M, Portillo J, Ojea A, Pertusa C, Rodriguez-Molina J, Camacho JE, Rabadan M, Astobieta A, Montesinos M, Isorna S, Muntañola P, Gimeno A, Blas M, Martinez-Piñeiro JA. Predicting nonmuscle invasive bladder cancer recurrence and progression in patients treated with bacillus Calmette-Guerin: the CUETO scoring model. J Urol. 2009;182:2195–2203. doi: 10.1016/j.juro.2009.07.016. [DOI] [PubMed] [Google Scholar]
  • 5.Sylvester RJ. How well can you actually predict which non-muscle-invasive bladder cancer patients will progress? Eur Urol. 2011;60:431–433. doi: 10.1016/j.eururo.2011.06.001. [DOI] [PubMed] [Google Scholar]
  • 6.Thomas F, Rosario DJ, Rubin N, Goepel JR, Abbod MF, Catto JWF. The long-term outcome of treated high-risk nonmuscle-invasive bladder cancer: time to change treatment paradigm? Cancer. 2012;118:5525–34. doi: 10.1002/cncr.27587. [DOI] [PubMed] [Google Scholar]
  • 7.Grotenhuis AJ, Dudek AM, Verhaegh GW, Witjes JA, Aben KK, van der Marel SL, Vermeulen SH, Kiemeney LA. Prognostic relevance of urinary bladder cancer susceptibility loci. PLoS One. 2014;9:e89164. doi: 10.1371/journal.pone.0089164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chen M, Hildebrandt MAT, Clague J, Kamat AM, Picornell A, Chang J, Zhang X, Izzo J, Yang H, Lin J, Gu J, Chanock S, Kogevinas M, Rothman N, Silverman DT, Garcia-Closas M, Barton Grossman H, Dinney CP, Malats N, Wu X. Genetic variations in the sonic hedgehog pathway affect clinical outcomes in non-muscle-invasive bladder cancer. Cancer Prev Res. 2010;3:1235–1245. doi: 10.1158/1940-6207.CAPR-10-0035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. Common {SNPs} explain a large proportion of the heritability for human height. Nat Gen. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G. Beyond missing heritability: prediction of complex traits. PLoS Genet. 2011;7:e1002051. doi: 10.1371/journal.pgen.1002051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Picornell AC. Genomewide pronostic study in bladder cancer. 2013. [Google Scholar]
  • 12.Liaw A, Wiener M. Package “randomForest.”. 2015. [Google Scholar]
  • 13.Albert JH, Chib S. Sequential ordinal modeling with applications to survival data. Biometrics. 2001;57:829–36. doi: 10.1111/j.0006-341X.2001.00829.x. [DOI] [PubMed] [Google Scholar]
  • 14.Visscher PM, Goddard ME. Genetic parameters for milk yield, survival, workability, and type traits for Australian dairy cattle. J Dairy Sci. 1995;78:205–220. doi: 10.3168/jds.S0022-0302(95)76630-9. [DOI] [PubMed] [Google Scholar]
  • 15.Gonzalez-Recio O, Alenda R. Genetic relationship of discrete-time survival with fertility and production in dairy cattle using bivariate models. Genet Evol. 2007;39(0999-193X (Print):391–404. doi: 10.1186/1297-9686-39-4-391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gianola D, Sorensen D. Quantitative genetic models for describing simultaneous and recursive relationships between phenotypes. Genetics. 2004;167:1407–24. doi: 10.1534/genetics.103.025734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Park T, Casella G. The Bayesian lasso. J Am Stat Assoc. 2008;103:681–686. doi: 10.1198/016214508000000337. [DOI] [Google Scholar]
  • 18.De Los CG, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, Weigel K, Cotes JM. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics. 2009;182:375–385. doi: 10.1534/genetics.109.101501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.de Maturana EL, Chanok SJ, Picornell AC, Rothman N, Herranz J, Calle ML, García-Closas M, Marenne G, Brand A, Tardón A, Carrato A, Silverman DT, Kogevinas M, Gianola D, Real FX, Malats N. Whole genome prediction of bladder cancer risk with the Bayesian LASSO. Genet Epidemiol. 2014;38:467–76. doi: 10.1002/gepi.21809. [DOI] [PubMed] [Google Scholar]
  • 20.Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. 2009. [Google Scholar]
  • 21.Lee SH, Goddard ME, Wray NR, Visscher PM. A better coefficient of determination for genetic profile analysis. Genet Epidemiol. 2012;36:214–224. doi: 10.1002/gepi.21614. [DOI] [PubMed] [Google Scholar]
  • 22.Di Martino E, Tomlinson DC, Knowles MA. A decade of FGF receptor research in bladder cancer: past, present, and future challenges. Adv Urol. 2012;2012:429213. doi: 10.1155/2012/429213. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Karaoglu I, van der Heijden AG, Witjes JA. The role of urine markers, white light cystoscopy and fluorescence cystoscopy in recurrence, progression and follow-up of non-muscle invasive bladder cancer. World J Urol. 2014;32:651–659. doi: 10.1007/s00345-013-1035-1. [DOI] [PubMed] [Google Scholar]
  • 24.Hernández V, De La Peña E, Martin MD, Blázquez C, Diaz FJ, Llorente C. External validation and applicability of the EORTC risk tables for non-muscle-invasive bladder cancer. World J Urol. 2011;29:409–14. doi: 10.1007/s00345-010-0635-2. [DOI] [PubMed] [Google Scholar]
  • 25.Vedder MM, Márquez M, de Bekker-Grob EW, Calle ML, Dyrskjøt L, Kogevinas M, Segersten U, Malmström P-U, Algaba F, Beukers W, Ørntoft TF, Zwarthoff E, Real FX, Malats N, Steyerberg EW. Risk prediction scores for recurrence and progression of non-muscle invasive bladder cancer: an international validation in primary tumours. PLoS One. 2014;9:e96849. doi: 10.1371/journal.pone.0096849. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Stenzl A, Cowan NC, De Santis M, Kuczyk MA, Merseburger AS, Ribal MJ, Sherif A, Witjes JA. Treatment of muscle-invasive and metastatic bladder cancer: update of the EAU guidelines. Eur Urol. 2011;59:1009–1018. doi: 10.1016/j.eururo.2011.03.023. [DOI] [PubMed] [Google Scholar]
  • 27.Babjuk M, Oosterlinck W, Sylvester R, Kaasinen E, Böhle A, Palou-Redorta J, Rouprêt M. EAU guidelines on non-muscle-invasive urothelial carcinoma of the bladder, the 2011 update. Eur Urol. 2011;59:997–1008. [DOI] [PubMed]
  • 28.de Los CG, Sorensen D, Gianola D. Genomic heritability: what is it? PLoS Genet. 2015;11:e1005048. doi: 10.1371/journal.pgen.1005048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Van Zitteren M, Van Der Net JB, Kundu S, Freedman AN, Van Duijn CM, Janssens ACJW. Genome-based prediction of breast cancer risk in the general population: a modeling study based on meta-analyses of genetic associations. Cancer Epidemiol Biomarkers Prev. 2011;20:9–22. doi: 10.1158/1055-9965.EPI-10-0329. [DOI] [PubMed] [Google Scholar]
  • 30.Wacholder S, Hartge P, Prentice R, Garcia-Closas M, Feigelson HS, Diver WR, Thun MJ, Cox DG, Hankinson SE, Kraft P, Rosner B, Berg CD, Brinton L a, Lissowska J, Sherman ME, Chlebowski R, Kooperberg C, Jackson RD, Buckman DW, Hui P, Pfeiffer R, Jacobs KB, Thomas GD, Hoover RN, Gail MH, Chanock SJ, Hunter DJ. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010;362:986–93. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data is available upon collaborative research. Please contact the corresponding author.


Articles from BMC Cancer are provided here courtesy of BMC

RESOURCES