Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Aug 1.
Published in final edited form as: J Biomed Inform. 2015 Jun 3;56:220–228. doi: 10.1016/j.jbi.2015.05.019

Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer

Dokyoon Kim 1, Ruowang Li 1, Scott M Dudek 1, Marylyn D Ritchie 1,*
PMCID: PMC4550096  NIHMSID: NIHMS705443  PMID: 26048077

Abstract

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.

Keywords: Survival prediction, Data integration, Interaction between multi-omics data, TCGA, Breast cancer

Graphical Abstract

graphic file with name nihms-705443-f0001.jpg

Introduction

Translational bioinformatics is one of the most prominent fields that efficiently translate genomic and biomedical data into clinical knowledge for application [3, 4, 41]. In particular, translational bioinformatics has been playing important roles in cancer research due to the tumour heterogeneity [4]. For example, recent standard-of-care for breast cancer or non-small cell lung cancer includes quantitating panels of gene expression such as Oncotype DX, developed by Genomic Health, or sequencing of genes such as EGFR, respectively, in order to provide therapeutic knowledge for new subtypes of cancer patients [4]. One of the most exciting problems in translational bioinformatics is to predict clinical outcomes using molecular datasets such as somatic mutation, copy number or gene expression data for better diagnostics, prognostics, and further therapeutics [3]. Among problems of predicting clinical outcomes, there is an increasing difficulty in predicting prognosis and therapeutic response prediction [31].

Evaluating survival models is one of the most important attentions in the development of cancer prognostic models, especially based on genomic profiles. One of the common approaches is that patients can be divided into two groups, such as high-risk survival and low-risk survival group, according to a survival-time threshold, and then a binary classification algorithm can be applied to predict the survival group for each individual patient in a test dataset [24, 26, 27, 52, 57]. This approach has an advantage of providing natural performance metrics from two by two contingency tables, along with positive and negative predictive values, to enable unambiguous assessments for survival prediction. However, this approach has a few limitations for predicting survival in cancer. First, it is not easy to take the censored survival information into consideration when building a model. In addition, the performance of binary classification depends on the threshold selected based on patient's survival information, which was used to define the two survival groups [14]. Alternatively, many studies have been using Cox proportional hazards models for cancer prognosis [10]. However, the final model from Cox regression approaches is an additive model. Thus, it is difficult to capture non-linear interactions between genomic features, which might have important roles associated with survival [16]. Even though many studies have shown an association between gene expression data and patient survival using Cox regression approaches [2, 15, 53], gene expression as a single dimensional genomic data type may not be enough to fully predict survival because cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome [17].

Many clinical data and meta-dimensional omics data have been generated from large-scale initiatives such as the International Cancer Genome Consortium (ICGC) or The Cancer Genome Atlas (TCGA). The explosion of these unprecedented dataset has provided many opportunities to examine the complex genetic architecture of several cancers and improve the diagnosis, treatment, and ultimately prevention of cancer [21, 35, 45-47]. Despite these efforts, it is crucial to develop a novel data integration method to better predict cancer clinical outcome, further exploring a global view on the interactions within/between meta-dimensional genomic data [23, 24, 27, 28, 39, 44, 56].

Previously, we proposed many methodological frameworks that predict clinical outcomes by integrating multi-omics data [23, 24, 27, 28]. However, these binary classification approaches have difficulties to directly predict survival data due to the problems of setting threshold and the characteristics of censored observations. In the present study, we propose a novel framework designed to perform three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival outcome. In order to demonstrate the utility of the proposed framework, we applied the framework on a simulation dataset followed by the breast cancer data from TCGA. Breast cancer is an extremely heterogeneous disease [22]. High degree of diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression [36]. In addition, most breast cancer studies based on molecular data have mainly focused on one- or two-dimensions of genomic data, mostly copy number alteration or gene expression profiles [12, 42, 43]. Thus, identifying interactions within/between meta-dimensional omics data associated with survival outcome in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.

Materials and Methods

Data

Normalized and preprocessed multi-omics datasets in breast cancer were downloaded from TCGA data matrix (http://tcga-data.nci.nih.gov/tcga/) and cBio Cancer Genomics Portal (http://www.cbioportal.org/public-portal/) (Table 1). Four different genomic data types were used for this study to represent each dimension of genomic data; CNA as genome dimension, methylation as epigenome dimension, gene expression as transcriptome dimension, and protein data as proteome dimension. Each genomic dataset was retrieved as a gene-based feature in order to better interpret the results. CNA data was obtained from the cBio Portal in order to retrieve the significantly altered copy number regions across a set of cancer patients using the GISTIC method [7]. For CNA data, 473 genes with log2 copy number value were extracted from 62 significant altered regions. DNA methylation data was also retrieved as a gene-level feature from the TCGA data matrix by choosing the least correlated with gene expression when genes were mapped with multiple methylation probes, from 485,577 methylation probes to 19,943 genes. The beta-value of human methylation 450 BeadChip was used for the elements of methylation data. Gene expression data from RNA-seq consisted of 20,502 unique gene symbols with upper quartile normalized RSEM count estimates [30]. Protein or phosphoprotein levels measured by the reverse phase protein array (RPPA) were retrieved from the cBio Portal [50]. Protein data contains 131 proteins after removing 11 proteins due to the missing data. Patients that have overlap among four types of omics data with available survival and age information, 476 patients, were used for this study.

Table 1.

TCGA breast cancer data types used for meta-dimensional analysis

Data type Platform # Features
CNA Affymetrix SNP 6 473 genes
Methylation Infinium humanmethylation450 BeadChip 19,943 genes
Gene expression Illumina GA RNA-seq 20,502 genes
Protein expression Reverse phase protein array (RPPA) 131 proteins

Analysis Tool for Heritable and Environmental Network Associations (ATHENA)

ATHENA was developed to uncover the meta-dimensional models that examine the genetic etiology of complex diseases such as cancer. Thus, ATHENA provides three key functions: (1) performing feature selection from categorical or continuous independent variables; (2) modeling single variable and/or interaction effects to predict categorical or continuous clinical outcomes; (3) annotating the candidate models for the interpretation in translational bioinformatics [19, 24, 51]. ATHENA contains several subcomponents: preprocessing, modeling, and an evolutionary-algorithm based machine learning technique at its core (Fig. 1). The current implementation of ATHENA contains two different evolutionary-algorithm modeling methods, which are Grammatical Evolution Neural Networks (GENN) and Grammatical Evolution Symbolic Regression (GESR). We have extended ATHENA to perform integrative analysis using meta-dimensional omics data to identify models that underlie the multi-layered architecture of cancer. A schematic overview of the ATHENA was shown in Figure 1. ATHENA can simultaneously analyze meta-dimensional genomic data such as CNA, methylation, gene expression, and protein expression data to build the meta-dimensional models of complex disease. For the further analysis, we used GENN as the modeling component.

Figure 1.

Figure 1

Overview of ATHENA. ATHENA contains preprocessing and modeling components. In particular, the preprocessing components contain the procedures for converting martingale residuals as a new outcome and feature screening using Cox regression for predicting survival data. Meta-dimensional genomic data and clinical data can be the input for the meta-dimensional models associated with clinical outcomes of interest.

Grammatical Evolution Neural Networks (GENN)

Even though many computational methods such as multifactor dimensionality reduction (MDR) have been proposed to discover interactions between genomic features [9, 38], many of them search exhaustively every possible combination of genomic features to build interaction models. In particular, when we integrate meta-dimensional genomic data, the search space volume grows exponentially with the number of integrated multi-omic features. Therefore, evolutionary algorithm based stochastic methods such as GENN have been proposed and applied [33, 40, 51]. Grammatical evolution is an evolutionary search algorithm and a flexible type of genetic programming because it also evolves functional solution through adaptations of the grammar rules (Fig. 1) [33]. GENN basically uses artificial neural networks and simultaneously optimizes the input features, weights, and network structures. In the previous study, the GENN algorithm and the grammar rules were described in detail [33]. The brief GENN algorithm is as follows:

  • Step 1: The input data is divided into five equal parts for 5-fold cross validations with 4/5 for training data and 1/5 for testing data.

  • Step 2: Under population size constraint, a random population of binary strings is generated initialized to be ANNs based on grammar rules. For parallelization, the population is divided into several demes across a user-specified number of CPUs.

  • Step 3: All ANNs in the population are evaluated using training data, and the solutions with the highest prediction accuracy (fitness score) are selected for crossover and reproduction. After that, the new population is generated.

  • Step 4: Step 3 is repeated for a pre-defined number of generations by the user. In addition, migration of best solutions occurs between CPUs during evolution for specified intervals.

  • Step 5: The overall best solution from the final generation is tested on the test dataset and the fitness score for the test dataset is recorded.

  • Step 6: Steps 2-5 are repeated each times with different set of training and testing data.

Survival fitness function

One of the main problems for the present study is to predict patient's censored survival data using their meta-dimensional omics data. Due to the censored observations, a general measure of goodness-of-fit such as R2, which is based on the residuals sum of squares for linear regression, is not suitable for directly predicting raw survival data based on genomic profiles. Therefore, to predict censored survival time, a proper measure of goodness-of-fit should be needed. The definition of the martingale residual is the difference between the cumulative hazard, which reflects the number of expected death events per patient i with failure time ti, and its observed status, δi = 0 censored, δi = 1 death event [49]. Since the Cox-model does not have upper limit, martingale residuals have a reversed exponential distribution between negative infinity and 1. Nevertheless, the summation of all martingale residuals from patients is always zero. Patients who die quicker than expected have positive martingale residuals as a bad prognosis, whereas patients who live longer than expected have negative martingale residuals as a good prognosis. Each patient's martingale residual can be calculated from the reduced model without any genomic effects from CNA, methylation, gene, or protein expression, respectively. Since martingale residuals are able to reflect the unexplained portion beyond what is explained by the adjusted clinical covariates excluding the genomic effects, martingale residuals could be used as a new continuous outcome [49]. Martingale residuals can be calculated from the fitted Cox model as

Mi=δiΛ(ti) (1)

where Λ is a cumulative hazard function [49]. In addition, as one of the advantages of martingale residuals as a new outcome, martingale residuals could be adjusted for the potential confounders such as age or gender. As a proof of concept, we adjusted age for the model using survival R package.

After calculating martingale residuals, a new fitness function for GENN was needed because the previous fitness function for predicting continuous outcomes in GENN, R2, was not suitable for predicting martingale residuals. Since the martingale residuals have an exponential distribution between negative infinity and 1, the basic assumption of the application of R2, which should have normally distributed residuals, is not satisfied. Thus, a new fitness function for GENN was proposed to measure the mean absolute differences (MAD) between observed martingale residuals (Mi) and predicted martingale residuals (Mi|x) calculated by GENN with genomic covariate vector x [32]. The new fitness function used by GENN is shown below:

MAD=iMiMixiMi (2)
Fitnessfunction=1MAD (3)

The range of outputs of the new fitness function is between 0 and 1. The final GENN model with 1 of fitness score refers to the best predictive model while the one with 0 of fitness score represents the worst predictive model.

Experiment setup

The overview of the pipeline for this study was described in Figure 2. First, we randomly divided the input dataset into two groups with 4/5 dataset for learning step and 1/5 dataset for the validation step. The validation step is independent of the cross-validation (CV) procedure since the 5-fold CV performed using only the learning dataset. Second, feature screening was conducted using Cox regression because GENN has been shown better performances compared to other methods when the noise is reduced [18]. Third, after reducing search spaces, we ran GENN for learning predictive models using five-fold CV. Fourth, all features appeared in at least two of top models from 5-fold CV were selected for the next step. After that, we reran GENN with selected features to build the final model using entire training dataset. Then, the final GENN model generated from training dataset can be tested to predict survival using the independent validation dataset. The validation dataset was not used for the feature screening and entire learning step in order to prevent over-fitting. The parameters for running GENN can be found in Table 2.

Figure 2.

Figure 2

Schematic overview of the pipeline for the analysis. (1) Splitting the input dataset into two groups as entire training dataset and validation dataset (2) Feature screening using Cox regression (3) Running GENN using 5 fold cross validation (4) Rerunning GENN based on selected features using entire training data to generate the final model (5) Predicting accuracy for the validation dataset using the final model. S represents samples and F means features.

Table 2.

GENN parameter settings

Parameter Value
Number of demes (CPUs) 20
Population size/ Deme 5,000
Number of generations 1,000
Number of migrations 20
Probability of crossover 0.9
Probability of mutation 0.01
Fitness function 1 – MAD

In order to demonstrate the validity of our approach, we simulated data consisting of gene expression and survival data using survJamda R package [55]. Then, breast cancer data from TCGA were analyzed to identify interactions between meta-dimensional genomic data associated with survival.

Results and Discussion

Simulation study

To demonstrate the validity of our approach, a simulation study was conducted. Four different simulation datasets containing two functional genes (Gene1, Gene2) in 500 samples were generated with a different total number of genes and an initial beta for the Cox model. The details for simulating dataset using survJamda have been previously described [55]. Simulation 1 and simulation 2 datasets with an initial beta of 0.5, which correspond to an intermediate main effect, consisted of 100 and 1,000 genes, respectively. Simulation 3 and simulation 4 datasets were generated with an initial beta of 3, meaning a strong main effect for two functional genes. They contained 100 and 1,000 genes, respectively. After calculating martingale residuals as a new outcome, we ran GENN with same parameter sets described in Table 2 for four different simulation datasets independently. Except for two models from the simulation 2 datasets, martingale residuals as a new continuous outcome performed well in terms of finding the two true functional genes, Gene1 and Gene2 (Table 3). In addition, the new fitness function for GENN could be suitable as a measure for selecting a good model containing true factors associated with survival. The fitness score increased when main effect is stronger (Table 3).

Table 3.

GENN results from simulation dataset

Simulation dataset CV Best model variables Fitness function (1 – MAD)
Simulation 1 cv1 Gene1, Gene2 0.77
cv2 Gene1, Gene2 0.79
cv3 Gene1, Gene2 0.72
cv4 Gene1, Gene2 0.81
cv5 Gene1, Gene2 0.72

Simulation 2 cv1 Gene1, Gene2 0.81
cv2 Gene1, Gene2 0.79
cv3 Gene1, Gene401 0.76
cv4 Gene1, Gene2 0.78
cv5 Gene2, Gene465, Gene657 0.77

Simulation 3 cv1 Gene1, Gene2 0.88
cv2 Gene1, Gene2 0.85
cv3 Gene1, Gene2 0.86
cv4 Gene1, Gene2 0.84
cv5 Gene1, Gene2 0.82

Simulation 4 cv1 Gene1, Gene2 0.91
cv2 Gene1, Gene2 0.88
cv3 Gene1, Gene2 0.88
cv4 Gene1, Gene2 0.90
cv5 Gene1, Gene2 0.81

GENN modeling for single dimensional genomic data in breast cancer

In order to reduce the search space for better identifying interactions associated with survival, the feature screening was conducted using Cox regression for each genomic data separately. After Cox regression was ran for each feature from each genomic data, q-values from false discovery rate (FDR) were used for selection cutoff. Different thresholds of FDR for the feature screening was used for each genomic data since features from CNA and protein data did not remained with the threshold used for methylation and gene expression dataset, which was FDR < 0.05. Thus, we used FDR < 0.25 criteria for CNA and protein dataset. Remaining features from methylation and gene expression data were 44 and 49 features, respectively. Nineteen CNA features and 31 protein features were analyzed for identifying interactions within/between different dimensional genomic data.

After feature screening, GENN with parameter sets describe in Table 2 was performed for each genomic dataset using 5-fold cross validations. Based on the best model and variable consistencies among the 5-fold CV, GENN with selected features was rerun using entire training dataset in order to generate the final predictive model (Fig. 2).

The final GENN model is the optimized neural network to build combination/interaction models between genomic features that predict survival outcome. The best GENN models from each dimensional genomic dataset are shown in Figure 3. The best GENN models from each dimension of genomic data showed different network structures, indicating non-linear interactions between genomic features within a single dimension of genomic data (Fig. 3). To predict survival of 95 patients from the independent validation dataset, each final GENN model was tested. The fitness scores for each of the best models from CNA, methylation, gene expression, and protein data were 0.63, 0.63, 0.69, and 0.64, respectively (Fig 3 and Table 4). Gene expression showed the best predictive power in terms of predicting survival of 94 patients among four different individual dimensions of genomic data.

Figure 3.

Figure 3

Best GENN models from each genomic dataset. (a) CNA (b) Methylation (c) Gene expression (d) Protein dataset. PADD, PDIV, and PSUB are an addition, division, and subtraction activation node, respectively. Genomic features from each genomic data are shown in the gray boxes and constants in the white boxes are weights.

Table 4.

Performance comparison between the model from single dimensional genomic data and integration model. Performance was measured from the validation dataset.

Data type 1 – MAD
CNA 0.63
Methylation 0.63
Gene expression 0.69
Protein expression 0.64
Integration 0.73

Integration with meta-dimensional genomic data

To identify interactions between different meta-dimensional genomic data associated with survival in breast cancer, we combined genome, epigenome, transcriptome, and proteome dimensional data. Given the variables from the best models of each single dimension of genomic data, the final meta-dimensional model was generated. The final meta-dimensional model was also tested to predict survival using the independent validation dataset. The meta-dimensional model showed the best predictive power with 0.73 of fitness score among other individual genomic data in terms of predictive power (Table 4). This suggests that interactions between different dimensions of genomic data might play important roles for survival. The final meta-dimensional model was obtained from GENN with variables from all four omic dimensions included (Fig. 4). The selected features from the final meta-dimensional model are KLLN, LOC728024, and BAGE5 from CNA data, PNKP from methylation data, EXOC1 and KRT12 from gene expression, and ERBB2 from protein data (Fig. 4).

Figure 4.

Figure 4

Meta-dimensional model with genomic features from CNA, methylation, gene expression, and protein expression data. Red, green, blue, and yellow boxes represent CNA, methylation, gene expression, and protein features, respectively. PADD, PDIV, and PSUB are an addition, division, and subtraction activation node, respectively.

Biological implication

Interestingly, gene expression model outperformed other GENN models from a single dimension of genomic data for predicting survival. This result is consistent with the previous findings that genomic features measured at the transcriptome dimension affect survival more directly than those measured at the genome or epigenome dimension [28, 56]. Notably, the top model selected from each individual dimension of genomic data as well as meta-dimensional genomic data model showed complex linear interactions between genomic features associated with survival outcome (Fig. 3). It suggests that it is important to identify these interactions between genomic features, which might have essential roles in prognosis of breast cancer; however, those interactions are not easily detected using the traditional Cox regression method. Among features from CNA model, KLLN gene is a well-known tumor suppressor gene that inhibits breast cancer growth and transcriptionally activates p53/p73-mediated apoptosis in breast cancer [54]. According to the CNA results, we found many homozygous deletion of KLLN gene among patients, which could lead to poor prognosis in breast cancer [34]. For the methylation model, polynucleotide kinase phosphatase (PNKP) is a gene involved in DNA repair. Dysregulation of DNA repair system and its signaling to checkpoints of cell cycle is strongly associated with the predisposition to several cancers and affects responses to DNA-damaging anticancer therapy [11]. Tubulin alpha 1c, TUBA1C, was associated with chemotherapy resistance in breast cancer cell line [48]. EXOC1from the gene expression model is a component of the exocyst complex and a multiple protein complex including this gene is vital to target exocytic vesicles to specific docking sites on the plasma membrane. Among selected features from the final protein model, recent studies found that ATM, ataxia-telangiectasia mutated, is breast cancer susceptibility alleles [1]. In addition, ERBB2 is a well-known oncogene in breast cancer that encodes a member of the epidermal growth factor (EGF) receptor family of receptor tyrosine kinases [37]. ERBB2 gene amplification or its over-expression has been reported in several cancers, including ovarian and breast and cancers [29]. The protein encoded by MAPK14 is a member of the MAP kinase family, which play important roles for multiple biochemical signals. In addition, MAP kinases are involved in many cellular processes such as differentiation, proliferation, development, and transcription regulation, which are associated with numerous cancers [6].

In the final meta-dimensional model, 3 CNA (KLLN, LOC728024, and BAGE5), 2 gene expression (EXOC1 and KRT12), 1 methylation (PNKP), and 1 protein features (ERBB2) were selected as a meta-dimensional model associated with survival in breast cancer (Fig. 4). It suggests that interactions between meta-dimensional omics data could account for a considerable level of survival prediction in breast cancer not detected by single dimensional genomic data alone (Table 4). In order to interpret selected genomic features, the integrated network was generated from the cBio portal in the context of biological interactions derived from public databases including NCI-Nature Pathway Interaction Database, Reactome, HPRD, Pathway Commons, and MSKCC Cancer Call Map [7] (Fig. 5). Notably, ERBB2, EXOC1, and PNKP were connected via either pathway or biological interaction knowledge (Fig. 5). This suggests that there might be possible synergistic interaction mechanisms between methylation of PNKP, expression of EXOC1, and ERBB2 protein expression associated with survival in breast cancer. In particular, ERRB2 played a crucial role as a hub in the network and was connected with other frequently altered neighbor genes from TCGA breast cancer dataset such as MYC and PTK2 as oncogenes in breast cancer [8, 13] (Fig. 5). Furthermore, ERRB2 is targeted by several cancer drugs including Lapatinib, Pertuzumab, and Trastuzumab, which were obtained from PiHelper database [7]. Even though three genomic features, ERBB2, EXOC1, and PNKP, were found in the integrated network, ERBB2 and KLLN could be synergistically interacting together via encoding a member of the EGF receptor family and activating p53-mediated apoptosis, respectively, further leading to poor prognosis in breast cancer [20]. As abnormal p53 is one of the most common oncogene abnormality in breast cancer, the p53 and EGFR could interact in the pathogenesis of breast cancer [20].

Figure 5.

Figure 5

Network view of selected features in the context of biological interactions. Selected genomic features with a black border as seed nodes were found with other frequently altered neighbor genes in the context of biological interactions derived from numerous public interaction databases. Each node in the network is color-coded with meta-dimensional genomic data derived from the TCGA breast cancer dataset via cBio Portal. Red color of each node represents the alteration frequency and dark red means higher frequency of alteration among patients. The interaction types defined from BioPax are shown as different colors.

Conclusions

In the present study, we addressed the issues of existing problems in the evaluation of survival models to predict prognosis for cancer patients. A binary classification approach has difficulties to directly predict survival data due to the characteristics of censored observations and its predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could act in key roles associated with cancer prognosis [16]. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models. Here we proposed a new integrative framework designed to perform the three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival.

In order to highlight the utility and validity of the proposed framework, a simulation study was conducted, followed by an analysis of breast cancer data with 476 patients from TCGA. In the simulated dataset, martingale residuals performed appropriately as a new continuous outcome in terms of finding known functional genes associated with survival using GENN (Table 3). This approach was also applied to renal cell carcinoma in the recent our study and shown to predict survival based on somatic mutation profiles [25]. Notably, R2-based fitness function did not work well to measure the predictive power for the martingale residuals (data not shown). This might be because the squared extreme residuals resulting from high values of the martingale residual due to its range from negative infinity to 1 increase the sum of squares to such extremes so that R2 could sometimes produce negative values [32]. Furthermore, the martingale residuals have an exponential distribution between negative infinity and 1, the basic assumption of the application of R2, which should have normally distributed residuals, is not fulfilled. Therefore, the MAD-based fitness function was suitable for GENN as appropriate goodness-of-fit criteria for martingale residuals. The proposed framework also has the advantage of adjusting clinical covariates such as sex, age, or stage as potential confounders. On the basis of the results from breast cancer dataset, we were able to identify interactions not only within single dimensional genomic data but between meta-dimensional omics data that are associated with survival. Remarkably, the meta-dimensional model showed the better predictive power compared to other models conducted based on a single dimension of genomic data. These results suggest that each dimension of genomic data is complementary to the prediction power of survival since cancer is the phenotypic end-point of multiple alterations from genome, epigenome, transcriptome, and proteome dimensions.

One of the limitations in the current study is that we adjusted only by age as a clinical covariate. It is because that there were many missing values for other clinical variables such as stage, grade, etc, which could increase power of predictive models if they had been included. Thus, we need to adjust for any other potential confounding factors related to breast cancer with a reasonable number of total samples, which contain a few missing values, as future work. In addition, the predictive power still seems relatively not strong enough. A better performance is not easily achieved due to the extreme heterogeneity of breast cancer. In the context of solving the heterogeneity problem, we mainly focus on identifying interactions between genomic features with main effects, which are from different dimensions of genomic data. However, there will be a possibility to miss the interactions between genomic features from different dimensions of genomic data, which were not selected from the feature selection because of small or marginal main effect within a single dimension of genomic data. It would also be important to look at the single and meta-dimensional models identified in the TCGA breast cancer data to see if the same or similar genes are predictive of survival in a completely independent dataset. Moreover, the proposed framework can be applied to predicting other censored clinical data such as recurrence or chemotherapy response. Recently, the Pan-Cancer project analyzes the molecular alterations of multi-omics data compares the across 12 different cancer types [5]. Identifying common/cancer-specific interactions associated survival across different types of cancer would be the one of promising future works. Due to the heterogeneity in breast cancer, identifying interactions within/between meta-dimensional omics data associated with survival will be valuable for providing guidance for improved meta-dimensional prognostic biomarkers and tailoring therapeutic strategies. ATHENA can be downloaded from http://ritchielab.psu.edu/ritchielab/software/.

Highlights.

  • We proposed a new integrative framework designed to perform these three functions:
    • (1)
      Predicting censored survival data
    • (2)
      Integrating meta-dimensional omics data
    • (3)
      Identifying interactions within/between multi-omic genomic features
  • Martingale residuals performed properly as a new continuous outcome.

Acknowledgments

This work was funded by NIH grant R01 LM010040 and NHLBI grant U01 HL065962. In addition, we gratefully acknowledge the TCGA Consortium and all its members for the TCGA Project initiative, for providing sample, tissues, data processing and making data and results available. The results published here are in whole or part based upon data generated by The Cancer Genome Atlas pilot project established by the NCI and NHGRI. Information about TCGA and the investigators and institutions that constitute the TCGA research network can be found at “http://cancergenome.nih.gov”.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest

We confirm that there are no known conflicts of interest.

References

  • 1.Ahmed M, Rahman N. ATM and breast cancer susceptibility. Oncogene. 2006;25:5906–5911. doi: 10.1038/sj.onc.1209873. [DOI] [PubMed] [Google Scholar]
  • 2.Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Jr., Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. [DOI] [PubMed] [Google Scholar]
  • 3.Butte AJ. Translational bioinformatics: coming of age. J Am Med Inform Assoc. 2008;15:709–714. doi: 10.1197/jamia.M2824. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Butte AJ, Ohno-Machado L. Making it personal: translational bioinformatics. J Am Med Inform Assoc. 2013;20:595–596. doi: 10.1136/amiajnl-2013-002028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.N. Cancer Genome Atlas Research. Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Carracedo A, Ma L, Teruya-Feldstein J, Rojo F, Salmena L, Alimonti A, Egia A, Sasaki AT, Thomas G, Kozma SC, Papa A, Nardella C, Cantley LC, Baselga J, Pandolfi PP. Inhibition of mTORC1 leads to MAPK pathway activation through a PI3K-dependent feedback loop in human cancer. J Clin Invest. 2008;118:3065–3074. doi: 10.1172/JCI34739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E, Antipin Y, Reva B, Goldberg AP, Sander C, Schultz N. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2:401–404. doi: 10.1158/2159-8290.CD-12-0095. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chin SF, Wang Y, Thorne NP, Teschendorff AE, Pinder SE, Vias M, Naderi A, Roberts I, Barbosa-Morais NL, Garcia MJ, Iyer NG, Kranjac T, Robertson JF, Aparicio S, Tavare S, Ellis I, Brenton JD, Caldas C. Using array-comparative genomic hybridization to define molecular portraits of primary breast cancers. Oncogene. 2007;26:1959–1970. doi: 10.1038/sj.onc.1209985. [DOI] [PubMed] [Google Scholar]
  • 9.Cordell HJ. Detecting gene-gene interactions that underlie human diseases. Nat Rev Genet. 2009;10:392–404. doi: 10.1038/nrg2579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Cox DR, Oakes D. Analysis of survival data. Chapman and Hall, London; New York: 1984. [Google Scholar]
  • 11.Curtin NJ. DNA repair dysregulation from cancer driver to therapeutic target. Nat Rev Cancer. 2012;12:801–817. doi: 10.1038/nrc3399. [DOI] [PubMed] [Google Scholar]
  • 12.Ding L, Ellis MJ, Li S, Larson DE, Chen K, Wallis JW, Harris CC, McLellan MD, Fulton RS, Fulton LL, Abbott RM, Hoog J, Dooling DJ, Koboldt DC, Schmidt H, Kalicki J, Zhang Q, Chen L, Lin L, Wendl MC, McMichael JF, Magrini VJ, Cook L, McGrath SD, Vickery TL, Appelbaum E, Deschryver K, Davies S, Guintoli T, Lin L, Crowder R, Tao Y, Snider JE, Smith SM, Dukes AF, Sanderson GE, Pohl CS, Delehaunty KD, Fronick CC, Pape KA, Reed JS, Robinson JS, Hodges JS, Schierding W, Dees ND, Shen D, Locke DP, Wiechert ME, Eldred JM, Peck JB, Oberkfell BJ, Lolofie JT, Du F, Hawkins AE, O'Laughlin MD, Bernard KE, Cunningham M, Elliott G, Mason MD, Thompson DM, Jr., Ivanovich JL, Goodfellow PJ, Perou CM, Weinstock GM, Aft R, Watson M, Ley TJ, Wilson RK, Mardis ER. Genome remodelling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. doi: 10.1038/nature08989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Dubik D, Dembinski TC, Shiu RP. Stimulation of c-myc oncogene expression associated with estrogen-induced proliferation of human breast cancer cells. Cancer Res. 1987;47:6517–6521. [PubMed] [Google Scholar]
  • 14.Dupuy A, Simon RM. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J Natl Cancer Inst. 2007;99:147–157. doi: 10.1093/jnci/djk018. [DOI] [PubMed] [Google Scholar]
  • 15.Gordon GJ, Jensen RV, Hsiao LL, Gullans SR, Blumenstock JE, Ramaswamy S, Richards WG, Sugarbaker DJ, Bueno R. Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 2002;62:4963–4967. [PubMed] [Google Scholar]
  • 16.Gui J, Moore JH, Kelsey KT, Marsit CJ, Karagas MR, Andrew AS. A novel survival multifactor dimensionality reduction method for detecting gene-gene interactions with application to bladder cancer prognosis. Hum Genet. 2011;129:101–110. doi: 10.1007/s00439-010-0905-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hanash S. Integrated global profiling of cancer. Nat Rev Cancer. 2004;4:638–644. doi: 10.1038/nrc1414. [DOI] [PubMed] [Google Scholar]
  • 18.Holzinger ER, Dudek SM, Frase AT, Fridley BL, Chalise P, Ritchie MD. Comparison of methods for meta-dimensional data analysis using in silico and biological data set, EvoBIO 2012. LNCS. 2012;7246:134–143. [Google Scholar]
  • 19.Holzinger ER, Dudek SM, Frase AT, Pendergrass SA, Ritchie MD. ATHENA: the analysis tool for heritable and environmental network associations. Bioinformatics. 2013 doi: 10.1093/bioinformatics/btt572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Horak E, Smith K, Bromley L, LeJeune S, Greenall M, Lane D, Harris AL. Mutant p53, EGF receptor and c-erbB-2 expression in human breast cancer. Oncogene. 1991;6:2277–2284. [PubMed] [Google Scholar]
  • 21.International Cancer Genome Consortium International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Irshad S, Ellis P, Tutt A. Molecular heterogeneity of triple-negative breast cancer and its clinical implications. Curr Opin Oncol. 2011;23:566–577. doi: 10.1097/CCO.0b013e32834bf8ae. [DOI] [PubMed] [Google Scholar]
  • 23.Kim D, Joung JG, Sohn KA, Shin H, Park YR, Ritchie MD, Kim JH. Knowledge Boosting: A graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc. 2014 doi: 10.1136/amiajnl-2013-002481. doi:10.1136/amiajnl-2013-002481. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kim D, Li R, Dudek SM, Ritchie MD. ATHENA: Identifying interactions between different levels of genomic data associated with cancer clinical outcomes using grammatical evolution neural network. BioData Min. 2013;6:23. doi: 10.1186/1756-0381-6-23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kim D, Li R, Dudek SM, Wallace JR, Ritchie MD. Binning somatic mutations based on biological knowledge for predicting survival: an application in renal cell carcinoma. Pac Symp Biocomput. 2015;20:96–107. [PMC free article] [PubMed] [Google Scholar]
  • 26.Kim D, Shin H, Joung JG, Lee SY, Kim JH. Intra-relation reconstruction from inter-relation: miRNA to gene expression. BMC Syst Biol. 2013 doi: 10.1186/1752-0509-7-S3-S8. doi:10.1186/1752-0509-7-S3-S8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kim D, Shin H, Sohn KA, Verma A, Ritchie MD, Kim JH. Incorporating inter-relationships between different levels of genomic data into cancer clinical outcome prediction. Methods. 2014;67:344–353. doi: 10.1016/j.ymeth.2014.02.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kim D, Shin H, Song YS, Kim JH. Synergistic effect of different levels of genomic data for cancer clinical outcome prediction. J Biomed Inform. 2012;45:1191–1198. doi: 10.1016/j.jbi.2012.07.008. [DOI] [PubMed] [Google Scholar]
  • 29.Lassus H, Sihto H, Leminen A, Joensuu H, Isola J, Nupponen NN, Butzow R. Gene amplification, mutation, and protein expression of EGFR and mutations of ERBB2 in serous ovarian carcinoma. J Mol Med (Berl) 2006;84:671–681. doi: 10.1007/s00109-006-0054-4. [DOI] [PubMed] [Google Scholar]
  • 30.Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2010;26:493–500. doi: 10.1093/bioinformatics/btp692. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Lussier YA, Li H. Breakthroughs in genomics data integration for predicting clinical outcome. J Biomed Inform. 2012;45:1199–1201. doi: 10.1016/j.jbi.2012.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Müller M. Goodness-of-fit criteria for survival data. Sonderforschungsbereich Paper. 2004;382 [Google Scholar]
  • 33.Motsinger-Reif AA, Dudek SM, Hahn LW, Ritchie MD. Comparison of approaches for machine-learning optimization of neural networks for detecting gene-gene interactions in genetic epidemiology. Genet Epidemiol. 2008;32:325–340. doi: 10.1002/gepi.20307. [DOI] [PubMed] [Google Scholar]
  • 34.Nizialek EA, Peterson C, Mester JL, Downes-Kelly E, Eng C. Germline and somatic KLLN alterations in breast cancer dysregulate G2 arrest. Hum Mol Genet. 2013;22:2451–2461. doi: 10.1093/hmg/ddt097. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Noushmehr H, Weisenberger DJ, Diefes K, Phillips HS, Pujara K, Berman BP, Pan F, Pelloski CE, Sulman EP, Bhat KP, Verhaak RG, Hoadley KA, Hayes DN, Perou CM, Schmidt HK, Ding L, Wilson RK, Van Den Berg D, Shen H, Bengtsson H, Neuvial P, Cope LM, Buckley J, Herman JG, Baylin SB, Laird PW, Aldape K. Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17:510–522. doi: 10.1016/j.ccr.2010.03.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Polyak K. Heterogeneity in breast cancer. J Clin Invest. 2011;121:3786–3788. doi: 10.1172/JCI60534. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Revillion F, Bonneterre J, Peyrat JP. ERBB2 oncogene in human breast cancer and its clinical significance. Eur J Cancer. 1998;34:791–808. doi: 10.1016/s0959-8049(97)10157-5. [DOI] [PubMed] [Google Scholar]
  • 38.Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet. 2001;69:138–147. doi: 10.1086/321276. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ritchie MD, Holzinger ER, Li R, Pendergrass SA, Kim D. Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 2015;16:85–97. doi: 10.1038/nrg3868. [DOI] [PubMed] [Google Scholar]
  • 40.Ritchie MD, White BC, Parker JS, Hahn LW, Moore JH. Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinformatics. 2003;4:28. doi: 10.1186/1471-2105-4-28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Shah NH, Tenenbaum JD. The coming age of data-driven medicine: translational bioinformatics' next frontier. J Am Med Inform Assoc. 2012;19:e2–4. doi: 10.1136/amiajnl-2012-000969. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T, Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao Y, Caldas C, Huntsman D, Hirst M, Marra MA, Aparicio S. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461:809–813. doi: 10.1038/nature08489. [DOI] [PubMed] [Google Scholar]
  • 43.Shah SP, Roth A, Goya R, Oloumi A, Ha G, Zhao Y, Turashvili G, Ding J, Tse K, Haffari G, Bashashati A, Prentice LM, Khattra J, Burleigh A, Yap D, Bernard V, McPherson A, Shumansky K, Crisan A, Giuliany R, Heravi-Moussavi A, Rosner J, Lai D, Birol I, Varhol R, Tam A, Dhalla N, Zeng T, Ma K, Chan SK, Griffith M, Moradian A, Cheng SW, Morin GB, Watson P, Gelmon K, Chia S, Chin SF, Curtis C, Rueda OM, Pharoah PD, Damaraju S, Mackey J, Hoon K, Harkins T, Tadigotla V, Sigaroudinia M, Gascard P, Tlsty T, Costello JF, Meyer IM, Eaves CJ, Wasserman WW, Jones S, Huntsman D, Hirst M, Caldas C, Marra MA, Aparicio S. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012;486:395–399. doi: 10.1038/nature10933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Sohn KA, Kim D, Lim J, Kim JH. Relative impact of multi-layered genomic data on gene expression phenotypes in serous ovarian tumors. BMC Syst Biol Accepted. 2013 doi: 10.1186/1752-0509-7-S6-S9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Srinivasan S, Patric IR, Somasundaram K. A ten-microRNA expression signature predicts survival in glioblastoma. PLoS One. 2011;6:e17438. doi: 10.1371/journal.pone.0017438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.TCGA Network Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–615. doi: 10.1038/nature10166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.TCGA Network Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Tegze B, Szallasi Z, Haltrich I, Penzvalto Z, Toth Z, Liko I, Gyorffy B. Parallel evolution under chemotherapy pressure in 29 breast cancer cell lines results in dissimilar mechanisms of resistance. PLoS One. 2012;7:e30804. doi: 10.1371/journal.pone.0030804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Therneau TM, Grambsch PM, Fleming TR. Martingale-Based Residuals for Survival Models. Biometrika. 1990;77:147–160. [Google Scholar]
  • 50.Tibes R, Qiu Y, Lu Y, Hennessy B, Andreeff M, Mills GB, Kornblau SM. Reverse phase protein array: validation of a novel proteomic technology and utility for analysis of primary leukemia specimens and hematopoietic stem cells. Mol Cancer Ther. 2006;5:2512–2521. doi: 10.1158/1535-7163.MCT-06-0334. [DOI] [PubMed] [Google Scholar]
  • 51.Turner SD, Dudek SM, Ritchie MD. ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci. BioData Min. 2010;3:5. doi: 10.1186/1756-0381-3-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.van 't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530–536. doi: 10.1038/415530a. [DOI] [PubMed] [Google Scholar]
  • 53.Waldman SA, Hyslop T, Schulz S, Barkun A, Nielsen K, Haaf J, Bonaccorso C, Li Y, Weinberg DS. Association of GUCY2C expression in lymph nodes with time to recurrence and disease-free survival in pN0 colorectal cancer. JAMA. 2009;301:745–752. doi: 10.1001/jama.2009.141. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Wang Y, He X, Yu Q, Eng C. Androgen receptor-induced tumor suppressor, KLLN, inhibits breast cancer growth and transcriptionally activates p53/p73-mediated apoptosis in breast carcinomas. Hum Mol Genet. 2013;22:2263–2272. doi: 10.1093/hmg/ddt077. [DOI] [PubMed] [Google Scholar]
  • 55.Yasrebi H. SurvJamda: an R package to predict patients' survival and risk assessment using joint analysis of microarray gene expression data. Bioinformatics. 2011;27:1168–1169. doi: 10.1093/bioinformatics/btr103. [DOI] [PubMed] [Google Scholar]
  • 56.Zhao Q, Shi X, Xie Y, Huang J, Shia B, Ma S. Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Brief Bioinform. 2014 doi: 10.1093/bib/bbu003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Zhu ZH, Sun BY, Ma Y, Shao JY, Long H, Zhang X, Fu JH, Zhang LJ, Su XD, Wu QL, Ling P, Chen M, Xie ZM, Hu Y, Rong TH. Three immunomarker support vector machines-based prognostic classifiers for stage IB non-small-cell lung cancer. J Clin Oncol. 2009;27:1091–1099. doi: 10.1200/JCO.2008.16.6991. [DOI] [PubMed] [Google Scholar]

RESOURCES