Abstract
Introduction
Although remarkable progress has been made to determine the prognosis of patients with colorectal cancer (CRC), it is inadequate to identify the subset of high-risk TNM stage II and stage III patients that have a high potential of developing tumor recurrence and may experience death. In this study, we aimed to develop biomarkers as a prognostic signature for the clinical outcome of CRC patients with stage II and stage III.
Materials and methods
We performed a systematic and comprehensive discovery step to identify recurrence-associated genes in CRC patients through publicly available GSE41258 (n=253) and GSE17536 (n=107) datasets. We subsequently determined the prognostic relevance of candidate genes in stage II and III patients and developed a triple-biomarker for predicting RFS in GSE17536, which was later validated in an independent cohort GSE33113 dataset (n=90).
Results
Based upon mRNA expression profiling studies, we identified 45 genes which differentially expressed in recurrent vs non-recurrent CRC patients. By using Cox proportional hazard models, we then developed a triple-marker model (THBS2, SERPINE1, and FN1) to predict prognosis in GSE17536, which successfully identified poor prognosis in stage II and stage III, particularly high-risk stage II CRC patients.
Discussion
Notably, we found that our triple-marker model once again predicted recurrence in stage II patients in GSE33113. Kaplan–Meier survival analysis demonstrated that patients with high scores have a poor outcome compared to those with low scores. Our triple-marker model is a reliable predictive tool for determining prognosis in CRC patients with stage II and stage III, and might be able to identify high-risk patients that are candidates for more targeted personalized clinical management and surveillance.
Keywords: colorectal cancer, triple-biomarker model, metastasis, retrospective study
Introduction
Colorectal cancer (CRC) is the third most common cancer worldwide. Despite 60% of TNM stage II and stage III patients presenting with a resectable disease at the time of diagnosis, ~50% of such patients who undergo curative surgery or 20% who are treated post-surgically with adjuvant chemotherapy, eventually relapse and experience a metastatic disease.1–3 This clinical challenge indicates the current TNM staging system is inadequate at predicting the risk for tumor recurrence, leading to potential under or over-treatment of a subset of patients with colorectal cancer.
Currently, 5-fluorouracil (5FU)-based adjuvant chemotherapy remains regular treatment for stage III CRC patients and some high risk stage II CRC patients, which improves survival rates bŷ20%.4,5 For stage III patients, 30%–40% of patients do not experience recurrence in 5 years even when left untreated, while about 40% patients with adjuvant treatment still suffered from relapse and eventually die, suggesting such subsets of patients need more intensive chemotherapy. On the other hand, for stage II patients, only patients who present with high-risk clinical features received adjuvant chemotherapy. Unfortunately, about 20% of clinical “low-risk” patients experience tumor recurrence.6,7 Collectively, these findings highlight an urgent need for better novel and robust prognostic biomarkers that can guide treatment decisions in CRC patients with stage II and stage III pathological progress.
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community. By integrating the mRNA expression profile and clinical outcome, we can obtain novel prognostic biomarkers for stage II and stage III CRC patients. In this study, we performed a systematic and comprehensive identification of recurrence-specific genes that are differentially expressed in recurrent tumor and non-recurrent tumor, followed by determining their combinatorial efficiency in predicting recurrence free survival by analyzing their expression in multiple, independent cohorts of patients with CRC.
Materials and methods
Public datasets
We used the Affymetrix dataset which is publicly available in the GEO database with available clinical information as originally research.8 The GSE41258 dataset consists of colorectal cancer patients with liver metastasis or lung metastasis.9 The biological specimens we used in this study included liver metastasis (n=47), lung metastasis (n=20), and primary colon adenocarcinomas (n=186). The GSE17536 dataset included 177 patients with CRC disease collected at the Moffitt Cancer Center (Tampa, FL, USA), and it was used to define the molecular classification.10,11 Since our study focused on stage II and stage III patients, only such patients were selected (stage II n=52, stage III n=55). The GSE33113 dataset included a set of 90 American Joint Committee on Cancer (AJCC) stage II patients that underwent intentionally curative surgery in the Academic Medical Center (AMC) in Amsterdam, the Netherlands.12 All the data contain complete clinical information for the differential gene expression and recurrence-free survival (RFS) analysis (Table 1).
Table 1.
GEO datasets | Clinicopathological characteristics |
---|---|
GSE41258 | This study consisted of patients who presented at Memorial Sloan-Kettering Cancer Center with a colonic neoplasm between 1992 and 2004. Biological specimens used in this study included primary colon adenocarcinomas, adenomas, metastasis, and corresponding normal mucosae. |
GSE17536 | A total of 55 colorectal cancer patients from Vanderbilt Medical Center (VMC) were used as the training dataset and 177 patients from the Moffitt Cancer Center were used as the independent dataset. |
GSE33113 | Primary tumor resections from 90 AJCC stage II CRC patients, that underwent intentionally curative surgery, and matching normal colon tissue from six of these patients were included in the study (1997–2006 [AMC-AJCCII-90]). Extensive medical records were kept from these patients and long-term clinical follow-up is available for the large majority. |
Abbreviations: AJCC, American Joint Committee on Cancer; CRC, colorectal cancer; AMC, Academic Medical Center.
Discovery of differentially expressed genes
The differentially expressed genes were defined as genes differentially expressed in lung metastasis and primary cancer tissues; liver metastasis and primary cancer tissues; recurrent tumors and non-recurrent tumors. All the comparisons were performed by GEO2R. GEO2R performs comparisons on original submitter-supplied processed data tables using the GEOquery and limma R packages from the Bioconductor project. The differentially expressed genes were determined by adjusted P<0.05 (Benjamini & Hochberg [false discovery rate]).
Pathway analysis
Enrichr pathway analysis was used for functional annotation of recurrence-associated genes.
Statistical analysis
All statistical analyses were performed using Medcalc version 12.3, or GraphPad Prism version 6.0. We conducted receiver operating characteristic (ROC) curves and calculated the area under the ROC curves (AUC) to evaluate the predictive power of candidate genes for prognosticating CRC patients. For the RFS analysis, we defined the probability that patients remained free of tumor recurrence as the first event. Data were analyzed from the date of surgery to the time of the first event or the date on which data were censored, according to the Kaplan–Meier method, and the curves were compared using the log-rank test. To develop a triple-marker and determining patient survival, we used Cox’s proportional hazard regression models and obtained a risk score derived from this prediction model. We categorized patients into high-score and low-score value groups based on the median cutoff value. All P-values were 2-sided, and those <0.05 were considered statistically significant.
Results
Identification of candidate genes for CRC recurrence
The metastatic spread of tumor cells is one of the most common causes of recurrence in colorectal cancer patients. Elucidation of specific gene expression pattern of metastatic colonies may provide useful insights into development of recurrence markers. GSE41258 dataset includes gene expression microarray data from primary colon adenocarcinomas, liver metastasis and lung metastasis tissues. To find metastatic-specific markers, we initially compared gene expression profile between primary cancer tissues and tissues from liver or lung metastatic sites as indicated in the flow chart of the study design (Figure 1). Interestingly, lung metastasis vs primary comparison analysis revealed 7,084 differentially expressed genes (adjusted P-value <0.05), while liver metastasis vs primary comparison showed 10,502 differentially expressed genes (adjusted P-value <0.05). More importantly, we found 3,501 overlapping genes in both comparison groups, highlighting their important role in tumor metastasis and potential targets for recurrence prediction.
To confirm our assumption that these metastatic-specific markers could serve as recurrence prediction biomarkers, we enrolled a testing cohort (GSE17536) which only involved stage II and III patients. We compared gene expression profile in tissues from patients with or without recurrent status. The comparison analysis showed 298 differentially expressed genes (adjusted P-value <0.05), and 45 of these genes overlapped with the above metastatic-specific markers (Figure 2A). The pathway enrichment analysis showed the biological function of these genes is mainly involved in the inflammatory response,13,14 focal adhesion, epidermal growth factor (EGF)/epidermal growth factor receptor (EGFR) pathway, extracellular matrix (ECM), and membrane receptors, implying these genes might be important for recurrent colorectal tumors to acquire metastatic capacity (Figure 2B). In order to narrow down this list further, we thereafter selected the ten most differentially expressed genes (CYP1B1, ITGBL1, THBS2, VCAN, BGN, SERPINE1, ECM2, TWIST1, FN1, and CAV2) according to fold change which were significantly up-regulated in recurrent tumor compared to non-recurrent tumor (Figure 2C), implicating their potential relevance in determining the clinical outcome of stage II and III CRC patients.
Development of a triple-biomarker model to predict RFS in stage II and III patients
We subsequently performed ROC analysis of the top 10 candidates to evaluate the prediction accuracy of recurrent or non-recurrent CRC in the GSE17536 dataset. As shown in Figure 3, each candidate showed good prediction power, with an AUC from 0.694 to 0.788. Moreover, we observed three genes THBS2, SERPINE1, and FN1 demonstrated higher AUC values compared to the other genes. Therefore, we aimed to combine these three biomarkers to improve the prediction ability. By using Cox regression method, we built a recurrence prediction model based on this triple-biomarker. As expected, our triple-marker model significantly improved prediction ability of individual genes (AUC=0.813; Figure 3). Importantly, the AUC value had no significant difference between three single genes, but all had significance when compared with the three gene biomarker panel (P<0.05).
Performance evaluation of the triple-biomarker model in the testing cohort
To test whether our triple-biomarker model could identify patients at high risk or low risk of poor outcomes, we calculated the risk-score of each patient based on Cox regression model. We divided patients into high-score and low-score groups based on the cutoff value (the median value of all patients’ risk scores). It is of note that the high-score group had a worse prognosis compared to patients in the low-risk group (HR=5.41, P=0.0004; Figure 4A). As mentioned previously, it is of clinical relevance to identify patients at high risk for stage II. Accordingly, when we split stage II patients into low- and high-score groups, our triple-marker model clearly showed that stage II patients with higher vs lower risk score values had a poorer prognosis (HR=3.53, P=0.0245; Figure 4B). Surprisingly, when we compared high-score stage II and stage III patients, the two groups yielded similar survival curves, suggesting our triple-marker is able to identify the high risk stage II group which has the same prognosis as the stage III group. Collectively, these results indicate that our newly developed triple-marker model could successfully segregate high- vs low-risk patients with stage II and stage III pathological progress.
Independent validation of the triple-biomarker model to identify high-risk stage II patients
To further confirm the results obtained for the triple markers in the testing cohort, we validated our findings in another independent cohort of 107 stage II CRC patients. We also calculated the risk score of each patient based on this triple-maker regression model. We divided all the patients into low-and high-score groups according to the median cutoff value. Consistent with our previous results, we found that our triple markers once again showed good predictive performance in stage II patients (Figure 5A). Furthermore, the Kaplan–Meier survival analysis demonstrated that patients with high scores have poor outcome compared to those with low scores (HR=4.34, P=0.0046; Figure 5B), highlighting that our triple-marker is indeed a promising and reliable prognostic tool for identifying high-risk stage II patients, which has important implications for their clinical management.
Discussion
In this study, we have first performed a systematic discovery step, followed by development and validation of a novel triple-marker (THBS2, SERPINE1, and FN1) aimed at predicting potential clinical outcomes for stage II and stage III CRC patients. Through our logical discovery, test and validation step, we provide data that our triple markers could successfully identify risk in CRC patients, particularly stage II patients, with a good predictive performance.
Based on the GSE41258 dataset, we first identified metastatic-specific markers, finding 3,501 overlapping genes in both comparison groups (lung metastasis vs primary cancer and liver metastasis vs primary cancer), suggesting these genes may serve as recurrence biomarkers. By using a testing cohort (GSE17536), we found 45 genes, which overlapped with metastatic-specific markers, and are significantly up-regulated in tissues from recurrent patients. More importantly, these 45 genes were involved in metastasis-related processes such as inflammatory response, focal adhesion, EGF/EGFR pathway, ECM, and membrane receptors. To narrow down the candidates, we selected the top 10 most differentially expressed genes (CYP1B1, ITGBL1, THBS2, VCAN, BGN, SERPINE1, ECM2, TWIST1, FN1, and CAV2) according to fold change. When we evaluated the prediction power of each gene to discriminate recurrence and non-recurrence by ROC analysis, we found THBS2, SERPINE1, and FN1 showed the highest AUC values. Therefore, we selected these genes to constitute a triple-marker model to predict RFS in stage II and stage III patients.
The biological function of these identified genes selected for our triple-marker model has been investigated previously. Thrombospondins (THBS2) is a multifunction alglycoprotein released from various types of cell.15 THBS2 contributes to carcinogenesis since THBS2 exerts its diverse biological effects such as angiogenesis, cell motility, apoptosis, cytoskeletal organization by binding with ECM proteins and cell surface receptors.16–19 Notably, THBS2 is known to activate transforming growth factor-β1 (TGF-β1) signaling, which promotes metastasis.20 A recent study shows that overexpression of THBS2 correlated with poor OS and RFS in CRC patients, which is consistent with our results.21 SERPINE1 expression has been shown to be associated with tumor cell migration and invasion through the activation of the PI3K-Akt pathway.22–24 Furthermore, SERPINE1 pro-migratory effect has been associated with LRP1 interaction, which in turn stimulates the Jak/Stat pathway.25 SERPINE1 may also contribute to tumor aggressiveness by promoting tumor angiogenesis.26,27 FN1 has long been considered as an epithelial–mesenchymal transition (EMT) marker and is associated with angiogenesis and metastasis.28,29 Genes with lower AUC may significantly complement the model, however, we are more inclined to select biomarkers which are up-regulated in recurrent tumors compared to non-recurrent tumors such as THBS1, SERPINEE1, and FN1. Nonetheless, our triple makers play an important biological role in tumor metastasis, highlighting their clinical application in predicting recurrence in CRC patients.
We thereafter built a Cox regression model based on these triple markers. In GSE17536, our markers successfully showed high-score patients had a worse prognosis compared to patients in the low-risk group. Considering the clinical importance to identify high-risk stage II patients, we tested whether our triple markers could predict high-risk stage II patients. Accordingly, when we split stage II patients into low-and high-score groups based on our triple markers, our triple-marker clearly showed that stage II patients with higher- vs lower-risk score values had a poor prognosis. Notably, the high risk stage II patients and stage III patients yielded similar survival curves. To further confirm the results obtained for the triple markers in the testing cohort, we validated our findings in another independent cohort of 107 stage II CRC patients. In agreement with our earlier studies, patients with high scores had poor outcomes compared to those with low scores, suggesting our triple-marker is a reliable prognostic tool for identifying high-risk stage II patients, which has important implications for their clinical management.
Limitations
In regard to potential limitations, our current study is retrospective in nature, and our results must be validated in future, prospective, multi-center clinical trials. In addition, some of the clinical parameters such as vascular invasion or number of analyzed lymph nodes were not recorded or evaluated in GEO datasets, which may be easier to address in a future well-defined patient cohort.
Conclusion
We provide compelling evidence that our newly developed triple-marker model can effectively stratify stage II and III CRC patients into high- and low-risk groups based upon clinical outcomes, thereby adding significant prognostic value to the currently used clinicopathological risk factors used for such purposes. If validated in future studies, such a triple-marker model potentially offers tremendous clinical value in directing personalized treatment regimens and clinical management of patients with stage II and III CRC.
Footnotes
Disclosure
The authors report no conflicts of interest in this work.
References
- 1.Obrand DI, Gordon PH. Incidence and patterns of recurrence following curative resection for colorectal carcinoma. Dis Colon Rectum. 1997;40(1):15–24. doi: 10.1007/BF02055676. [DOI] [PubMed] [Google Scholar]
- 2.O’Connell MJ, Campbell ME, Goldberg RM, et al. Survival following recurrence in stage II and III colon cancer: findings from the ACCENT data set. J Clin Oncol. 2008;26(14):2336–2341. doi: 10.1200/JCO.2007.15.8261. [DOI] [PubMed] [Google Scholar]
- 3.André T, Boni C, Navarro M, et al. Improved overall survival with oxaliplatin, fluorouracil, and leucovorin as adjuvant treatment in stage II or III colon cancer in the MOSAIC trial. J Clin Oncol. 2009;27(19):3109–3116. doi: 10.1200/JCO.2008.20.6771. [DOI] [PubMed] [Google Scholar]
- 4.Graham JS, Cassidy J. Adjuvant therapy in colon cancer. Expert Rev Anticancer Ther. 2012;12(1):99–109. doi: 10.1586/era.11.189. [DOI] [PubMed] [Google Scholar]
- 5.Carethers JM. Systemic treatment of advanced colorectal cancer: tailoring therapy to the tumor. Therap Adv Gastroenterol. 2008;1(1):33–42. doi: 10.1177/1756283X08093607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.André T, Boni C, Mounedji-Boudiaf L, et al. Oxaliplatin, fluorouracil, and leucovorin as adjuvant treatment for colon cancer. N Engl J Med. 2004;350(23):2343–2351. doi: 10.1056/NEJMoa032709. [DOI] [PubMed] [Google Scholar]
- 7.Kuebler JP, Wieand HS, O’Connell MJ, et al. Oxaliplatin combined with weekly bolus fluorouracil and leucovorin as surgical adjuvant chemotherapy for stage II and III colon cancer: results from NSABP C-07. J Clin Oncol. 2007;25(16):2198–2204. doi: 10.1200/JCO.2006.08.2974. [DOI] [PubMed] [Google Scholar]
- 8.Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res. 2007;35(Database issue):D760–D765. doi: 10.1093/nar/gkl887. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sheffer M, Bacolod MD, Zuk O, et al. Association of survival and disease progression with chromosomal instability: a genomic exploration of colorectal cancer. Proc Natl Acad Sci U S A. 2009;106(17):7131–7136. doi: 10.1073/pnas.0902232106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Smith JJ, Deane NG, Wu F, et al. Experimentally derived metastasis gene expression profile predicts recurrence and death in patients with colon cancer. Gastroenterology. 2010;138(3):958–968. doi: 10.1053/j.gastro.2009.11.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Freeman TJ, Smith JJ, Chen X, et al. Smad4-mediated signaling inhibits intestinal neoplasia by inhibiting expression of β-catenin. Gastroenterology. 2012;142(3):e562–e571. doi: 10.1053/j.gastro.2011.11.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kemper K, Versloot M, Cameron K, et al. Mutations in the Ras-Raf axis underlie the prognostic value of CD133 in colorectal cancer. Clin Cancer Res. 2012;18(11):3132–3141. doi: 10.1158/1078-0432.CCR-11-3066. [DOI] [PubMed] [Google Scholar]
- 13.Chen EY, Tan CM, Kou Y, et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128. doi: 10.1186/1471-2105-14-128. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kuleshov MV, Jones MR, Rouillard AD, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44(W1):W90–W97. doi: 10.1093/nar/gkw377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Adams JC, Lawler J. The thrombospondins. Int J Biochem Cell Biol. 2004;36(6):961–968. doi: 10.1016/j.biocel.2004.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Iruela-Arispe ML, Luque A, Lee N. Thrombospondin modules and angiogenesis. Int J Biochem Cell Biol. 2004;36(6):1070–1078. doi: 10.1016/j.biocel.2004.01.025. [DOI] [PubMed] [Google Scholar]
- 17.Risher WC, Eroglu C. Thrombospondins as key regulators of synaptogenesis in the central nervous system. Matrix Biol. 2012;31(3):170–177. doi: 10.1016/j.matbio.2012.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Bornstein P. Thrombospondins function as regulators of angiogenesis. J Cell Commun Signal. 2009;3(3–4):189–200. doi: 10.1007/s12079-009-0060-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zubor P, Hatok J, Moricova P, et al. Gene expression abnormalities in histologically normal breast epithelium from patients with luminal type of breast cancer. Mol Biol Rep. 2015;42(5):977–988. doi: 10.1007/s11033-014-3834-x. [DOI] [PubMed] [Google Scholar]
- 20.Cheon DJ, Tong Y, Sim MS, et al. A collagen-remodeling gene signature regulated by TGF-β signaling is associated with metastasis and poor survival in serous ovarian cancer. Clin Cancer Res. 2014;20(3):711–723. doi: 10.1158/1078-0432.CCR-13-1256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Qian Z, Zhang G, Song G, et al. Integrated analysis of genes associated with poor prognosis of patients with colorectal cancer liver metastasis. Oncotarget. 2017;8(15):25500–25512. doi: 10.18632/oncotarget.16064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Pavón MA, Arroyo-Solera I, Téllez-Gabriel M, et al. Enhanced cell migration and apoptosis resistance may underlie the association between high SERPINE1 expression and poor outcome in head and neck carcinoma patients. Oncotarget. 2015;6(30):29016–29033. doi: 10.18632/oncotarget.5032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Balsara RD, Castellino FJ, Ploplis VA. A novel function of plasminogen activator inhibitor-1 in modulation of the AKT pathway in wild-type and plasminogen activator inhibitor-1-deficient endothelial cells. J Biol Chem. 2006;281(32):22527–22536. doi: 10.1074/jbc.M512819200. [DOI] [PubMed] [Google Scholar]
- 24.Langlois B, Perrot G, Schneider C, et al. LRP-1 promotes cancer cell invasion by supporting ERK and inhibiting JNK signaling pathways. PLoS One. 2010;5(7):e11584. doi: 10.1371/journal.pone.0011584. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Degryse B, Neels JG, Czekay RP, Aertgeerts K, Kamikubo Y, Loskutoff DJ. The low density lipoprotein receptor-related protein is a motogenic receptor for plasminogen activator inhibitor-1. J Biol Chem. 2004;279(21):22595–22604. doi: 10.1074/jbc.M313004200. [DOI] [PubMed] [Google Scholar]
- 26.Bajou K, Noël A, Gerard RD, et al. Absence of host plasminogen activator inhibitor 1 prevents cancer invasion and vascularization. Nat Med. 1998;4(8):923–928. doi: 10.1038/nm0898-923. [DOI] [PubMed] [Google Scholar]
- 27.Bajou K, Peng H, Laug WE, et al. Plasminogen activator inhibitor-1 protects endothelial cells from FasL-mediated apoptosis. Cancer Cell. 2008;14(4):324–334. doi: 10.1016/j.ccr.2008.08.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sponziello M, Rosignolo F, Celano M, et al. Fibronectin-1 expression is increased in aggressive thyroid cancer and favors the migration and invasion of cancer cells. Mol Cell Endocrinol. 2016;431:123–132. doi: 10.1016/j.mce.2016.05.007. [DOI] [PubMed] [Google Scholar]
- 29.Soikkeli J, Podlasz P, Yin M, et al. Metastatic outgrowth encompasses COL-I, FN1, and POSTN up-regulation and assembly to fibrillar networks regulating cell adhesion, migration, and growth. Am J Pathol. 2010;177(1):387–403. doi: 10.2353/ajpath.2010.090748. [DOI] [PMC free article] [PubMed] [Google Scholar]