Abstract
Background:
The aim of this study was to predict the survival rate of Iranian gastric cancer patients using the Cox proportional hazard and artificial neural network models as well as comparing the ability of these approaches in predicting the survival of these patients.
Methods:
In this historical cohort study, the data gathered from 436 registered gastric cancer patients who have had surgery between 2002 and 2007 at the Taleghani Hospital (a referral center for gastrointestinal cancers), Tehran, Iran, to predict the survival time using Cox proportional hazard and artificial neural network techniques.
Results:
The estimated one-year, two-year, three-year, four-year and five-year survival rates of the patients were 77.9%, 53.1%, 40.8%, 32.0%, and 17.4%, respectively. The Cox regression analysis revealed that the age at diagnosis, high-risk behaviors, extent of wall penetration, distant metastasis and tumor stage were significantly associated with the survival rate of the patients. The true prediction of neural network was 83.1%, and for Cox regression model, 75.0%.
Conclusion:
The present study shows that neural network model is a more powerful statistical tool in predicting the survival rate of the gastric cancer patients compared to Cox proportional hazard regression model. Therefore, this model recommended for the predicting the survival rate of these patients.
Keywords: Gastric cancer, Survival analysis, Cox regression, Artifitial neural network
Introduction
Stomach or gastric cancer (GC) is the malignant growth of stomach cells (1), which can develop in any part of the stomach and may spread throughout the stomach and to other organs; particularly the esophagus, lungs and the liver (2). After cancer of lung, breast, and colerectum, GC is the fourth most commonly occurring cancer throughout the world (3). Generally, GC has reported the second most common cancer related cause of death in the world (3, 4).
In the past two decades, because of the promotion of hygiene in Iran, death from different disease has reduced, but death rates due to cancers have remained as a major health problem in Iranian people (5, 6).
In the last decades, the data analysts have proposed and used a variety of survival methods for describing the connection between the outcome variable and a host of covariates in different fields of medical sciences. In this context, the usual survival methods such as Cox proportional hazard modeling are still the most common approach for analyzing these kinds of data sets. However, when we use this model some underlying assumptions including proportionality of hazards and independency of covariates affecting hazard rate should be considered (7, 8). When these assumptions are not acceptable, we should use other free assumption modeling approaches such as artificial neural networks (ANNs).
Indeed, ANN models attempt to model the capabilities of human brains and are being used in areas of prediction and classification, where regression models and other related statistical techniques have traditionally been used(9).
In addition, ANN models have been used in medical fields for classification and for prediction of outcome (10).
In this manuscript, we have developed the ANN strategy for analyzing the survival data from 436 patients with stomach cancer. The aim of this study was also to compare the results obtained by ANN and traditional Cox regression analyses for predicting the survival rate of these patients.
Materials and Methods
Main outcome and related factors
We analyzed the data from 436 patients with stomach cancer who underwent operation between 2002 and 2007 at the Taleghani Hospital (a referral center for gastrointestinal cancers), Tehran, Iran. This study was conducted by the Research Center for Gastroenterology and Liver Disease of Shahid Beheshti University of Medical Sciences, Tehran, Iran. All the patients were diagnosed by endoscopy and/or biopsies. In this historical cohort study, the required information for each patient including gender, age at diagnosis, familial history, symptoms at diagnosis, high-risk behaviors (such as smoking, drug use and alcohol drinking), lymph nodes metastasis, extent of wall penetration, type of histology, tumor stage and metastasis was gathered from his/her registered documents in the Cancer Archive of Taleghani Hospital. The patients who had not completed document or treated out of the time February 2002 to January 2007 were excluded. In addition, the patients who have had lower than one-month survival time were excluded. We also registered the survival time of each patient (in month) after subtotal or total gastrectomy. The survival status of patients was confirmed through contact with their families by telephone. Moreover, the staging of the tumors was based on the sixth edition of the TNM system (11). The study protocol was approved by the Ethics Committee of the Research Centre for Gastroenterology and Liver Disease of Shaheed Beheshti Medical University.
Statistical analysis
In the first step of data analysis, because of survival nature of the outcome data, some preliminary analyses such as Kaplan-Meier and log-rank tests have been carried out. In the next step, for predicting the survival rate of the patients, we utilized more complex statistical methods, including Cox proportional hazard (CPH) and ANN models.
In the modeling process, we randomly divided the data in two subsets: 300 patients for constructing the models (training subset) and the remaining (136 patients) for assessing the validation of the models (testing subset). After evaluating the validation, we applied the receiver operation characteristic (ROC) curve as well as the concordance index for comparing the prediction power of the described models. It is worth mentioning that, in fitting ANN model we used a three-layer back-propagation neural network with 10 nodes in input layer, 7 nodes in hidden layer and one node in output layer. Since each patient’s status is a binary response variable (dead or censored), the sigmoid function was utilized as the activation function in hidden and output layers. For net training, we have utilized back-propagation learning algorithm with learning rate of 0.05 and momentum of 0.9. The learning process was stopped, when the average error (mean square error) in the training set descended to 0.0001. We also applied the backward selection method (with significant level of entry 0.10, and significant level of removal 0.15) to fit the Cox regression model.
For data analysis, we used the SPSS version 16.0 and the MATLAB version 7.1 softwares.
Results
As described before, the study sample was consisted of 436 GC patients including 315 men (72.2%) and 121 women (27.8%). The mean±SD age of the patients was 58.43±13.02. Table 1 shows other characteristics of the patients under study.
Table 1.
General characteristics of the patients
Characteristic | No | Percent | |
---|---|---|---|
Age at diagnosis (yr) | <=50 | 128 | 29.4 |
>50 | 308 | 70.6 | |
Familial history | Positive | 122 | 28.0 |
Negative | 314 | 72.0 | |
Symptoms at diagnosis | Positive | 323 | 74.1 |
Negative | 113 | 25.9 | |
High-risk behaviors | Positive | 198 | 45.4 |
Negative | 238 | 54.6 | |
Lymph nodes metastasis* | N1 | 115 | 26.4 |
N2 | 227 | 52.1 | |
N3 | 58 | 13.3 | |
Extent of wall penetration ** | T1 | 43 | 9.9 |
T2 | 70 | 16.1 | |
T3 | 211 | 48.4 | |
T4 | 92 | 21.1 | |
Distant metastasis | Presence | 71 | 16.3 |
Absence | 322 | 73.8 | |
Histology type | Adenocarcinoma | 322 | 73.8 |
Others | 114 | 26.2 | |
Tumor stage | Early | 73 | 16.7 |
Advanced | 319 | 73.2 |
N1: Tumor cells spread to closest or small number of regional lymph nodes; N2: Tumor cells spread to an extent between N1 and N3; N3: Tumor cells spread to most distant or numerous regional lymph nodes.
T1: Tumor invades lamina propria or submucosa; T2: Tumor invades the muscularis propria or the subserosa; T3: Tumor penetrates the serosa (visceral peritoneum) without invading adjacent structures; T4: Tumor invades adjacent structures.
In the first step of the modeling process, the data was divided in training and testing subsets. The log-rank (Mantel-Cox) test showed that the estimated survival curves using the training and testing subsets have no significant difference (P= 0.674). Based on Kaplan-Meier method, the mean and median of lifetime in the training subset was 32.86 and 28.3 mo, respectively. In addition, 1–5 yr survival probability in the training part was 0.76, 0.53, 0.35, 0.28, and 0.18, respectively. These rates were, respectively, 0.78, 0.51, 0.41, 0.36 and 0.12 in the testing part. Table 2 shows more details.
Table 2:
1–5 year survival probability (±SE) of GC patients in training and testing sets and in total
Follow-up period | Testing Set: n=136 (Prob.±SE) | Training Set: n=300 (Prob.±SE) | Total Data: n=436 (Prob.±SE) |
---|---|---|---|
One-year | 0.78 ± 0.037 | 0.76 ± 0.029 | 0.78 ± 0.023 |
Two-year | 0.51 ± 0.055 | 0.53 ± 0.039 | 0.53 ± 0.032 |
Three-year | 0.41 ± 0.08 | 0.35 ± 0.048 | 0.41 ± 0.038 |
Four-year | 0.36 ± 0.078 | 0.28 ± 0.056 | 0.32 ± 0.044 |
Five-year | 0.12 ± 0.075 | 0.18 ± 0.064 | 0.17 ± 0.050 |
For assessing the effect of different risk indicators on survival of the GC patients, we fitted a Cox regression model to the data using backward selection method. Table 3 shows the obtained results. These findings indicated that age at diagnosis (P= 0.038), high-risk behaviors (P= 0.070), extent of wall penetration (P= 0.076), distant metastasis (P= 0.003), and tumor stage (P= 0.017) have had statistically significant relationship with survival of the GC patients. This analysis revealed no significant relationship between patients’ survival and sex, familial history, symptoms at diagnosis, lymph nodes metastasis and histology type.
Table 3:
CPH modeling results for assessing the effect of prognostic factors on GC patients’ survival
Prognostic factor | P-value | HR (CI: 95%) |
---|---|---|
Age at diagnosis (yrs) | 0.038 | 1.471 (1.021 – 2.118) |
High-risk behaviors | ||
Absent | Reference Category | |
Present | 0.070 | 1.364 (0.976 – 1.908) |
Extent of wall penetration | ||
T1 | Reference Category | |
T2 | 0.596 | 0.724 (0.220 – 2.387) |
T3 | 0.382 | 1.491 (0.560 – 4.543) |
T4 | 0.052 | 2.203 (0.992 – 4.893) |
Distant metastasis | ||
Absent | Reference Category | |
Present | 0.003 | 1.861 (1.234 – 2.807) |
Tumor stage | ||
Early | Reference Category | |
Advanced | 0.017 | 1.883 (1.121 – 3.162) |
In the next step, we used the data from testing subset for comparing the accuracy of the models in true classification (the proportion of patients that classified correctly in dead and survived groups) of the patients. The obtained results were shown in Table 4. In this table, the first row shows the sensitivity (i.e. classification accuracy for positive cases), the second row shows the specificity (i.e. classification accuracy for negative cases), and the third row shows the concordance index (i.e. a proportion of all utilizable patient pairs in which the predictive and observed outcomes are concordant. The values of 0 to 1 indicate model fit.). As we can see, the ANN model lead to more accurate predictions compared to the Cox model (overall true prediction of 83.1% vs. 75.0%). The area under ROC curve, calculated from testing data, for ANN model was 0.92, and for CPH model, 0.88. The difference between two AUC and its SE was 0.04 and 0.0178 respectively, and derived P-value was 0.0245.
Table 4.
Classification accuracy of ANN and CPH models in testing subset
Status | Observed (n) | True prediction by ANN n (%) | True prediction by CPH n (%) |
---|---|---|---|
Dead | 51 | 41 (80.4) | 32 (62.7) |
Survived | 85 | 72 (84.7) | 70 (82.4) |
Total | 136 | 113 (83.1) | 102 (75.0) |
Discussion
Gastric cancer is very lethal as a fact that the diagnosis of most of patients occurs in the advanced stages (12–15).
Therefore, determining of risk factors and risk indicators, and estimating the survival rate of GC patients are of the great importance. The main aims of the present study were identifying some of the most important related factors of GC and comparing the ability of a traditional survival modeling technique (Cox proportional hazard model) and neural network approach in predicting the survival rate of the GC patients.
In this study, the estimation for the one-year, two-year, three-year, four-year and five-year survival rates for GC patients were, respectively, 77.9%, 53.1%, 40.8%, 32.0%, and 17.4% for the total sample (436 patients). Other studies in Iran showed five-year survival rates between 13% and 18% for the GC patients (13–15). In addition, the reported five-year survival rate of stomach cancer is 29.6% in China, 4.4% in Thailand, 37% in USA,10–20% in European countries (4), 22% in Switzerland, 30% in France, and 40–60% in Japan (3, 4, 14).
In this study, the Cox regression analysis showed that the patients’ survival time is significantly associated with age at diagnosis, high-risk behaviors, extent of wall penetration, distant metastasis, and tumor stage. Meanwhile, other surveys for Iranian people revealed that tumor size (>35 mm), presence of lymph node and distant metastasis, elder age at diagnosis and poor differentiated grade were significant risk indicators for lower survival rate of the GC patients (16, 17). Moreover, published studies from other countries have referred to other risk indicators such as gender, religion, education, number of involved lymph nodes, histological type, and type of complementary treatment as the significant effective factors for survival of the GC patients (18–22).
In the past decades, the statisticians frequently used the traditional methods for analyzing the survival data sets. In recent years, however, more powerful statistical and mathematical softwares enable make use of more sophisticated techniques such as artificial neural network to predict the survival rate of cancer patients more accurately. In this context, the use of ANN for predicting different outcomes in gastric cancer was reported in several articles from different countries. In 2000, a study was conducted to utilize the ANN in the discrimination of benign from malignant gastric cells. The results showed that the ANN could correctly classify 96% of benign cells and 89% of malignant cells (23). In 2008, an ANN-based study was conducted for predicting tumor staging in GC patients and an accuracy of 81.8% for predicting the tumor stage in primary GC patients was reported (24). In Iran, a study applied the Kaplan-Meier, CPH and hierarchical ANN methods for the assessment of GC survival. In overall, there was no significant difference among the survival probabilities or the trend of changes in survival probabilities with the three methods (25). In another study in Taiwan, the ordinary logistic regression, an ANN and decision tree method was used for predicting post-operative complications of GC patients. Results of their study indicated that the ANN was better technique for predicting the post-operative complications compared to logistic regression and decision tree methods (26). A single layer perceptron ANN was applied to data of 4302 GC patients from the National Cancer Center, Tokyo, Japan. Results showed an accuracy of 79% in predicting N (+) or N0 using this method (27). In the present study, we compared the results of CPH and ANN methods in predicting survival of GC patients.
As we know, the predictive results are focused on two aspects: 1) predictive accuracy for the total set of patients of testing data set (i.e. concordance index), and 2) for the two classes of outcomes (dead and survived) that represents sensitivity and specificity respectively. Based on table 4, the ANN model has better sensitivity and specificity than the Cox model. In addition, concordance index for ANN was 83.1% vs. 75.0% for Cox model that indicate better fit for ANN model. In conclusion, the ANN model had a more accurate prediction of survival rate in GC patients than the CPH model. Therefore, the predicted survival rate obtained by this technique could be used for classifying the high-risk GC patients and allocating the necessary treatments and health resources to these patients.
Ethical Considerations
Ethical issues including plagiarism, informed consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc. have been completely observed by the authors.
Acknowledgments
Our gratitude and regards are due to the School of Medical Sciences of Tarbiat Modares University for all the supports that generously was extended to us during this research. We also wish to express our thanks to all colleagues at Research Center for Gastroenterology and Liver Disease in Shahid Beheshti University of Medical Sciences. The authors declare that there is no conflict of interests.
References
- 1.Stomach (Gastric) Cancer. 2005–2008 Cancer FactsMD.com, available from: www.cancerfactsmd.com/stomach-gastric-cancer/
- 2.Adachi Y, Yasuda K, Inomata M, Sato K, Shiraishi N, Kitano S. Pathology and prognosis of gastric carcinoma: well versus poorly differentiated type. Cancer. 2000;89(7):1418–24. [PubMed] [Google Scholar]
- 3.Inoue M, Tsugane S. Epidemiology of gastric cancer in Japan. Postgrad Med J. 2005;81(957):419–24. doi: 10.1136/pgmj.2004.029330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Crew KD, Neugut AI. Epidemiology of gastric cancer. World J Gastroenterol. 2006;12(3):354–62. doi: 10.3748/wjg.v12.i3.354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sajadi A, Nouraei M, Mohagheghi MA, Mousavi-Jarrahi A, Malekezadeh R, Parkin DM. Cancer Occurrence in Iran in 2002, an International Perspective. Asian Pac J Cancer Prev. 2005;6(3):359–63. [PubMed] [Google Scholar]
- 6.Mohagheghi MA. Annual Report of: Tehran Cancer Registry 1999. The Cancer Institute Publication 2004 [Google Scholar]
- 7.Elisa TL. In: Statistical Methods for Survival Data Analysis. John WW, editor. John Wiley & Sons Inc; 2003. pp. 8–15. [Google Scholar]
- 8.Chia CL, Street WN, Wolberg WH. Application of Artificial Neural Network-Based Survival Analysis on Two Breast Cancer Datasets. American Medical Informatics Association Annual Symposium (AMIA 2007); Chicago, IL. 2002. pp. 130–34. [PMC free article] [PubMed] [Google Scholar]
- 9.Paliwal M, Kumar UA. Neural networks and statistical techniques: A review of applications. Expert Syst Appl. 2009;36:2–17. [Google Scholar]
- 10.Jones AS, Taktak AGF, Helliwell TR, Fenton JE, Birchall MA, Husband DJ, Fisher AC. An artificial neural network improves prediction of observed survival in patients with laryngeal squamous carcinoma. Eur Arch Otorhinolaryngol. 2006;263:541–47. doi: 10.1007/s00405-006-0021-2. [DOI] [PubMed] [Google Scholar]
- 11.American Joint Committee on Cancer AJCC Cancer Staging Manual. 6th ed. NY: Springer; New York: 2002. pp. 99–106. Stomach. [Google Scholar]
- 12.Sajadi A, Raafat J, Mohagheghi M, Meemary F. Gastric Carcinoma, 5 years experience of a single institute. Asian Pac J Cancer Prev. 2005;6(2):195–96. [PubMed] [Google Scholar]
- 13.Zeraati H, Mahmoudi M, Kazemnejad A, Mohammad K. Postoperative survival in gastric cancer patients and its associated factors: A time dependent covariates model. Iranian J Publ Health. 2006;35(3):40–6. [Google Scholar]
- 14.Biglarian A, Hajizadeh E, Gohari MR, Khoda Bakhshi R. Survival analysis of patients with gastric adenocarcinomas and factors related. Kowsar Med J. 2008;12(4):345–55. (in Persian) [Google Scholar]
- 15.Hajiani E, Sarmast Soshtari MH, Masjedizadeh R, Hashemi J, Azmi M, Tahereh R. Clinical profile of gastric cancer in Khuzestan, southwest of Iran. World J Gastroenterol. 2006;12(30):4832–35. doi: 10.3748/wjg.v12.i30.4832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Moghimi-Dehkordi B, Safaee A, Pourhoseingholi MA, Fatemi R, Tabeie Z, Zali MR. Statistical comparison of survival models for analysis of cancer data. Asian Pac J Cancer Prev. 2008;9(3):417–20. [PubMed] [Google Scholar]
- 17.Peyre CG, Hagen JA, DeMeester SR, Altorki NK, Ancona E, Griffin SM, et al. The number of lymph nodes removed predicts survival in esophageal cancer: an international study on the impact of extent of surgical resection. Ann Surg. 2008;248(4):549–56. doi: 10.1097/SLA.0b013e318188c474. [DOI] [PubMed] [Google Scholar]
- 18.Ozgüç H, Sönmez Y, Yerci O. Metastatic/resected lymph nodes ratio-based classification in gastric cancer. Turk J Gastroenterol. 2008;19(1):2–7. [PubMed] [Google Scholar]
- 19.Liu C, Lu P, Lu Y, Xu H, Wang S, Chen J. Clinical implications of metastatic lymph node ratio in gastric cancer. BMC Cancer. 2007;7:200. doi: 10.1186/1471-2407-7-200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Borie F, Rigau V, Fingerhut A, Millat B. French Association for Surgical Research. Prognostic factors for early gastric cancer in France: Cox regression analysis of 332 cases. World J Surg. 2004;28(7):686–91. doi: 10.1007/s00268-004-7127-8. [DOI] [PubMed] [Google Scholar]
- 21.Yeole BB, Kumar AV. Population-based survival from cancers having a poor prognosis in Mumbai (Bombay), India. Asian Pac J Cancer Prev. 2004;5(2):175–82. [PubMed] [Google Scholar]
- 22.Ghiandoni G, Rocchi MB, Signoretti P, Belbusti F. Prognostic factors in gastric cancer evaluated by using Cox regression model. Minerva Chir. 1998;53(6):497–504. [PubMed] [Google Scholar]
- 23.Karakitsos P, Pouliakis A, Koutroumbas K, Stergiou EB, Tzivras M, Archimandritis A, Liossi AI. Neural network application in the discrimination of benign from malignant gastric cells. Anal Quant Cytol Histol. 2000;22(1):63–9. [PubMed] [Google Scholar]
- 24.Lai KC, Chiang HC, Chen WC, Tsai FJ, Jeng LB. Artificial neural network-based study can predict gastric cancer staging. Hepatogastroenterology. 2008;55(86–87):1859–63. [PubMed] [Google Scholar]
- 25.Amiri Z, Mohammad K, Mahmoudi M, Zeraati H, Fotouhi A. Assessment of gastric cancer survival: using an artificial hierarchical neural network. Pak J Biol Sci. 2008;11(8):1076–84. doi: 10.3923/pjbs.2008.1076.1084. [DOI] [PubMed] [Google Scholar]
- 26.Chien CW, Lee YC, Ma T, Lee TS, Lin YC, Wang W, Lee WJ. The application of artificial neural networks and decision tree model in predicting post-operative complication for gastric cancer patients. Hepatogastroenterology. 2008;55(84):1140–45. [PubMed] [Google Scholar]
- 27.Bollschweiler EH, Mönig SP, Hensler K, Baldus SE, Maruyama K, Hölscher AH. Artificial neural network for prediction of lymph node metastases in gastric cancer: a phase II diagnostic study. Ann Surg Oncol. 2004;11(5):506–11. doi: 10.1245/ASO.2004.04.018. [DOI] [PubMed] [Google Scholar]