Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 1.
Published in final edited form as: Ann Surg. 2022 Jul 15;277(6):971–978. doi: 10.1097/SLA.0000000000005538

Predicting long-term survival and time-to-recurrence after esophagectomy in patients with esophageal cancer - Development and validation of a multivariate prediction model

Rohan R Gujjuri 1,2, Jonathan M Clarke 3, Jessie A Elliott 4, Saqib A Rahman 5, John V Reynolds 4, George B Hanna 2, Sheraz R Markar 2,6,7,; ENSURE Group Study8
PMCID: PMC7614526  EMSID: EMS145872  PMID: 37193219

Introduction

Esophageal cancer remains a significant cause of disease burden and cancer mortality worldwide.1 Despite improvements in standard treatment modalities, esophagectomy continues to be associated with high postoperative morbidity and long-term deterioration in health-related quality of life.2,3 Furthermore, recurrence remains common and substantially influences long-term prognosis with patients experiencing an overall 5-year survival of 20-50% in most settings.3,4

In the postoperative setting, early identification of individuals at risk of an unfavorable prognosis following treatment is key to improving long-term survival, with an expanding range of therapeutic alternatives after primary treatment failure including immunotherapy, radiofrequency ablation and salvage locoregional surgery.5,6 Identification of a population at high-risk of recurrence can provide useful information to both patients and clinicians and facilitate optimized therapeutic decisions based on an individualized profile of prognostic factors.7,8 Treatment de-escalation can be more confidently recommended in low-risk individuals and accurate predictions can risk-stratify participants more appropriately for therapeutic intervention trials exploring adjuvant therapy, leading to improved trial recruitment and future improvements in the existing knowledge base. 9,10

Nonetheless, prediction of patients with poor prognosis remains challenging. A limited number of predictive tools exist currently, with The American Joint Committee on Cancer (AJCC) Tumor, Node, Metastases (TNM) staging criteria being widely utilized to stratify patients and their resulting survival based on anatomical stage.11 Classifications are however based mainly on historical data (1980s to 2000s), which fails to reflect recent advancements in treatment and staging is limited to postoperative information on disease status in operated patients.12 Heterogeneity in survival predictions has also been observed within patients who are similarly staged and despite the recent addition of prognostication by post-neoadjuvant staging, staging groups continue to remain coarse and lack much-needed granularity.13,14 Additionally, clinically important characteristics, such as age and lymphovascular invasion are omitted, leading to inaccurate predictions when applied at an individual patient level, limiting the overall clinical utility. As a result, in the recent 8th edition of the TNM staging manual, the AJCC acknowledged the importance of pathological and patient-specific data alongside cancer staging to improve risk prediction in cancer.13

Nonetheless, a paucity of evidence exists in the literature concerning prognostication for esophageal cancer.15 Existing models have failed to gain widespread clinical use or advocation as a ‘gold standard’, with the lack of actionable guidance limiting overall utility.1620 Advances in machine learning techniques, with random survival forest (RSF) models in particular, have demonstrated promising results in multiple settings and have been used for esophageal cancer in developing the 7th and 8th editions of the TNM staging manuals,11,13,16 but performance in a clinical environment when compared to traditional statistical methods remains as yet unresolved. Accordingly, this study aims to develop and validate multivariable prediction models that can provide clinicians with accurate long-term survival and time-to-recurrence for patients with esophageal cancer following surgical resection.

Methods

Study design

This study is based on data previously collected during the European iNvestigation of SUrveillance After Resection for Esophageal Cancer (ENSURE) study, described in detail by Elliot et al.21 To summarize briefly, a multicenter observational cohort study was conducted in high-volume centers across Europe to determine the impact of intensive surveillance on oncologic outcomes. All patients aged 18 years and above undergoing curative surgery for esophageal or junctional cancer from June 2009 to June 2015 were considered for inclusion. Patients with salvage surgery after failure of primary endoscopic or oncologic treatment were also included, with those undergoing definitive oncological or endoscopic therapy as sole therapy for esophageal cancer being excluded. All patients underwent a standardized follow-up protocol and data were anonymised and collected from prospectively maintained databases, including routinely collected patient-, tumor- and treatment factors.21

Ethical approval

This study is exempt from UK National Research Ethics Committee approval as it involved a secondary analysis of anonymised data that has been previously collected. The primary study was registered on ClinicalTrials.gov prior to inclusion of the first participant (NCT03461341), and was registered with the Research and Innovation Hub, St. James’s Hospital, Dublin, Ireland (approval number #4982) and approved by the St. James’s Hospital and Tallaght University Hospital Joint Research Ethics Committee (approval number #2018- 08-CA).

Variable definition

Only patients with complete survival data were included in this study (n=4719) (Study flow diagram in Supplementary Figure 1). Overall survival (OS) was defined as the duration of time from diagnosis until patient death. Patients were censored if they were lost to follow-up, withdrew from the study, or did not die by the end of the study period. Disease-free survival (DFS) was defined as the duration of time from surgery until patient death or recurrence of esophageal cancer. Patients were censored if they were lost to follow-up, withdrew from the study, or did not have an event by the end of the study period.

Outcomes

Predictive models were developed using OS and DFS at 5-years to predict long-term survival and time-to-recurrence in patients after esophagectomy for cancer. Model performance was then evaluated for the following outcomes: (1) 1-year OS; (2) 3-year OS; (3) 5-year OS; (4) 1-year DFS; (5) 3-year DFS; and (6) 5-year DFS. The primary outcome was OS and DFS at 5-years, with the other periods being assessed as secondary outcomes. Subgroup analysis was conducted for patient age, pathological TNM staging, histology subtype, differentiation grade and patient groups stratified by response to neoadjuvant therapy to evaluate model use in these distinct populations.

Variable selection

Predictor variables were selected based on literature review and clinical importance.7,8,17 A full list of variables considered for inclusion can be found in Supplementary Table 1. Of the included variables, only six variables had more than 5% missing data for all patients: perineural invasion (27.5%), lymphatic invasion (26.8%), differentiation grade (18.8%), venous invasion (18.5%), clinical T stage (8.2%) and clinical N stage (8.0%) (Supplementary Table 2).

Model development

Random survival forest (RSF) and Cox proportional hazards (CPH) methods were used in this study. RSF is a tree-based algorithm that constructs many individual decision trees. As a decision tree branches out, the node is split (the point where branching occurs in the tree) by using the variable that maximizes survival difference between daughter nodes until each tree is fully extended.22 Predictions for an individual are then derived as the average prediction generated from all trees in the forest.22 Model development was conducted using an iterative machine-learning approach that has been previously described elsewhere.16 To describe briefly:

Step 1 - Imputation

The mechanism of missingness was determined to be Missing At Random following data visualization as previously described (Supplementary Figure 2).23 Missing data were then imputed using multiple imputation by chained equations (MICE).23 All variables involved in primary analysis, including the outcome variable, were incorporated into the predictive matrix in the imputation model.24 Inclusion of an event indicator and the Nelson–Aalen estimator (an estimator of cumulative hazard) in the predictive matrix instead of the outcome variable has been recommended previously to limit bias when utilizing MICE for survival data.25 As such, data were adapted accordingly, and imputed with the updated predictive matrix set to 10 imputations with 10 iterations for computational reasons as recommended.26

Step 2 – Predictor selection

The final variables used in model development were determined from a subset of pre-selected variables chosen based on a literature review and clinical importance.7,8,17 Permutation based random survival forest variable importance (VIMP) with bootstrapped confidence intervals was used for OS and DFS with any predictors demonstrating a VIMP>0 being included in the final model (Supplementary table 3 and 4). The models were trained using 16 variables: patient age, sex, American Society of Anesthesiology (ASA) grade, tumor location, resection margin status, clinical N stage, pathological TNM stage, differentiation grade, histology subtype, lymphatic invasion, venous invasion, tumor regression grade, treatment protocol, Clavien-Dindo grade.

Step 3 – Model training

The RSF models were derived from final variables for both overall survival and disease-free survival. A grid search, which represents the possible combinations of tunable hyperparameters within a specified range, was used to optimize the number of trees, number of variables in each decision tree and minimum node size.27 Analyses were conducted on each imputed dataset, with final predictions then combined using Rubin’s rules.28 CPH models were also produced for overall survival and disease-free survival using the same set of variables.

Step 4 - Model performance

Model performance for CPH and RSF was assessed by discrimination and calibration.29 Discrimination refers to a model’s ability to order patients with different outcomes and was determined using the time-dependent area under the receiver operator curve (tAUC).30 This is considered equivalent to the standard AUC and is better suited for time-dependent analyses due to its ability to appreciate the change of an individual’s event status and marker value over time. A score of 0 denotes a poor separation, with a score of 1 being excellent. Furthermore, it is suggested to be more appropriate for time-based predicted risk than the commonly used concordance index, which was also provided for comparative purposes.31 Calibration refers to the agreement between predicted and observed risks and was assessed visually by plotting the predicted and observed survival probabilities for patient groups stratified by quintiles of outcome risk up to 5 years.32 We also calculated the integrated Brier score, with a score closer to 0 indicating better accuracy in predictions.29 Additionally, baseline characteristics were compared between patient groups when stratified by quintiles of predicted OS and DFS probabilities at 5 years from the CPH.

Step 5 - Internal validation

Overall CSF and RSF model performance was reported following internal validation using the 0.632 estimator for 1000 bootstrap resampling.33 Here, in each bootstrap, 63.2% of cases were randomly selected, with some of these being resampled to achieve the original sample size of 4719. These cases represent the training set in each bootstrap and any cases not selected comprise the testing set. The overall performance is then calculated as a weighted combination of apparent (performance in training set) and test (performance in test set): estimated performance = 0.368 x apparent performance + 0.632 x test performance.33 (Supplementary Figure 3)

Data were analyzed and presented using R4.1.1 (R Foundation for Statistical Computing, Vienna, Austria) with packages including mice, rms, ranger and amelia. Categorical variables were compared using the χ2 test. Non-normally distributed data were analyzed using the Mann-Whitney U test and Kruskal-Wallis test where appropriate.

Results

Study demographics

This study included 4719 patients who underwent an esophagectomy for cancer between June 2009 and June 2015. Characteristics of the included patient population are presented in Supplementary Table 5. The median age was 65 years old [interquartile range (58-71 years)], and 22.5% (n=1063) of patients were female. The predominant tumor histology was adenocarcinoma (73.1%), and most patients had a negative resection margin (86.4%). Overall, 2299 deaths were recorded (48.7%), and the median OS time was 54 months. Recurrence was found in 2653 patients (56.2%), with a median DFS time of 35 months. OS at 1, 3 and 5 years was 89.3%, 59.2% and 47.7% respectively (Kaplan Meier curve in Supplementary Figure 4). DFS at 1, 3 and 5 years was 72.1%, 49.2% and 40.9% respectively (Kaplan Meier curve in Supplementary Figure 5).

Model performance for Overall Survival

Model discrimination

The RSF VIMP identified advanced pathological tumor staging alongside lymphovascular invasion to be most important factors in predicting poor overall survival. A complete list of predictive variables with their respective importance, can be found in Supplementary Table 3. The RSF model demonstrated good discrimination with a bootstrapped tAUC of 77.1% (95% CI 76.1%-78.1%) at 5 years in internal validation. This was equivalent to the CPH model (Supplementary Table 6) which produced a 5-year a bootstrapped tAUC of 78.2% (95% CI 77.4%-79.1%). (Table 1).

Table 1. Time-dependent area under the receiver operator curve (tAUC) of prediction models for Overall- and Disease-free survival.
Outcomes tAUC (95% CI)
Overall Survival CPH RSF
1 year 74.2% (72.9%-75.6%) 73.6%
75.1%)
(71.8%-
3 years 77.9% (77.1%-78.6%) 77.2%
%78.0%)
(76.3-
5 years 78.2% (77.4%-79.1%) 77.1%
78.1%)
(76.1%-
Disease-free Survival
1 year 77.1% (76.2-78.0%) 76.2%
77.1%)
(75.3%-
3 years 79.9% (79.1-80.5%) 79.4%
80.3%)
(78.9%-
5 years 79.4% (78.5-80.2%) 78.6%
79.5%)
(77.5%-

Abbreviations: CI - Confidence intervals; CPH – Cox proportional regression hazards; RSF – Random survival forest; tAUC – Time-dependent area under the receiver operator curve

Assessment of the concordance index at 5 years found similar results with the RSF model performing equally compared to the CPH model (0.710 vs 0.720) (Table 2). Furthermore, performance was equivalent in the RSF model at 1 and 3 years [bootstrapped tAUC of 73.6.2% (95% CI 71.8%-75.1%) and 77.2% (76.3%-78.0%)] compared to the CPH [bootstrapped tAUC of 74.2% (95% CI 72.9%-75.6%) and 77.9% (77.1%-78.6%)] (Table 1). Subgroup analysis found good performance for both models when stratified by age, pathological tumor staging, differentiation grade and histology subtype (Supplementary Tables 7-10).

Table 2. Concordance-index and Integrated Brier score of prediction models for Overall- and Disease-free survival at 5 years.
Outcomes CiD (95% CI) iBrier (95% CI)
Overall Survival
RSF 0.710 (0.704 – 0.717) 0.186 (0.185 – 0.189)
CPH 0.720 (0.715 – 0.725) 0.181 (0.180 – 0.183)
Disease-free Survival
RSF 0.722 (0.719 – 0.725) 0.214 (0.212 – 0.217)
CPH 0.729 (0.725 – 0.732) 0.209 (0.208 – 0.210)

Abbreviations: iBrier – Integrated Brier score; CiD – Concordance-index; CI - Confidence intervals; CPH – Cox proportional regression hazards; RSF – Random survival forest

Model calibration

The CPH model showed good agreement between the observed and predicted survival times for all patients when grouped in quintiles according to their survival predictions (Figure 1). However, the RSF model only showed good agreement for patients who had a predicted survival between 20% and 80% (Figure 2). Patients with an observed survival of <20% were more optimistically predicted (predicted>observed survival). Patients with a >80% observed survival were predicted more pessimistically (predicted<observed survival). The integrated Brier scores were 0.186 and 0.181 respectively in the RSF and CPH models at 5 years, demonstrating good accuracy in predictions (Table 2).

Figure 1. Calibration curves showing observed and predicted overall survival probabilities derived from Cox proportional hazards stratified by quintiles of mortality risk at 5 years.

Figure 1

Figure 2. Calibration curves showing observed and predicted overall survival probabilities derived from random survival forest stratified by quintiles of mortality risk at 5 years.

Figure 2

Model performance for Disease-free Survival

Model discrimination

The RSF VIMP identified advanced pathological tumor staging alongside lymphovascular invasion to be most important factors in predicting poor disease-free survival. A complete list of predictive variables with their respective importance, can be found in Supplementary Table 4. The RSF model demonstrated good discrimination with a bootstrapped tAUC of 78.6% (95% CI 77.5%-79.5%) at 5 years in internal validation. This was comparable to the CPH Model (Supplementary Table 11), which produced a 5-year bootstrapped tAUC of 79.4% (95% CI 78.5%-80.2%) (Table 1). Assessment of the concordance index at 5 years also illustrated equivalent discrimination with the CPH model (0.729 vs 0.722) (Table 2).

This was repeated at 1 and 3 years with a bootstrapped tAUC of 79.4% (78.9%-80.3%) and 76.2% (75.3%-77.1%) respectively in the RSF model performing similarly to the CPH model [bootstrapped tAUC of 79.9% (95% CI 79.1%-80.5%) and 77.1% (76.2%-78.0%)] (Table 1). Furthermore, subgroup analysis found good discrimination in both models when stratified by age, pathological tumor staging, differentiation grade and histology subtype (Supplementary Table 12-15).

Model calibration

Good agreement was found between the observed and predicted time-to-recurrence for all patients and their quintile groups in the CPH model (Figure 3). RSF demonstrated good agreement for patients with a predicted DFS probability between 20% and 60% (Figure 4). However, patients with a <20% DFS were given a more optimistic prediction (predicted>observed survival) while patients with a >60% DFS were predicted more pessimistically (predicted<observed survival). Integrated Brier scores of 0.214 and 0.209 were found respectively in the RSF and CPH models at 5 years, demonstrating good accuracy in predictions (Table 2).

Figure 3. Calibration curves showing observed and predicted disease-free survival probabilities derived from Cox proportional hazards stratified by quintiles of recurrence risk at 5 years.

Figure 3

Figure 4. Calibration curves showing observed and predicted disease-free survival probabilities derived from random survival forest stratified by quintiles of recurrence risk at 5 years.

Figure 4

Subgroup characteristics stratified by quintiles of survival probabilities

In subgroup analysis, when stratified by quintiles of OS survival probabilities, clinical N staging, pathologic N staging, histology type, tumor site and treatment protocol were found to be different between the groups (p<0.05) (Supplementary Table 16). According to the CPH methodologies, patients with a predicted survival of 0-20% had worse pathological staging and increased rates of adjuvant therapy compared to patients with a predicted survival of >20%. Patients with a predicted survival of 80-100% according to the CPH methodologies had higher rates of neoadjuvant chemotherapy before surgery compared to their counterparts.

When stratified by quintiles of DFS survival probabilities in the CPH model, clinical N staging, pathologic N staging, differentiation, ASA grade, histology type, tumor site, margin status, treatment protocol and Clavien-Dindo classifications were found to be different between the groups (p<0.05) (Supplementary Table 17). Patients with a predicted survival of 0-20% according to the CPH methodologies had worse pathological staging with increased rates of adjuvant therapy compared to their counterparts. Patients with a predicted survival of 80-100% in the CPH model had had higher rates of neoadjuvant chemotherapy before surgery compared to their counterparts.

Patient subgroups stratified by response following neoadjuvant therapy

When assessing downstaging based on nodal status, patients with cN+ disease who were identified as pN0 following neoadjuvant therapy experienced an improved 5-year survival compared to their pN+ counterparts (58.7% v 43.3%, p<0.001). Model performance in this subgroup found fair discrimination tAUC (69.9% (69.2%-70.3%)) and good calibration (Supplementary Figure 6).

When exploring the impact of neoadjuvant therapy, patients with complete response (Subgroup 1, n=566) (Supplementary Figure 7) experienced improved 5-year survival compared to their counterparts with residual but nodal negative disease (Subgroup 2, n=1159) (Supplementary Figure 8) and no residual but nodal positive disease (Subgroup 3, n=117) (Supplementary Figure 9). Patients with residual and nodal positive disease (Subgroup 4, n=1549) experienced the worst 5-year survival out of all 4 subgroups (Supplementary Figure 10). Subgroups 1, 2 and 4 demonstrated fair discrimination and good calibration in model performance. Patients with no residual but nodal positive disease found fair discrimination but poor calibration in model performance.

Discussion

Precisely predicting long-term survival and time-to-recurrence in patients following esophagectomy for cancer could significantly aid clinicians and improve personalized patient care. This study illustrated that a prediction model derived from an assortment of routinely collected data can accurately predict differing survival and recurrence times in patients. The CPH model demonstrated equivalent discrimination and superior calibration to the RSF model for overall- and disease-free survival at 5 years post-esophagectomy.

Predictor variables that have the most significant impact on survival and recurrence are consistent with clinical experience and published literature. These include advanced pathological stage, lymphovascular invasion, poor tumor differentiation, and positive margin status.7,8,17 Pathological tumor staging at the time of surgery was preferred in variable selection due to its relatively increased accuracy over clinical tumor stage (determined prior to surgery), which often over-stages patients.34 Regardless, clinical N staging was included in the model due to its influence in improving model performance. This may be as potential discrepancies between the clinical and pathological stages may not solely be due to the former’s limited accuracy but rather an indication of the impact of downstaging in response to neoadjuvant treatment.34 Furthermore, tumor regression grade has been previously suggested to influence survival and was found to be a significant prognostic factor in this study, lending further credibility to the co-inclusion of clinical and pathological tumor staging.35 Interestingly, the use of neoadjuvant chemoradiotherapy was predictive of increased risk of recurrence and decreased survival in this study. However, patients requiring preoperative oncological treatment for downstaging are likely to have more advanced tumor staging than their counterparts.36 Furthermore, tumor regression and scarring following therapy may complicate intraoperative assessment, thus increasing the risk of positive margins and subsequent local recurrence.37

The models in this study perform favorably when compared to pre-existing literature with four studies predicting OS,16,1820 two predicting DFS,19,38 and only one predicting both.19 These models were derived from various predictor variables using CPH, RSF or logistic regression methods with performance reported as concordance index/tAUCs between 0.63–0.83 for OS and 0.63-0.79 for DFS.16,1820,38 However, several studies were limited to single-centers and/or patients with esophageal adenocarcinoma only, restricting broader applicability.18,20,38 This study involved an international, multicenter cohort and included patients undergoing esophagectomy for adenocarcinoma or squamous cell carcinoma, allowing use in a more diverse, international population.

Machine learning via RSF found no predictive uplift compared to CPH in this study. While comparable discriminative performance was observed in both models, CPH demonstrated superior calibration for OS and DFS. Moreover, CPH has a fast computation time with the added benefit of producing more interpretable 'hazard ratios' rather than relying on variable importance, simplifying overall use. However, data in this study were predominantly categorical, which could have impacted the production of decision trees as RSF methods have been previously shown to display bias in favor of continuous variables and categorical variables with more levels.39. Ensemble ML techniques, such as gradient boosting, which often outperform single decision tree methods, could be utilized in future to improve overall performance further.40 Furthermore, alongside prognostication, ML has been advocated for use in a quantitative approach to medical imaging, termed ‘radiomics’. Recent developments in Deep Learning have demonstrated comparable accuracy to clinical experts in multiple settings, including detection of pneumonia from chest X-rays and skin cancer classification from dermatology imaging.41,42 Future studies combining both imaging and non-imaging data can potentially augment current diagnostic methods to complement improved patient prognostication.

Currently, information provided to patients after esophagectomy regarding long-term survival is limited and mainly centered around TNM staging. This results in predictions, which do not consider pertinent individual patient and treatment factors (i.e., 30 out of 100 patients with stage 2 esophageal cancer will survive their cancer for 5 years) being delivered to patients.43 While important, this knowledge ultimately remains coarse and vague with limited clinical utility in improving informed decision making. Using clinical prediction models, such as the one described in this study, can significantly supplement the information derived from TNM staging and allows for the delivery of a more accurate prediction of an individual’s cancer prognosis. This increased precision leads to a host of benefits for patients and clinicians. Firstly, prognostication provides patients with an improved estimation of risk, enabling them to more effectively plan their personal future and contribute to the decision-making process.44 Furthermore, it can play a vital role in determining treatment efficacy in a research setting and boost the impact of clinical trials due to improved patient inclusion.10 Finally, in the clinical setting, it can augment clinician decision-making by improving treatment selection for each patient and potentially pave the way for precision medicine in esophageal cancer, particularly with the rise of novel therapies (e.g. gene therapy and immune checkpoint inhibitors).6,45

Recurrence post-esophagectomy poses a severe risk for patients, strongly influencing long-term survival and often occurs before patient rehabilitation is complete.17 Thus, identification of a patient group that is at high-risk of recurrence is vital. The models produced in this study allows for the accurate prediction of time-to-recurrence in an individual and could play an important role in the planning of adjuvant therapy in high-risk patients.6,46 Early identification can be considered as the starting point for treatment escalation and augment efforts to prolong time-to-recurrence or, better yet, prevent recurrence in the first instance. In addition to adjuvant therapy, this model can tailor postoperative follow-up in patients with esophageal cancer. Currently, a paucity of evidence exists supporting an optimal postoperative surveillance protocol following esophagectomy.47 The ongoing ENSURE study and SARONG trial aim to elicit the impact of different postoperative surveillance strategies on oncologic outcome.21 Results from these studies could identify a population at risk of recurrence that potentially benefits from intensive surveillance; the models from this study can provide clinical utility and aid clinicians in detecting patients conforming to that particular risk group. Nonetheless, the optimal modality for performing intensive surveillance remains yet to be defined. Currently, computed tomography imaging is commonly used for postoperative surveillance but this is subject to change in future with the development and clinical integration of circulating tumor markers and tumor DNA, which have demonstrated potential utility for surveillance purposes in esophageal cancer.48

Despite the many advantages to prognostication and ubiquity of prediction tools, mainstream clinical use remains low.1620 This is likely a direct consequence of the lack of actionable information provided by these models. Predicted outcome probabilities are often produced without any clear instructions on what clinicians should do with that information.49 Furthermore, it is unclear if prognostic tools in treatment decisions truly benefit patient outcomes compared to current clinical standards.50 The gold standard for determining clinical utility remains a randomized clinical trial where prediction-based decisions are evaluated against standard decision making. However, implementing this would be ethically and practically challenging, with further evidence needed to drive such a trial. Sachs et al. proposed a framework to evaluate the clinical utility derived from prognostic models using observational data.50 Future studies based on this could increase current understanding and ultimately motivate a randomized clinical trial to truly evaluate the role of prediction-based decisions in improving patient outcomes.

The main strengths of this study are due to the large sample size generated from an international multicenter cohort. Nearly all patients had a complete follow-up, with robust survival and recurrence data collected due to comparable surveillance protocols in included centers. Rates of missingness were low across the entire dataset, and MICE was used for imputation, minimizing bias in model development. Furthermore, models in this study were derived from an extensive group of routine data, improving applicability and model performance was assessed both in terms of discrimination and calibration.

Nevertheless, there are limitations to note in our work. Cautious interpretation is required when determining model use as only internal validation was performed. Use of an independent population that externally validates model performance is required in future before utilization in a clinical setting.16 Furthermore, data were collected at high-volume centers across Europe and would likely not represent esophageal cancer care across low- and middle-income countries. While the models in this study produced accurate predictions for patients in the postoperative environment, no attempt was made to generate a pre-treatment model. As such, clinical staging remains the optimal method, despite its many limitations.13,14 Additionally, patients undergoing endoscopic therapy or definitive oncological treatment as sole therapy were excluded and thus, model use would not be suitable in this subgroup.

Conclusion

In conclusion, this study has demonstrated the ability to accurately predict long-term survival and time-to-recurrence after surgery for esophageal cancer using routinely collected data. Identification of patient groups at risk of recurrence and poor long-term can improve patient outcomes by enhancing selection of treatment methods and surveillance strategies. The CPH models developed in this study illustrated very good discrimination and performance, with ML techniques from the RSF demonstrating similar performance. Future work evaluating prediction-based decisions against standard decision-making is required to improve understanding of the clinical utility derived from prognostic model use.

Supplementary Material

Supplemental Data File

Mini Abstract.

This study aimed to develop prediction models for long-term survival and time-to-recurrence following surgery for esophageal cancer using Cox proportional hazards (CPH) and random survival forest (RSF) and demonstrated good discrimination and calibration for both. Identification of patient groups at risk of recurrence and poor long-term survival can improve patient outcomes by optimizing treatment methods and surveillance strategies.

Structured Abstract.

Objective

To develop prediction models to predict long-term survival and time-to-recurrence following surgery for esophageal cancer.

Summary Background data

Long-term survival after esophagectomy remains poor, with recurrence common. Prediction tools can identify high-risk patients and optimize treatment decisions based on their prognostic factors.

Methods

Patients undergoing curative surgery from the European iNvestigation of SUrveillance After Resection for Esophageal Cancer study were included. Prediction models were developed for overall survival (OS) and disease-free survival (DFS) using Cox proportional hazards (CPH) and random survival forest (RSF). Model performance was evaluated using discrimination (time-dependent area under the curve (tAUC)) and calibration (visual comparison of predicted and observed survival probabilities).

Results

This study included 4719 patients with an OS of 47.7% and DFS of 40.9% at 5 years. Sixteen variables were included. CPH and RSF demonstrated good discrimination with a tAUC of 78.2% (95% CI 77.4%-79.1%) and 77.1% (95% CI 76.1%-78.1%) for OS and a tAUC of 79.4% (95% CI 78.5%-80.2%) and 78.6% (95% CI 77.5%-79.5%) respectively for DFS at 5 years. CPH showed good agreement between predicted and observed probabilities in all quintiles. RSF showed good agreement for patients with survival probabilities between 20-80%.

Conclusions

This study demonstrated that a statistical model can accurately predict long-term survival and time-to-recurrence after esophagectomy. Identification of patient groups at risk of recurrence and poor long-term survival can improve patient outcomes by optimizing treatment methods and surveillance strategies. Future work evaluating prediction-based decisions against standard decision-making is required to understand the clinical utility derived from prognostic model use.

Acknowledgments

None

Funding source

RRG was supported by a Royal College of Surgeons Intercalated Bachelor of Science Degree in Surgery award. Research was supported by the Data Science in Cancer Research award, CRUK Imperial Centre.

Footnotes

CRediT author statement

Rohan R Gujjuri – Methodology, Software, Formal Analysis, Investigation, Data Curation, Writing – Original draft, Writing – Review & Editing, Visualization.

Jonathan M Clarke – Conceptualization, Methodology, Investigation, Writing – Review & Editing, Visualization, Supervision

Jessie A Elliott – Investigation, Data Curation, Writing – Review & Editing

Saqib A Rahman – Software, Formal Analysis, Writing – Review & Editing

John V Reynolds – Investigation, Supervision, Writing – Review & Editing

George B Hanna – Supervision, Writing – Review & Editing

Sheraz R Markar– Conceptualization, Methodology, Resources, Writing – Review & Editing, Visualization, Supervision, Project Administration.

Conflict of interest: None declared

References

  • 1.Kamangar F, Nasrollahzadeh D, Safiri S, Sepanlou SG, Fitzmaurice C, Ikuta KS, et al. The global, regional, and national burden of oesophageal cancer and its attributable risk factors in 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Lancet Gastroenterol Hepatol. 2020 Jun;5(6):582–97. doi: 10.1016/S2468-1253(20)30007-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Schandl A, Lagergren J, Johar A, Lagergren P. Health-related quality of life 10 years after oesophageal cancer surgery. Eur J Cancer. 2016;69:43–50. doi: 10.1016/j.ejca.2016.09.032. [DOI] [PubMed] [Google Scholar]
  • 3.Shapiro J, van Lanschot JJB, Hulshof MCCM, van Hagen P, van Berge Henegouwen MI, Wijnhoven BPL, et al. Neoadjuvant chemoradiotherapy plus surgery versus surgery alone for oesophageal or junctional cancer (CROSS): long-term results of a randomised controlled trial. Lancet Oncol. 2015 Sep;16(9):1090–8. doi: 10.1016/S1470-2045(15)00040-6. [DOI] [PubMed] [Google Scholar]
  • 4.Al-Batran S-E, Homann N, Pauligk C, Goetze TO, Meiler J, Kasper S, et al. Perioperative chemotherapy with fluorouracil plus leucovorin, oxaliplatin, and docetaxel versus fluorouracil or capecitabine plus cisplatin and epirubicin for locally advanced, resectable gastric or gastro-oesophageal junction adenocarcinoma (FLOT4): a ra. Lancet. 2019 May;393(10184):1948–57. doi: 10.1016/S0140-6736(18)32557-1. [DOI] [PubMed] [Google Scholar]
  • 5.Whitaker K. Earlier diagnosis: the importance of cancer symptoms. Lancet Oncol. 2020 Jan;21(1):6–8. doi: 10.1016/S1470-2045(19)30658-8. [DOI] [PubMed] [Google Scholar]
  • 6.Kelly RJ, Ajani JA, Kuzdzal J, Zander T, Van Cutsem E, Piessen G, et al. Adjuvant Nivolumab in Resected Esophageal or Gastroesophageal Junction Cancer. N Engl J Med. 2021 Apr 1;384(13):1191–203. doi: 10.1056/NEJMoa2032125. [DOI] [PubMed] [Google Scholar]
  • 7.Stiles BM, Salzler GG, Nasar A, Paul S, Lee PC, Port JL, et al. Clinical predictors of early cancer-related mortality following neoadjuvant therapy and oesophagectomy. Eur J Cardio-thoracic Surg. 2015;48(3):455–60. doi: 10.1093/ejcts/ezu479. [DOI] [PubMed] [Google Scholar]
  • 8.Davies AR, Pillai A, Sinha P, Sandhu H, Adeniran A, Mattsson F, et al. Factors associated with early recurrence and death after esophagectomy for cancer. J Surg Oncol. 2014 Apr;109(5):459–64. doi: 10.1002/jso.23511. [DOI] [PubMed] [Google Scholar]
  • 9.Toh Y, Kitagawa Y, Kuwano H, Kusano M, Oyama T, Muto M, et al. A nation-wide survey of follow-up strategies for esophageal cancer patients after a curative esophagectomy or a complete response by definitive chemoradiotherapy in Japan. Esophagus. 2016 Apr 13;13(2):173–81. [Google Scholar]
  • 10.Ezzati A, Lipton RB. Machine Learning Predictive Models Can Improve Efficacy of Clinical Trials for Alzheimer’s Disease. J Alzheimer’s Dis. 2020 Mar 10;74(1):55–63. doi: 10.3233/JAD-190822. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Rice TW, Rusch VW, Ishwaran H, Blackstone EH. Cancer of the esophagus and esophagogastric junction. Cancer. 2010 May 24;116(16):3763–73. doi: 10.1002/cncr.25146. [DOI] [PubMed] [Google Scholar]
  • 12.Jang R, Darling G, Wong RKS. Multimodality Approaches for the Curative Treatment of Esophageal Cancer. J Natl Compr Cancer Netw. 2015 Feb;13(2):229–38. doi: 10.6004/jnccn.2015.0029. [DOI] [PubMed] [Google Scholar]
  • 13.Rice TW, Patil DT, Blackstone EH. 8th edition AJCC/UICC staging of cancers of the esophagus and esophagogastric junction: application to clinical practice. Ann Cardiothorac Surg. 2017 Mar;6(2):119–30. doi: 10.21037/acs.2017.03.14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rice TW, Apperson-Hansen C, DiPaola LM, Semple ME, Lerut TEMR, Orringer MB, et al. Worldwide Esophageal Cancer Collaboration: clinical staging data. Dis Esophagus. 2016 Oct;29(7):707–14. doi: 10.1111/dote.12493. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.van den Boorn HG, Engelhardt EG, van Kleef J, Sprangers MAG, van Oijen MGH, Abu-Hanna A, et al. Prediction models for patients with esophageal or gastric cancer: A systematic review and meta-analysis. Katoh M, editor. PLoS One. 2018 Feb 8;13(2):e0192310. doi: 10.1371/journal.pone.0192310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rahman SA, Walker RC, Maynard N, Trudgill N, Crosby T, Cromwell DA, et al. The AUGIS Survival Predictor. Ann Surg. 2021 Feb 17; doi: 10.1097/SLA.0000000000004794. Publish Ah(0) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Rahman SA, Walker RC, Lloyd MA, Grace BL, van Boxel GI, Kingma BF, et al. Machine learning to predict early recurrence after oesophageal cancer surgery. Br J Surg. 2020 Jun 15;107(8):1042–52. doi: 10.1002/bjs.11461. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gabriel E, Attwood K, Shah R, Hochwald S, Kukar M, Nurkin S. A Novel Calculator for Esophageal Adenocarcinoma Accurately Predicts Overall Survival Benefit from Neoadjuvant Chemoradiation. J Am Coll Surg. 2017;224(5):884–94. doi: 10.1016/j.jamcollsurg.2017.01.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Xie S-H, Santoni G, Mälberg K, Lagergren P, Lagergren J. Prediction Model of Long-term Survival After Esophageal Cancer Surgery. Ann Surg. 2021 May;273(5):933–9. doi: 10.1097/SLA.0000000000003431. [DOI] [PubMed] [Google Scholar]
  • 20.Shapiro J, van Klaveren D, Lagarde SM, Toxopeus ELA, van der Gaast A, Hulshof MCCM, et al. Prediction of survival in patients with oesophageal or junctional cancer receiving neoadjuvant chemoradiotherapy and surgery. Br J Surg. 2016 Jun 15;103(8):1039–47. doi: 10.1002/bjs.10142. [DOI] [PubMed] [Google Scholar]
  • 21.Elliott JA, Markar SR, Klevebro F, Johar A, Goense L, Lagergren P, et al. An International Multicenter Study Exploring Whether Surveillance After Esophageal Cancer Surgery Impacts Oncological and Quality of Life Outcomes (ENSURE) Ann Surg. 2022 Jan 27; doi: 10.1097/SLA.0000000000005378. Publish Ah. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008 Sep 1;2(3):841–60. [Google Scholar]
  • 23.van Buuren S. Flexible Imputation of Missing Data, Second Edition. Second edition. Boca Raton, Florida: CRC Press, Chapman and Hall/CRC; 2018. [2019] [Google Scholar]
  • 24.Enders CK. Applied missing data analysis. 2010;377 [Google Scholar]
  • 25.White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med. 2009 Jul 10;28(15):1982–98. doi: 10.1002/sim.3618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.van Buuren S, Groothuis-Oudshoorn K. mice : Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011;45(3) [Google Scholar]
  • 27.Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305. [Google Scholar]
  • 28.Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009 Dec 28;9(1):57. doi: 10.1186/1471-2288-9-57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, et al. Assessing the Performance of Prediction Models. Epidemiology. 2010 Jan;21(1):128–38. doi: 10.1097/EDE.0b013e3181c30fb2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kamarudin AN, Cox T, Kolamunnage-Dona R. Time-dependent ROC curve analysis in medical research: current methods and applications. BMC Med Res Methodol. 2017 Dec 7;17(1):53. doi: 10.1186/s12874-017-0332-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Blanche P, Kattan MW, Gerds TA. The c-index is not proper for the evaluation of ’t’-year predicted risks. Biostatistics. 2019 Apr 1;20(2):347–57. doi: 10.1093/biostatistics/kxy006. [DOI] [PubMed] [Google Scholar]
  • 32.Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol. 2014 Dec 19;14(1):40. doi: 10.1186/1471-2288-14-40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Steyerberg EW, Harrell FE, Borsboom GJJ, Eijkemans MJ, Vergouwe Y, Habbema JDF. Internal validation of predictive models. J Clin Epidemiol. 2001 Aug;54(8):774–81. doi: 10.1016/s0895-4356(01)00341-9. [DOI] [PubMed] [Google Scholar]
  • 34.Kamarajah SK, Newton N, Navidi M, Wahed S, Immanuel A, Hayes N, et al. Long-term outcomes of clinical and pathological-staged T3 N3 esophageal cancer. Dis Esophagus. 2020 Aug 3;33(8) doi: 10.1093/dote/doz109. [DOI] [PubMed] [Google Scholar]
  • 35.Noble F, Lloyd MA, Turkington R, Griffiths E, O’Donovan M, O’Neill JR, et al. Multicentre cohort study to define and validate pathological assessment of response to neoadjuvant therapy in oesophagogastric adenocarcinoma. Br J Surg. 2017 Nov 16;104(13):1816–28. doi: 10.1002/bjs.10627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lagergren J, Smyth E, Cunningham D, Lagergren P. Oesophageal cancer. Lancet. 2017 Nov;390(10110):2383–96. doi: 10.1016/S0140-6736(17)31462-9. [DOI] [PubMed] [Google Scholar]
  • 37.Schlick CJR, Khorfan R, Odell DD, Merkow RP, Bentrem DJ. Margin Positivity in Resectable Esophageal Cancer: Are there Modifiable Risk Factors? Ann Surg Oncol. 2020 May 13;27(5):1496–507. doi: 10.1245/s10434-019-08176-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Lagarde SM, Reitsma JB, de Castro SMM, ten Kate FJW, Busch ORC, van Lanschot JJB. Prognostic nomogram for patients undergoing oesophagectomy for adenocarcinoma of the oesophagus or gastro-oesophageal junction. Br J Surg. 2007 Oct 15;94(11):1361–8. doi: 10.1002/bjs.5832. [DOI] [PubMed] [Google Scholar]
  • 39.Strobl C, Boulesteix A-L, Zeileis A, Hothorn T. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics. 2007 Dec 25;8(1):25. doi: 10.1186/1471-2105-8-25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Jukic S, Saracevic M, Subasi A, Kevric J. Comparison of Ensemble Machine Learning Methods for Automated Classification of Focal and Non-Focal Epileptic EEG Signals. Mathematics. 2020 Sep 2;8(9):1481 [Google Scholar]
  • 41.Rajpurkar P, Irvin J, Zhu K, Yang B, Mehta H, Duan T, et al. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. 2017;3-9 [Google Scholar]
  • 42.Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017 Feb 2;542(7639):115–8. doi: 10.1038/nature21056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Survival. Oesophageal cancer. Cancer Research UK; [cited 2021 May 17]. [Internet]. Available from: https://www.cancerresearchuk.org/about-cancer/oesophageal-cancer/survival. [Google Scholar]
  • 44.Stacey D, Samant R, Bennett C. Decision Making in Oncology: A Review of Patient Decision Aids to Support Patient Participation. CA Cancer J Clin. 2008 Aug 28;58(5):293–304. doi: 10.3322/CA.2008.0006. [DOI] [PubMed] [Google Scholar]
  • 45.Baxter MA, Spender LC, Petty RD. Combining precision medicine and prophylaxis in oesophageal squamous cell carcinoma. Br J Cancer. 2020 Nov 24;123(11):1585–7. doi: 10.1038/s41416-020-01057-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Rahman S, Thomas B, Maynard N, Park MH, Wahedally M, Trudgill N, et al. Impact of postoperative chemotherapy on survival for oesophagogastric adenocarcinoma after preoperative chemotherapy and surgery. Br J Surg. 2022 Feb 1;109(2):227–36. doi: 10.1093/bjs/znab427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lordick F, Mariette C, Haustermans K, Obermannová R, Arnold D. Oesophageal cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2016 Sep;27:v50–7. doi: 10.1093/annonc/mdw329. [DOI] [PubMed] [Google Scholar]
  • 48.Chidambaram S, Markar SR. Clinical utility and applicability of circulating tumor DNA testing in esophageal cancer: a systematic review and meta-analysis. Dis Esophagus. 2022 Feb 11;35(2) doi: 10.1093/dote/doab046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Reilly BM, Evans AT. Translating Clinical Research into Clinical Practice: Impact of Using Prediction Rules To Make Decisions. Ann Intern Med. 2006 Feb 7;144(3):201. doi: 10.7326/0003-4819-144-3-200602070-00009. [DOI] [PubMed] [Google Scholar]
  • 50.Sachs MC, Sjölander A, Gabriel EE. Aim for Clinical Utility, Not Just Predictive Accuracy. Epidemiology. 2020 May;31(3):359–64. doi: 10.1097/EDE.0000000000001173. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Data File

RESOURCES