Abstract
Background:
Endoscopic mucosal resection (EMR) has emerged as an esophageal-preserving treatment for T1 esophageal adenocarcinoma (EAC); however, only patients with negligible risk of lymph node metastasis (LNM) are eligible. Reliable clinical diagnostic tools for LNM are lacking – as such, several risk assessment scores have been developed. The purpose of this study was to externally validate two previously published risk scores (Lee and Weksler) for clinical prediction of LNM in T1 EAC patients.
Methods:
In adherence with the Lee and Weksler scores, esophagectomy patients with pathologic T1 EAC were identified. Sub-analysis was performed in patients with clinical T1 based on EMR. Predictive accuracy of the scores were evaluated by calculating the area under the curve (AUC) of the receiver operating characteristic curve (ROC) and calibration plots. AUCs were compared using Venkatraman’s test for paired ROCs.
Results:
Of 233 patients identified that met study criteria for external validation, 3 T1a and 32 T1b patients had LNM. ROCs demonstrated comparable high predictive and discriminatory capabilities with AUC 0.832 and 0.824 for the Lee and Weksler Scores, respectively (p=0.750). Results were more variable for the EMR cohort. Based on the risk thresholds defined by each score, the false positive rate as compared to pathologic LNM status were 73% and 56% for Lee and Weksler, with 3% false negatives in the latter. On EMR, the false positive rates were 70% and 50% for Lee and Weksler, with no false negatives.
Conclusion:
Both scoring systems demonstrated good discriminatory ability and predictive accuracy for LNM but the defined thresholds resulted in a high false positive rate. A better scoring system based on clinical characteristics is needed to better identify patients with local disease.
Keywords: esophageal cancer, staging, external validation, risk assessment
INTRODUCTION
Utilization of endoscopic mucosal resection (EMR) as an organ-preserving therapy for early stage esophageal cancer has been increasing; however, appropriate patient selection remains challenging. Presence of lymph node metastasis (LNM) is the most important determinant of long-term prognosis in patients with early esophageal adenocarcinoma (EAC). (1,2) It is known that risk of LNM is increased in T1b EAC with rates of up to 27% reported; however, even patients with T1a disease may have a 7% risk. (3,4) As such, EMR should only be offered to patients with low risk of LNM.
Currently available diagnostic tools for evaluation of LNM in early EAC are not acceptably reliable. Fluorine-18 fluorodeoxyglucose positron emission tomography (FDG-PET) imaging is limited by frequent tumor non-avidity in early stage EAC and poor sensitivity in identification of nodal metastasis. (5,6) Endoscopic ultrasound (EUS) has become the gold standard and has been demonstrated to be effective in staging of locoregional disease; however, a recent study by Luu et al demonstrated only 53% T stage concordance with surgical pathology and 14% unrecognized nodal disease in 139 patients with early disease. (7) Additionally, Barrett’s esophagus, often present in EAC, may further complicate accurate assessment of nodal status as a result of increased mucosal nodularity. (8)
Difficulty in identification of patients with LNM has resulted in development of clinical assessment tools utilizing clinicopathologic data to attempt to identify those at high risk. Early studies are limited by small size and inconsistency in patient selection criteria (9,10); as such, two recent studies proposed risk assessment scores specifically for T1 EAC. (11, 12) The tools were developed using different data, with Lee et al (2013) using a single-institutional cohort of 258 patients and Weksler et al (2017) utilizing the National Cancer Database. (11, 12) The instruments contain similar input variables (T stage, grade, lymphovascular invasion and tumor length) but differ to varying degrees in ease of point allocation and score interpretation. Neither instrument has been externally validated to date and importantly, both studies are limited by their use of pathologic data to create a score meant to be used in the clinical setting. The objective of this study was to provide external validation of two previously published scoring systems for clinical prediction of nodal status in patients with EAC. Furthermore, we provide an additional validation using clinically generated data from patients that underwent diagnostic EMR to determine the clinical utility of these scores.
METHODS
Patients with T1 adenocarcinoma on surgical pathology that underwent esophagectomy without neoadjuvant treatment between 1995 and 2017 were identified from a prospectively maintained institutional database at Memorial Sloan Kettering Cancer Center (MSKCC). Furthermore, a cohort of patients with T1 disease as diagnosed by EMR were separately identified and analyzed to determine the diagnostic performance of these scores using clinical data. Clinicopathologic characteristics were used to validate the scores as defined by Lee et al and Weksler et al. (Table 1)
Table 1.
Comparison and Definitions of the Scoring Systems
| Score A (Lee et al, 2013) (12) |
Score B (Weksler et al, 2017) (13) |
|
|---|---|---|
| Inclusion Criteria | All patients that underwent esophagectomy for EAC with T1 on final pathology | |
| Location | 5 university affiliated institutions | National Cancer Database |
| Methodology | Multivariable logistic regression to identify predictive variables for LNM | |
| Scoring System [Variable, points allocated] |
|
|
| Clinical Implications | Risk categories for LNM:
|
Proposed therapeutic algorithm:
|
Pathologic characteristics for the validation cohort were identified from final surgical pathology reports. Tumors were staged according to the American Joint Committee on Cancer (AJCC) 7th edition. T1a disease was limited to the mucosa while T1b invaded the submucosa without involvement of the muscularis propria. Tumor size was defined as the largest measure in any direction. Lymphovascular invasion (LVI) included identification of tumor cells within surrounding lymphovascular structures. Although grade and tumor length may be determined from esophagogastroduodenoscopy with biopsy (EGD), the pathology specimens from EGD do not allow for evaluation of LVI and depth of penetration (unlike EMR). As such, we used EGD and EMR to perform a clinical evaluation of the factors included in both scoring systems. T stage, LVI and grade for the EMR cohort were identified from the EMR specimen report and tumor size as described on esophagogastroduodenoscopy (EGD) report.
Characteristics were compared by LNM using Fisher’s exact or the Wilcoxon rank sum test for categorical and continuous variables, respectively. Predictive accuracy of the Lee and Weksler scores was evaluated on the external validation MSKCC cohort by calculating the AUC. The AUC can be interpreted as the probability that, given two randomly drawn patients, the patient with LNM has a higher risk score based on the scoring system. AUC ranges from 0 to 1 where 0.5 is equivalent to a flip of a coin and 1 represents all patients predicted correctly. The difference between AUCs was assessed using Venkatraman’s test for two paired curves. (13) The calibration plot is a plot of predicted risk score probability versus observed probability of LNM. An ideal scoring system will have all predicted probabilities fall on the 45 degree diagonal line. Ideally, validation would be performed on the full model coefficients. However, only Ferri et al presented the coefficients from the full multivariable model used to construct the nomogram. Therefore, in this paper we validated the integer scoring systems provided in each paper. Points were assigned as defined in both papers (Table 1). Validation was also performed on the subset with clinical T1 disease based on EMR report.
Statistical analyses were performed using R version 3.2.4 (R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
There were 233 patients identified that met study criteria; 80 (34%) T1a and 153 (66%) T1b. LNM was identified in 35 (15%). There were 3 (4%) T1a and 32 (21%) T1b patients with LNM. A comparison of the MSKCC cohort with the other two studies is presented in Table 2. Nodal positivity rate ranged from 11 to 15%.
Table 2.
Comparison of Demographic Characteristics Between Study Groups
| Memorial Sloan Kettering Cancer Center (n=233) | Lee (n=258) | Weksler (n=1283) | |
|---|---|---|---|
| Years Included | 1995–2017 | 2000–2011 | 2010–2013 |
| Age | 65 (58–71)a | 65.3 (10.1)b | 65 (59–71)a |
| White Race, n (%) | 217 (93) | NA | 1233 (97.1) |
| Male sex, n (%) | 192 (82) | 226 (88) | 1095 (85.3) |
| T1b | 153(66) | 136 (53) | 711 (55) |
| NA | 3 (1.3) | ||
| Lymphovascular Invasion, n (%) | 55 (24) | 53 (21) | 203 (15.8) |
| Tumor Size | 1.7 (0.8–2.5)a | 1.7 (1.5)b | 1.7 (1.5)b |
| T1b | 32/153 (21) | 35/136 (26) | 128/711 (18) |
| Lymph Nodes Sampled | 21 (16–28)a | 29.1 (21.5)b | 14 (9–21)a |
| Incomplete Resection (any + margins), n (%) | 7 (3) | NA | 16 (1.2) |
| Underlying Barrett’s, n (%) | 202 (87) | 204 (79) | NA |
Median (Interquartile Range)
Mean (Standard Deviation)
The Lee and Weksler scores had comparable and high discriminatory ability with good predictive accuracy. (Figures 1 and 2) The AUCs were 0.832 (95% CI 0.767–0.898) and 0.824 (0.755–0.892) for the Lee and Weksler scores, respectively. No significant difference in AUC was observed between groups (p=0.750).
Figure 1.
Score A (Lee et al, 2013) (12) A) ROC curve B) Calibration plot
Figure 2.
Score B (Weksler et al, 2017) (13) A) ROC curve B) Calibration plot
There were 58 patients identified that met study criteria for inclusion in the sub-analysis of patients that were identified as T1a on EMR (Table 3). On EMR, 30 patients were diagnosed with T1a disease (52%) and 28 (48%) with T1b disease. Final pathology demonstrated 26 T1a, 28 T1b and 4 patients with T2 disease. There were 8 patients (13.8%) in this cohort with positive nodal disease. The AUC for the Lee Score was 0.919 (0.835–1.00) as compared to the Weksler score, 0.788 (0.661–0.914). With only 8 events, there were too few bins to plot a calibration curve. Raw plots of predictive value and nodal positivity are shown in Figure 3. The plot for the Lee score (Figure 3A) demonstrates that node positive patients are aggregated toward higher predictive values (to the right). For Weksler score (Figure 3B), predicted risk was underestimated for a couple of node positive patients (see triangles on far left).
Table 3.
Patient characteristics in EMR cohort
| Characteristics | EMR (N=58) |
|---|---|
| Median Age (IQR) | 63 (56–69) |
| White race, n (%) | 55 (95) |
| Male Gender, n (%) | 47 (81) |
| 1b | 28 (48) |
| 2 | 4 (7) |
| NA | 2 (3) |
| Lymphovascular Invasion, n (%) | 21 (36) |
| Median Tumor Size, cm(IQR) | 15 (1–2) |
| Overall | 8 (14) |
| Underlying Barrett’s, n (%) | 52 (90) |
Figure 3.
Plot of predicted values and nodal positivity in EMR subset A) Score A B) Score B
The accuracy of the Lee and Weksler scores were calculated based on the high risk thresholds defined in the respective papers. As compared to pathologic staging, the false positive rate was 73% (95%CI 66, 80%) and 56% (95%CI 48, 63%) for the Lee and Weksler scores, respectively, with a 3% false negative rate in the latter. (Table 4) In the EMR cohort, the false positive rates were 70% (95% CI 51–85%) and 50% (95% CI 31–69%) for Lee and Weksler with a 0% false negative rate for both. (Table 5)
Table 4.
Measures of accuracy for Lee and Weksler scores in the pathologic cohort
| Negative Lymph Nodes (n = 172) | Positive Lymph Nodes (n = 33) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | |
|---|---|---|---|---|---|---|
| Lee | N | N | ||||
| Low | 46 | 0 | 100 | 26.7 | 20.8 | 100 |
| High | 126 | 33 | ||||
| Weksler | ||||||
| Low | 76 | 1 | 97 | 44.2 | 25 | 98.7 |
| High | 96 | 32 |
Table 5.
Measures of accuracy for Lee and Weksler scores in the EMR cohort
| Negative Lymph Nodes (n = 30) | Positive Lymph Nodes (n = 8) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | |
|---|---|---|---|---|---|---|
| Lee | ||||||
| Low | 9 | 0 | 100 | 30 | 28 | 100 |
| High | 21 | 8 | ||||
| Weksler | ||||||
| Low | 15 | 0 | 100 | 50 | 35 | 100 |
| High | 15 | 8 |
DISCUSSION
EMR has been increasingly utilized as organ-preserving treatment for patients with early stage esophageal adenocarcinoma; however, its use is dependent upon accurate determination of LNM. Determination of LNM are limited in accuracy by currently available clinical tools. This study utilized a large single institutional cohort to compare two LNM scoring systems previously developed with the aim of assessing risk of LNM for T1 EAC. Our external validation cohort was similar to those on the two studies under evaluation. Additionally, as demonstrated elsewhere, our cohort had a nodal positivity rate of 4% in T1a patients and 21% in T1b patients. The discriminatory performance of the two scores on external validation was high (AUC>0.8) with no significant difference between AUCs and excellent predictive accuracy on calibration plots. In order to minimize the rate of “missed” LNM, the thresholds defined as high risk by both scores resulted in a high rate of false positives. The two assessed studies are further limited by their use of pathologic data in creation of a scoring system meant for clinical use.
In the current era, it has become increasingly important to identify appropriate candidates for organ-preserving therapy for early stage esophageal cancer as the implications of undertreatment are grave. In a theoretically low-risk cohort of pathologic T1/T2 node-negative EAC patients treated with esophagectomy, we recently demonstrated 5-year recurrence rates of 8.2%, 11.5% and 22.2% for T1a, T1b and T2, respectively.(15)) LVI, which is closely associated with T stage, was an important predictor of recurrence and recurrence was associated with poor prognosis. These findings highlight the importance of careful patient selection for EMR given that even among the most low-risk patients, there is a significant risk of recurrence.
Unfortunately, current clinical staging is limited by the ability of available diagnostic tools to accurately identify T stage and LNM. While EUS has been demonstrated to have accuracy of up to 97% in identification of tumor depth, the role of this modality remains controversial in evaluation of early disease with variable outcomes ranging from 53–99%. (7, 14, 16) Although EUS has variable performance in assessing pathologic T stage, limited data has suggested the utility of EMR as a staging tool. Pouw et al found that in 105 patients with unremarkable EUS findings, EMR identified submucosal invasion in 17 (24%). (17) In our series, EMR demonstrated 72% (42/58) T stage concordance with surgical pathology compared to 35% (17/49) with EUS. These findings highlight the importance of EMR as a diagnostic tool for more accurate T staging. Similarly, assessment of LNM on imaging and EUS is also flawed. (6–8) Although the addition of fine needle aspiration to EUS (EUS-FNA) is associated with reasonable reliability (sensitivity 75% and specificity 95%), identification of high risk nodes is difficult.(18) As demonstrated in a series of 25 patients evaluated with EUS, the ability of echo features (size, hypoechoic, margins and shape) the accuracy of EUS to distinguish between benign and malignant nodes was 80% when all four features were present; however, only 25% of malignant nodes had all 4 features. (19) Furthermore, EUS-FNA is complex to perform because it often requires traversing the tumor with the needle and risks seeding. By using pathologic staging to identify T1 node-negative patients, the authors of the two studies fail to account for this known discordance in clinical staging. As such, we performed a sub-analysis using clinically obtained data to determine the performance of the two scores in the preoperative setting, as they would theoretically be used in routine clinical practice.
Our sub-analysis using patients with EMR suggested more variation between the scoring systems, with the Lee score appearing to have a higher discriminatory ability for risk of LNM (AUC 0.919 compared to 0.788 for the Weksler score). The significance of the observed difference between scores was likely limited by our small sample size. Furthermore, the clinical utility of these scores remains unknown based upon this analysis as neither score accounted for the known discordance between clinical and pathologic T staging in its development. It seems appropriate that a score meant for clinical use must utilize measures taken from clinical data.
While understaging and undertreatment may have devastating prognostic consequences, overstaging is also problematic in that it limits the use of local therapy in place of esophagectomy. Therefore, another important consideration in predicting LNM is the tradeoff between false positives, resulting in unnecessary esophagectomy, and false negatives, missed LNM. The cutoffs specified by both scores resulted in a high false positive rate (>50%) in both the validation and EMR cohorts. The lower false positive rate in the validation cohort for the Weksler score was offset by the false negative rate of 3%. In contrast, although the Lee score had a 73% false positive rate, there were no false negatives. Future scoring systems might include additional factors which allow for minimization of false positives without compromising the ability to capture all patients with LNM.
Our study confirmed the predictive ability of two proposed risk assessment scores for LNM in T1 EAC and evaluated their utility in the clinical setting. There are several limitations to this study. Our external validation demonstrated no significant differences between the discriminatory abilities and similar predictive accuracy between the two studies. Based on AUC, both models performed well. However, the choices of cut-off specified may favor sensitivity (100% for Lee and 97% for Weksler) over specificity (27% and 44%, respectively). Second, only patients with complete input variables were included in the study. Given the retrospective nature of this study, we included a long study period to identify a sufficient sample size. However, given that the input variables utilized for both scoring systems are obtained from pathologic specimens rather than imaging modalities, it is unlikely that developments in diagnostic technology are likely to have impacted the results of external validation portion of this study. Finally, the results of the EMR cohort must be interpreted with caution given the small sample size and highly selected patients. Patients that undergo esophagectomy after EMR include either those with positive margins after EMR or with higher risk features.
In conclusion, this study demonstrated that both scoring systems (Lee and Weksler) (12, 13) had good predictive accuracy on external validation and performed well in a sub-analysis using clinical data. Although the identified variables used in the two scoring systems are clearly of importance in determination of LNM, further study is needed to identify whether these and other clinical variables are of use in the clinical setting to better predict LNM and also allow for fewer patients to undergo unnecessary esophagectomy. Given the differences in LNM between patients with T1a and T1b disease, we do recommend that patients with early esophageal cancer undergo diagnostic EMR to better assist in clinical decision-making and risk assessment.
Footnotes
MEETING PRESENTATION: 16th World Congress of the International Society for Diseases of the Esophagus: Vienna, Austria; September 2018
REFERENCES
- 1.The Society of Thoracic Surgeons General Thoracic Surgery Database Task Force. The society of thoracic Surgeons composite score for evaluating esophagectomy for esophageal cancer. Ann Thorac Surg 2017;103:1661–1667. [DOI] [PubMed] [Google Scholar]
- 2.Barbour AP, Jones M, Brown I, Gotley DC, et al. Risk stratification for early esophageal adenocarcinoma: analysis of lymphatic spread and prognostic factors. Ann Surg Oncol 2010; 17: 2494–1502. [DOI] [PubMed] [Google Scholar]
- 3.Lorenz D, Origer J, Pauthner M, Graupe F, et al. Prognostic risk factors of early esophageal adenocarcinomas. Ann Surg 2014; 259: 469–476. [DOI] [PubMed] [Google Scholar]
- 4.Pennathur A, Farkas A, Krasinskas AM, Ferson PF, et al. Esophagectomy for T1 esophageal cancer: Outcomes in 100 patients and implications for endoscopic therapy. Ann Thorac Surg 2009; 87: 1048–1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Dubecz A, Kern M, Solymosi N, Schweigert M, Stein HJ. Predictors of lymph node metastasis in surgically resected T1 esophageal cancer. Ann Thorac Surg 2015; 99: 1879–1886. [DOI] [PubMed] [Google Scholar]
- 6.Van Westreenen HL, Westerterp M, Bossuyt PM, Pruim J, et al. Systematic review of the staging performance of 18F-fluorodeoxyglucose positron emission tomography in esophageal cancer. J Clin Onocol 2004; 22: 3805–3812. [DOI] [PubMed] [Google Scholar]
- 7.Yang GY, Wagner TD. The role of positron emission tomography in esophageal cancer. Gastrointest Cancer Res 2008; 2:3–9. [PMC free article] [PubMed] [Google Scholar]
- 8.Luu C, Amaral M, Klapman J, Harris C, et al. Endoscopic ultrasound staging for early esophageal cancer: Are we denying patients neoadjuvant chemo-radiation? World J Gastroenterol 2017; 23: 8193–8199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Rampado S, Bocus P, Battaglia G, Ruol A, et al. Endoscopic ultrasound: Accuracy in staging superficial carcinomas of the esophagus. Ann Thorac Surg 2008; 85: 251–256. [DOI] [PubMed] [Google Scholar]
- 10.Stein HJ, Feith M, Bruecher BL, et al. Early esophageal cancer: Pattern of lymphatic spread and prognostic factors for long-term survival after surgical resection. Ann Surg 2005; 242: 566–573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ancona E, Rampado S, Cassaro M, et al. Prediction of lymph node status in superficial esophageal carcinoma. Ann Surg Oncol 2008; 15: 3278–3288. [DOI] [PubMed] [Google Scholar]
- 12.Lee L, Ronellenfitsch U, Hofstetter WL, Darling G, et al. Predicting lymph node metastasis in early esophageal adenocarcinoma using a simple scoring system. J Am Coll Surg 2013; 217:191–199. [DOI] [PubMed] [Google Scholar]
- 13.Weksler B, Kennedy KF, Sullivan JL. Using the National Cancer Database to create a scoring system that identifies patients with early-stage esophageal cancer at risk for nodal metastasis. J Thorac Cardiovasc Surg 2017; 154:1787–1793. [DOI] [PubMed] [Google Scholar]
- 14.Venkatraman ES, Begg CB, A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 1996; 83: 835–848. [Google Scholar]
- 15.Kim K, Park SJ, Kim BT, et al. Evaluation of lymph node metastases in squamous cell carcinoma of the esophagus with positron emission tomography. Ann Thorac Surg. 2001;71:290–294. [DOI] [PubMed] [Google Scholar]
- 16.Nobel T, Livschitz K, Xing XX, Hsu M, Tan KS, Barbetta A, Sihag S, Jones D, Molena D. Surveillance implications of recurrence incidence and patterns in early node-negative esophageal adenocarcinoma. Ann Thorac Surg 2019; accepted, under revision. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Puli SR, Reddy JB, Bechtold ML, Antillon D, et al. Staging accuracy of esophageal cancer by endoscopic ultrasound: A meta-analysis and systematic review. World J Gastroenterol 2008; 14: 1479–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Pouw RE, Heldoorn N, Herrero A, ten Kate FJ, et al. Do we still need EUS in the workup of patients with early esophageal neoplasia? A retrospective analysis of 131 cases. Gastrointest Endosc 2011; 73: 662–668. [DOI] [PubMed] [Google Scholar]
- 19.Peng HQ, Greenwald BD, Tavora FR, et al. Evaluation of performance of EUS-FNA in preoperative lymph node staging of cancers of esophagus, lung, and pancreas. Diagnostic cytopathology. 2008. May;36(5):290–6. [DOI] [PubMed] [Google Scholar]
- 20.Bhutani MS, Hawes RH, Hoffman BJ. A comparison of the accuracy of echo features during endoscopic ultrasound (EUS) and EUS-guided fine-needle aspiration for diagnosis of malignant lymph node invasion. Gastrointest Endosc. 1997. June;45(6):474–9. [DOI] [PubMed] [Google Scholar]






