Abstract
Background Context:
We developed the New England Spinal Metastasis Score (NESMS) as a simple, informative, scoring scheme that could be applied to both operative and non-operative patients. The performance of the NESMS to other legacy scoring systems has not previously been compared using appropriately powered, prospectively collected, longitudinal data.
Purpose:
To compare the predictive capacity of the NESMS to the Tokuhashi, Tomita and Spinal Instability Neoplastic Score (SINS) in a prospective cohort, where all scores were assigned at the time of baseline enrollment.
Patient Sample:
We enrolled 202 patients with spinal metastases who met inclusion criteria between 2017–2019.
Outcome Measures:
One-year survival (primary); 3-month mortality and ambulatory function at 3- and 6-months were considered secondarily.
Methods:
All prognostic scores were assigned based on enrollment data, which was also assigned as time-zero. Patients were followed until death or survival at 365 days after enrollment. Survival was assessed using Kaplan-Meier curves and score performance was determined via logistic regression testing and observed to expected plots. The discriminative capacity (c-statistic) of the scoring measures were compared via the z-score.
Results:
When comparing the discriminative capacity of the predictive scores, the NESMS had the highest c-statistic (0.79), followed by the Tomita (0.69), the Tokuhashi (0.67) and the SINS (0.54). The discriminative capacity of the NESMS was significantly greater (p-value range: 0.02 to <0.001) than any of the other predictive tools. The NESMS was also able to inform independent ambulatory function at 3- and 6-months, a function that was only uniformly replicated by the Tokuhashi score.
Conclusions:
The results of this prospective validation study indicate that the NESMS was able to differentiate survival to a significantly higher degree than the Tokuhashi, Tomita and SINS. We believe that these findings endorse the utilization of the NESMS as a prognostic tool capable of informing care for patients with spinal metastases.
Keywords: NESMS, Tokuhashi, Tomita, SINS, spinal metastases, survival, prognostic score, decision-making
Introduction
Spinal metastases have become increasingly prevalent over the last two decades1 and are challenging to treat, given the often frail nature of patients, limited life expectancy and high morbidity, as well as the complication profile of the operative and non-operative interventions used to maintain quality of life.2–4 Patients and their clinical providers are frequently confronted with the need to estimate life-expectancy or discuss anticipated outcomes in the setting of their metastatic disease, while determining the optimal treatment course.2–6 Frequently, these conversations will be supported by estimates generated from a prognostic utility.2,4,6–10 Over the last three decades, the number of predictive scores for patients with spinal metastases have grown substantially.6,7 However, the original Tokuhashi8 and Tomita9 criteria, as well as the Spinal Instability Neoplastic Score (SINS)10,11 are among the most widely cited and used to inform clinical care and practice.
The ideal prognostic tool in this area should be easy to apply and understand, inform patient care or improve shared decision-making and be broadly applicable to individuals with spinal metastases, including reproducible results, irrespective of treatment choice.2,6,7 In several prior studies the performance of the Tokuhashi and Tomita scores has been deemed suboptimal in one, or more, of these criteria.6,7.12–14 Furthermore, although increasing in popularity, the SINS was devised as means of identifying patients with spinal instability who might require surgical stabilization and not as a prognostic tool intended to inform longevity or physical function after treatment.10 Nonetheless, some have suggested that higher levels of the SINS were correlated with increased mortality at 30-days.11 To the best of our knowledge, none of these predictive utilities have been prospectively validated in samples specifically designed to rigorously test their performance.2
Our group developed the New England Spinal Metastasis Score (NESMS) as a simple, informative, scoring scheme that could be applied to both operative and non-operative patients.15–17 The NESMS consists of base score that uses a dichotomized version of the modified Bauer scheme to characterize underlying cancer characteristics, combined with an assessment of ambulatory function (independent vs. dependent/nonambulatory) and serum albumin with a thresholds of 3.5mg/dL.2,18 The NESMS performed well in retrospective analyses conducted using operative16 and non-operative17 cohorts from other centers. More recently, the NESMS was also prospectively validated in an independent sample of patients.18 The performance of the NESMS relative to the other legacy scoring measures, such as the Tokuhashi and Tomita scores, has yet to be characterized.
In this context, we sought to compare the predictive capacity of the NESMS to the original Tokuhashi, Tomita and SINS in a prospective cohort study, where all scores were assigned at the time of baseline enrollment. In this investigation, we used survival as our primary outcome but also sought to assess the ability of these scoring utilities to prognosticate independent ambulatory function, one of the principle factors considered by clinicians and patients when developing treatment plans for spinal metastases4. As far as we are aware, this is the first investigation to prospectively compare the performance of popular prognostic scores for patients with spinal metastases, as well as the first to assess their ability to predict ambulatory function following treatment.
Methods
This investigation was conducted prospectively at three partner institutions in Boston, MA (Brigham and Women’s Hospital, Massachusetts General Hospital and Dana Farber Cancer Institute). We enrolled patients 18 and older, with confirmed spinal metastases, who presented for initial treatment at one of the participating centers. Patients were not considered eligible if they had received prior surgical or radiation treatment to the region of their metastases. We required participant enrollment and collection of baseline data within 72 hours of treatment initiation. This research was approved by our institution’s investigational review board and registered with clinicaltrials.gov (NCT03224650). All participants consented to study enrollment. The study protocol, including the ways in which eligible patients were identified, consented and followed for data collection, have previously been published in full.2
Primary Outcome and Sample Size Estimation
As previously specified2, we used 1-year mortality (the outcome on which the NESMS was developed) as our primary outcome measure. Survival at 3-months and ambulatory function at 3- and 6-months were considered secondary outcomes. Sample size was estimated for the primary outcome, using previously published data employed in the creation of the NESMS and the study was powered to detect significant differences (alpha=0.05 and 90% power) in survival at a minimum of 3-months based on the clinical relevance of this time-point.2 We also structured enrollment to ensure a comparative balance between non-operative and operative cases with a ratio of 3:2.
The date of enrollment was set as time-zero in all cases. Patients were prospectively evaluated at pre-determined study time-points to one of two end-points: death, or survival at 365 days following enrollment. In order to account for attrition due to all-cause mortality and increase the power for our secondary outcomes, we determined the need to enroll 202 patients in total (~122 non-operative and 80 operative) for this pre-planned analysis.
Data Collection
Study-specific research staff completed initial intake assessment at the time of enrollment. This included sociodemographic, clinical, laboratory and treatment-based information including whether the initial strategy was operative or non-operative. The complete scope of information obtained at baseline has been published previously.2,18 Primary tumor and the presence of visceral, other bone and brain metastases were recorded based on clinical data available at baseline enrollment. There was no study-specific standardized surveillance modality or imaging. Baseline neurologic function was graded according to the ASIA scale. Ambulatory function was determined at baseline and then at 1-month, 3-months, 6-months and 12-months following enrollment. In line with the rubric used in the NESMS, ambulatory function was characterized as independent or ambulatory with assistance/nonambulatory, irrespective of whether the limitation derived from pain or compromised neurologic function.2 Interval changes in treatment strategy (e.g. revision surgery, new surgical procedure for patient initially managed non-operatively, repeat radiation, new chemotherapy/immunotherapy etc) were updated at these time-points as well.
Survival was determined through study-specific interviews with patients and/or family members at 1-, 3-, 6- and 12-months. In the event that the patient or family member could not be contacted, electronic medical records were also used to determine survival.2 Mortality was recorded in days from enrollment. No missing survival data was present for patients enrolled in this investigation.
Assignment of Prognostic Scores
As previously described2, the NESMS, Tokuhashi, Tomita and SINS scores were assigned based on baseline data collected at enrollment by independent study staff not involved in surgical or non-operative treatment. Scores were not recalculated during the course of the investigation. We used the original Tokuhashi score, as prior work6 has found this version to outperform the revised Tokuhashi in our primary outcome of 1-year survival. All prospective grading was independently reviewed and verified by study-specific clinicians not involved in patient care prior to the conduct of any analyses, and discrepancies were resolved via consensus. Treating clinicians were blinded to all study-specific prognostic scores.2 No missing data was present for any of the prognostic scores used in this investigation.
Statistical Analysis
As previously specified2, our primary analysis compared the discriminative capacity of the NESMS to the Tokuhashi, Tomita and SINS scores using the c-statistic. Each prognostic utility was evaluated in its own statistical model, with the score as the primary predictor and adjusting for pre-specified covariates based on conceptual model.2 These included age, biologic sex, race, comorbidities (stratified using Deyo-modified Charlson19 conditions as <3 vs. ≥3 based on sample mean) and treatment strategy (operative vs. non-operative with any patient receiving surgery over the course of the study considered in the operative group2). We further considered using baseline performance status and neurologic function as additional co-variates but these were dropped when it was determined that there were statistically significant interactions between these variables and prognostic scores (performance status: p=0.02; neurologic function: p=0.02).
Normality within the distribution of prognostic scores was evaluated using mean, median, kurtosis and skewness.20 Baseline comparisons were made using the chi-square test for categorical variables and the Wilcoxon rank-sum test for non-parametric continuous variables. The performance of each scoring system with respect to overall survival was evaluated using Kaplan-Meier curves and observed to expected plots.21 Survival at 1-year following presentation was then assessed using logistic regression and final model calibration was determined using the Hosmer-Lemeshow goodness of fit test.21 Discriminative capacity (c-statistic) between prognostic scores was compared, for well-calibrated utilities only, using the z-score.21 Analyses for secondary outcomes were conducted via multivariable logistic regression. Testing for ambulatory function included mortality as a competing risk. We defined statistical significance for those variables with odds ratios (OR) and 95% confidence intervals (CI) exclusive of 1.0 and p<0.05. All statistical analyses were conducted using STATA v15.1 (STATA Corp., College Station, TX).
Results
Sample Characteristics
In the time period 2017–2019 we achieved our enrollment goal of 202 patients, with an approximate 3:2 ratio of non-operative (n=122) to operative cases (n=80). Sixty-four percent (202/314) of eligible patients consented to participate and were prospectively enrolled. The data used for this investigation was finalized on August 1, 2020. The cohort as a whole had a mean age of 61.5 (SD 12.0; median 61.5; interquartile range [IQR] 55, 69). Fifty-five percent of patients were male and 86% self-identified as White (Table 1). The three most common primary tumor types involved were lung (20%), breast (18%) and prostate (14%). The plurality of cases involved the thoracic region (38%), with 18% involving the lumbar spine and 31% demonstrating involvement of multiple regions. At presentation, 71% of the cohort was considered neurologically intact (ASIA E), while 3% were graded as ASIA A, 1% ASIA B and 4% ASIA C. Fifty-eight percent of population were independently ambulatory at the time of presentation.
Table 1.
Characteristic | Value* |
---|---|
Age (mean, SD) | 61.5 (12.0) |
Biologic Sex | - |
Male Sex | 112(55) |
Female Sex | 90 (45) |
White | 174 (86) |
Body Mass Index (mean, SD) | 27.3 (6.2) |
Number of Co-morbidities (mean, SD) | 2.4 (0.9) |
Primary Cancer | |
Breast | 35 (18) |
Lung | 39 (20) |
Prostate | 28 (14) |
Serum Albumin | - |
Albumin <3.5g/dL | 58 (29) |
Albumin ≥3.5g/dL | 144 (71) |
Ambulatory Status at Presentation | - |
Independent Ambulator | 118 (58) |
Ambulatory with Assistance/Non-ambulatory | 84 (42) |
Performance Status | - |
Poor | 22 (11) |
Moderate | 71 (35) |
Good | 109 (54) |
Neurologic Status at Presentation | - |
Neurologic Intact | 143 (71) |
Neurologic Deficits | 57 (28) |
Treatment Strategy | - |
Non-operative | 122(60) |
Operative | 80 (40) |
Bone Metastases | 109 (54) |
Visceral Metastases | 106 (52) |
Type of Lesion | - |
Blastic | 41 (21) |
Mixed (lytic/blastic) | 45 (23) |
Lytic | 114 (57) |
New England Spinal Metastases Score | - |
0 | 30 (15) |
1 | 49 (24) |
2 | 78 (39) |
3 | 45 (22) |
Bauer Score | - |
0 | 24 (12) |
1 | 53 (26) |
2 | 74 (37) |
3 | 41 (20) |
4 | 10 (5) |
Tokuhashi Score (mean/SD) | 8.4 (2.9) |
Tomita Score (mean/SD) | 5.8 (2.6) |
Spinal Instability Neoplastic (SINS) Score (mean/SD) | 10.2 (3.1) |
All values are presented as raw number and percentage (rounded to the nearest whole number) except where noted.
Among those who received surgery, the most common site of intervention (58/80; 73%) was the thoracic spine. The most common procedure was a posterior fusion (65/80; 81%), with 31% (25/80) of the surgical group also receiving a corpectomy.
We found a relatively normal distribution amongst all scoring systems at presentation, with NESMS: 0=15%, 1=24%, 2=39% and 3=22%; Tokuhashi: mean 8.4, median 9, kurtosis 2.5, skew −0.3; Tomita: mean 5.8, median 6, kurtosis 1.8, skew 0.3; and SINS: mean 10.2, median 10, kurtosis 3.2, skew −0.4. Mortality at 3-months following presentation was 23%, while one-year mortality was documented in 51%. Independent ambulatory function was maintained in 54% (85/156) of individuals still alive at 3-months and in 50% (64/128) of those who were alive at 6-months.
Prediction of Mortality at 1-Year Following Presentation
Unadjusted Kaplan-Meier evaluation indicated statistically significant different survival times by group for the NESMS (p<0.001, Figure 1), Tokuhashi (p=0.005, Figure 2a and Tomita scores (p=0.005, Figure 2b. The SINS was not found to differentiate survival based on its scoring criteria (p=0.78, Figure 2c. Observed to expected plots demonstrated the desired proportionality across all scoring systems except for the SINS (Figures 3 and 4).
Mortality at 1-year was 93% for patients with NESMS=0, 69% for NESMS=1, 45% for NESMS=2 and 13% for NESMS=3 (Table 2). The c-statistic was 0.79. In adjusted analysis that accounted for confounders, as compared to NESMS=3 as referent, all renditions of the score were significantly associated with an increased risk of mortality (NESMS=0: OR 168.27; 95% CI 27.97, 1012.28; NESMS=1: OR 22.54; 95% CI 7.01, 72.49; NESMS=2: OR 6.42; 95% CI 2.29, 17.95; Table 3). There was no evidence for statistically significant lack of fit (p=0.75).
Table 2.
Outcome | Proportion (%) | 95% CI | p-value |
---|---|---|---|
1-year Mortality | <0.001 | ||
NESMS 0 | 93 | 78, 99 | - |
NESMS 1 | 69 | 55, 82 | - |
NESMS 2 | 45 | 34, 57 | - |
NESMS 3 | 13 | 5, 27 | - |
3-month Mortality | <0.001 | ||
NESMS 0 | 60 | 41, 77 | - |
NESMS 1 | 41 | 27, 56 | - |
NESMS 2 | 8 | 3, 16 | - |
NESMS 3 | 4 | 1, 15 | - |
3-month Independent Ambulatory Function | <0.001 | ||
NESMS 0 | 7 | 1,22 | - |
NESMS 1 | 14 | 6,27 | - |
NESMS 2 | 59 | 47, 70 | - |
NESMS 3 | 67 | 51, 80 | - |
6-month Independent Ambulatory Function | <0.001 | ||
NESMS 0 | 10 | 2, 27 | - |
NESMS 1 | 12 | 5, 25 | - |
NESMS 2 | 40 | 29, 51 | - |
NESMS 3 | 53 | 38, 68 | - |
Accounting for mortality as competing risk
Table 3.
Characteristics | OR | 95% CI | p-value |
---|---|---|---|
New England Spinal Metastases Score | |||
0 | 168.27 | 27.97, 1012.28 | <0.001 |
1 | 22.54 | 7.01, 72.49 | <0.001 |
2 | 6.42 | 2.29, 17.95 | <0.001 |
3 | Ref | Ref | Ref |
Male Sex | 0.48 | 0.24, 0.97 | 0.04 |
White | 0.96 | 0.36, 2.54 | 0.94 |
Number of Co-morbidities | |||
Two or less | Ref | Ref | Ref |
Three or more | 0.62 | 0.29, 1.32 | 0.21 |
Treatment Strategy | |||
Non-operative | Ref | Ref | Ref |
Operative | 0.67 | 0.33, 1.36 | 0.27 |
Age | |||
50 or younger | Ref | Ref | Ref |
51–60 | 1.41 | 0.51, 3.91 | 0.51 |
61–70 | 0.87 | 0.30, 2.48 | 0.79 |
71 or older | 2.87 | 0.88, 9.40 | 0.08 |
OR – odds ratio; CI – confidence interval; Ref – referent.
Mortality at 1-year was 65% for patients with Tokuhashi=0–8, 46% for Tokuhashi=9–11, and 19% for Tokuhashi=12–15. The c-statistic was 0.67. In adjusted analysis that accounted for confounders, as compared to Tokuhashi=12–15, all renditions of the score were significantly associated with an increased risk of mortality (Tokuhashi=0–8: OR 9.72; 95% CI 3.28, 28.80; Tokuhashi=9–11: OR 4.32; 95% CI 1.46, 12.80; Table 4). There was no evidence of statistically significant lack of fit (p=0.38).
Table 4.
Characteristics | OR | 95% CI | p-value |
---|---|---|---|
Tokuhashi Score | |||
0–8 | 9.72 | 3.28, 28.80 | <0.001 |
9–11 | 4.32 | 1.46, 12.80 | 0.008 |
12–15 | Ref | Ref | Ref |
Male Sex | 0.64 | 0.35, 1.18 | 0.15 |
White | 0.87 | 0.36, 2.12 | 0.76 |
Number of Co-morbidities | |||
Two or less | Ref | Ref | Ref |
Three or more | 0.90 | 0.47, 1.75 | 0.77 |
Treatment Strategy | |||
Non-operative | Ref | Ref | Ref |
Operative | 0.81 | 0.44, 1.49 | 0.49 |
Age | |||
50 or younger | Ref | Ref | Ref |
51–60 | 1.13 | 0.45, 2.87 | 0.79 |
61–70 | 0.96 | 0.38, 2.43 | 0.94 |
71 or older | 1.71 | 0.61, 4.78 | 0.31 |
OR – odds ratio; CI – confidence interval; Ref – referent
Mortality at 1-year was 78% for patients with Tomita=9–10, 57% for Tomita=6–8, 47% for Tomita=4–5 and 30% for Tomita=2–3. The c-statistic was 0.69. In adjusted analysis that accounted for confounders, as compared to Tomita=2–3, Tomita=9–10 (OR 8.99; 95% CI 3.38, 23.93) and Tomita=6–8 (OR 3.24; 95% CI 1.54, 6.82) were significantly associated with an increased risk of mortality (Table 5). Tomita=4–5 was not significantly different from the referent (OR 2.12; 95% CI 0.85, 5.30). There was no evidence for statistically significant lack of fit (p=0.79).
Table 5.
Characteristics | OR | 95% CI | p-value |
---|---|---|---|
Tomita Score | |||
2–3 | Ref | Ref | Ref |
4–5 | 2.12 | 0.85, 5.30 | 0.11 |
6–8 | 3.24 | 1.54, 6.82 | 0.002 |
9–10 | 8.99 | 3.38, 23.93 | <0.001 |
Male Sex | 0.76 | 0.41, 1.39 | 0.37 |
White | 0.72 | 0.30, 1.74 | 0.47 |
Number of Co-morbidities | |||
Two or less | Ref | Ref | Ref |
Three or more | 0.88 | 0.46, 1.71 | 0.71 |
Treatment Strategy | |||
Non-operative | Ref | Ref | Ref |
Operative | 0.83 | 0.44, 1.54 | 0.55 |
Age | |||
50 or younger | Ref | Ref | Ref |
51–60 | 1.22 | 0.48, 3.11 | 0.67 |
61–70 | 1.17 | 0.46, 2.97 | 0.75 |
71 or older | 2.11 | 0.74, 6.00 | 0.16 |
OR – odds ratio; CI – confidence interval; Ref – referent
Mortality at 1-year was 47% for patients with SINS=13–18, 54% for SINS=7–12 and 41% for SINS=0–6. The c-statistic was 0.54. In adjusted analysis that accounted for confounders, as compared to SINS=0–6, SINS=7–12 (OR 1.88; 95% CI 0.73, 4.86) and SINS=13–18 (OR 1.36; 95% CI 0.46, 4.01) were not significantly associated with an increased risk of mortality (Table 6). There was no evidence of statistically significant lack of fit (p=0.96).
Table 6.
Characteristics | OR | 95% CI | p-value |
---|---|---|---|
SINS Score | |||
0–6 | Ref | Ref | Ref |
7–12 | 1.88 | 0.73, 4.86 | 0.19 |
13–18 | 1.36 | 0.46, 4.01 | 0.58 |
Male Sex | 0.64 | 0.36, 1.14 | 0.13 |
White | 0.77 | 0.33, 1.77 | 0.54 |
Number of Co-morbidities | |||
Two or less | Ref | Ref | Ref |
Three or more | 1.00 | 0.54, 1.87 | 1.0 |
Treatment Strategy | |||
Non-operative | Ref | Ref | Ref |
Operative | 0.85 | 0.47, 1.54 | 0.59 |
Age | |||
50 or younger | Ref | Ref | Ref |
51–60 | 0.94 | 0.39, 2.27 | 0.90 |
61–70 | 0.91 | 0.38, 2.20 | 0.83 |
71 or older | 1.47 | 0.55, 3.94 | 0.44 |
OR – odds ratio; CI – confidence interval; Ref – referent
When comparing the discriminative capacity of the predictive scores, the NESMS had the highest c-statistic (0.79), followed by the Tomita score (0.69), the Tokuhashi score (0.67) and the SINS (0.54). The discriminative capacity of the NESMS was significantly greater than any of the other predictive tools [vs. Tomita (z=2.29; p=0.02); vs. Tokuhashi (z=2.72, p=0.007); vs. SINS (z=5.32, p<0.001)].
Mortality at 3-months Following Presentation
Following adjusted analysis, as compared to NESMS=3 as referent, NESMS=0 (OR 37.68; 95% CI 7.30, 194.50) and NESMS=1 (OR 15.81; 95% CI 3.31, 75.64) were significantly associated with an increased risk of mortality. There was no significant association between NESMS 2 (OR 1.69; 95% CI 0.32, 8.89) and 3-month mortality.
When compared to Tokuhashi=12–15, only Tokuhashi=0–8 was significantly associated with an increased risk of 3-month mortality (OR 6.24; 95% CI 1.37, 28.48). Mortality was not significantly different for Tokuhashi=9–11 (OR 2.57; 95% CI 0.54, 12.32).
For the Tomita score, when compared to Tomita=2–3, Tomita=9–10 (OR 10.28; 95% CI 3.00, 35.27) and Tomita=6–8 (OR 6.17; 95% CI 1.95, 19.52) were significantly associated with an increased risk of mortality (Table 5). Tomita=4–5 was not significantly different from the Tomita=2–3 (OR 3.56; 95% CI 0.90, 14.07).
The SINS performed poorly across all renditions of the score. As compared to SINS=0–6, neither SINS=7–12 (OR 1.74; 95% CI 0.47, 6.42) and SINS=13–18 (OR 2.62; 95% CI 0.63, 10.91) were significantly associated with an increased risk of mortality.
Prediction of Independent Ambulatory Function at 3- and 6-months.
Following adjusted analysis, as compared to NESMS=3, NESMS=0 (OR 0.03; 95% CI 0.01, 0.14) and NESMS=1 (OR 0.08; 95% CI 0.03, 0.22) were significantly associated with lower likelihood of independent ambulation at 3-months. Similar findings were present for ambulatory function at 6-months (NESMS=0: OR 0.07; 95% CI 0.02, 0.28; NESMS=1: OR 0.10; 95% CI 0.03, 0.29). There was no significant finding for NESMS 2 regarding ambulatory function at 3- (OR 0.67; 95% CI 0.30, 1.49) or 6-months (OR 0.61; 95% CI 0.27, 1.35).
All renditions of the Tokuhashi score were correlated with ambulatory function at 3- and 6-month timepoints. As compared to Tokuhashi=12–15, 3-month findings for the score were: Tokuhashi=0–8 OR 0.13 (95% CI 0.05, 0.36) and Tokuhashi=9–11 OR 0.25 (95% CI 0.09, 0.67). Six-month findings for the score were Tokuhashi=0–8 OR 0.12 (95% CI 0.05, 0.33) and Tokuhashi=9–11 OR 0.25 (95% CI 0.10, 0.65).
The Tomita score performed poorly as a predictor for 3-month ambulatory function with Tomita=4–5 not significantly different (OR 0.83; 95% CI 0.34, 2.04) and only a borderline effect for Tomita=6–8 (OR 0.48; 95% CI 0.24, 0.99) as compared to Tomita=2–3. Only Tomita=9–10 had a sizable effect on 3-month ambulation (OR 0.22; 95% CI 0.09, 0.58). Six-month results were relatively similar for Tomita=4–5 (OR 0.91; 95% CI 0.36, 2.33), with a more demonstrable effect for Tomita=6–8 (OR 0.36; 95% CI 0.17, 0.79) and Tomita=9–10 (OR 0.26; 95% CI 0.09, 0.70).
In adjusted analysis, all renditions of the SINS were significant predictors of ambulatory function at 3-months, but not at 6-months. As compared to SINS=0–6, 3-month estimates were: SINS=7–12 OR 0.22 (95% CI 0.07, 0.64) and SINS=13–18 OR 0.14 (95% CI 0.04, 0.47). At 6-months, estimates were: SINS=7–12 OR 0.44 (95% CI 0.17, 1.16) and SINS=13–18 OR 0.24 (95% CI 0.07, 0.77).
Discussion
Prognostic tools have become a standard component of the care of patients with spinal metastases since Tokuhashi et al first proposed a modern scoring rubric in 1990.6,7,8 While various predictive modalities are widely used and frequently cited in the literature, few have been shown to be consistently reliable or useful in forecasting survival.6,7,12,13 For example, some have maintained that the Tokuhashi system does not perform well in patients with lung cancer, myeloma, or limited life expectancy.6,7 Similarly, the Tomita score has been found to perform suboptimally in cases of breast, lung, renal cell and prostate cancer.6,7
Our results indicate that the NESMS outperformed these legacy scoring systems, as well as the SINS, in the primary outcome of 1-year survival with a significantly higher c-statistic, meaningful step-wise increases in the odds of mortality across each value of the score and good model calibration. Furthermore, the NESMS performed as well, if not better, than the Tokuhashi score in the secondary outcomes of 3-month mortality and ambulatory function at 3- and 6-months. While the Tokuhashi system performed reasonably well in prognosticating independent ambulatory function, all renditions of the score did not predict 3-month mortality appropriately. At the same time, the discriminative capacity of the Tokuhashi and Tomita score were relatively similar (Tomita score=0.69; Tokuhashi score=0.67). Nonetheless, the overall performance of the Tomita score was inferior to that of the Tokuhashi, with certain renditions of the Tomita score unable to differentiate 3-month mortality. The SINS was not significantly associated with mortality at any time-point. Neither the Tomita scale, nor SINS were that informative for ambulatory function at 3- and 6-months.
Our work is advantaged by a prospective study design, with all scoring utilities assigned using baseline enrollment data.2 Moreover, the normal distributions encountered within each scoring system indicate that we captured sufficient variation of clinical and treatment-based characteristics within our sample to be considered representative of the broad spectrum of patients with spinal metastases. This speaks against concerns for restricted variation or clinical truncation that have plagued the retrospective study of scoring measures in the past.6,7,13,14 In addition, we are reassured that the point estimates and 95% CI reported here for the NESMS overlap with those published for the model in initial validation18 and the c-statistic (0.79) is relatively close to that of the initial study used to develop the utility (0.74)15.
Although our work is the first to prospectively compare these prognostic tools, our findings regarding the utilities are well aligned with prior reports. In a retrospective series of 86 patients with renal cancer, Massaad et al reported that the NESMS differentiated survival with the highest accuracy, while only fair performance was detected for the Tokuhashi system, among others.14 Similar conclusions were expressed by Ahmed et al, who maintained that both the Tokuhashi and Tomita scores lacked the characteristics necessary to be considered useful predictive tools.6 Sullivan et al reported that higher levels of the SINS were correlated with increased mortality at 30-days, although the effect size of this estimate was fairly small (HR 1.11).11
Some of these discrepancies may result from restricted clinical variation, combined with the practice patterns and confounding by indication inherent to the cohorts used in retrospective validation-type investigations. These issues were avoided in our current work via prospective enrollment and baseline assignment of predictive scores, as well as our inclusion of patients with wide variation in the manifestations of spinal metastatic disease, as shown by a normal distribution across all the prognostic utilities.2 Short of these rigorous process measures, putative validation studies may at best support use of the scoring utility in the clinical cohort evaluated. At worst, they only measure how well contributors are able to align their clinical judgment and practice-based outcomes with the authors who developed the prognostic tools in the first place.
Nonetheless, we do acknowledge potential limitations. Foremost, this investigation was powered to detect significant differences in survival and may be underpowered to identify clinically important distinctions in secondary measures or interactions between certain covariates. We recognize the potential for residual confounding as a result. Further, in the cohort under study, neurologic function and performance status demonstrated significant interactions with the prognostic scores. This likely stems from the fact that neurologic function and performance status are highly correlated with ambulatory capacity as characterized in the NESMS and there is also potential overlap with other parameters contained in the Tokuhashi score and SINS. It is possible that in yet larger samples these characteristics could be included as co-variates, however, given the intersection with multiple scoring algorithms considered here we do not see this as materially impacting our current findings. Comparisons in this study were limited to the NESMS, Tokuhashi, Tomita and SINS, based on the extensive prior research and widespread clinical utilization of these scoring tools.6,7.12–14 This investigation did not consider other systems such as the Rades, van der Linden and Katagiri scores.6,7 The Katagiri score6,21, in particular, contains novel aspects not replicated in the systems presently under consideration, including hormone dependency and molecular targets. Future study is envisioned that would compare the NESMS to these other emerging scoring algorithms. Last, this validation was conducted in hospitals that operate with overlapping faculty in a single city and 2/3 institutions are part of a single healthcare system. This may raise concerns for patient clustering at the clinical and sociodemographic level, as well as the fact that a degree of expertise bias may be present as many of the providers administering clinical care to patients enrolled in this study were also involved in the development of or are familiar with the NESMS. While treating clinicians were blinded to patients’ study-specific prognostic scores, we cannot rule out the possibility that providers used outside means to inform their treatment decisions and shared-decision making which may further confound our estimations. This results in our contention that the NESMS should not be used as a determinant of whether patients merit a particular treatment approach. There may very well still be those who meaningfully benefit from interventions, even surgery, with low NESMS values. While we believe the current findings support utilization of the NESMS in clinical practice, we also recognize that further validation studies in outside centers with different practice patterns are necessary and should be performed. In order to further improve upon the current work and interpretation of the results, as well as to reduce the limitations and sources of confounding identified here, prospective outside center validation would prove invaluable.
In conclusion, the results of this prospective validation study indicate that the NESMS was able to differentiate survival to a significantly higher degree than the Tokuhashi, Tomita and SINS. The NESMS was also able to inform independent ambulatory function at 3- and 6-months, a function that was only uniformly replicated by the Tokuhashi score. We believe that these findings endorse the application of the NESMS to the care of patients with spinal metastases.
Acknowledgement:
Contributors to the POST Study group also include: Drs. Michael Groff, Yi Lu, John Chi, Hasan Zaidi, Mai Anh Huynh, Alexander Spektor, Ayal Aizer, Karen Marcus, and Larissa Lee. The authors thank Lauren B. Barton for her contributions to the data collection used in this investigation.
Funding Statement:
This research was supported in part by a grant from the Orthopaedic Research and Education Foundation (OREF). The OREF was not involved in the conduct of the study or the preparation of the manuscript. The findings and views expressed here are those of the authors and should not be viewed as reflective of the opinions of the OREF.
This research was funded in part by National Institutes of Health (NIH-NIAMS) grant K23-AR071464 to Dr. Schoenfeld. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the NIH or the Federal government.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest: None
Contributor Information
Nicole Agaronnik, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115.
Lananh Nguyen, Department of Orthopaedic Surgery, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115.
Daniel G. Tobert, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114.
Tracy A. Balboni, Department of Radiation Oncology, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115.
Joseph H. Schwab, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114.
John H. Shin, Department of Neurosurgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114.
Daniel M. Sciubba, Department of Neurosurgery, Johns Hopkins University, Baltimore, MD.
Mitchel B. Harris, Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02214.
References
- 1.Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, Parkin DM, Forman D, Bray F. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer 2015;136: E359–386. [DOI] [PubMed] [Google Scholar]
- 2.Schoenfeld AJ, Blucher JA, Barton LB, Schwab JH, Balboni TA, Chi JH, Shin JH, Kang JD, Harris MB, Ferrone ML. Design of the Prospective Observational study of Spinal metastasis Treatment (POST). Spine J. 2020;20: 572–579. [DOI] [PubMed] [Google Scholar]
- 3.Depreitere B, Turner I, Vandoren C, Choi D. Cost-Utility Analysis of Surgery and Radiotherapy for Symptomatic Spinal Metastases in a Belgian Specialist Center. World Neurosurg. 2019;125: e537–e543. [DOI] [PubMed] [Google Scholar]
- 4.Turner I, Kennedy J, Morris S, Crockard A, Choi D. Surgery and Radiotherapy for Symptomatic Spinal Metastases Is More Cost Effective Than Radiotherapy Alone: A Cost Utility Analysis in a U.K. Spinal Center. World Neurosurg 2018;109: e389–e397. [DOI] [PubMed] [Google Scholar]
- 5.Fehlings MG, Nater A, Tetreault L, Kopjar B, Arnold P, Dekutoski M, et al. Survival and Clinical Outcomes in Surgically Treated Patients With Metastatic Epidural Spinal Cord Compression: Results of the Prospective Multicenter AO Spine Study. J Clin Oncol 2016;34:268–276. [DOI] [PubMed] [Google Scholar]
- 6.Ahmed AK, Goodwin CR, Heravi A, Kim R, Abu-Bonsrah N, Sankey E, et al. Predicting survival for metastatic spine disease: a comparison of nine scoring systems. Spine J 2018;18:1804–14. [DOI] [PubMed] [Google Scholar]
- 7.Cassidy JT, Baker JF, Lenehan B. The role of prognostic scoring systems in assessing surgical candidacy for patients with vertebral metastasis: a narrative review. Global Spine J 2018;8:638–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tokuhashi Y, Matsuzaki H, Toriyama S, Kawano H, Ohsaka S. Scoring system for the preoperative evaluation of metastatic spine tumor prognosis. Spine 1990;15:1110–1113. [DOI] [PubMed] [Google Scholar]
- 9.Tomita K, Kawahara N, Kobayashi T, Yoshida A, Murakami H, Akamaru T. Surgical strategy for spinal metastases. Spine 2001;26:298–306. [DOI] [PubMed] [Google Scholar]
- 10.Fourney DR, Frangou EM, Ryken TC, Dipaola CP, Shaffrey CI, Berven SH, et al. Spinal instability neoplastic score: an analysis of reliability and validity from the spine oncology study group. J Clin Oncol 2011;29:3072–7. [DOI] [PubMed] [Google Scholar]
- 11.Sullivan PZ, Albayar A, Ramayya AG, McShane B, Marcotte P, Malhotra NR, Ali ZS, Chen HI, Janjua MB, Saifi C, Schuster J, Grady MS, Jones J, Ozturk AK. Association of spinal instability due to metastatic disease with increased mortality and a proposed clinical pathway for treatment. J Neurosurg Spine. 2020. February 14:1–8. doi: 10.3171/2019.11.SPINE19775. [DOI] [PubMed] [Google Scholar]
- 12.Amelot A, Cristini J, Salaud C, Moles A, Hamel O, Moreau P, Bord E, Buffenoir K. Overall Survival in Spine Myeloma Metastases: Difficulties in Predicting With Prognostic Scores. Spine (Phila Pa 1976). 2017;42:400–406. [DOI] [PubMed] [Google Scholar]
- 13.Carrwik C, Olerud C, Robinson Y. Predictive Scores Underestimate Survival of Patients With Metastatic Spine Disease: A Retrospective Study of 315 Patients in Sweden. Spine (Phila Pa 1976). 2020;45:414–419. [DOI] [PubMed] [Google Scholar]
- 14.Massaad E, Hadzipasic M, Alvarez-Breckenridge C, Kiapour A, Fatima N, Schwab JH, Saylor P, Oh K, Schoenfeld AJ, Shankar GM, Shin JH. Predicting tumor-specific survival in patients with spinal metastatic renal cell carcinoma: which scoring system is most accurate? J Neurosurg Spine. 2020. June 5:1–11. [DOI] [PubMed] [Google Scholar]
- 15.Ghori AK, Leonard DA, Schoenfeld AJ, Saadat E, Scott N, Ferrone ML, et al. Modeling one-year survival after surgery on the metastatic spine. Spine J 2015;15:2345–2350. [DOI] [PubMed] [Google Scholar]
- 16.Goodwin CR, Schoenfeld AJ, Abu-Bonsrah NA, Garzon-Muvdi T, Sankey EW, Harris MB, Sciubba DM. Reliability of a spinal metastasis prognostic score to model 1-year survival. Spine J. 2016;16: 1102–8. [DOI] [PubMed] [Google Scholar]
- 17.Shi DD, Chen YH, Lam TC, Leonard D, Balboni TA, Schoenfeld A, Skamene S, Cagney DN, Chi JH, Cho CH, Harris M, Ferrone ML, Hertan LM. Assessing the utility of a prognostication model to predict 1-year mortality in patients undergoing radiation therapy for spinal metastases. Spine J. 2018. June;18(6):935–940. [DOI] [PubMed] [Google Scholar]
- 18.Schoenfeld AJ, Ferrone ML, Schwab JH, Blucher JA, Barton LB, Tobert DG, Chi JH, Shin JH, Kang JD, Harris MB. Prospective validation of a clinical prediction score for survival in patients with spinal metastases: The New England Spinal Metastasis Score. Spine J. 2020. February 19:S1529–9430(20)30053-X. doi: 10.1016/j.spinee.2020.02.009. [DOI] [PubMed] [Google Scholar]
- 19.Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992;45:613–9. [DOI] [PubMed] [Google Scholar]
- 20.Long JS, Freese J. Regression models for categorical dependent variables using STATA. 2nd Edition. College Station, TX: STATA Press; 2006. [Google Scholar]
- 21.Kobayashi K, Ando K, Nakashima H, Sato K, Kanemura T, Yoshihara H, Hirasawa A, Kato F, Ishiguro N, Imagama S. Prognostic factors in the new Katagiri scoring system after palliative surgery for spinal metastasis. Spine 2020;45: E813–E819. [DOI] [PubMed] [Google Scholar]