Abstract
Background
Survival estimation for patients with symptomatic skeletal metastases ideally should be made before a type of local treatment has already been determined. Currently available survival prediction tools, however, were generated using data from patients treated either operatively or with local radiation alone, raising concerns about whether they would generalize well to all patients presenting for assessment. The Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA), trained with institution-based data of surgically treated patients, and the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy model (METSSS), trained with registry-based data of patients treated with radiotherapy alone, are two of the most recently developed survival prediction models, but they have not been tested on patients whose local treatment strategy is not yet decided.
Questions/purposes
(1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?
Methods
Between 2010 and 2018, we provided local treatment for 2113 adult patients with skeletal metastases in the extremities at an urban tertiary referral academic medical center using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Every patient’s survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. After applying a priori designated exclusion criteria, 91% (1920) were analyzed here. Among them, 48% (920) of the patients were female, and the median (IQR) age was 62 years (53 to 70 years). Lung was the most common primary tumor site (41% [782]), and 59% (1128) of patients had other skeletal metastases in addition to the treated lesion(s). In general, the indications for surgery were the presence of a complete pathologic fracture or an impending pathologic fracture, defined as having a Mirels score of ≥ 9, in patients with an American Society of Anesthesiologists (ASA) classification of less than or equal to IV and who were considered fit for surgery. The indications for radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above. In all, 84% (1610) of the patients received palliative radiotherapy alone as local treatment for the target lesion(s), and 16% (310) underwent surgery followed by postoperative radiotherapy. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period. Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities. We first compared SORG to METSSS in the entire population. Then, we repeated the comparison in patients who received local treatment with palliative radiation alone. We assessed model performance by area under the receiver operating characteristic curve (AUROC), calibration analysis, Brier score, and decision curve analysis (DCA). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction, and in general, an AUROC of ≥ 0.7 indicates adequate discrimination for clinical use. Calibration refers to the agreement between the predicted outcomes (in this case, survival probabilities) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual survival is generally underestimated by the prediction model, and a negative intercept suggests the opposite (overestimation). When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survivors. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival); and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes, and it ranges from 0 (best prediction) to 1 (worst prediction). The Brier score captures both discrimination and calibration, and it is considered a measure of overall model performance. In Brier score analysis, the “null model” assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to be considered as useful. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model’s predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment in an individual and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. Finally, we examined whether laboratory data, which were not included in the METSSS model, would have been independently associated with survival after controlling for the METSSS model’s predictions by using the multivariable logistic and Cox proportional hazards regression analyses.
Results
Between the two models, only SORG-MLA achieved adequate discrimination (an AUROC of > 0.7) in the entire cohort (of patients treated operatively or with radiation alone) and in the subgroup of patients treated with palliative radiotherapy alone. SORG-MLA outperformed METSSS by a wide margin on discrimination, calibration, and Brier score analyses in not only the entire cohort but also the subgroup of patients whose local treatment consisted of radiotherapy alone. In both the entire cohort and the subgroup, DCA demonstrated that SORG-MLA provided more net benefit compared with the two default strategies (of treating all or no patients) and compared with METSSS when risk thresholds ranged from 0.2 to 0.9 at both 90 days and 1 year, indicating that using SORG-MLA as a decision-making aid was beneficial when a patient’s individualized risk threshold for opting for treatment was 0.2 to 0.9. Higher albumin, lower alkaline phosphatase, lower calcium, higher hemoglobin, lower international normalized ratio, higher lymphocytes, lower neutrophils, lower neutrophil-to-lymphocyte ratio, lower platelet-to-lymphocyte ratio, higher sodium, and lower white blood cells were independently associated with better 1-year and overall survival after adjusting for the predictions made by METSSS.
Conclusion
Based on these discoveries, clinicians might choose to consult SORG-MLA instead of METSSS for survival estimation in patients with long-bone metastases presenting for evaluation of local treatment. Basing a treatment decision on the predictions of SORG-MLA could be beneficial when a patient’s individualized risk threshold for opting to undergo a particular treatment strategy ranged from 0.2 to 0.9. Future studies might investigate relevant laboratory items when constructing or refining a survival estimation model because these data demonstrated prognostic value independent of the predictions of the METSSS model, and future studies might also seek to keep these models up to date using data from diverse, contemporary patients undergoing both modern operative and nonoperative treatments.
Level of Evidence
Level III, diagnostic study.
Introduction
Radiation and surgery are two mainstays in the management of symptomatic skeletal metastasis. Radiotherapy is noninvasive and often effectively decreases bone pain [25, 27, 28, 30]. It seldom causes serious systemic complications and rarely delays crucial systemic therapy for patients with advanced cancer. Consequently, radiotherapy is especially suited for patients without mechanical instability of the involved bone and for those who may not recover from a more extensive operation because of frailty and limited remaining survival. Surgery, on the other hand, is often performed to address mechanical instability and might provide faster symptom relief [2, 15, 20, 36], but it has potential downsides such as anesthetic risks, wound complications, infection, perioperative physiologic disturbance, longer recovery periods, need for prolonged rehabilitation, and disruption of systemic cancer treatment. More extensive operations (such as tumor resection and prosthetic reconstruction) typically carry greater risks than less invasive procedures (like closed nail fixation). The primary goal for treating skeletal metastases is often symptomatic relief and improvement of quality of life. A personalized treatment strategy needs to consider a patient’s expected survival, and currently this is almost always decided based on the treating team’s best estimate.
Physicians’ clinical estimation of survival, however, can be unreliable [3], and several survival prediction models have been developed to address this issue [9-11, 21, 33, 41, 42, 44, 50]. These models were often created using data from patients who had been treated with either surgery or palliative radiotherapy alone [1, 16, 26, 42]. Among them, the newly introduced Skeletal Oncology Research Group machine-learning algorithm (SORG-MLA) has demonstrated great performance in predicting the 90-day and 1-year survival of surgically treated patients from various cohorts [39, 44, 45]. Another survival prediction model, the Metastases location, Elderly, Tumor primary, Sex, Sickness/comorbidity, and Site of radiotherapy (METSSS) model, was created using data from the US National Cancer Database and showed promise in making 1-year survival predictions for patients treated with upfront radiation [53]. In clinical practice, when patients with symptomatic bone metastases are being evaluated for local treatment, survival estimation should ideally take place before physicians already decide on a treatment strategy because it might affect the decision between surgical and nonsurgical treatment or influence the choice of surgical procedure or radiation regimen. However, SORG-MLA and METSSS have not been tested head-to-head in a cohort that includes both patients who received local treatment with surgery and those with radiotherapy alone, raising concerns whether these models would perform well before a patient’s treatment modality has already been selected. SORG-MLA and METSSS have also not been directly compared among patients who had palliative radiotherapy alone, a group that actually constitutes the majority of patients with skeletal metastases presenting for evaluation. SORG-MLA was developed using institutional data and incorporated more detailed information such as laboratory values that reflect a patient’s overall health and nutritional status [37]. The strength of METSSS, on the other hand, was that the model was created based on a large data set from a US national registry and would potentially provide a good model fit. As predictive analytics see wider use, researchers seeking to develop a new model or refine an existing one could benefit from knowing what properly selected, meaningful factors to include into their algorithms so that the model can achieve high performance. For example, if the laboratory values included in SORG-MLA are independently associated with patient survival after adjusting for the predictions made by METSS, future studies might consider incorporating these laboratory factors during the model construction or refinement process.
We therefore asked the following questions: (1) Which of these two survival prediction models performed better in a mixed cohort made up both of patients who received local treatment with surgery followed by radiotherapy and who had radiation alone for symptomatic bone metastases? (2) Which model performed better among patients whose local treatment consisted of only palliative radiotherapy? (3) Are laboratory values used by SORG-MLA, which are not included in METSSS, independently associated with survival after controlling for predictions made by METSSS?
Patients and Methods
Study Design and Setting
This single-center, retrospective, comparative study followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) guideline [4, 14]. It was performed at an urban, tertiary referral university medical center in Taipei, Taiwan. The treatment strategy was typically discussed among the patient’s medical oncologist, radiation oncologist, and one of four fellowship-trained orthopaedic oncologists who operated on all surgically treated patients. Neither METSSS nor SORG-MLA was used at the point of care to aid clinical decision-making during the treatment period (2010 to 2018). Survival was retrospectively estimated by these two models to test their potential for providing survival probabilities.
Participants
Between 2010 and 2018, we provided local treatment for 2113 adult (age 20 years or older by the Taiwanese Civil Code) patients with skeletal metastases in the extremities using one of two strategies: (1) surgery followed by postoperative radiotherapy or (2) palliative radiotherapy alone. Typically, the diagnosis of skeletal metastasis was made in patients with a known primary cancer based on their symptoms and clinical images, such as radiographs, bone scan, CT scan, and MRI. A biopsy was not routinely performed on skeletal lesions unless there were reasons to suspect a second malignancy or primary bone sarcoma. Every patient’s survivorship status was ascertained either by their medical records or the national death registry from the Taiwanese National Health Insurance Administration. We excluded 3% (54) of the patients because their target lesion was a sarcoma bone metastasis, 2% (47) because their target lesions had been previously treated with surgery or radiation, and 4% (92) because their primary tumor site could not be ascertained due to multiple documented cancers and a lack of histopathologic confirmation of the metastatic tumor, leaving 1920 for analysis (Fig. 1). During this time period, our general indications for surgery followed by radiotherapy were patients with an American Society of Anesthesiologists (ASA) classification less than or equal to IV or patients considered fit for surgery based on a multidisciplinary assessment jointly made by a medical oncologist, anesthesiologist, and orthopaedic oncologist, and the presence of a complete pathologic fracture or an impending pathologic fracture deemed unlikely to heal with nonoperative treatment alone. An impending fracture was diagnosed if the lesion in question had a Mirels score of ≥ 9 [32] and caused pain or weakness in the involved limb. The Mirels score was assigned by the consultant orthopaedic oncologist, and we considered a score of 2 on “pain” as adequate to qualify a patient for surgery if the total Mirels score was ≥ 9. For example, a patient with an osteolytic lesion (3) occupying between one-third to two-thirds of the diameter (2) of the femoral diaphysis (2) would be a surgical candidate if the pain score was 2. In addition, surgery was often offered for actual pathologic femur fractures unless there were overt medical contraindications, such as ongoing shock, a comatose state, acute respiratory failure, decompensated hepatic failure, and severe heart dysfunction because we felt that femoral fractures tend to profoundly impact the patient’s quality of life. Our general indications for upfront palliative radiotherapy were relief of pain, local tumor control, prevention of skeletal-related events, and any combination of the above, in the absence of the surgical indications mentioned above. Patients were excluded if (1) their first radiotherapy or the first surgical intervention for an extremity metastasis was performed at other institutions, (2) their target lesion was a bone sarcoma or soft tissue sarcoma bone metastasis, and (3) their primary tumor site could not be ascertained because of multiple documented cancers and lack of histopathologic verification of the metastatic tumor.
Fig. 1.

This flowchart displays the details of patient enrollment in this study.
Descriptive Data
Among the 1920 included patients, 48% (920) were female, and the median (IQR) age was 62 years (53 to 70 years) (Table 1). Eighty-four percent (1610) of patients received local treatment with palliative radiotherapy only, and 16% (310) of patients had surgery followed by radiotherapy. Thirty-three percent (632) of the patients had additional Charlson comorbidities other than metastatic cancer, and 61% (892 of 1456) of the patients with a known pretreatment functional status had a good Eastern Cooperative Oncology Group performance status (ECOG PS) score of 0 or 1. Lung was the most common primary tumor site (41% [782]), followed by breast (14% [263]), liver (9% [169]), prostate 8% [160]), and kidney (3% [66]). Eighty-eight percent (1688) of the patients had lower extremity metastases and 21% (401) had upper extremity metastases, with 9% (179) presenting with both upper and lower extremity metastases at the same time. Fifty-nine percent (1128) of the patients had other skeletal metastases in addition to the treated lesion(s). Brain metastases were present in 16% (303) of the patients and visceral metastases in 18% (339). Molecular target therapy was administered to 23% (439) of the patients. The 90-day survival rate was 77% (1473), and the 1-year survival rate was 42% (811).
Table 1.
Baseline characteristics of the study population (n = 1920)
| Variable | Total | Missing proportion |
| Demographic characteristics | ||
| Age in years | 62 (53 to 70) | 0 |
| Sex, female | 48 (920) | 0 |
| Height in cm | 161 (155 to 167) | 0.6 (11) |
| Weight in kg | 59 (51 to 67) | 0.6 (11) |
| BMI in kg/m2 | 23 (21 to 24) | 0.6 (11) |
| Clinical factors | ||
| Additional Charlson comorbidity other than the primary cancer | 33 (632) | 0 |
| ECOG PS score 0 or 1 | 61 (892 of 1456) | 24 (464) |
| Oncological factors | ||
| Tumor primary site | 0 | |
| Lung | 41 (782) | |
| Breast | 14 (263) | |
| Liver | 9 (169) | |
| Prostate | 8 (160) | |
| Kidney | 3 (66) | |
| Others | 25 (480) | |
| Metastatic sites | 0 | |
| Upper extremity | 21 (401) | |
| Lower extremity | 88 (1688) | |
| Multiple bone metastases | 59 (1128) | 0 |
| Visceral metastasis | 18 (339) | 0 |
| Brain metastasis | 16 (303) | 0 |
| Previous systematic therapy | 23 (439) | 0 |
| Surgical treatment | 16 (310) | 0 |
| Laboratory values | ||
| White blood cell in 103/μL | 6.5 (4.7 to 9.0) | 8 (156) |
| Hemoglobin in g/dL | 11.3 (9.8 to 12.7) | 8 (156) |
| Platelet in 103/μL | 235 (175 to 312) | 8 (156) |
| Absolute lymphocyte in 103/μL | 0.93 (0.48 to 1.46) | 14 (261) |
| Absolute neutrophil in 03/μL | 4.12 (2.38 to 6.22) | 14 (261) |
| Albumin in g/dL | 3.7 (3.2 to 4.2) | 29 (562) |
| Alkaline phosphatase in IU/L | 124 (78 to 231) | 25 (489) |
| Creatinine in mg/dL | 0.8 (0.6 to 1.0) | 7 (127) |
| Neutrophil-to-lymphocyte ratio | 4.3 (2.6 to 7.9) | 14 (261) |
| Platelet-to-lymphocyte ratio | 222.0 (140.9 to 347.0) | 14 (261) |
| International normalized ratio | 0.99 (0.95 to 1.06) | 29 (562) |
| Serum sodium concentration in mg/dL | 136 (133 to 139) | 15 (293) |
| Serum calcium concentration in mg/dL | 9.0 (8.4 to 9.4) | 20 (378) |
| Survival endpoints | ||
| 30 days | 93 (1778) | 0 |
| 90 days | 77 (1473) | 0 |
| 180 days | 60 (1161) | 0 |
| 1 year | 42 (811) | 0 |
| 2 years | 26 (500) | 0 |
Data presented as median (IQR) or % (n). ECOG PS = Eastern Cooperative Oncology Group performance status.
Factors Evaluated as Potentially Associated With Survival
All required input variables for the two survival prediction models (SORG-MLA and METSSS) were extracted from the medical records (Supplemental Table 1; http://links.lww.com/CORR/B317). Only laboratory data obtained within 2 months of a patient’s inclusion date were used. The developers of SORG-MLA were not involved in data extraction or analysis. We examined whether laboratory items included in SORG-MLA would have been independently associated with survival after adjusting for METSSS predictions because METSSS did not incorporate any laboratory data.
Primary and Secondary Study Endpoints
Our primary study goal was to identify the better survival prediction model in a cohort made up of patients who received local treatment with surgery and with radiotherapy for symptomatic bone metastases. To achieve this, we retrieved data on each patient’s actual survivorship, which was defined as the interval between surgery or the first course of radiotherapy for the index extremity metastasis and the time of death by any cause. There was no loss to follow-up in this study because we searched not only the institutional database but also the national death registry in the Taiwanese National Health Insurance database. To eliminate the influence from outliers, survival data were censored at 2 years. The SORG was originally designed only to estimate 90-day and 1-year survival. The rationale for this is that a 90-day survival estimate might guide the clinician in deciding whether or not to offer surgery, and a 1-year estimate might inform the choice of reconstruction (such as prosthetic reconstruction versus internal fixation in the case of a proximal femoral metastasis). The SORG-MLA was used to calculate patients’ 90-day and 1-year survival probabilities. METSSS was used to assign a total point value for each patient, and a corresponding 1-year survival probability can be derived from its built-in nomogram. We then compared the two models on discrimination, calibration, overall performance, and decision curve analysis (DCA). If a model outperformed the other on all four analyses, it would be considered the better survival prediction model. Otherwise, no definite conclusion could be drawn. Note that the original METSSS only offered 1-year survival probability. Consequently, we could only compare the two models’ 90-day predictions on discrimination and decision curves because these two analyses could be carried out without actual values of predicted probabilities.
Our secondary study goals were (1) to identify the better survival prediction model in a subset of patients whose local treatment for symptomatic bone metastases consisted of only radiation and (2) to determine whether laboratory factors included in the SORG-MLA model could have had additional prognostic value besides METSSS predictions. For the first objective, we used the same analyses as mentioned above for our primary goal. For the second objective, we used multivariable logistic regression and Cox regression analysis to calculate the OR and HR, respectively, for each laboratory item included in the SORG-MLA model after adjusting for METSSS predictions. Laboratory parameters that maintained statistical significance after adjusting for METSSS predictions were deemed as potentially having additional prognostic relevance to patient survival.
Ethical Approval
The medical records review was approved by the research ethics committee at the National Taiwan University Hospital (No. 202105108RINC).
Statistical Analysis
A statistician blinded to patient survival status evaluated the performance of survival prediction models with the so-called “ABCD” method [6, 23, 29, 38, 46, 47, 51]: area under the receiver operating characteristic curve (AUROC) for discrimination, Brier score for overall performance, calibration, and DCAs (Supplemental Table 2; http://links.lww.com/CORR/B317). The AUROC measures discrimination, which is the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction. In general, a model with an AUROC of ≥ 0.7 is deemed as having acceptable discriminatory ability and potentially suitable for clinical use [29]. We used the DeLong test for comparison of AUROCs [8]. Calibration refers to the agreement between the predicted outcomes (predicted probabilities in our study) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual probability is generally underestimated by the prediction model, and a negative intercept suggests overestimation. When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survival. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival), and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes and ranges from 0 (best prediction) to 1 (worst prediction). It is a metric that captures both discrimination and calibration and is often used as a measure of overall model performance. The “null model” in Brier score analysis assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to qualify as useful, and a model with a Brier score closer to 0 is considered as having better overall performance. The DCA was developed as a method to determine whether using a model to inform treatment decisions would do more good than harm. It plots the net benefit of making decisions based on the model’s predictions across all possible risk thresholds (or cost-to-benefit ratios) in relation to the two default strategies of treating all or no patients. The care provider can decide on an acceptable risk threshold for the proposed treatment and assess the corresponding net benefit to determine whether consulting with the model is superior to adopting the default strategies. The acceptable risk threshold should ideally be individualized based on the clinical scenario. For example, when prescribing radiotherapy to an otherwise healthy, ambulatory patient with breast cancer for a moderately painful mixed osteolytic-osteoblastic bone metastasis occupying less than one-third of the humerus (a Mirels score of 6), the clinician may choose a lower risk threshold because the treatment’s cost-to-benefit ratio is low. In contrast, a higher risk threshold should be adopted when performing large-segment resection and prosthetic reconstruction for an extensive femoral metastasis in a frail patient with advanced cancer and several comorbidities. Readers may refer to a recent CORR® Synthesis article by Karhade and Schwab [23] if they wish to further familiarize themselves with the “ABCD” method for model performance assessment.
Survival was plotted as Kaplan-Meier curves and compared using the log-rank test. We evaluated whether laboratory values included in the SORG-MLA model were independently associated with 1-year survival and overall survival by multivariable logistic regression and Cox proportional hazards regression, respectively, after controlling for the predictions made by the METSSS model. We chose the missForest method to impute missing values because it could simultaneously handle mixed-type data consisting of continuous and categorical missing variables without the assumption of data distribution and in data settings where complex interactions might exist [43]. We performed a sensitivity analysis on patients without missing data (n = 744) and a subgroup analysis on patients who received only palliative radiotherapy (n = 1610). We reported 95% CIs and two-tailed p values for statistical interpretation. R, version 4.0.4 (R Core Team), was used for all statistical analyses.
Results
Which Survival Prediction Model Performed Better in Patients Treated With Surgery Plus Radiation or With Radiation Alone?
In a mixed cohort made up of patients who underwent local treatment with surgery followed by radiation or with radiotherapy alone for symptomatic skeletal metastases, only SORG-MLA achieved an AUROC of > 0.7 and qualified as having adequate discrimination (the ability to distinguish patients who survived from those who died) (Fig. 2). SORG-MLA outperformed METSSS by a wide margin on discrimination at 90 days (SORG-MLA versus METSSS AUROC 0.78 [95% CI 0.76 to 0.81] versus 0.58 [95% CI 0.55 to 0.61]; p < 0.001) and 1 year (SORG-MLA versus METSSS 0.77 [95% CI 0.75 to 0.79] versus 0.64 [95% CI 0.62 to 0.67]; p < 0.001) (Table 2). On calibration (agreement between the predicted survival and the actual survival) analysis, the 90-day SORG-MLA model had a calibration intercept of 0.31 (95% CI 0.19 to 0.42) and a log(O:E) of 0.08 (95% CI -0.01 to 0.17) (Table 2), suggesting that it was well calibrated but might slightly underestimate the survival probability at 90 days in this patient cohort. At 1 year, SORG-MLA again outperformed METSSS on calibration by a substantial margin, with an intercept of 0.15 (95% CI 0.05 to 0.24) and a log(O:E) of 0.09 (95% CI -0.01 to 0.19) compared with an intercept of 0.72 (95% CI 0.63 to 0.82) and a log(O:E) of 0.48 (95% CI 0.36 to 0.59) for the latter. These calibration metrics indicated that the 1-year predictions made by SORG-MLA were better aligned with the actual 1-year survival in the entire cohort of patients who underwent local treatment with surgery and with radiation alone. On Brier score analysis (an overall performance metric that assesses both discrimination and calibration by calculating the mean squared difference between the model predictions and the observed outcomes), the null model (a model that assigns to every patient a predicted survival probability equal to the actual prevalence of survivorship at a particular timepoint) had a Brier score of 0.18 at 90 days and 0.24 at 1 year (Table 2). At 90 days, the SORG-MLA had a Brier score of 0.14, which was lower than the null model’s (0.18). This suggested SORG-MLA’s 90-day predictions could provide additional new information compared with the null model. At 1 year, the Brier score of METSSS was barely lower than that of the null model (METSSS 0.23, null model 0.24), meaning that METSSS added little new information compared with the null model; the SORG-MLA’s Brier score, on the other hand, was modestly lower than the null model’s (SORG-MLA 0.20, null model 0.24), indicating that SORG-MLA’s 1-year predictions could add more new information compared with both the null model and METSSS (Table 2).
Fig. 2.
AUROC of the METSSS model and SORG-MLA in predicting (A) 90-day and (B) 1-year postradiotherapy survival. A color image accompanies the online version of this article.
Table 2.
Performance metrics with 95% CIs of the two survival prediction models (n = 1920)
| Performance metricsa | METSSS | p value | SORG-MLA | p value |
| AUROC | ||||
| 90 days | 0.58 (0.55 to 0.61) | < 0.001 | 0.78 (0.76 to 0.81) | < 0.001 |
| 1 year | 0.64 (0.62 to 0.67) | < 0.001 | 0.77 (0.75 to 0.79) | < 0.001 |
| Calibration intercept | ||||
| 90 days | 0.31 (0.19 to 0.42) | < 0.001 | ||
| 1 year | 0.72 (0.63 to 0.82) | < 0.001 | 0.15 (0.05 to 0.24) | 0.001 |
| Log(O:E) | ||||
| 90 days | 0.08 (-0.01 to 0.17) | 0.07 | ||
| 1 year | 0.48 (0.36 to 0.59) | < 0.001 | 0.09 (-0.01 to 0.19) | 0.07 |
| Brier scoreb | ||||
| 90 days | 0.14 (0.18) | |||
| 1 year | 0.23 (0.24) | 0.20 (0.24) |
The AUROC measures discrimination, that is, the ability to distinguish patients with the event of interest (such as death at a particular time point) from those without. The AUROC typically ranges from 0.5 to 1.0, with 0.5 indicating random guessing and 1.0 a perfect prediction. In general, an AUROC ≥ 0.7 indicates that a model has adequate discrimination and can be considered for clinical use if it is well calibrated. Calibration refers to the agreement between the predicted outcomes (for example, predicted probabilities in our study) and the actual outcomes, with a perfect calibration curve having an intercept of 0 and a slope of 1. A positive intercept indicates that the actual probability is generally underestimated by the prediction model, and a negative intercept suggests overestimation. When comparing models, an intercept closer to 0 typically indicates better calibration. Calibration can also be summarized as log(O:E), the logarithm scale of the ratio of observed (O) to expected (E) survival. A log(O:E) > 0 signals an underestimation (the observed survival is greater than the predicted survival), and a log(O:E) < 0 indicates the opposite (the observed survival is lower than the predicted survival). A model with a log(O:E) closer to 0 is generally considered better calibrated. The Brier score is the mean squared difference between the model predictions and the observed outcomes and ranges from 0 (best prediction) to 1 (worst prediction). It is often used as a measure of overall model performance. The “null model” in Brier score analysis assigns a predicted probability equal to the prevalence of the outcome and represents a model that adds no new information. A prediction model should achieve a Brier score at least lower than the null-model Brier score to qualify as useful, and a model with a Brier score closer to 0 is considered as having better overall performance.
Brier scores of the null models are presented in parentheses.
DCA helps clinicians assess whether using a model to inform the treatment decision would bring more good than harm (expressed as net benefit) across a full range of possible risk levels (termed risk thresholds). At 90 days, SORG-MLA provided more net benefit than not only the two default strategies of treating all or no patients but also the METSSS across a wide range of possible risk thresholds from 0.2 to 0.9 (Fig. 3A). Furthermore, METSSS only started to bring net benefit compared with the “treat all” strategy after the risk threshold exceeded 0.7, which seemed to contradict the original intended use of METSSS as an aid to guide radiotherapy because modern radiotherapy does not pose a very high risk in most clinical scenarios. At 1 year, DCA again demonstrated that SORG-MLA brought more net benefit than the two default strategies (of treating all or no patients) and METSSS when risk thresholds ranged from 0.2 to 0.9 (Fig. 3B). These DCA findings suggest that SORG-MLA might be the more useful tool to aid decision-making in both low-risk and high-risk scenarios. A potential low-risk scenario, for example, would be giving radiotherapy to ambulatory patients who had a painful humeral metastasis but no visceral metastasis. A likely high-risk situation could be performing large-segment tumor resection and complex reconstruction for an extensive femoral metastasis in a patient with advanced cancer and several comorbidities.
Fig. 3.

Decision curves of the METSSS model and SORG-MLA for (A) 90-day and (B) 1-year postradiotherapy survival. A color image accompanies the online version of this article.
A sensitivity analysis was performed on patients with no missing data (Supplemental Table 3; http://links.lww.com/CORR/B317) and demonstrated similar results for the above-mentioned performance metrics by SORG-MLA and METSSS. All taken together, SORG-MLA was the better prediction tool in patients who underwent local treatment with surgery (followed by radiotherapy) and with radiotherapy alone.
Which Model Performed Better Among Patients Who Received Only Palliative Radiotherapy?
In the 1610 patients who received local treatment with only radiotherapy, SORG-MLA demonstrated better discrimination at both 90 days (SORG-MLA versus METSSS AUROC 0.78 [95% CI 0.76 to 0.81] versus 0.58 [95% CI 0.55 to 0.61]; p < 0.001) and 1 year (SORG-MLA versus METSSS 0.77 [95% CI 0.75 to 0.80] versus 0.64 [95% CI 0.61 to 0.67]; p < 0.001) (Supplemental Table 4; http://links.lww.com/CORR/B317). On calibration analysis, the 90-day SORG-MLA model had a calibration intercept of 0.16 (95% CI -0.02 to 0.34) and a log(O:E) of 0.03 (95% CI -0.10 to 0.16), indicating a generally good calibration with the potential to underestimate survival in these patients. At 1 year, SORG-MLA was much better calibrated than METSSS, with an intercept of 0.03 (95% CI -0.13 to 0.19) and a log(O:E) of 0.02 (95% CI -0.14 to 0.18) compared with an intercept of 0.52 (95% CI 0.34 to 0.70) and a log(O:E) of 0.34 (95% CI 0.13 to 0.55) for the latter. At 90 days, SORG-MLA was deemed as providing more information than simply assuming that every patient had a survival rate equal to the prevalence of survival in this population because its Brier score was lower than that of the null model (SORG-MLA 0.15, null model 0.19). At 1-year, the Brier score was 0.23 for METSSS, 0.19 for SORG-MLA, and 0.24 for the null model, suggesting that SORG-MLA added new information compared with the null model while METSSS barely could add new information.
On DCA, the results of the two models were nearly identical to those found in the entire group made up of both patients who received local treatment with surgery as well as those who had palliative radiotherapy alone. SORG-MLA provided higher net benefit compared with the two default strategies (treating all patients and treating no patients) and METSSS when the acceptable risk thresholds for a patient to opt for local treatment ranged from 0.2 to 0.9 at both 90 days and 1 year (Supplemental Fig. 1; http://links.lww.com/CORR/B317). In contrast, compared with the “treating all patients with radiotherapy” strategy, consulting the 90-day predictions of METSSS provided only marginal net benefit when the risk threshold surpassed 0.7; and consulting the 1-year predictions of METSSS added modest benefit when the risk threshold was > 0.3. These risk levels (0.7 and 0.3) might be considered too high as the cutoff survival probabilities at 90 days and 1 year for undergoing modern palliative radiotherapy, which rarely caused serious adverse effects when delivered to target local skeletal metastases.
Are Laboratory Values in SORG-MLA Independently Associated With Survival?
A distinct difference between SORG-MLA and METSSS was that the latter did not incorporate laboratory data as prognostic factors. We found that all the laboratory items included in SORG-MLA, except for platelet count and serum creatinine, were associated with overall survival on Kaplan-Meier analysis (Supplemental Fig. 2; http://links.lww.com/CORR/B317). In patients with skeletal metastases in the extremities who underwent local treatment with surgery and with only radiotherapy, higher albumin (OR 4.2 [95% CI 3.4 to 5.1]; HR 0.44 [95% CI 0.41 to 0.48]), lower alkaline phosphatase (OR 0.79 [95% CI 0.74 to 0.84]; HR 1.05 [95% CI 1.04 to 1.06]), lower calcium (OR 1.1 [95% CI 1.0 to 1.2]; HR 0.94 [95% CI 0.89 to 0.99]), higher hemoglobin (OR 1.50 [95% CI 1.4 to 1.6]; HR 0.79 [95% CI 0.77 to 0.82]), lower international normalized ratio (OR 0.65 [95% CI 0.57 to 0.73]; HR 1.05 [95% CI 1.04 to 1.07]), higher lymphocytes (OR 1.9 [95% CI 1.6 to 2.2]; HR 0.68 [95% CI 0.62 to 0.73]), lower neutrophils (OR 0.87 [95% CI 0.84 to 0.90]; HR 1.07 [95% CI 1.05 to 1.08]), lower neutrophil-to-lymphocyte ratio (OR 0.88 [95% CI 0.86 to 0.90]; HR 1.03 [95% CI 1.03 to 1.03]), lower platelet-to-lymphocyte ratio (OR 0.83 [95% CI 0.79 to 0.88]; HR 1.04 [95% CI 1.03 to 1.04]), higher sodium (OR 1.15 [95% CI 1.12 to 1.18]; HR 0.93 [95% CI 0.93 to 0.94]), and lower white blood cells (OR 0.92 [95% CI 0.89 to 0.94]; HR 1.05 [95% CI 1.04 to 1.06]) were associated with better 1-year and overall survival after adjusting for the predictions made by METSSS (Table 3), suggesting that these laboratory items might have additional prognostic value independent of the predictions of METSSS.
Table 3.
Regression analysis of laboratory variables associated with 1-year survival and overall mortality
| Variable | 1-year survival (logistic regression) | Overall mortality (Cox regression) | ||
| Adjusted OR (95% CI)a | p value | Adjusted HR (95% CI)a | p value | |
| Categorical data | ||||
| Albumin (≥ median vs < median) | 3.9 (3.2 to 4.7) | < 0.001 | 0.46 (0.41 to 0.51) | < 0.001 |
| ALP (≥ median vs < median) | 0.5 (0.4 to 0.6) | < 0.001 | 1.6 (1.5 to 1.8) | < 0.001 |
| Calcium (≥ median vs < median) | 2.1 (1.7 to 2.5) | < 0.001 | 0.7 (0.6 to 0.8) | < 0.001 |
| Creatinine (≥ median vs < median) | 0.7 (0.4 to 0.9) | 0.005 | 0.9 (0.8 to 1.0) | 0.002 |
| Hemoglobin (≥ median vs < median) | 3.4 (2.8 to 4.1) | < 0.001 | 0.51 (0.46 to 0.57) | < 0.001 |
| INR (≥ median vs < median) | 0.32 (0.26 to 0.40) | < 0.001 | 1.9 (1.7 to 2.1) | < 0.001 |
| Lymphocyte (≥ median vs < median) | 2.1 (1.8 to 2.6) | < 0.001 | 0.62 (0.56 to 0.68) | < 0.001 |
| Neutrophil (≥ median vs < median) | 0.6 (0.5 to 0.7) | < 0.001 | 1.4 (1.3 to 1.5) | < 0.001 |
| NLR (≥ median vs < median) | 0.28 (0.23 to 0.34) | < 0.001 | 2.1 (1.9 to 2.3) | < 0.001 |
| Platelet (≥ median vs < median) | 1.0 (0.8 to 1.2) | 0.86 | 1.1 (1.0 to 1.2) | 0.28 |
| PLR (≥ median vs < median) | 0.5 (0.4 to 0.6) | < 0.001 | 1.6 (1.4 to 1.7) | < 0.001 |
| Sodium (≥ median vs < median) | 2.7 (2.2 to 3.3) | < 0.001 | 0.58 (0.52 to 0.64) | < 0.001 |
| WBC (≥ median vs < median) | 0.7 (0.5 to 0.8) | < 0.001 | 1.3 (1.1 to 1.4) | < 0.001 |
| Continuous data | ||||
| Albumin per g/dL increase | 4.2 (3.4 to 5.1) | < 0.001 | 0.44 (0.41 to 0.48) | < 0.001 |
| ALP per 100 IU/L increase | 0.79 (0.74 to 0.84) | < 0.001 | 1.05 (1.04 to 1.06) | < 0.001 |
| Calcium per mg/dL | 1.1 (1.0 to 1.2) | 0.02 | 0.94 (0.89 to 0.99) | 0.01 |
| Creatinine per mg/dL increase | 0.99 (0.86 to 1.13) | 0.91 | 1.0 (0.9 to 1.1) | 0.85 |
| Hemoglobin per g/dL increase | 1.5 (1.4 to 1.6) | < 0.001 | 0.79 (0.77 to 0.82) | < 0.001 |
| INR per 0.1-unit increase | 0.65 (0.57 to 0.73) | < 0.001 | 1.05 (1.04 to 1.07) | < 0.001 |
| Lymphocyte per 103/μL increase | 1.9 (1.6 to 2.2) | < 0.001 | 0.68 (0.62 to 0.73) | < 0.001 |
| Neutrophil per 103/μL increase | 0.87 (0.84 to 0.90) | < 0.001 | 1.07 (1.05 to 1.08) | < 0.001 |
| NLR per unit increase | 0.88 (0.86 to 0.90) | < 0.001 | 1.03 (1.03 to 1.03) | < 0.001 |
| Platelet per 105/μL increase | 1.0 (0.9 to 1.1) | 0.48 | 1.04 (0.99 to 1.09) | 0.11 |
| PLR per 100-unit increase | 0.83 (0.79 to 0.88) | < 0.001 | 1.04 (1.03 to 1.04) | < 0.001 |
| Sodium per mg/dL | 1.15 (1.12 to 1.18) | < 0.001 | 0.93 (0.93 to 0.94) | < 0.001 |
| WBC per 103/μL increase | 0.92 (0.89 to 0.94) | < 0.001 | 1.05 (1.04 to 1.06) | < 0.001 |
ALP = alkaline phosphatase; INR = international normalized ratio; NLR = neutrophil-to-lymphocyte ratio; PLR = platelet-to-lymphocyte ratio; WBC = white blood cell.
ORs and HRs were adjusted to the METSSS model. ORs and 95% CIs were rounded to one decimal place to the right if they did not affect data interpretation.
Discussion
Survival prediction models for patients with bone metastases arose out of the need for accurate estimation of survival when physicians and patients try to formulate a more personalized treatment plan. These prognostic models typically have been developed from the data of patients preselected for a particular local treatment strategy, such as surgery or radiotherapy [10, 16, 18, 21, 40, 44, 45, 48], and their applicability in patients whose local treatment strategy is yet to be determined has not been substantiated. We tested two state-of-the-art prediction models, namely the SORG-MLA and the METSSS, on patients who received local treatment with surgery followed by radiotherapy and with radiotherapy alone for symptomatic long-bone metastases, and we found that SORG-MLA demonstrated better predictive performance in these patients as measured by discrimination, calibration, Brier score, and DCAs. In the subgroup analysis of patients whose local treatment consisted of radiation alone, SORG-MLA still outperformed METSSS even though the latter was specifically created to estimate survival for this population. Further analysis indicated that the laboratory items included in SORG-MLA were independently associated with survival after adjusting for the predictions made by METSSS, suggesting that researchers might seek to test the prognostic values of these laboratory data when developing or updating a prediction model. The SORG-MLA model can be found at https://sorg-apps.shinyapps.io/extremitymetssurvival/.
Limitations
Limitations are present in this study. First, a prediction model’s true clinical utility cannot be definitively determined until it has been implemented in the real world and patient outcomes have been reviewed. Neither SORG-MLA nor METSSS was used to aid treatment decisions at the point of care in this study. Although we demonstrated the feasibility of using SORG-MLA in patients undergoing evaluation for local treatment. Future implementation-phase studies are needed before widespread use of SORG-MLA can be recommended. Second, the study was conducted in a single tertiary referral academic center where multidisciplinary care was readily accessible, and surgery was typically done by fellowship-trained orthopaedic oncologists. Typically, prediction models are best at predicting what they were trained on, and our results might not be applicable to patients treated in more rural, nonacademic institutions where it was not routine to involve multiple specialties in treating patients with skeletal metastases. Third, 61% (1176 of 1920) of patients had missing data, and this was inevitable because of the study’s retrospective nature and a lack of consensus among our radiation oncologists on whether and what laboratory tests were necessary before radiotherapy was performed. Nevertheless, the percentage of missing data was relatively low for each item (Supplemental Table 1; http://links.lww.com/CORR/B317): only albumin and alkaline phosphatase had > 20% of missingness. Furthermore, the sensitivity analysis on patients without missing data showed almost identical model performance results, and the MissForest imputation method we applied for missing data was widely accepted [19]. Therefore, we believe that this is a minor limitation. Fourth, because this was a retrospective study, we could not have standardized the timing and items of laboratory tests. Unlike surgical patients, not all patients treated nonoperatively had a complete laboratory workup shortly before radiotherapy, and we allowed laboratory values collected as far as 2 months prior to the index treatment to be used for survival estimation. Using data obtained farther away from the index treatment potentially could result in less accurate survival projections because a patient’s physical condition might change considerably within 2 months’ time, especially in the absence of effective systemic therapy. In practice, if clinicians have reasons to believe that a patient’s overall health has recently altered, it might be prudent to acquire more up-to-date laboratory data if they want to consult SORG-MLA for survival estimation. Fifth, although we included many known prognostic variables as confounding factors, there are still additional factors that may impact the endpoints. For example, response to molecular targeted therapy was associated with the prognosis of patients with lung cancer bone metastases [49]. Innovative immunotherapy has also been shown to cause overestimation of mortality with SORG-MLA [5]. These findings suggest that there is a need for temporal reappraisal and updates of these machine learning–generated algorithms as cancer treatment continues to evolve. Lastly, the clinical decision for patients with extremity metastases is complex, and patient survival is only one aspect of what should be a holistic assessment. The radiosensitivity and anatomic location of the target lesion, potential complications associated with the proposed type of treatment, the risk of local recurrence, and the patient’s desired quality of life all contribute to the formulation of a treatment plan. Clinicians should be aware of the limitations of current prognostic models and not make decisions solely based on a patient’s estimated survival.
Which Survival Prediction Model Performed Better in Patients Treated With Surgery Plus Radiation or With Radiation Alone?
Orthopaedic and radiation oncologists are often asked to help evaluate patients with symptomatic skeletal metastases and advise on the type of local treatment. An accurate estimation of a patient’s likely survival before a particular treatment strategy has been selected would improve the care of these patients and would help physicians make more informed and individualized decisions. Between the two modern prediction models we tested, only SORG-MLA achieved acceptable discrimination because its AUROC exceeded 0.7 [29]. SORG-MLA was also better calibrated to the actual survival, and on Brier score analysis, it provided more information than the null model, which assumed that every patient would have had a mortality risk equal to the prevalence of death at a particular timepoint (90 days or 1 year) in the study cohort. Clinicians, however, might still wonder whether using SORG-MLA was justified simply based on better performance metrics. DCA can evaluate whether using a prediction model brings more good than harm by assessing the net benefit across various plausible risk thresholds or patient preferences of survival probabilities for undergoing treatment, and we found that on DCA, SORG-MLA compared favorably to METSSS in patients undergoing local treatment with surgery followed by radiotherapy and with radiotherapy alone when their individualized risk thresholds for opting to undergo the proposed treatment fell between 0.2 and 0.9. For example, a patient with lung cancer who sustains a humeral pathologic fracture might decide to accept surgical fixation as long as their chance of survival at 3 months is > 50%. If SORG-MLA estimates a 3-month survival probability of 60%, then offering surgery would be a decision based on SORG-MLA’s prediction. On the other hand, if SORG-MLA yields a 3-month survival probability of 30%, not offering surgery would also be considered a decision based on SORG-MLA prediction. The treating clinician could then set the patient’s risk threshold (that is, the patient’s preferred survivability threshold for undergoing surgery) at 0.5 (50%) and infer on the DCA whether a decision based on SORG-MLA prediction provided net benefit compared to the default or any other strategies. As the complexity and magnitude of the proposed treatment modality increases, for example, performing tumor resection and megaprosthesis reconstruction, a patient might decide that the correspondingly increased risks of surgery are worth taking only if he or she has a higher chance of survival, such as 80% at 3 months. In this scenario, the clinician should use 0.8 (80%) as the risk threshold and determine whether any net benefit can be gained by consulting SORG. Furthermore, METSSS does not directly estimate 90-day survival, a period of clinical importance when physicians are weighing different treatment modalities. All in all, SORG-MLA appeared to be more useful for the prediction of survival in patients with long-bone metastases who were under consideration for local treatment. However, clinical decisions might benefit from having more prediction timepoints, such as 1 month and 6 months, because 1-month survival might factor heavily into the decision of operative versus nonoperative treatment, and 6-month survival could influence the type of reconstruction that surgeons choose. The PATHFx, another machine-learning survival estimation tool developed to predict survival for patients with long-bone metastases, offers survival predictions at 6 different time points: 1, 3, 6, 12, 18, and 24 months [1, 10, 12, 31, 34]. PATHFx has been externally validated several times, and recently updated to include nonsurgically treated patients [1]. If physicians wish to obtain survival estimations at time points other than 90 days and 1 year, they could consult the online version of PATHFx 3.0 at https://www.pathfx.org/.
Which Model Performed Better Among Patients Who Received Only Palliative Radiotherapy?
Between the two models we tested, SORG-MLA turned out to be the better performer on discrimination, calibration, Brier score, and DCAs in patients whose local treatment consisted of radiotherapy alone. This was contrary to what we had expected because METSSS was developed based on a large registry of patients whose target skeletal metastases were treated with only radiation, and it had adequate discrimination and good calibration on internal validation [53]. DCA can help clinicians decide when to use a model. In our subgroup of patients who received local treatment with only radiotherapy, DCA suggested that for a patient whose risk threshold for the proposed radiotherapy regimen was < 0.2, physicians might adopt the “treat all” strategy because neither SORG-MLA nor METSSS was superior to this default management at 90 days and 1 year (Supplemental Fig. 1; http://links.lww.com/CORR/B317). However, using SORG-MLA’s 90-day and 1-year predictions to guide radiotherapy would provide more benefit than both the “treat all” strategy and the METSSS across a wide range of risk thresholds (or patients’ acceptable threshold of survival probabilities for undergoing treatment) between 0.2 and 0.9. METSSS, on the other hand, brought little benefit over the “treat all” strategy unless a patient’s risk threshold was in a narrower range of 0.7 to 0.9 at 90 days and 0.3 to 0.6 at 1 year. In light of these results, we would recommend that clinicians use SORG-MLA instead of METSSS if they wish to consult a survival estimation tool to guide treatment decisions even in patients who will undergo local treatment with only radiotherapy. Another survival prediction model initially trained on surgically treated patients but later updated to include those who were nonsurgically treated is the PATHFx. However, for patients whose local treatment consisted of radiotherapy only, Anderson et al. [1] found on DCA that using the 1-, 3-, and 6-month predictions of PATHFx to guide radiotherapy decisions was not superior simply to treating all patients with radiotherapy unless the patient had an extremely high survivability threshold for treatment (exceeding 95%). SORG-MLA has not been directly compared with PATHFx in a cohort consisting of patients treated with radiotherapy alone for their target skeletal metastases, and further research is needed to determine whether SORG-MLA or PATHFx has a performance advantage in this particular population.
Are Laboratory Values in SORG-MLA Independently Associated With Survival?
Current prognostic models for patients with skeletal metastases were often created using data gathered from either institutional databases (such as SORG-MLA) or large registries (such as METSSS). The latter possess large, diverse data sets that hold the promise of creating more generalizable algorithms [7, 35]. Yet registry data could also suffer from a lack of detail and input inaccuracies [24]. One might anticipate that METSSS, a model specifically designed for patients undergoing palliative radiotherapy, would outperform SORG-MLA in our patients because most (84%) of them received radiotherapy only. In fact, we found that SORG-MLA had superior performance not only in the overall cohort but also in the subgroup of patients whose local treatment consisted of radiotherapy only. SORG-MLA included a comprehensive list of 10 laboratory items that reflect patients’ physical and physiologic status; and these items, except for platelet count and serum creatinine level, all had an association with survival in our patients on multivariable and Cox regression analyses after adjusting for the survival predictions made by METSSS. The inclusion of more detailed data can enhance the performance of a prediction model [13, 22]. Future research may seek to identify more survival predictors and incorporate them into newly developed algorithms or existing models requiring refinement or updates. Morphometric evaluation, such as the measurement of psoas muscle size on CT, has been shown to provide additional prognostic value in patients with bone metastases [17, 52]. Clinicians should be aware of the limitations of prediction models and also consider clinical parameters, such as nutritional status, when they assess patients.
Conclusion
We demonstrated the clinical utility of SORG-MLA in estimating survival of patients treated both operatively and nonoperatively for extremity metastases. SORG-MLA also outperformed METSSS in patients whose local treatment consisted of radiotherapy alone. In addition, we found that several laboratory items included in SORG-MLA were associated with survival after adjusting for the predictions of METSSS, suggesting that these values might provide additional prognostic value when clinicians assess a patient’s overall health and likely survival. We propose that clinicians might use SORG-MLA as the preferred tool for survival estimation in all patients with extremity bone metastases undergoing evaluation for local therapy. However, they should be cognizant of the weaknesses of current prediction models, including their potential applicability issue in different care settings, need for complete and recent laboratory data, and probable performance decline without temporal reappraisal, and consider other important factors in the decision-making process, such as possible complications associated with the mode of local treatment, risk of local tumor progression, and patients’ quality of life. Future studies seeking to develop or refine a prediction model might try to include more diverse and contemporary patients treated both operatively and nonoperatively for skeletal metastases so that the study cohorts are more reflective of the real-world practice of radiation and orthopaedic oncologists.
Supplementary Material
Acknowledgments
We thank all healthcare professionals from various departments of National Taiwan University Hospital for their contribution in providing multidisciplinary care for our patients. We also thank Professor Wen-Chung Lee (Department of Public Health, National Taiwan University) for proving consultation on the statistical methods employed in this study. We thank the staff of the Department of Medical Research for gathering and providing clinical data from the National Taiwan University Hospital-integrative Medical Database (NTUH-iMD).
Footnotes
The first three authors contributed equally to this manuscript.
Each author certifies that there are no funding or commercial associations (consultancies, stock ownership, equity interest, patent/licensing arrangements, etc.) that might pose a conflict of interest in connection with the submitted article related to the author or any immediate family members.
All ICMJE Conflict of Interest Forms for authors and Clinical Orthopaedics and Related Research® editors and board members are on file with the publication and can be viewed on request.
Ethical approval for this study was obtained from the research ethics committee at the National Taiwan University Hospital (No. 202105108RINC).
This work was performed at National Taiwan University Hospital, Taipei, Taiwan.
Contributor Information
Chia-Che Lee, Email: 017604@ntuh.gov.tw.
Chih-Wei Chen, Email: ohw0701@gmail.com.
Hung-Kuan Yen, Email: b04401122@ntu.edu.tw.
Yen-Po Lin, Email: LYP7972@gmail.com.
Cheng-Yo Lai, Email: G02724@hch.gov.tw.
Jaw-Lin Wang, Email: jlwang@ntu.edu.tw.
Olivier Q. Groot, Email: oqgroot@gmail.com.
Stein J. Janssen, Email: steinjanssen@gmail.com.
Joseph H. Schwab, Email: Joseph.Schwab@cshs.org.
Feng-Ming Hsu, Email: hsufengming@ntuh.gov.tw.
References
- 1.Anderson AB, Wedin R, Fabbri N, et al. External validation of PATHFx version 3.0 in patients treated surgically and nonsurgically for symptomatic skeletal metastases. Clin Orthop Relat Res. 2020;478:808-818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Arpornsuksant P, Morris CD, Forsberg JA, Levin AS. What factors are associated with local metastatic lesion progression after intramedullary nail stabilization? Clin Orthop Relat Res. 2022;480:932-945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chow E, Harth T, Hruby G, et al. How accurate are physicians’ clinical predictions of survival and the available prognostic tools in estimating survival times in terminally ill cancer patients? A systematic review. Clin Oncol (R Coll Radiol). 2001;13:209-218. [DOI] [PubMed] [Google Scholar]
- 4.Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. [DOI] [PubMed] [Google Scholar]
- 5.de Groot TM, Ramsey D, Groot OQ, et al. Does the SORG machine-learning algorithm for extremity metastases generalize to a contemporary cohort of patients? Temporal validation from 2016 to 2020. Clin Orthop Relat Res. 2023;481:2419-2430. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Debray TP, Damen JA, Snell KI, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ. 2017;356:i6460. [DOI] [PubMed] [Google Scholar]
- 7.Debray TP, Riley RD, Rovers MM, et al. Cochrane IPDM-aMg. Individual participant data (IPD) meta-analyses of diagnostic and prognostic modeling studies: guidance on their use. PLoS Med. 2015;12:e1001886. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Demler OV, Pencina MJ, D’Agostino RB Sr. Misuse of DeLong test to compare AUCs for nested models. Stat Med. 2012;31:2577-2587. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Downie S, Lai FY, Joss J, Adamson D, Jariwala AC. The Metastatic Early Prognostic (MEP) score. Bone Joint J. 2020;102-B:72-81. [DOI] [PubMed] [Google Scholar]
- 10.Forsberg JA, Eberhardt J, Boland PJ, Wedin R, Healey JH. Estimating survival in patients with operable skeletal metastases: an application of a Bayesian belief network. PLoS One. 2011;6:e19956. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Forsberg JA, Wedin R, Bauer HC, et al. External validation of the Bayesian Estimated Tools for Survival (BETS) models in patients with surgically treated skeletal metastases. BMC Cancer. 2012;12:493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Forsberg JA, Wedin R, Boland PJ, Healey JH. Can we estimate short- and intermediate-term survival in patients undergoing surgery for metastatic bone disease? Clin Orthop Relat Res. 2017;475:1252-1261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gouda OE, El‐Hoshy SH. Diagnostic technique for analysing the internal faults within power transformers based on sweep frequency response using adjusted R‐square methodology. IET Sci Meas Technol. 2020;14:1057-1068. [Google Scholar]
- 14.Groot OQ, Ogink PT, Lans A, et al. Machine learning prediction models in orthopedic surgery: a systematic review in transparent reporting. J Orthop Res. 2022;40:475-483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Houdek MT, Wellings EP, Moran SL, et al. Outcome of sacropelvic resection and reconstruction based on a novel classification system. J Bone Joint Surg Am. 2020;102:1956-1965. [DOI] [PubMed] [Google Scholar]
- 16.Hsieh HC, Lai YH, Lee CC, et al. Can a Bayesian belief network for survival prediction in patients with extremity metastases (PATHFx) be externally validated in an Asian cohort of 356 surgically treated patients? Acta Orthop. 2022;93:721-731. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hu MH, Yen HK, Chen IH, et al. Decreased psoas muscle area is a prognosticator for 90-day and 1-year survival in patients undergoing surgical treatment for spinal metastasis. Clin Nutr. 2022;41:620-629. [DOI] [PubMed] [Google Scholar]
- 18.Huang Z, Hu C, Chi C, et al. An artificial intelligence model for predicting 1-year survival of bone metastases in non-small-cell lung cancer patients based on XGBoost algorithm. Biomed Res Int. 2020;2020:3462363. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17:162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Janssen SJ, Teunis T, Hornicek FJ, et al. Outcome after fixation of metastatic proximal femoral fractures: a systematic review of 40 studies. J Surg Oncol. 2016;114:507-519. [DOI] [PubMed] [Google Scholar]
- 21.Janssen SJ, van der Heijden AS, van Dijke M, et al. 2015 Marshall Urist Young Investigator Award: prognostication in patients with long bone metastases: does a boosting algorithm improve survival estimates? Clin Orthop Relat Res. 2015;473:3112-3121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Karch J. Improving on adjusted R-squared. Collabra Psychol. 2020;6. [Google Scholar]
- 23.Karhade AV, Schwab JH. CORR synthesis: when should we be skeptical of clinical prediction models? Clin Orthop Relat Res. 2020;478:2722-2728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lawrenz JM, Johnson SR, Hajdu KS, et al. Is the number of National Database Research studies in musculoskeletal sarcoma increasing, and are these studies reliable? Clin Orthop Relat Res. 2023;481:491-508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Lee HL, Kuo CC, Tsai JT, et al. Magnetic resonance-guided focused ultrasound versus conventional radiation therapy for painful bone metastasis: a matched-pair study. J Bone Joint Surg Am. 2017;99:1572-1578. [DOI] [PubMed] [Google Scholar]
- 26.Liu WC, Li ZQ, Luo ZW, et al. Machine learning for the prediction of bone metastasis in patients with newly diagnosed thyroid cancer. Cancer Med. 2021;10:2802-2811. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lutz S, Balboni T, Jones J, et al. Palliative radiation therapy for bone metastases: update of an ASTRO evidence-based guideline. Pract Radiat Oncol. 2017;7:4-12. [DOI] [PubMed] [Google Scholar]
- 28.Ma Y, He S, Liu T, et al. Quality of life of patients with spinal metastasis from cancer of unknown primary origin: a longitudinal study of surgical management combined with postoperative radiation therapy. J Bone Joint Surg Am. 2017;99:1629-1639. [DOI] [PubMed] [Google Scholar]
- 29.Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5:1315-1316. [DOI] [PubMed] [Google Scholar]
- 30.Matsuyama Y, Nakamura T, Yoshida K, et al. Radiodynamic therapy with acridine orange local administration as a new treatment option for primary and secondary bone tumours. Bone Joint Res. 2022;11:715-722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Meares C, Badran A, Dewar D. Prediction of survival after surgical management of femoral metastatic bone disease - a comparison of prognostic models. J Bone Oncol. 2019;15:100225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Mirels H. Metastatic disease in long bones. A proposed scoring system for diagnosing impending pathologic fractures. Clin Orthop Relat Res. 1989;249:256-264. [PubMed] [Google Scholar]
- 33.Ogink PT, Groot OQ, Karhade AV, et al. Wide range of applications for machine-learning prediction models in orthopedic surgical outcome: a systematic review. Acta Orthop. 2021;92:526-531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Overmann AL, Clark DM, Tsagkozis P, Wedin R, Forsberg JA. Validation of PATHFx 2.0: an open-source tool for estimating survival in patients undergoing pathologic fracture fixation. J Orthop Res. 2020;38:2149-2156. [DOI] [PubMed] [Google Scholar]
- 35.Riley RD, Ensor J, Snell KIE, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353:i3140. Erratum appears in; BMJ. 2019;365:l4379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Schneider KN, Broking JN, Gosheger G, et al. What is the implant survivorship and functional outcome after total humeral replacement in patients with primary bone tumors? Clin Orthop Relat Res. 2021;479:1754-1764. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Schoenfeld AJ, Ferrone ML, Passias PG, et al. Laboratory markers as useful prognostic measures for survival in patients with spinal metastases. Spine J. 2020;20:5-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Siegert S. Variance estimation for Brier Score decomposition. Q J R Meteorol Soc. 2014;140:1771-1777. [Google Scholar]
- 39.Skalitzky MK, Gulbrandsen TR, Groot OQ, et al. The preoperative machine learning algorithm for extremity metastatic disease can predict 90-day and 1-year survival: an external validation study. J Surg Oncol. 2022;125:282-289. [DOI] [PubMed] [Google Scholar]
- 40.Song Q, Shang J, Zhang C, Zhang L, Wu X. Impact of the homogeneous and heterogeneous risk factors on the incidence and survival outcome of bone metastasis in NSCLC patients. J Cancer Res Clin Oncol. 2019;145:737-746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Sorensen MS, Gerds TA, Hindso K, Petersen MM. Prediction of survival after surgery due to skeletal metastases in the extremities. Bone Joint J. 2016;98-B:271-277. [DOI] [PubMed] [Google Scholar]
- 42.Sorensen MS, Gerds TA, Hindso K, Petersen MM. External validation and optimization of the SPRING model for prediction of survival after surgical treatment of bone metastases of the extremities. Clin Orthop Relat Res. 2018;476:1591-1599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Stekhoven DJ, Buhlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28:112-118. [DOI] [PubMed] [Google Scholar]
- 44.Thio Q, Karhade AV, Bindels BJJ, et al. Development and internal validation of machine learning algorithms for preoperative survival prediction of extremity metastatic disease. Clin Orthop Relat Res. 2020;478:322-333. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tseng TE, Lee CC, Yen HK, et al. International validation of the SORG machine-learning algorithm for predicting the survival of patients with extremity metastases undergoing surgical treatment. Clin Orthop Relat Res. 2022;480:367-378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565-574. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Wang M, Wu Q, Zhang J, et al. Prognostic impacts of extracranial metastasis on non-small cell lung cancer with brain metastasis: a retrospective study based on surveillance, epidemiology, and end results database. Cancer Med. 2021;10:471-482. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Willeumier JJ, van der Hoeven NM, Bollen L, et al. Epidermal growth factor receptor mutations should be considered as a prognostic factor for survival of patients with pathological fractures or painful bone metastases from non-small cell lung cancer. Bone Joint J. 2017;99-B:516-521. [DOI] [PubMed] [Google Scholar]
- 50.Willeumier JJ, van der Linden YM, van der Wal C, et al. An easy-to-use prognostic model for survival estimation for patients with symptomatic long bone metastases. J Bone Joint Surg Am. 2018;100:196-204. [DOI] [PubMed] [Google Scholar]
- 51.Yen HK, Chiang H. Letter to the editor: CORR synthesis: When should we be skeptical of clinical prediction models? Clin Orthop Relat Res. 2022;480:2271-2273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Zakaria HM, Massie L, Basheer A, et al. Application of morphometrics as a predictor for survival in female patients with breast cancer spinal metastasis: a retrospective cohort study. Spine J. 2018;18:1798-1803. [DOI] [PubMed] [Google Scholar]
- 53.Zaorsky NG, Liang M, Patel R, et al. Survival after palliative radiation therapy for cancer: the METSSS model. Radiother Oncol. 2021;158:104-111. [DOI] [PMC free article] [PubMed] [Google Scholar]

