Abstract
BACKGROUND:
Current prognostic models for brain metastases (BMs) have been constructed and validated almost entirely with data from patients receiving up-front radiotherapy, leaving uncertainty about surgical patients.
OBJECTIVE:
To build and validate a model predicting 6-month survival after BM resection using different machine learning algorithms.
METHODS:
An institutional database of 1062 patients who underwent resection for BM was split into an 80:20 training and testing set. Seven different machine learning algorithms were trained and assessed for performance; an established prognostic model for patients with BM undergoing radiotherapy, the diagnosis-specific graded prognostic assessment, was also evaluated. Model performance was assessed using area under the curve (AUC) and calibration.
RESULTS:
The logistic regression showed the best performance with an AUC of 0.71 in the hold-out test set, a calibration slope of 0.76, and a calibration intercept of 0.03. The diagnosis-specific graded prognostic assessment had an AUC of 0.66. Patients were stratified into regular-risk, high-risk and very high-risk groups for death at 6 months; these strata strongly predicted both 6-month and longitudinal overall survival (P < .0005). The model was implemented into a web application that can be accessed through http://brainmets.morethanml.com.
CONCLUSION:
We developed and internally validated a prediction model that accurately predicts 6-month survival after neurosurgical resection for BM and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.
KEY WORDS: Brain metastases, Machine learning, Neurosurgery, Survival prediction
ABBREVIATIONS:
- BM
brain metastases
- dS-GPA
diagnosis-specific graded prognostic assessment
- KPS
Karnofsky Performance Status
- ML
machine learning
- SRS
stereotactic radiosurgery
- WBRT
whole-brain radiotherapy.
Brain metastases (BMs) are increasingly common intracranial tumors.1 Without treatment, they are associated with a dismal prognosis of 1 to 2 months.2,3 Since the seminal randomized controlled trials by Patchell et al4 and Vecht et al,5 surgical resection has become a pillar of BM management. In these trials, patients with a single, resectable BM who underwent craniotomy and whole-brain radiotherapy (WBRT) had a median survival of 9 to 10 months. In comparison, similar patients who received only WBRT had a median survival of up to 6 months.4,5
In more recent years, survival of patients with surgical BM has increased further, partly because of novel drugs that control extracranial metastases in various cancers.6-9 In practice, this has allowed for good outcomes after BM resection beyond the inclusion criteria of the clinical trials, such as those with multiple BMs10,11 or those with a locally recurrent BM.12-14
Despite these advances, there still remain a proportion of patients with resected BM whose survival does not exceed the threshold of 6 months, which is roughly the median survival when treated with radiotherapy only in Vecht's randomized trial.5 Careful consideration of patient and disease characteristics may help select patients who will benefit most from surgery; however, prognostic models for patients with surgical BM are currently lacking. Although the recursive partitioning analysis15 and the diagnosis-specific graded prognostic assessment (ds-GPA)16-20 are validated prognostic scores, these were largely developed and validated in patients receiving up-front radiation.
Machine learning (ML) algorithms have proven successful in predicting outcomes after neurosurgical procedures.21 This new methodology has not previously been applied to patients with BM undergoing resection. Therefore, we aimed to create and internally validate a prognostic model to predict 6-month survival after craniotomy for BM using different ML modalities.
METHODS
Data Collection and Variable Selection
Under institutional review board approval, the institutional BM database of a large tertiary neurosurgical center was used to train our model. This database includes all consecutive adult patients with BMs who underwent craniotomy at our department from January 2007 to January 2018. Patient consent was waived for this study by the institutional review board.
The following data were extracted for each patient: sex (male/female), age (continuous), number of BMs (continuous), location of BMs, tumor origin (lung, breast, melanoma, renal cell, colorectal, gynecological, or others), size of the largest BM (continuous), interval from BM diagnosis to surgery (continuous), pattern of systemic metastases (none, lymph nodes, lung, bone, liver, and/or others), radiation therapy before resection (yes/no; this indicates a recurrent metastasis rather than neoadjuvant radiation), immunotherapy before resection (yes/no), Karnofsky Performance Status (KPS; continuous), adjuvant stereotactic radiosurgery (SRS), and the presence of a targetable mutation (EGFR, ALK, BRAF, NRAS, and HER2; yes/no). KPS was extracted as recorded; it was not retrospectively determined based on patient's notes. We avoided using postoperative variables with the exception of adjuvant SRS, given the fact that this is usually planned preoperatively and plays an important role in postsurgical disease control. A sensitivity analysis without inclusion of adjuvant SRS was also performed. Continuous variables were not discretized using cutoff values. In case of missing data, sample median imputation was used for continuous variables. For categorical variables, missing observations were assigned the default 0 value following one-hot encoding.
Variables were selected based on expected clinical significance and availability within our database. We did not perform any data-driven variable selection, eg, by using P-values on individual variables. Although P-values are useful for determining the impact of, eg, treatment strategies in clinical trials, they are not primarily meant for variable selection.22
Model Development
Patients were randomized into training and testing sets using an 80:20 split.
Different algorithms were trained to perform a binary classification task outputting the probability of survival at 6 months postoperatively. Six-month survival was used as a primary outcome because in our view, this is a meaningful prognostic cutoff when considering the expected benefit of surgical treatment. An additional benefit is a relatively low loss to follow-up, increasing accuracy of prediction and ease of validation. The following algorithms were trained: gradient boosting classifier, K-nearest neighbors, logistic regression, naive Bayes classifier, random forest classifier, and support vector machine. These algorithms use presented data to iteratively tweak a model to provide the best fit for that data; a more detailed description of the individual algorithms can be found elsewhere.23 We furthermore investigated whether better performance was seen when combining different methodologies into one ensemble model incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Ensembles, also known as stacked models, are models that first train different individual algorithms on a data set and then use their predictions to train a second-level algorithm to produce the final prediction (Figure 1). Such a combined approach would account for the fact that there is not 1 algorithm that gives the optimal prediction for every case; in other words, this is a way to “hedge” predictions and prevent outliers or poor calibration. Models were independently fine-tuned through hyperparameter tuning using 5-fold cross-validation.
FIGURE 1.
Stacked machine learning model. The output of different machine learning algorithms is collected and used as input for a logistic regression, which gives the final prediction.
Internal Validation and Model Performance Assessment
Model accuracy was quantified using the receiver operating characteristic curve and the area under this curve (AUC). The AUC measures discriminatory ability; an AUC of 1.0 indicates perfect prediction while 0.5 equals random chance.24-26 Model calibration was assessed graphically using calibration plots and numerically with calibration slope and intercept. Calibration describes the concordance between model-assigned probabilities and actual probabilities of 6-month survival in binned groups. This assesses whether the model overestimates (calibration slope <1, intercept <0) or underestimates (slope >1, intercept >0) survival. The perfect model would have a slope of 1 and an intercept of 0.25,26 Finally, the Brier27 score was calculated; this is the mean squared error between predicted and observed outcomes; the perfect score is 0.26 This score takes into account both accuracy and calibration.
The predictive performance of the ds-GPA in this population was also evaluated. Predicted survival (more or less than 6 months) was calculated for each patient based on the ds-GPA scale appropriate for their histology. For BMs of rare origins, we used the general GPA score instead.16 All performance assessment was conducted on the test set. The best performing model was subsequently selected for further analysis.
Feature Importance and Risk Stratification
After model training, feature importance of different variables in the selected model was explored; this corresponds to each variable's relative importance in the prediction. The distribution of the probability of predicted 6-month mortality was explored using histograms. The range of probabilities in the training set was split into 3 risk categories. Patients in the test set were assigned one of these categories, and this stratification was used to fit a Cox proportional hazards model. Kaplan–Meier curves were constructed to visualize survival differences between groups. The log-rank test was used to determine statistical significance between the survival groups; P < .05 was considered statistically significant.
Statistical Software
R version 3.5.0 (R Foundation for Statistical Computing) was used for initial data preprocessing and data visualization. Python version 3.6 (Python Software Foundation) was used for data preprocessing, model development, validation, and risk stratification. The scikit-learn28 and Keras29 packages for Python were used to create the models, and the Lifelines package30 was used to create the Kaplan–Meier curves.
RESULTS
Patient Characteristics
We included 1062 patients with resected BMs. The median follow-up was 42.1 months (95% CI: 36.7-47.2). In total, 213 were randomized into the test set while the remaining 849 were used to train the models. Baseline demographic characteristics were similar between training and test sets and are outlined in Table 1. The median survival was 14.0 months (IQR: 5.5-29.5); 73.9% of patients were alive and accounted for at 6 months after craniotomy.
TABLE 1.
Baseline Characteristics
Demographic characteristic | Total (n = 1062) | Training (n = 849) | Test (n = 213) |
---|---|---|---|
Age, years, median (IQR) | 61.0 (53.1-68.9) | 61.2 (52.9-69.1) | 60.6 (53.4-68.1) |
Sex (male), n (%) | 447 (42.1) | 354 (41.7) | 93 (43.7) |
No. of BMs, median (IQR) | 2.0 (1.0-3.0) | 2.0 (1.0-3.0) | 1.0 (1.0-3.0) |
BM size,a median (IQR) (cm) | 3.0 (2.3-3.9) | 3.0 (2.3-3.9) | 3.0 (2.3-4.0) |
Location of BMs, n (%) | |||
Supratentorial | 901 (84.8) | 724 (85.3) | 177 (83.1) |
Infratentorial | 419 (39.5) | 332 (39.1) | 87 (40.8) |
Dural involvement | 32 (3.0) | 25 (2.9) | 7 (3.3) |
Primary tumor origin | |||
Lung cancer | 476 (44.8) | 373 (43.9) | 103 (48.4) |
Melanoma | 144 (13.6) | 122 (14.4) | 22 (10.3) |
Breast cancer | 154 (14.5) | 125 (14.7) | 29 (13.6) |
Colorectal cancer | 38 (3.6) | 29 (3.4) | 9 (4.2) |
Gynecological cancer | 56 (5.3) | 45 (5.3) | 11 (5.2) |
Renal cell carcinoma | 51 (4.8) | 40 (4.7) | 11 (5.2) |
Others | 143 (13.5) | 115 (13.5) | 28 (13.1) |
Pattern of extracranial metastases, n (%) | |||
None | 450 (42.3) | 359 (42.3) | 91 (42.7) |
Lung | 227 (21.4) | 189 (22.3) | 38 (17.8) |
Bone | 151 (14.2) | 120 (14.1) | 31 (14.6) |
Liver | 61 (5.7) | 47 (5.5) | 14 (6.6) |
Others | 451 (42.4) | 360 (42.4) | 91 (42.7) |
Time from diagnosis of BMs to craniotomy, d, median (IQR) | 3.0 (1.0-9.0) | 3.0 (1.0-8.0) | 3.0 (1.0-9.0) |
Previous radiation therapy, n (%) | 131 (12.3) | 110 (13.0) | 21 (9.9) |
Adjuvant SRS, n (%) | 407 (38.3) | 319 (37.6) | 88 (41.3) |
Previous immunotherapy, n (%) | 34 (3.2) | 24 (2.8) | 10 (4.7) |
Presence of targetable mutations, n (%) | 148 (13.9) | 113 (13.3) | 35 (16.4) |
Karnofsky Performance Status, median (IQR) | 80 (80-100) | 80 (80-100) | 80 (80-100) |
BM, brain metastases; SRS, stereotactic radiosurgery.
Diameter of the largest BM.
Model Performance
Table 2 summarizes the performance of all algorithms in the test set. The various trained models achieved AUCs ranging from 0.63 to 0.71. Calibration slopes ranged from 0.32 to 0.81 while intercepts ranged from 0.00 to 0.15. Brier scores ranged from 0.17 to 0.21. The logistic regression displayed the best overall performance for AUC (0.71; optimal cutoff at 0.23), calibration (slope 0.76, intercept 0.03), and Brier score (0.17). The stacked model performed approximately equally (AUC 0.70, calibration slope 0.81, intercept 0.05, and Brier score 0.18). Hyperparameters are described in Supplementary Table S1, http://links.lww.com/NEU/D182. When excluding adjuvant SRS from the logistic regression, AUC was 0.69, calibration slope 0.60, intercept 0.11, and Brier score 0.18. The ds-GPA had an AUC of 0.66, a calibration slope of 0.59 and intercept of 0.00, and a Brier score of 0.20.
TABLE 2.
Performance of the Various Models
Model | AUC | Calibration | Brier score | |
---|---|---|---|---|
Slope | Intercept | |||
dS-GPA (original cohorts) | 0.661 | 0.59 | 0.00 | 0.199 |
dS-GPA (external validation cohorts) | 0.664 | 0.57 | 0.00 | 0.207 |
Logistic regression | 0.709 | 0.76 | 0.03 | 0.174 |
Random forest | 0.656 | 0.75 | 0.01 | 0.179 |
Gradient boosting | 0.683 | 0.61 | 0.08 | 0.179 |
Support vector classifier | 0.657 | 0.32 | 0.08 | 0.191 |
K-nearest neighbor | 0.627 | 0.40 | 0.13 | 0.193 |
Stacked model | 0.697 | 0.81 | 0.05 | 0.175 |
Sensitivity analysis with adjuvant SRS variable excluded (logistic regression) | 0.686 | 0.60 | 0.11 | 0.176 |
AUC, area under the curve; dS-GPA, diagnosis-specific graded prognostic assessment; SRS, stereotactic radiosurgery.
Bold entries denote the best performing algorithm which was ultimately implemented.
The receiver operating characteristic curve curves of the logistic regression and the ds-GPA are presented in Figure 2; calibration curves are presented in Supplementary Figure S2, http://links.lww.com/NEU/D182.
FIGURE 2.
The receiver-operating curves for the stacked model. A, The stacked model had an AUC of 0.71. The diagnosis-specific graded prognostic assessment had an AUC of 0.66 when using predicted survival data from either its B, training cohort and C, external validation cohort. AUC, area under the curve.
Mean Feature Importance and Risk Stratification
Figure 3 displays feature importance, which was derived from the coefficients of the logistic regression. KPS was the strongest contributor to this model, with higher KPS predicting longer survival. Other variables with a feature importance of >5% were adjuvant SRS, cerebellar/posterior fossa localization, age, and receipt of immunotherapy before craniotomy.
FIGURE 3.
Mean feature importance in the final model. Feature importance denotes the fraction (between 0 and 1) of the model prediction that was contributed by any individual variable. KPS, Karnofsky Performance Status; SRS, stereotactic radiosurgery; WBRT, whole-brain radiotherapy.
The distribution of predicted risks of 6-month mortality across the training and test sets is displayed in Figure 4A and 4B. In the training set, predicted risks ranged from 0.02 to 0.91. We constructed 3 groups based on predicted risk of mortality: regular-risk (<30%), high-risk (30%-60%), and very high-risk (>60%). This stratification was subsequently used to categorize patients in the test set (n = 213), where 138 (64.8%), 68 (31.9%), and 6 (2.8%) patients were determined regular-risk, high-risk, and very high-risk, respectively. The associated median survival times were 15.7 (IQR 7.3-35.9), 8.0 (IQR CI 3.7-18.1), and 3.0 (IQR 1.6-7.8) months, and risk score significantly predicted longitudinal survival (intermediate vs low: hazard ratio 1.74; high vs low: hazard ratio 4.12; overall log-rank P < .0005, Supplementary Figure S3, http://links.lww.com/NEU/D182).
FIGURE 4.
Distribution of the predicted probabilities of 6-month mortality in the A, training and B, test sets. The x-axis denotes probability as a proportion between 0 and 1; the y axis represents the absolute number of patients who are assigned these probabilities in both data sets.
Online Implementation
The trained model was incorporated into a web application that is accessible through https://brainmets.morethanml.com.
DISCUSSION
This study used different ML algorithms to predict 6-month survival after craniotomy for BMs. Ultimately, the logistic regression showed best overall performance in terms of AUC and calibration. This could further be used to stratify patients risk groups for mortality.
Two recent reviews31,32 have summarized existing prognostic models for BMs. These models were built for the general or origin-specific BM populations (eg, breast cancer BMs only) and present AUCs ranging from 0.60 to 0.78, with origin-specific models usually having higher AUCs.32 Although there are many models for radiotherapeutic patients, no model has yet been developed or validated in surgical cohorts. Of 5 nomograms built/validated in >1000 patients with BM, 215,33 were created from exclusively WBRT cohorts, and the remaining 3 (including the ds-GPA updates for lung, breast, and melanoma)16-19,34,35 were built on patients receiving SRS/WBRT, with only a small minority receiving surgery(13%-17%).17-19,34 Neurosurgical patients differ from those undergoing up-front SRS/WBRT because they typically have fewer and/or larger BMs. Moreover, location of the BM may be more relevant for surgical patients; this was reflected by the fact that location-related variables had a relatively high feature importance in our model. It is known that surgery and radiotherapy combined extend survival longer than WBRT alone.4,5 Thus, predictions from radiotherapy-based models cannot necessarily be extrapolated to surgical patients. Recently, 1 study from Shanghai36 presented an internally validated nomogram to predict survival after BM resection built on 335 patients, reporting an AUC of 0.71. However, this nomogram was exclusive to patients with BMs from non–small-cell lung cancer. To the best of our knowledge, this is the first model to assess survival in the entire surgical BM population. Coincidentally, to our knowledge, this study also comprises the first validation of the ds-GPA16-20 in a large, all-surgical cohort.
Our model identified a small group of very high-risk patients who did especially poorly after surgery (median survival 3.0 months). Risk stratification in these patients may help manage expectations and weigh treatment options if there is uncertainty about the optimal strategy. Conversely, the regular-risk group, comprising approximately 65% of patients, had a 72.9% 6-month survival and a median survival of 15.7 months. This group included atypical indications for resection, such as multiple or recurrent BMs. This supports the notion that even patients without a classical indication for resection can have good outcomes after neurosurgery. However, external validation is necessary to solidify conclusion based on this stratification.
In our series, relatively few patients (38.3%) received cavity SRS; this is atypical of current best practice. The main reason for this is that we presented a relatively large sample of patients with multiple BMs (median number of BMs was 2, IQR 1-3). Although SRS is now standard of care even for multiple BMs, historically these patients have received WBRT, including those in the present cohort, which goes back to 2007. We have included cavity SRS as a model feature to account for this variation.
Recent years have seen a rise in the development of ML models for outcomes prediction in neurosurgery.21 To safeguard methodological quality, Christodoulou et al37 have discussed sources of bias when comparing different ML and classical statistical models for outcome prediction based on clinical data. They formulated 5 potential sources of bias when comparing performance of different methodologies: (1) unclear or biased performance validation; (2) discrepancy in whether data-driven variable selection was used in different models; (3) different ways of handling continuous variables, eg, using inappropriate cutoffs; (4) considering different candidate variables in different models; and (5) inconsistent use of corrections for imbalanced outcomes. It was found that when these sources of bias are addressed, more complex ML models were not found to outperform simpler algorithms such as logistic regression. Our results support this finding. Although we experimented with more complex ensemble models which should in theory yield a more balanced prediction, they did not have any merit for performance metrics in the test set.
Limitations and Strengths
Several limitations of this study should be considered. The current model is constructed and validated internally using retrospective data. External validation, preferably in a prospective setting, is a necessary follow-up step. Although this model demonstrated predictive capability for median survival, this was not the main aim of our study; instead, we focused on survival at a cutoff, ie, 6 months. Furthermore, some intraoperative and postoperative factors (eg, extent of resection and surgical complications) influence survival and, therefore, leave an inherent uncertainty in survival prediction. Last, it should be noted that this tool should not be used to determine which patients should or should not receive surgery; randomized trials remain the golden standard for such decision making. Rather, it can be used to estimate risk and outcomes for those who are set to undergo surgery. Strengths of this study include the fact that it is the first large study to create a prognostic model for patients with neurosurgically treated BM. The large sample size of both training and test sets and the fact that we internally validated our model constitute additional strengths because these should reduce the risk of overfitting, which might overestimate the model's performance in future external validation.
CONCLUSION
Here, we presented ML models to predict 6-month survival in patients who underwent craniotomy for BMs. A logistic regression achieved a good AUC (0.71), calibration (slope 0.76, intercept 0.03), and Brier score (0.17). Future studies should aim to externally validate the present model.
Footnotes
Alexander F. C. Hulsbergen and Yu Tung Lo contributed equally to this work.
Preliminary results of this work have been presented at the 2020 Annual Meeting of the American Association of Neurological Surgeons on April 25-29, 2020, Annual Meeting of the European Association of Neurosurgical Societies on October 19-21, 2020 (both held online).
Supplemental digital content is available for this article at neurosurgery-online.com.
Funding
This study did not receive any funding or financial support. Dr Yu is supported in part by the National Institute of General Medical Sciences (NIGMS) grant R35GM142879 and the Blavatnik Center for Computational Biomedicine Award.
Disclosures
The authors have no personal, financial, or institutional interest in any of the drugs, materials, or devices described in this article. Dr Yu is also the inventor of a digital pathology analytical system (quantitative pathology analysis and diagnosis using neural networks). US Patent (10, 832, 406). This patent is assigned to Harvard University. This patent is unrelated to the submitted work.
Supplemental Digital Content
Supplementary Materials. Supplementary Table S1. Hyperparameters of the stacked model. Nr = number.
Supplementary Figure S2. Calibration curves. Calibration curves of A, the logistic regression model and B, the ds-GPA based on its training (top line) and validation (bottom line) cohorts.
Supplementary Figure S3. Survival plots by risk group. Kaplan–Meier plot of the overall test group, as well as the regular-risk, high-risk and very high-risk groups (P < .0005 between these 3 groups).
REFERENCES
- 1.Fox BD, Cheung VJ, Patel AJ, Suki D, Rao G. Epidemiology of metastatic brain tumors. Neurosurg Clin N Am. 2011;22(1):1-6, v. [DOI] [PubMed] [Google Scholar]
- 2.Hatiboglu MA, Wildrick DM, Sawaya R. The role of surgical resection in patients with brain metastases. Ecancermedicalscience. 2013;7:308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Horton J, Baxter DH, Olson KB. The management of metastases to the brain by irradiation and corticosteroids. Am J Roentgenol Radium Ther Nucl Med. 1971;111(2):334-336. [DOI] [PubMed] [Google Scholar]
- 4.Patchell RA, Tibbs PA, Walsh JW, et al. A randomized trial of surgery in the treatment of single metastases to the brain. N Engl J Med. 1990;322(8):494-500. [DOI] [PubMed] [Google Scholar]
- 5.Vecht CJ, Haaxma-Reiche H, Noordijk EM, et al. Treatment of single brain metastasis: radiotherapy alone or combined with neurosurgery? Ann Neurol. 1993;33(6):583-590. [DOI] [PubMed] [Google Scholar]
- 6.Iorgulescu JB, Harary M, Zogg CK, et al. Improved risk-adjusted survival for melanoma brain metastases in the era of checkpoint blockade immunotherapies: results from a national cohort. Cancer Immunol Res. 2018;6(9):1039-1045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lin NU, Bellon JR, Winer EP. CNS metastases in breast cancer. J Clin Oncol. 2004;22(17):3608-3617. [DOI] [PubMed] [Google Scholar]
- 8.Petrelli F, Lazzari C, Ardito R, et al. Efficacy of ALK inhibitors on NSCLC brain metastases: a systematic review and pooled analysis of 21 studies. PLoS One. 2018;13(7):e0201425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sherman JH, Lo SS, Harrod T, et al. Congress of neurological Surgeons systematic review and evidence-based guidelines on the role of chemotherapy in the management of adults with newly diagnosed metastatic brain tumors. Neurosurgery. 2019;84(3):E175-E177. [DOI] [PubMed] [Google Scholar]
- 10.Smith TR, Lall RR, Lall RR, et al. Survival after surgery and stereotactic radiosurgery for patients with multiple intracranial metastases: results of a single-center retrospective study. J Neurosurg. 2014;121(4):839-845. [DOI] [PubMed] [Google Scholar]
- 11.Pollock BE, Brown PD, Foote RL, Stafford SL, Schomberg PJ. Properly selected patients with multiple brain metastases may benefit from aggressive treatment of their intracranial disease. J Neurooncol. 2003;61(1):73-80. [DOI] [PubMed] [Google Scholar]
- 12.Kamp MA, Fischer I, Dibue-Adjei M, et al. Predictors for a further local in-brain progression after re-craniotomy of locally recurrent cerebral metastases. Neurosurg Rev. 2018;41(3):813-823. [DOI] [PubMed] [Google Scholar]
- 13.Schackert G, Schmiedel K, Lindner C, Leimert M, Kirsch M. Surgery of recurrent brain metastases: retrospective analysis of 67 patients. Acta Neurochir (Wien). 2013;155(10):1823-1832. [DOI] [PubMed] [Google Scholar]
- 14.Ammirati M, Nahed BV, Andrews D, Chen CC, Olson JJ. Congress of neurological Surgeons systematic review and evidence-based guidelines on treatment options for adults with multiple metastatic brain tumors. Neurosurgery. 2019;84(3):E180-E182. [DOI] [PubMed] [Google Scholar]
- 15.Gaspar L, Scott C, Rotman M, et al. Recursive partitioning analysis (RPA) of prognostic factors in three Radiation Therapy Oncology Group (RTOG) brain metastases trials. Int J Radiat Oncol Biol Phys. 1997;37(4):745-751. [DOI] [PubMed] [Google Scholar]
- 16.Sperduto PW, Chao ST, Sneed PK, et al. Diagnosis-specific prognostic factors, indexes, and treatment outcomes for patients with newly diagnosed brain metastases: a multi-institutional analysis of 4,259 patients. Int J Radiat Oncol Biol Phys. 2010;77(3):655-661. [DOI] [PubMed] [Google Scholar]
- 17.Sperduto PW, Jiang W, Brown PD, et al. Estimating survival in melanoma patients with brain metastases: an update of the graded prognostic assessment for melanoma using molecular markers (Melanoma-molGPA). Int J Radiat Oncol Biol Phys. 2017;99(4):812-816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sperduto PW, Kased N, Roberge D, et al. Summary report on the graded prognostic assessment: an accurate and facile diagnosis-specific tool to estimate survival for patients with brain metastases. J Clin Oncol. 2012;30(4):419-425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sperduto PW, Yang TJ, Beal K, et al. Estimating survival in patients with lung cancer and brain metastases: an update of the graded prognostic assessment for lung cancer using molecular markers (Lung-molGPA). JAMA Oncol. 2017;3(6):827-831. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Subbiah IM, Lei X, Weinberg JS, et al. Validation and development of a modified breast graded prognostic assessment as a tool for survival in patients with breast cancer and brain metastases. J Clin Oncol. 2015;33(20):2239-2245. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Senders JT, Staples PC, Karhade AV, et al. Machine learning and neurosurgical outcome prediction: a systematic review. World Neurosurg. 2018;109:476-486.e1. [DOI] [PubMed] [Google Scholar]
- 22.Staartjes VE, Zattra CM, Akeret K, et al. Neural network-based identification of patients at high risk for intraoperative cerebrospinal fluid leaks in endoscopic pituitary surgery. J Neurosurg. Published online ahead of print June 21, 2019. DOI: 10.3171/2019.4.JNS19477. [DOI] [PubMed] [Google Scholar]
- 23.Senders JT, Zaki MM, Karhade AV, et al. An introduction and overview of machine learning in neurosurgical care. Acta Neurochir (Wien). 2018;160(1):29-38. [DOI] [PubMed] [Google Scholar]
- 24.Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29-36. [DOI] [PubMed] [Google Scholar]
- 25.Steyerberg EW, Vergouwe Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur Heart J. 2014;35(29):1925-1931. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-138. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Brier GW. Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 1950;78(1):1-3. [Google Scholar]
- 28.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: machine learning in Python. J Machine Learn Res. 2011;12:2825-2830. [Google Scholar]
- 29.Chollet F. Keras, GitHub repository; 2015. [Google Scholar]
- 30.Davidson-Pilon C. Lifelines: survival analysis in Python. J Open Source Softw. 2019;4(40):1317. [Google Scholar]
- 31.Gilbride L, Siker M, Bovi J, Gore E, Schultz C, Hall WA. Current predictive indices and nomograms to enable personalization of radiation therapy for patients with secondary malignant neoplasms of the central nervous system: a review. Neurosurgery. 2018;82(5):595-603. [DOI] [PubMed] [Google Scholar]
- 32.Nieder C, Mehta MP, Geinitz H, Grosu AL. Prognostic and predictive factors in patients with brain metastases from solid tumors: a review of published nomograms. Crit Rev Oncol Hematol. 2018;126:13-18. [DOI] [PubMed] [Google Scholar]
- 33.Zindler JD, Rodrigues G, Haasbeek CJ, et al. The clinical utility of prognostic scoring systems in patients with brain metastases treated with radiosurgery. Radiother Oncol. 2013;106(3):370-374. [DOI] [PubMed] [Google Scholar]
- 34.Rades D, Dziggel L, Haatanen T, et al. Scoring systems to estimate intracerebral control and survival rates of patients irradiated for brain metastases. Int J Radiat Oncol Biol Phys. 2011;80(4):1122-1127. [DOI] [PubMed] [Google Scholar]
- 35.Barnholtz-Sloan JS, Yu C, Sloan AE, et al. A nomogram for individualized estimation of survival among patients with brain metastasis. Neuro Oncol. 2012;14(7):910-918. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ji X, Zhuang Y, Yin X, Zhan Q, Zhou X, Liang X. Survival time following resection of intracranial metastases from NSCLC-development and validation of a novel nomogram. BMC Cancer. 2017;17(1):774. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol. 2019;110:12-22. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplementary Materials. Supplementary Table S1. Hyperparameters of the stacked model. Nr = number.
Supplementary Figure S2. Calibration curves. Calibration curves of A, the logistic regression model and B, the ds-GPA based on its training (top line) and validation (bottom line) cohorts.
Supplementary Figure S3. Survival plots by risk group. Kaplan–Meier plot of the overall test group, as well as the regular-risk, high-risk and very high-risk groups (P < .0005 between these 3 groups).