Abstract
Objectives
Evaluate a machine learning model designed to predict mortality for Medicare beneficiaries aged >65 years treated for hip fracture in Inpatient Rehabilitation Facilities (IRFs).
Design
We employed a retrospective design/cohort analysis of Centers for Medicare and Medicaid Services Inpatient Rehabilitation Facility–Patient Assessment Instrument data.
Setting and Participants
A total of 17,140 persons admitted to Medicare-certified IRFs in 2015 following hospitalization for hip fracture.
Measures
Patient characteristics include sociodemographic (age, gender, race, and social support) and clinical factors (functional status at admission, chronic conditions), and IRF length of stay. Outcomes were 30-day and one-year all-cause mortality. We trained and evaluated two classification models, logistic regression and a multilayer perceptron (MLP), to predict the probability of 30-day and one-year mortality and evaluated calibration, discrimination, and precision of the models.
Results
For 30-day mortality, the MLP performed (acc=0.74, AUROC=0.76, avg prec=0.10, slope = 1.14) as well as the logistic regression model (acc= 0.78, AUROC=0.76, avg prec= 0.09, slope = 1.20). For one-year mortality, the performances were similar for both MLP (acc= 0.68, AUROC=0.75, avg prec =0.32, slope = 0.96) and logistic regression (acc=0.68, AUROC=0.75, avg prec=0.32, slope = 0.95).
Conclusion and Implications
A scoring system based on logistic regression may be more feasible to run in current electronic medical records. But MLP models may reduce cognitive burden and increase ability to calibrate to local data, yielding clinical specificity in mortality prediction so that palliative care resources may be allocated more effectively.
Keywords: Functional status, mortality, hip fracture, inpatient rehabilitation facilities
Manuscript summary
Machine learning models had similar performance in predicting 30-day and one-year mortality among hip fracture patients treated in Inpatient Rehabilitation Facilities.
Each year 300,000+ older adults are treated for hip fracture in the United States.1 For the majority who suffer a fractured hip, the likelihood of full functional recovery remains low, and other related poor health outcomes include permanent nursing home placement and excess mortality.2 Research has identified demographic (age, sex, and race/ethnicity) and clinical factors (comorbidities, physical and cognitive function) associated with mortality after hip fracture.3 About 7% of patients die within 30 days and upwards of 27% within a year of hip fracture.4–6 However, palliative care resources remain scarce. In the age of data, new clinical algorithms are needed to identify hip fracture patients most likely to be nearing end of life and offer clinical insights to inform palliative care consults.
Palliative care focuses on managing symptoms, facilitating goals of care discussions, and attending to psychosocial and spiritual concerns of patients with life-limiting illness; the goal for clinicians is to improve the quality of end-of-life care for patients and their families.7,8 Although the number of hospital palliative care programs has been growing, workforce shortages have limited their sustainability, and prognostic tools used by existing programs are difficult to implement at scale because they involve face-to-face clinical assessment.9 Palliative care prognostic tools have been designed for patients with advanced disease, rather than for early identification of, for example, the 27% of hip fracture patients most at risk of dying.10,11
Current prognostic tools that identify high-risk patients rely on logistic regression to predict mortality, deriving data primarily from small datasets focused on geographic location or from survey and/or small feature sets that potentially underestimate the predicted probability of rare outcomes like death.12 In a systematic review of 16 prognostic indices that predict risk of mortality from 6-months to 5 years for older adults, the majority (13) had areas under the receiver operating characteristics (AUROCs) ≤ 0.70, suggesting unexplained variance in mortality and overall moderate model performance.13
Machine learning is a technique that allows computers to learn patterns directly from data. Machine learning has shown immense potential in numerous medical fields including radiology14, dermatology15, and pathology16; one of machine learning’s greatest strengths is its high performance on numerous clinically relevant tasks. However, machine learning applied to big data and assessing their relative predictive power, has been scarcely applied to hip fracture mortality risk research despite the abundance of retrospective data available from Medicare administrative claims data of older adults (65 years of age or older), thus increasing generalizability17,18.
To extend these findings, our study (1) included functional status, an additional dimension of health and highly correlated with mortality; (2) used stratified cross-validation (i.e., the use and splitting of all available data into folds governed by criteria to ensure, for example, that each fold has the same proportion of observations with a given categorical value [e.g., mortality – yes/no]); and (3) identified patients who died within 30 days and one year following hospital discharge for prediction models with appropriate timelines to maximize benefits of end of life care. We chose inpatient rehabilitation as an exemplar for this study because (a) hip fracture patients treated in post-acute care settings have a complex care regimen and returning to the community can be challengings19, and (b) clinicians within IRFs are required to document functional status routinely using a validated instrument, the Functional Independent Measure.20
The objective of this study was to develop and evaluate a machine learning model designed to predict 30-day and one-year mortality in patients treated for hip fracture discharged from IRFs. This was the first study to investigate machine learning for mortality prediction in older adult hip fracture patients using Medicare assessment and administrative claims data. In the following sections, we describe our training and evaluation datasets, develop and evaluate eight iterations of a machine learning model with varying specifications, identify the best predictive models of mortality, and discuss model performance in the context of the extant literature, with the overall goal of allocating palliative care resources more effectively.
Methods
Design and Sample
This study used a retrospective design. The study sample was constructed from data in the Inpatient Rehabilitation Facility–Patient Assessment Instrument (IRF-PAI), the Medicare Provider Analysis and Review (MedPAR), and Master Beneficiary Summary files. In our consort diagram (Figure 1) eligible patients from the 2014–2015 MedPAR dataset (n = 252,477) were excluded if they were without a primary diagnosis of hip fracture (ICD-9CM codes 8.20–8.20.9) at admission (n=113,416), without discharge between Jan 01 and Nov 30 2014 (n=14,711), age 65 years and younger (n=7,091), not discharged to IRF (n=10,186), sex/age discrepant between visits (n=5,343), or died during index hospitalization (n=244). Eligible patients from 2014 IRF-PAI (n = 43,105) were excluded if they were duplicate beneficiaries (n = 38,235). Merged MedPAR and IRF-PAI patients (n=18,172) were excluded if they did not live at home prior to hospitalization as we were interested in community residing older adults (n=935) or were sex/age discrepant between the datasets (n=97). The effective sample was n= 17,140. The study was considered exempt by the Duke University Institutional Review Board.
Features
Sociodemographic characteristics were operationalized as: age in years at the time of IRF admission; self-reported race as Asian, Black or African American, White, or Other (American Indian, Alaska Native, Native Hawaiian, Other Pacific Islander) and ethnicity (Hispanic or Latino); sex; and co-habitation, measured as living with someone (yes or no) prior to hospitalization.
Clinical factors were operationalized as chronic conditions and functional status on admission to IRF. Eight chronic conditions that are potentially amenable to public health or clinical interventions and meet the definition for chronicity (or both), based on specific algorithms and stored for research access, were examined: stroke, diabetes, liver disease, chronic kidney disease, asthma, heart disease (acute myocardial infarction, coronary artery disease, congestive heart failure, and peripheral vascular disease), lung disease, and depression.21 Functional status was measured using the total Functional Independent Measure (FIM) score on admission: motor ability (13 items: self-care, sphincter control, mobility, and locomotion) and cognitive ability (5 items: communication and social cognition), using a 7-point Likert scale ranging from 1 (total assistance) to 7 (complete independence). A higher score indicates better functioning, with possible ranges for motor scores of 13–91, cognitive scores of 5–35, and total FIM scores of 18–126. FIM items = 0 (activity did not occur) were converted to 1 (total assistance), per CMS guidelines. All have been found to be reliable and valid.20,22
Utilization was operationalized as length of stay (LOS), i.e., the total number of nights in the IRF.
The primary outcomes, all cause 30-day and one-year mortality, were operationalized as death (yes or no) within 30 days or the first year, respectively, following IRF discharge. Death was ascertained on the basis of the death date recorded in the Medicare denominator/vital status file.
Building and Validating Classification Models
We used two types of classification models, logistic regression and a multilayer perceptron (MLP), to train the machine to predict the probability of each binary outcome. For logistic regression, we explored different regularization techniques (L1 and L2) and strengths. Regularization strength reflects the constraint against an overfitted model. Smaller values of the regularization parameter C specify stronger regularization, which leads to less overfitting. The values of C tested include 0.0001, 0.001, 0.01, 0.1, 1, 10, 100, and 1000. For the MLP, which is a type of feedforward neural network that has at least three layers of nodes (an input layer, a hidden layer, and an output layer), we trained multiple models using different hyperparameters, numbers of layers, learning rates, number of epochs, and ensemble sizes. We trained MLPs with three different architectures: [120, 60, 1], i.e., 120 nodes in the input layer, 60 nodes in the hidden layer, and 1 node in the output layer, [30, 20, and 1], and [120, 100, 20, 1]. We varied the learning rate, which controls how quickly the model learns from the data, from 0.0001 to 1000. We varied the number of epochs, which describes the number of times the algorithm learns from the entire training data set, from 15 to 300. We used dropout rates, which are the probabilities that a node is not used in a single epoch, of 0.1 to 0.9. To avoid overfitting, we employed ensembling and dropout regularization. Size of the ensemble is the number of separately trained models used to make the final prediction. By combining several models, ensemble learning improves machine learning results and predictive performance compared to using a single model. Using more layers, nodes, and epochs improves the ability of the model to learn from the training data but increases the risk of overfitting.
Statistical Analysis
Stratified k-fold cross validation was used to evaluate the logistic regression and MLP models described above. K-fold cross validation divides the data set into k sections, or folds, allowing a single fold at a time to serve as a validation set while the remaining k-1 folds become the training set. The performance of the model is estimated by taking the average performance of the folds. To ensure that the performance of the model more closely reflects its performance in practice, we applied stratified k-fold cross validation where k = 10 and each validation fold has a similar target distribution.
We calculated the calibration slope, which measures the relationship between the frequency of the true target outcome and the frequency of the model’s predictions.23 If the model is well-calibrated, then 80% of the examples for which the model predicts a 0.80 risk probability do have the target outcome, 70% of the examples for which the model predicts a 0.70 risk probability do have the target outcome, and so on. In a poorly calibrated model, there is a mismatch between the proportion of examples with the target outcome and the models predicted probability for those examples. A perfectly calibrated classifier has a slope of 1, with slopes closer to 1 indicating better calibration of the classifier.
Discriminatory power of the models was evaluated by means of area under the receiver operator curve (AUROC). Specifically, AUROC measures how well the model ranks hip fracture patients who experienced the designated outcome (mortality within 30 days following discharge from IRFs) as higher risk than those who did not. To identify the optimum threshold for each of the models, we use the AUROC, which finds a balance between sensitivity (a.k.a. true positive rate or recall) and false positive rate (equal to 1 – specificity). We use Youden’s J index to select a decision threshold.24
In this study, we compared the performance of models based on the calibration slope, the AUROC, average precision, and accuracy; as well as the sensitivity, specificity, positive predicted value (PPV), and negative predicted value (NPV). Each performance measure is defined in Table 1. We considered 2-sided P < .05 to be statistically significant. All analyses were performed in Python 3.7.
Table 1.
Measure | Definition | Formula/Calculation |
---|---|---|
AUROC | Area Under the Receiver Operating Curve, also known as c-statistic, is indicative of how much the model is able to distinguish between the positive and negative cases. | 1 |
Accuracy | Accuracy determines the accuracy of the algorithm in predicting instances. | 2 |
Average Precision | AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight. | 3 |
Sensitivity | True positive (TP) rate is the model’s ability to predict positive outcomes accurately. | |
Specificity | True negative (TN) rate is the model’s ability to predict negative outcomes accurately. | |
Positive Predicted Value | PPV is the probability that the individual with a positive prediction truly has a positive outcome. | |
Negative Predicted Value | NPV is the probability that the individual with a negative prediction truly has a negative outcome. |
TP: true positive, FP: false positive
TP: true positives, TN: true negatives, FP: false positives, FN: false negatives
Rn: nth recall, Pn: nth precision
Results
Sample characteristics
The effective analytic sample (n=17,140) received post-acute care following hip fracture in 1,112 US IRFs. Overall, mean age was 82 years; the majority of patients were female (72%) and white (93%) with a mean hospital stay of 12 days. Those who died (n=14,633) were significantly older, more likely to be male, and to have a positive history of chronic conditions (except asthma), higher Charlson scores, more prior hospitalizations, and longer LOSs during their index hospitalization and IRF stays (Table 2).
Table 2.
Patient Characteristics | Overall (n= 17,140) | Did not die (n= 14,633) | Died (n= 2,507) | p-value* |
---|---|---|---|---|
Demographic | ||||
Age (years) | 82.2 (7.6) | 81.8 (7.6) | 84.2 (7.7) | <0.0001 |
Female | 12359 (72.11%) | 10813 (73.9%) | 1546 (61.7%) | <0.0001 |
White | 15925 (92.91%) | 13610 (93%) | 2315 (92.3%) | 0.2289 |
Clinical | ||||
Stroke | 614 (3.58%) | 472 (3.2%) | 142 (5.7%) | <0.0001 |
Diabetes | 4303 (25.11%) | 3630 (24.8%) | 673 (26.8%) | 0.0297 |
Liver Disease | 394 (2.30%) | 291 (2%) | 103 (4.1%) | <0.0001 |
Chronic Kidney Disease | 4745 (27.68%) | 3684 (25.2%) | 1061 (42.3%) | <0.0001 |
Asthma | 1154 (6.73%) | 969 (6.6%) | 185 (7.4%) | 0.1621 |
Chronic Heart Failure | 3073 (17.93%) | 2227 (15.2%) | 846 (33.7%) | <0.0001 |
Lung Disease | 351 (2.05%) | 210 (1.4%) | 141 (5.6%) | <0.0001 |
Depression | 3480 (20.30%) | 2908 (19.9%) | 572 (22.8%) | 0.0007 |
Charlson Comorbidity Score | 1.7 (1.8) | 1.6 (1.7) | 2.6 (2.1) | <.0001 |
Utilization | ||||
Number of prior year hospitalizations | 0.8 (1.5) | 0.7 (1.4) | 1.3 (2.0) | <.0001 |
IRF LOS | 13.4 (4.5) | 13.4 (4.4) | 13.6 (5.1) | 0.0016 |
Index hospital LOS | 12.3 (7.78) | 12.1 (7.6) | 12.9 (8.6) | 0.0004 |
p-value by comparing the differences between patient characteristics of “Did not die” and “Died” groups.
Model performance
For 30-day mortality, the logistic regression and the MLP models performed equally well, the former with slightly higher AUROC and average precision (Table 3). For one-year mortality, the logistic regression and MLP models had similar performance; the AUROC and average precision of the MLP were also slightly higher than the logistic regression. For 30-day mortality, logistic regression has a calibration slope of 0.905, and MLP had a calibration slope of 0.990. For one-year mortality, logistic regression had a calibration slope of 0.958, and MLP had a calibration slope of 1.09. All MLP models had slightly higher AUROC curves than logistic regression models, but the difference was of minimal clinical importance.
Table 3.
AUROC | Accuracy | Average Precision | Sensitivity (TPR) | Specificity (TNR) | PPV | NPV | |
---|---|---|---|---|---|---|---|
30-day LR | 0.760 | 0.780 | 0.097 | 0.66 | 0.783 | 0.071 | 0.99 |
30-day MLP | 0.765 | 0.728 | 0.101 | 0.725 | 0.728 | 0.062 | 0.991 |
One-year LR | 0.756 | 0.684 | 0.326 | 0.729 | 0.677 | 0.266 | 0.942 |
One-year MLP | 0.758 | 0.681 | 0.327 | 0.743 | 0.672 | 0.263 | 0.944 |
LR, logistic regression; MLP, multilayer perceptron; TPR, true positive rate; FPR, false positive rate; PPV, positive predicted value; NPV, negative predictive value
Notes from 30-day mortality:
P-value: 0.191.
95% confidence intervals with 10 fold stratified cross validation:
Confidence interval for MLP’s AUROC: [0.741, 0.787]
Confidence interval for LR’s AUROC: [0.735, 0.783]
Notes from one-year mortality:
P-value: 0.014
95% confidence intervals with 10 fold stratified cross validation:
Confidence interval for MLP’s AUROC: [0.741 0.764]
Confidence interval for LR’s AUROC: [0.746, 0.766]
Discussion
Logistic regression and MLP models that estimated mortality risk in a cohort of hip fracture patients treated in IRFs performed well across demographic, social, and clinical variables. Both had high AUROCs and may serve as valuable tools for identifying patients with hip fracture at high risk for 30-day and one-year mortality. However, an important implication of our findings is that the LR model might be considered a more reasonable choice given it performed nearly identically to MLP, is easier to interpret by clinicians, and requires less computational resources.
Our study advances evidence developed to build or validate mortality models using logistic regression among hip fracture patients in acute care hospitals. Pugley et al.25 and Karres, et al.16 developed similar 30-day mortality risk models for hip fracture patients using logistic regression and found reasonable AUROCs of 0.70 and 0.81, respectively. The largest (N=341,062) and only US-based study to develop a 30-day mortality risk model using logistic regression reported neither discrimination nor calibration data.26 Neither were evaluation criteria used when reporting the quality of the models developed by Heyes et al.27 and Novoa-Parra et al.,28 the only studies found to develop one-year risk models for patients with hip fracture using logistic regression. In our work to build predictive models for both 30-day and one-year mortality of patients with hip fracture treated in IRFs, both logistic regression and MLP models demonstrated good discrimination and calibration.
Based on analysis of two national datasets representing care delivered to the largest older adult population in the US, the fee-for-service Medicare population, our findings have implications for both clinical practice and research. For clinical practice, logistic regression and MLP models, has resource-dependent advantages. For medical settings with infrastructure and expertise to run artificial neural networks, machine learning models rely on machine-guided computational methods rather than on human-guided data analysis.29 They are also based on data routinely collected as part of current clinical work processes at the point-of-care and represents standard data elements collected in a structured format. Where available, these models’ advantages would support early attention to inform scarce palliative care resource allocation. However, the majority of hospital systems and IRFs in the US without such resources should find the LR models to perform similarly, to be easily interpretable by clinicians with respect to odds ratios and average marginal effect, and to be easily computed with available technology. In either case, both models in this study accurately identify hip fracture patients at risk for 30-day and one-year mortality therefore appear to be valuable tools for refining decision-making by clinicians and facilitating end-of-life discussions among patients and families.
As we compared models for 30-day and one-year mortality rates, we discovered a more accurate ability to predict 30-day mortality, regardless of methodology (0.780 & 0.728 for 30-day mortality vs. 0.684 & 0.681 for one-year mortality, Table 1). In either case, both models in this study accurately identify hip fracture patients at risk for 30-day and one-year mortality therefore appear to be valuable tools for refining decision-making by clinicians and facilitating end-of-life discussions among hip fracture patients and their families.
Research on model development must proceed carefully in order to ensure performance optimization. The optimal tool for predicting mortality should incorporate appropriate assessment domains yet be sufficiently simple and minimally time-consuming for use in the clinical setting. Previous efforts have been based on manually derived scores that included a limited set of features. Among these, sex and age have been consistently represented as key predictive features for mortality after hospitalization for hip fracture across 30-day30 and one-year 6 prognostic indices, although sex and age were not top features in the algorithms developed in our study. Furthermore, earlier efforts to develop such tools have precluded incorporating the weight of risk specific to each preexisting chronic condition, potentially affecting their predictive abilities. In our study, the features of 30-day and one-year logistic regression models were specific morbidities, including lung disease, liver disease, and chronic kidney disease. Other machine learning algorithms such as random forest classifiers may demonstrate advantages over logit models and prove informative for understanding how to maximize prediction accuracy. Then, valid and causal conclusions regarding predictions of mortality depend on time-to-event survival data, which suggests a comparison of cox proportional hazard and random survival forest models. Future research may use DeepHit or a partial logistic artificial neural network approach for the analysis of the hazard function as a function of time and covariates for censored survival time data.31 Finally, implementation and translation of machine learning should be pilot-tested in IRFs to demonstrate benefits. IRFs vary by type: nearly 80% of IRFs are hospital-based units while the remainder are stand-alone facilitates, and many specialize in the rehabilitation of specific patient groups, (e.g., stroke, TBI). Therefore, the effectiveness of machine learning models could vary depending on IRF type and the patient groups.
Limitations
This study has several limitations. First, administrative claims data lack laboratory results, medications, and socio-behavioral information, which may be important predictors of mortality. Second, our prediction models and findings, which were derived from the fee-for-service Medicare population, may not be generalizable to enrollees in Medicare Advantage plans, which include nearly 35% of all individuals with Medicare. As well, because Medicare data has a lag time, they would require validation in the electronic medical record. Third, our data contained a largely homogenous group in terms of demographics, with 73% being female and 93% being white; however, this breakdown reflects the population of patients with hip fracture nationally. Fourth, use of the J-index to set a threshold did not take clinical judgment into consideration. Finally, our study did not investigate complex time-dependent effects of predictors32 on mortality over the follow-up period.
Conclusions and Implications
This study presented an analysis of prognostic factors for hip fracture mortality using logistic regression and MLP models. Although model evaluation suggested that MLP may yield slightly better accuracy when compared to logistic regression, both models had high AUROCs and good calibration and can serve as valuable tools for accurately identifying patients with hip fracture at high risk for 30-day and one-year mortality. However, the feasibility of MLP implementation may be limited by institutional information technology infrastructure. Given this limitation, the LR model may be a better choice for this task based on its interpretability and equivalent performance to MLP.
Acknowledgements
We thank Judith Hays, PhD, RN and Donnalee Frega, PhD, RN for editorial and technical assistance with this manuscript. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health (KL2TR002554). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. A portion of Dr. Colon-Emeric’s time (K24 AG049077-02, 5P30AG028716-15) was supported by grants from the National Institute on Aging.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.National and regional estimates on hospital use for all patients from the HCUP Nationwide Inpatient Sample (NIS). Available at: http://hcupnet.ahrq.gov/HCUPnet.jsp Accessed February 6, 2020.
- 2.Haentjens P, Magaziner J, Colon-Emeric CS, et al. Meta-analysis: excess mortality after hip fracture among older women and men. Ann Intern Med. 2010;152(6):380–390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Chang W, Lv H, Feng C, et al. Preventable risk factors of mortality after hip fracture surgery: Systematic review and meta-analysis. Int J Surg. 2018;52:320–328. [DOI] [PubMed] [Google Scholar]
- 4.Becker DJ, Arora T, Kilgore ML, et al. Trends in the utilization and outcomes of Medicare patients hospitalized for hip fracture, 2000–2008. J Aging Health. 2014;26(3):360–379. [DOI] [PubMed] [Google Scholar]
- 5.Ali AM, Gibbons CE. Predictors of 30-day hospital readmission after hip fracture: a systematic review. Injury. 2017;48(2):243–252. [DOI] [PubMed] [Google Scholar]
- 6.Cenzer IS, Tang V, Boscardin WJ, et al. One-Year Mortality After Hip Fracture: Development and Validation of a Prognostic Index. J Am Geriatr Soc. 2016;64(9):1863–1868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.De Lima L, Radbruch L. The International Association for Hospice and Palliative Care: Advancing Hospice and Palliative Care Worldwide. J Pain Symptom Manage. 2018;55(2s):S96–s103. [DOI] [PubMed] [Google Scholar]
- 8.Barawid E, Covarrubias N, Tribuzio B, Liao S. The benefits of rehabilitation for palliative care patients. Am J Hosp Palliat Care. 2015;32(1):34–43. [DOI] [PubMed] [Google Scholar]
- 9.Arif H. Kamal SPW, Jesse Troy, Victoria Leff, Constance Dahlin, Joseph D. Rotella, George Handzo, Phillip E. Rodgers, and Evan R. Myers. Policy Changes Key To Promoting Sustainability And Growth Of The Specialty Palliative Care Workforce. Health Aff (Millwood). 2019;38(6):910–918. [DOI] [PubMed] [Google Scholar]
- 10.Lau F, Downing GM, Lesperance M, Shaw J, Kuziemsky C. Use of Palliative Performance Scale in end-of-life prognostication. J Palliat Med. 2006;9(5):1066–1075. [DOI] [PubMed] [Google Scholar]
- 11.Simmons CPL, McMillan DC, McWilliams K, et al. Prognostic Tools in Patients With Advanced Cancer: A Systematic Review. J Pain Symptom Manage. 2017;53(5):962–970.e910. [DOI] [PubMed] [Google Scholar]
- 12.Fischer SM, Gozansky WS, Sauaia A, Min S-J, Kutner JS, Kramer A. A Practical Tool to Identify Patients Who May Benefit from a Palliative Approach: The CARING Criteria. J Pain Symptom Manage. 2006;31(4):285–292. [DOI] [PubMed] [Google Scholar]
- 13.Yourman LC, Lee SJ, Schonberg MA, Widera EW, Smith AK. Prognostic Indices for Older Adults: A Systematic Review. JAMA. 2012;307(2):182–192. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts H. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Mobadersany P, Yousefi S, Amgad M, et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci U S A. 2018;115(13):E2970–e2979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Karnuta JM, Navarro SM, Haeberle HS, Billow DG, Krebs VE, Ramkumar PN. Bundled Care for Hip Fractures: A Machine-Learning Approach to an Untenable Patient-Specific Payment Model. J Orthop Trauma. 2019;33(7):324–330. [DOI] [PubMed] [Google Scholar]
- 18.Lund JL, Kuo TM, Brookhart MA, et al. Development and validation of a 5-year mortality prediction model using regularized regression and Medicare data. Pharmacoepidemiol Drug Saf. 2019;28(5):584–592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Cary MP Jr., Prvu Bettger J, Jarvis JM, Ottenbacher KJ, Graham JE. Successful Community Discharge Following Postacute Rehabilitation for Medicare Beneficiaries: Analysis of a Patient- Centered Quality Measure. Health Serv Res. 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ottenbacher KJ, Hsu Y, Granger CV, Fiedler RC. The reliability of the functional independence measure: a quantitative review. Arch Phys Med Rehabil. 1996;77(12):1226–1232. [DOI] [PubMed] [Google Scholar]
- 21.Goodman RA, Ling SM, Briss PA, Parrish RG, Salive ME, Finke BS. Multimorbidity Patterns in the United States: Implications for Research and Clinical Practice. J Gerontol A Biol Sci Med Sci. 2016;71(2):215–220. [DOI] [PubMed] [Google Scholar]
- 22.Stineman MG, Shea JA, Jette A, et al. The Functional Independence Measure: tests of scaling assumptions, structure, and reliability across 20 diverse impairment categories. Arch Phys Med Rehabil. 1996;77(11):1101–1108. [DOI] [PubMed] [Google Scholar]
- 23.Cohen I, Goldszmidt M. Properties and Benefits of Calibrated Classifiers 2004; Berlin, Heidelberg. [Google Scholar]
- 24.Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biometrical journal Biometrische Zeitschrift. 2005;47(4):458–472. [DOI] [PubMed] [Google Scholar]
- 25.Pugely AJ, Martin CT, Gao Y, Klocke NF, Callaghan JJ, Marsh JL. A risk calculator for short-term morbidity and mortality after hip fracture surgery. J Orthop Trauma. 2014;28(2):63–69. [DOI] [PubMed] [Google Scholar]
- 26.Dodd AC, Bulka C, Jahangir A, Mir HR, Obremskey WT, Sethi MK. Predictors of 30-day mortality following hip/pelvis fractures. Orthop Traumatol Surg Res. 2016;102(6):707–710. [DOI] [PubMed] [Google Scholar]
- 27.Heyes GJ, Tucker A, Marley D, Foster A. Predictors for 1-year mortality following hip fracture: a retrospective review of 465 consecutive patients. Eur J Trauma Emerg Surg. 2017;43(1):113–119. [DOI] [PubMed] [Google Scholar]
- 28.Novoa-Parra CD, Hurtado-Cerezo J, Morales-Rodriguez J, Sanjuan-Cervero R, Rodrigo-Perez JL, Lizaur-Utrilla A. Factors predicting one-year mortality of patients over 80 years operated after femoral neck fracture. Revista espanola de cirugia ortopedica y traumatologia. 2019;63(3):202–208. [DOI] [PubMed] [Google Scholar]
- 29.Beam AL, Kohane IS. Big Data and Machine Learning in Health Care. JAMA. 2018;319(13):1317–1318. [DOI] [PubMed] [Google Scholar]
- 30.Maxwell MJ, Moran CG, Moppett IK. Development and validation of a preoperative scoring system to predict 30 day mortality in patients undergoing hip fracture surgery. Br J Anaesth. 2008;101(4):511–517. [DOI] [PubMed] [Google Scholar]
- 31.Lee C, Yoon J, Schaar MV. Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data. IEEE Trans Biomed Eng. 2020;67(1):122–133. [DOI] [PubMed] [Google Scholar]
- 32.Paidamoyo Chapfuwa CT, Chunyuan Li, Courtney Page, Benjamin Goldstein, Lawrence Carin, Ricardo Henao Adversarial Time-to-Event Modeling. Proceedings of the 35 th International Conference on Machine Learning, Stockholm, Sweden; 2018. [PMC free article] [PubMed] [Google Scholar]