Table 2. Key characteristics of included studies.
| Lead author
(year) |
Setting | Aim | Principle methods | Data and sample | Key Results |
|---|---|---|---|---|---|
| Einav (2018) 33 | US: random sample
of Medicare fee-for- service beneficiaries in 2008 |
To analyse healthcare
spending by predicted 12-mortality, i.e. can high end-of-life care costs be identified ex ante |
Ensemble of RF, gradient
boosting and LASSO |
Administrative data: demographics,
ICD codes, chronic conditions, prior utilization for a baseline sample of 5,631,168 Trajectories of health care use and diagnosis in the prior 12-month period were included |
ML model attributed a higher risk
score to those who died within one year than those who did not in 87% of cases End-of-life spending is high but deaths do not account heavily for high spending in Medicare overall Focusing on end-of-life spending is not a useful way to identify inappropriate treatment choices |
| Makar (2015) 34 | US: Medicare fee-for-
service beneficiaries in 2010 |
To quantify six-month
mortality risk in four disease cohorts: cancer, COPD, CHF, dementia. |
Six ML approaches and
logistic regression were used in each cohort, of which RF models performed best in primary analysis. |
Administrative data: demographics,
ICD codes, chronic conditions, functional status, durable medical equipment, prior utilization for 20,000 randomly selected subjects in each disease cohort Traditional baseline characteristics were augmented with values in prior 12-month period, thus capturing disease progression, functional decline, etc. |
ML model attributed a higher risk
score to those who died within six months than those who did not in 82% of cases Augmented variables key to predictive power; models using only traditional variables were less accurate |
| Sahni (2018) 35 | Minnesota, US:
Six-hospital network (one large tertiary care centre; five community hospitals), 2012–2016 |
To quantify 1-year
mortality risk in a cohort of clinically diverse hospitalized patients. |
RF models and logistic
regression were applied separately and performance compared. |
Electronic medical record data,
including vital signs, blood count, metabolic panel, demographics and ICD codes for 59,848 patients |
ML model attributed a higher risk
score to those who died within one year than those who did not in 86% of cases RF model outperforms logistic regression Demographic and lab data key to predictive power; models using ICD codes alone are less reliable |
US: United States; RF: random forest; ML: machine learning; COPD: chronic obstructive pulmonary disease; CHF: congestive heart failure; ICD: international classification of disease; SEER: Surveillance, Epidemiology and End Results.