Skip to main content
. 2019 Aug 12;2:13. Originally published 2019 Jul 15. [Version 2] doi: 10.12688/hrbopenres.12923.2

Table 2. Key characteristics of included studies.

Lead author
(year)
Setting Aim Principle methods Data and sample Key Results
Einav (2018) 33 US: random sample
of Medicare fee-for-
service beneficiaries
in 2008
To analyse healthcare
spending by predicted
12-mortality, i.e. can high
end-of-life care costs be
identified ex ante
Ensemble of RF, gradient
boosting and LASSO
Administrative data: demographics,
ICD codes, chronic conditions, prior
utilization for a baseline sample of
5,631,168

Trajectories of health care use and
diagnosis in the prior 12-month
period were included
ML model attributed a higher risk
score to those who died within one
year than those who did not in 87%
of cases

End-of-life spending is high but
deaths do not account heavily for high
spending in Medicare overall

Focusing on end-of-life spending
is not a useful way to identify
inappropriate treatment choices
Makar (2015) 34 US: Medicare fee-for-
service beneficiaries
in 2010
To quantify six-month
mortality risk in four
disease cohorts: cancer,
COPD, CHF, dementia.
Six ML approaches and
logistic regression were used
in each cohort, of which RF
models performed best in
primary analysis.
Administrative data: demographics,
ICD codes, chronic conditions,
functional status, durable medical
equipment, prior utilization for
20,000 randomly selected subjects
in each disease cohort

Traditional baseline characteristics
were augmented with values in prior
12-month period, thus capturing
disease progression, functional
decline, etc.
ML model attributed a higher risk
score to those who died within six
months than those who did not in 82%
of cases

Augmented variables key to predictive
power; models using only traditional
variables were less accurate
Sahni (2018) 35 Minnesota, US:
Six-hospital network
(one large tertiary
care centre; five
community hospitals),
2012–2016
To quantify 1-year
mortality risk in a cohort
of clinically diverse
hospitalized patients.
RF models and logistic
regression were applied
separately and performance
compared.
Electronic medical record data,
including vital signs, blood count,
metabolic panel, demographics and
ICD codes for 59,848 patients
ML model attributed a higher risk
score to those who died within one
year than those who did not in 86%
of cases

RF model outperforms logistic
regression

Demographic and lab data key to
predictive power; models using ICD
codes alone are less reliable

US: United States; RF: random forest; ML: machine learning; COPD: chronic obstructive pulmonary disease; CHF: congestive heart failure; ICD: international classification of disease; SEER: Surveillance, Epidemiology and End Results.