Abstract
Background:
Emergency department (ED) visits are common at the end-of-life, but the identification of patients with life-limiting illness remains a key challenge in providing timely and resource-sensitive advance care planning (ACP) and palliative care services. To date, there are no validated, automatable instruments for ED end-of-life screening. Here, we developed a novel electronic health record (EHR) prognostic model to screen older ED patients at high risk for six month mortality and compare its performance to validated comorbidity indices.
Methods:
This was a retrospective, observational cohort study of ED visits from adults aged ≥ 65 years who visited any of 9 EDs across a large regional health system between 2014 and 2019. Multivariable logistic regression that included clinical and demographics variables, vital signs, and laboratory data was used to develop a six month mortality predictive model - the Geriatric End-of-life Screening Tool (GEST) using five-fold cross-validation on data from 8 EDs. Performance was compared to the Charlson and Elixhauser comorbidity indices using area under the receiver-operating characteristic curve (AUROC), calibration, and decision curve analyses. Reproducibility was tested against data from the remaining independent ED within the health system. We then used GEST to investigate rates of ACP documentation availability and code status orders in the EHR across risk strata.
Results:
A total of 431,179 encounters by 123,128 adults were included in this study with a six month mortality rate of 12.2%. Charlson (AUROC (95% CI): 0.65 (0.64-0.69)) and Elixhauser indices (0.69 (0.68-0.70)) were outperformed by GEST (0.82 (0.82-0.83)). GEST displayed robust performance across demographic subgroups and in our independent validation site. Among patients with a greater than 30% mortality risk using GEST, only 5.0% had ACP documentation; 79.0% had a code status previously ordered, of which 70.7% were full code. In decision curve analysis, GEST provided greater net benefit than the Charlson and Elixhauser scores.
Conclusions:
Prognostic models using EHR data robustly identify high mortality risk older adults in the ED for whom code status, ACP, or palliative care interventions may be of benefit. While all tested methods identified patients approaching the end-of-life, GEST was most performant. These tools may enable resource-sensitive end-of-life screening in the ED.
Keywords: end-of-life, goals of care, emergency medicine, machine learning, prognostication
Introduction
Emergency department (ED) visits are common at the end of life, with approximately 75% of decedents visiting an emergency room within six months of death.1 Moreover, ED visits often mark an inflection point in illness, heralding a more rapid decline.2 For this reason, ED visits present an opportunity to consider the prognoses of older adults, and there is a growing interest in the early identification of ED patients who may be benefit from advance care planning (ACP) or palliative care.
While qualitative work has revealed that proactive identification of patients who may benefit from palliative care is a key facilitator of ED palliative care integration,3 in our review of the literature, we identified no automated tools designed to identify older adults in the ED at risk for near-term mortality.4,5 The surprise question - “Would I be surprised if this patient died in the next twelve months?” has been broadly applied as an ACP and palliative care screening question, but has limited prognostic accuracy and barriers to effective implementation.6-9 Other proposed ED-based screening tools such as P-CaRES and SPEED have content validity, but as of yet unknown performance characteristics in identifying end-of-life needs.10 None of these methods are automated – all require clinician screening input at the point-of-care.
Efforts to design automated tools addressing palliative needs among hospitalized patients have shown promising results in increasing rates of ACP documentation and palliative service consultation. Palliative Connect is a machine learning algorithm developed at the Hospital of the University of Pennsylvania to predict death within six months among hospitalized patients.11 A similar algorithm was developed at New York University Langone Health for two month mortality among hospitalized patients.12,13 While both appear to perform well in their validation studies, neither algorithm is readily available for external validation nor has published ED performance data. Other promising tools such as the modified Hospitalised-patient One-year Mortality Risk (mHOMR) score rely on admission-level information like admitting services, limiting their applicability to the ED.14 More significantly, the generalizability of these models in heterogeneous ED populations that include patients who will be discharged is uncertain.
To address the gap in automatable end-of-life screening in the ED, we developed a new electronic health record (EHR) based prediction algorithm, the Geriatric End-of-life Screening Tool (GEST) using logistic regression (LR) and clinical variables routinely available in the ED including age, vital signs, past diagnoses, and laboratory results to identify older adults at high risk of six month mortality. Because we are unable to retrospectively apply current screening tools such as the surprise question or P-CaRES to a new cohort, we compare GEST performance to the Charlson and Elixhauser comorbidity indices which have been robustly validated as prognostic tools in many contexts including in-hospital and one-year mortality following hospital discharge.15,16 To our knowledge, neither the Charlson or Elixhauser scores have been proposed as a potential end-of-life screening tool nor tested in a general population of older ED patients.
We then use GEST as a lens to study the code status and ACP availability of high-risk older adults in the ED. While prior studies have shown low rates of ACP completion in decedent populations, a finding mirrored in ED code status availability,17,18 decedents are not necessarily the same individuals as those predicted to die.19,20 We propose that this analyses focusing on patients with ex ante risk of life-limiting illness (i.e., those predicted to die) will provide a useful and distinct perspective as compared to ex post (i.e., decedents) analyses of the deceased, namely that the ex ante analysis allows for description of cohorts that can be prospectively identified for intervention.
The goals of this study are to develop and test a new EHR-based prediction algorithm, GEST, to predict six month mortality in a cohort of older ED patients. We will compare GEST’s performance to the Charlson and Elixhauser comorbidity indices. We then evaluate opportunities for code status or ACP documentation interventions in those identified as high-risk for mortality.
Methods
Study setting and participants
This is a retrospective observational cohort study of electronic health records (EHRs) from adults ≥ 65 years of age who visited one of the Yale New Haven Health System (YNHHS) EDs between 2014 and 2019 and had a home address in the state of Connecticut. The cohort was followed through December 2021. The YNHHS includes 9 EDs across Connecticut and Rhode Island. Model development and internal validation was performed using data from 8 EDs within the health system, while the final ED (Bridgeport, CT) was used as an independent site for reproducibility testing. This site was selected as it is geographically distinct from the other EDs while having a high clinical volume. The study was deemed exempt by all institutional review boards. The TRIPOD checklist for prediction model development was followed.21
Measures
Our primary outcome was mortality at 6 months. While various mortality models have addressed diverse outcome timelines including one,22 two,13 three, six,5 and twelve months,5 we selected six months to align with hospice eligibility.23 We used combined documented in-system mortality current through December 2021 and Connecticut state mortality data through November 2022 as evidence of mortality (Supplementary Methods S1). To confirm that non-decedents were survivors, we used documentation of future ED visits or future in-system vital signs through December 2021. Patients with unknown outcomes at 6-months were subsequently excluded from this study. Similarly, patients who presented to the ED in cardiac arrest as determined by a chief complaint of cardiac arrest or who died in the ED (ED disposition as “expired”) as the high mortality risk of these patients was obvious. Due to several cases where patients who were documented as discharged had, in fact, expired in the ED, patients whose deaths occurred within 24 hours of ED arrival were also excluded.
EHR algorithm development and methods of assessment
We developed the GEST EHR-data based model using LR. LR is a regression technique that forms the basis for many clinical prediction tools.24 LR offers the benefits of interpretability and robust performance often on-par with more complex machine learning methods.24 Based on our prior work developing prediction models for the ED setting, we included routinely available structured EHR data including patient demographics, chief complaints, summarized medical histories, vital signs, outpatient medications, laboratory results, procedures, and imaging orders in this study as candidate predictors before reducing our model to a more limited set of predictors (Supplementary Methods S2).25 Code status was not included as a predictor. Extensive testing was performed on the development cohort to identify parsimonious sets of candidate variables to include in the model (Supplementary Methods S3).
We used 5-fold cross validation to develop and internally validate the LR model using data from 8 ED sites. In this process, the model is trained on 80% of the data and tested on the remaining 20% iteratively until all folds have been the validation set. Since a single patient could have multiple visits, we used person-level rather than visit-level assignment to each of the folds. For each fold, missing data were standardized and imputed using the development sample median.26 Areas under the receiver-operating characteristic curve (AUROCs) and logistic regression coefficients that reveal the contributions of model variables to predictive performance27,28 were derived from the means and standard deviations across each of these folds. As a sensitivity analysis, we present the AUROC of a dataset where patients who are non-decedents are assumed to have survived.
We then combine all five validation folds to generate receiver-operating characteristic curves and precision-recall curves.29 All five folds were also used to test threshold-dependent outputs including sensitivity, specificity, positive predictive values, and negative predictive values. In order to focus on detection of high mortality risk patients, we used a threshold of 30% mortality risk at six months.11,23 As a sensitivity analysis, we also report the results of “threshold tuning” where we modify the cutoff at which a patient is considered high-risk to either 10% or 50% and again evaluate the above outputs. Calibration, how well predicted probabilities match actual probabilities, are reported using calibration plots.
To further assess for LR model reproducibility, we report model AUROC and calibration for an external ED site within the health system that was not part of the model development process. For this analysis, we use DeLong’s method to estimate AUROC confidence intervals.30
To assess for model demographic bias, we evaluated area under receiver operating characteristic curves across age deciles and racial and ethnic subgroups. These variables were extracted directly from the EHR. Due to small sample sizes, we limited our sensitivity analysis to patients documented as “Black or African American”, “Hispanic or Latino”, and/or Female. Recognizing that patients with more tenuous access to care may have less complete information available in the EHR, we performed a sensitivity analysis of GEST where all data prior to the index ED visit were masked (Supplementary Methods S4). This analysis was not included for the Charlson or Elixhauser indices since those methods rely exclusively on historical information.
Comorbidity Indices
For each patient encounter, we assigned a Charlson and Elixhauser comorbidity score based on all prior documented comorbidities in the EHR using the R package comorbidity (Supplementary Methods S5). For Charlson, ICD-9 and 10 codes were mapped using the Quan weightings for the prediction of one-year mortality following hospital discharge,16 while, for Elixhauser, the van Walraven weights for predicting in-hospital mortality were used.15 Because our goal is to assess the accuracy of automated scoring, patients for whom there are no prior encounters were assigned a score of 0 since no past medical history would have been available. The R package pROC and python packages scikit-learn and pyroc were used for model comparisons.
Decision curve analysis
We employ decision curve analysis using the Python package statkit to assess the net benefit of Charlson, Elixhauser, and GEST predictions across risk thresholds. Decision curve analysis enables calculation of a “net benefit” for prediction models in comparison to universal intervention for all or intervention for none.31 Net benefit equals the true positives minus false negatives weighted by the threshold at which intervention would be warranted.32 In our study, the risk thresholds under consideration are equivalent to the mortality risk at which a clinician would refer a patient for ACP or palliative needs assessment.
Code status and ACP analysis
For patient code statuses, we first extracted all code status orders from the EHR and then mapped the most recent available order to the ED encounter. The YNHHS employs a specific ACP note type that is displayed prominently when entering a patient chart (Figure S1). While there is no perfect ascertainment measure for ACP activities, we used these ACP documentation fields as a primary measure of ACP activities because it is a highly pragmatic endpoint used in other relevant studies13,33 and reflects the practice patterns of patient-facing clinicians.34 As with prior studies in this space, we do not include code status orders as evidence of ACP activity.35 For code status and ACP analyses, patients were assigned risk quintiles (e.g., 0-20% 6-month mortality risk) using the outputs of the GEST model.
Results
Of the 460,448 ED encounters from 137,125 unique patients screened for our analysis, a total of 27,452 visits (6.0%) by 13,356 patients were excluded for loss to follow up (Figure 1). A further 1,817 (0.4%) encounters were excluded for either a presenting complaint of cardiac arrest or in-ED mortality. The remaining cohort of 431,179 encounters by 123,128 unique patients was included in our analysis (Figure 1). The mean age of the cohort was 78.1 (SD 8.9), 57.6% female, 77.1% White, 14.0% Black, 8.7% Hispanic or Latino (Table 1). Demographics of the development dataset comprised of 8 EDs and the independent validation dataset from a single ED are shown in Table S1. The 6-month mortality rate associated with this cohort was 12.2%.
Figure 1:
Consort diagram
Table 1:
Study demographics.
| Overall | Decedents | ||
|---|---|---|---|
| n | 431179 | 53308 | |
| Age, mean (SD) | 78.1 (8.9) | 82.2 (9.2) | |
| Sex, n (%) | Female | 248167 (57.6) | 27504 (51.6) |
| Male | 182999 (42.4) | 25804 (48.4) | |
| Unknown | 13 (0.0) | ||
| Race, n (%) | Asian | 4324 (1.0) | 367 (0.7) |
| Black or African American | 60499 (14.0) | 5867 (11.0) | |
| Other | 34087 (7.9) | 3088 (5.8) | |
| White or Caucasian | 332269 (77.1) | 43986 (82.5) | |
| Ethnicity, n (%) | Hispanic or Latino | 37706 (8.7) | 3095 (5.8) |
| Non-Hispanic | 391127 (90.7) | 49819 (93.5) | |
| Unknown | 2346 (0.5) | 394 (0.7) | |
| Insurance, n (%) | Other | 2421 (0.6) | 469 (0.9) |
| Private | 152189 (35.3) | 15279 (28.7) | |
| Public | 273995 (63.5) | 37439 (70.2) | |
| Self-Pay | 2574 (0.6) | 121 (0.2) | |
| Code status, n (%) | Comfort measures only | 116 (0.0) | 67 (0.1) |
| Full | 190827 (44.3) | 26578 (49.9) | |
| No code/DNR | 37549 (8.7) | 11464 (21.5) | |
| None | 202687 (47.0) | 15199 (28.5) | |
| Comorbidities * | |||
| Congestive Heart Failure, n (%) | 75323 (18.6) | 16590 (32.0) | |
| Dementia, n (%) | 14196 (3.5) | 3058 (5.9) | |
| Cerebrovascular Disease, n (%) | 73098 (18.1) | 10944 (21.1) | |
| Chronic Obstructive Pulmonary Disease, n (%) | 93389 (23.1) | 14036 (27.1) | |
| Diabetes, n (%) | 105457 (26.1) | 13733 (26.5) | |
| Renal Disease, n (%) | 62802 (15.5) | 12531 (24.2) | |
| Cancer, n (%) | 85099 (21.0) | 17422 (33.6) | |
| Comorbidity scores * | |||
| Charlson, mean (SD) | 1.7 (2.1) | 2.8 (2.6) | |
| Elixhauser, mean (SD) | 7.4 (8.2) | 12.3 (9.3) |
Comorbidities and comorbidity scores include 26,408 patients with no documented medical histories. As described in the Methods, patients with no documented histories were included as having no comorbidities.
The GEST model was developed using 18 predictors with a cross-validation AUROC of 0.82 (0.82-0.83) (Figure 2A) and robust calibration (Figure 2B). The top model coefficients were ED minimum systolic blood pressure, age, and history of lung cancer (Figure S2, Table S2). GEST identified 10.5% of patients as having a greater than 30% six month mortality risk (Figure 2C), with an observed mortality rate of 46.1%. Test characteristics for six month mortality using a 30% risk threshold are shown in Table 2.
Figure 2:
Model performance characteristics. (A) Receiver-operating characteristic curve and (B) calibration plots for the Charlson, Elixhauser, and GEST models with 95% CI shown with shading. Calibration plots were fit to a second-order polynomial. (C) Mortality risk distribution across the validation cohort using GEST. (D) Decision curve analysis comparing Charlson, Elixhauser, and GEST screening methods to universal referral (always act) or inaction (never act).
Table 2:
Model performance characteristics.
| AUC | Sensitivity | Specificity | PPV | NPV | Percent of patients identified with > 30% 6-month mortality risk |
|
|---|---|---|---|---|---|---|
| Charlson | 0.65 (0.64-0.67) | 10.2 (10.0 - 11.0) | 97.1 (97.0 - 97.0) | 33.1 (32.0 - 34.0) | 88.6 (89.0 - 89.0) | 3.80% |
| Elixhauser | 0.69 (0.68-0.70) | 11.1 (11.0 - 11.0) | 96.9 (97.0 - 97.0) | 32.8 (32.0 - 34.0) | 88.7 (89.0 - 89.0) | 4.10% |
| GEST | 0.82 (0.82-0.83) | 39.6 (39.0 - 40.0) | 93.6 (93.0 - 94.0) | 46.1 (46.0 - 47.0) | 91.8 (92.0 - 92.0) | 10.50% |
Areas under receiver-operating characteristic curves (AUROCs), sensitivities, specificities, positive predictive values (PPV), negative predictive values (NPV) for 6 -month mortality using an assigned model cutoff of 30% mortality risk are shown with 95% CI. Additionally, the percent of patients in the validation cohort who screen positive is shown. Charlson - Charlson comorbidity index with Quan weights; Elixhauser - Elixhauser comorbidity index with van Walraven weights; GEST - Geriatric End-of-life Screening Tool.
The mean Charlson and Elixhauser scores in our cohort were 1.7 and 7.4, respectively (Figure S3). Comorbidities and comorbidity scores were unavailable for 26,408 (6.1%) encounters. In cross-validation, AUROCs for the two indices were 0.65 (0.64-0.69) and 0.69 (0.68-0.70) (Figure 2A) and model calibration showed decay above 40% (Figure 2B). Charlson and Elixhauser identified 3.8% and 4.1% of all ED patient encounters as having a >30% mortality risk with observed mortality rates of 33.1% and 32.8%. As compared to GEST, both Charlson and Elixhauser had significantly lower sensitivity, but maintained high specificity (Table 2). Precision-recall curves for all models are shown in Figure S4. When the Charlson, Elixhauser, and GEST models were tested against the single-site held out dataset, AUROCs were similar to those from cross-validation (Charlson: 0.66 (0.65-0.66), Elixhauser: 0.68 (0.67-0.68), GEST: 0.82 (0.81-0.82)).
We then redefined our “high mortality risk” at higher and lower thresholds and reevaluated model sensitivity and specificity. By decreasing the risk threshold to 10% mortality risk at 6 months, the sensitivity of all three models improved (GEST: 78.6 (78.0-79.0); Charlson: 65.0 (65.0-65.0); Elixhauser 69.1 (69.0 - 70.0)) at the cost of specificity (GEST: 70.7 (71.0-78.0); Charlson: 78.6 (78.0-79.0); Elixhauser: 71.9 (72.0-72.0)) (Table S3). GEST identified 26.7% of patients as having greater than 10% mortality risk, while Charlson and Elixhauser identified 23.5% and 30.8% of patients, respectively. Conversely, increasing the risk threshold to 50% increased specificity at the cost of sensitivity (Table S4). Notably, GEST identified 2.7% of patients in the validation cohort as having a greater than 50% mortality risk at 6 months, while Charlson and Elixhauser identified 0.1%.
To assess for model bias, we then compared model AUROCs across age deciles as well as race, ethnicity, and gender. All models performed better for patients in their seventh decade of life as compared to patients greater than 90 years of age, but the degradation was worse for Charlson and Elixhauser (60s: Charlson 0.74 (0.72-0.76), Elixhauser 0.75 (0.72-0.77), GEST 0.82 (0.82-0.83); >90: Charlson 0.57 (0.57-0.58), Elixhauser 0.60 (0.59-0.61), GEST 0.74 (0.73-0.74)) (Table S5). Model performance was stable across Black, Hispanic or Latino, and female patient subsets (Table S6). Model calibration across these patient subsets is shown in Figure S5. GEST without historical diagnosis information performed with an AUROC of 0.82 (0.82-0.83). When applied to our cohort without exclusions for loss to follow-up, GEST performed with an AUROC of 0.82 (0.82-0.82).
In assessment of end-of-life screening tools, decision curve analysis enables comparisons of the “net benefit” of different instruments across various mortality risk thresholds. Universal referral to end-of-life resources yields positive benefit for a threshold probability less than the population six month mortality rate of 12% (Figure 2D). This is explained mathematically by the low risk tolerance compensating for low screening yields.31 Conversely, at higher tolerance risk thresholds, universal non-screening is preferred to universal screening because the low rate of true positives is outweighed by the false positives. A perfect screening tool has a net benefit equal to the population mortality rate of 12% because every patient referred to intervention is a true positive with no false positives. Similar to our AUROC and calibration findings, across a range of risk thresholds from 10% to 50%, GEST has a higher net benefit than both Charlson and Elixhauser scores (Figure 2D).32
ACP documentation was available for 2.0% of encounters and for only 5.0% of patients with a mortality risk of greater than 30% using GEST (Figure 3). 52.7% of patients had a code status documented prior to their ED encounter, which increased to 79.0% for patients at high mortality risk. When documentation was present, 83.3% of patients were full code which decreased to 70.7% in high mortality risk patients (Figure 3).
Figure 3:
(A) ACP availability and (B, C) code status documentation across GEST risk strata. For (C), y-axis is percent of patients who were full code given code status documentation was available.
Discussion
Using data from a large observational cohort of older adults who visited an ED, we show that a novel, fully automatable EHR screening instrument, GEST, outperforms the Charlson and Elixhauser comorbidity indices in identifying patients at high risk of six month mortality. While the Charlson and Elixhauser comorbidity indices identified around 2% of all ED visits by older adults as having a greater than 30% six month mortality risk, GEST identified about 4 times as many patients as being high risk while maintaining robust specificity. Moreover, GEST had a higher net benefit than Charlson or Elixhauser across a range of mortality risk thresholds.
GEST uses a mixture of demographic, past medical, vital sign, lab, and healthcare utilization data. Among GEST variables, we note the importance of history of delirium or dementia, a finding that mirrors a recent study showing persons living with dementia have a higher risk of one-year mortality following an ED visit as compared to individuals without dementia.36 Similarly, though not often used as a prognostic marker in the ED, reticulocyte distribution width is associated with increased inpatient mortality, hospital readmission, and length of stay.37 We note that a unique aspect of our model is the ability to blend historical and demographic risk factors with real-time data like vital signs and laboratory results.
Qualitative work has revealed that proactive identification of patients who may benefit from palliative care is a key facilitator of ED palliative care integration3 and to our knowledge, none of the ED-specific screening tools including the surprise question, P-CaRES, or SPEED have been evaluated as automated approaches. In automating screening, GEST, and to a lesser extent the Charlson and Elixhauser scores, may enable facile capture of at-risk populations, allowing clinicians to focus their energies on high-risk groups. Moreover, in contrast to current screening tools all three of the methods studied here allow risk threshold tuning which allows users to increase or decrease mortality risk sensitivity in exchange for specificity. In the context of screening for ACP or palliative care needs, this feature allows health systems to modify their algorithmic sensitivity to match their operational bandwidth.23 Our data indicate that GEST has a clear advantage over Charlson and Elixhauser models, offering significantly better sensitivity at the various tested thresholds. Opportunities to perform tiered or joint screening with both an automated instrument and a previously validated tool such as the surprise question is an important area for future research.
ED end-of-life screening tools may serve different purposes among admitted and discharged cohorts. For patients being admitted, automated screening can flag ED clinicians or social workers to initiate a conversation which can then be documented and continued by the inpatient team. We anticipate that a small cohort of patients may be appropriate for ED referral to hospice services, preventing admission and facilitating patient access to care that best aligns with their wishes.36 Health systems with more extensive palliative care resources may be able to initiate primary palliative consultation for admitted ED patients, many of whom will have prolonged ED boarding times.38,39 For discharged patients, screening can serve as a key digital linkage to primary care and oncologic practices and facilitate timely post-discharge discussions. Regardless of ED disposition, many patients will have care partners with them in the ED and automated screening may facilitate engagement and documentation of healthcare proxies, an often key element of end-of-life decision-making.40,41
We observed very low levels of ACP notes across the health system. We also noted that most patients with code status documentation were full code. Specifically, at the moment of ED care provision, identifiably high-risk older adults lack readily accessible ACP documentation and, in the mind of the health system EHR, a majority are full code. These observations highlight the opportunity and importance of ED-based end-of-life screening. Significant efforts are being made to advance the science of ED based end-of-life care through the development of conversation and decision aids as well as trials of ED-based palliative care.2,42-44 If validated, the work described here presents a pragmatic approach to initiate interventions with at-risk patient populations in a resource-sensitive manner.
Limitations
This study was performed across 9 EDs, but our findings may not be generalizable to other health systems in prospective validation. Our ACP and code status analysis was pragmatic and based on available EHR data, but we recognize that there are numerous ways that these data are communicated including transfer facility documentation and via attendant families. Moreover, we did not perform a natural language processing analysis looking for ACP documentation in other types of EHR notes though this has been shown to enable capture of ACP activities.34
While we have performed numerous analyses to assess the robustness and reproducibility of our models across ages, genders, races, and ethnicities, within our primary cohort of 8 EDs, a second cohort of a single geographically distinct ED within our health system, and then among patients without known survival data, we are cognizant of the possibility our screening tool may have unintended consequences for healthcare resource allocation. Sources of possible bias in this study include unclear methods of assignment for race, ethnicity, or gender within our dataset and structural racism in laboratory and healthcare utilization, among others.45,46
Conclusions
In this observational study of older adults in EDs, our novel screening instrument, GEST, provided facile identification of older adults at risk of six month mortality, outperforming the Charlson and Elixhauser indices. Most high mortality risk older adults were full code and had limited ACP documentation available. These data suggest an identifiable cohort of at-risk older adults that could benefit from targeted ED end-of-life screening efforts. External, prospective validation is needed to test the generalizability of this work.
Supplementary Material
Methods S1: Mortality linkage between electronic health records and state records.
Methods S2: Electronic health records candidate features.
Methods S3: Logistic regression (LR) model development and validation.
Methods S4: Model robustness to missing data.
Methods S5: Calculation of Charlson and Elixhauser comorbidity indices.
Figure S1: Advance care planning note and code status display in the electronic health record.
Figure S2: Logistic regression coefficients for GEST.
Figure S3: Charlson and Elixhauser comorbidity score distributions across the study cohort.
Figure S4: Precision-recall curves. Average precision (AP) is shown in the legend.
Figure S5: Calibration curves for GEST across Black, Hispanic or Latino, and female patient subsets.
Table S1: Demographics of 8 EDs used for cross-validation and remaining ED used as a held-out cohort.
Table S2: GEST model coefficients from 5-fold cross validation across 8 ED sites.
Table S3: Test characteristics using 10% mortality risk threshold.
Table S4: Test characteristics using 50% mortality risk threshold.
Table S5: Model performance across age deciles
Table S6: Model performance across Black, Hispanic, and Female cohorts.
Key Points:
There is growing interest in using emergency department visits for older adults as an opportunity to link patients with advance care planning and palliative services, but there is a dearth of pragmatic screening tools for this purpose.
We developed a novel prognostic model, the Geriatric End-of-life Screening Tool (GEST), to identify older adults in the emergency department at risk of six month of mortality and show that it outperforms the Charlson and Elixhauser comorbidity indices.
When implemented into the electronic health record, GEST may enable robust, automated end-of-life screening in the emergency department.
Why does this matter?
A majority of older adults will visit the emergency department within six months of dying, but the identification of patients with life-limiting illness remains a key challenge in providing timely and resource-sensitive advance care planning (ACP) and palliative care services. Given the lack of tools for this purpose, we developed a novel screening tool, the Geriatric End-of-life Screening Tool (GEST) and compared it to the Charlson and Elixhauser comorbidity indices in predicting six month mortality. We show that GEST provides robust mortality risk stratification and that high-risk patients often do not have available documentation of code status or advance care planning. Taken together, we propose that GEST may provide a pragmatic route to connect vulnerable, high risk older adults in the ED with end-of-life services.
Acknowledgements:
The authors would like to the contributions of Huan Li and Sabrina Su.
Disclosures:
ADH is funded by NIA T32AG023480 and NIH KL2TR002542. MS is supported by a NIH/NIA K24 AG071906 mid-career investigator award. UH is supported by NIH/NIA R61 AG069822, NIH/NIA, R33 AG058926, VA HS002421, and the John A. Hartford Foundation and Gary & Mary West Health Institute.
Footnotes
The authors report no conflicts of interest.
Sponsors had no role in study design, methods, recruitment, data collection, analysis, or manuscript preparation.
References
- 1.Smith AK, McCarthy E, Weber E, Cenzer IS, Boscardin J, Fisher J & Covinsky K Half of older Americans seen in emergency department in last month of life; most admitted to hospital, and many die there. Health Aff. 31, 1277–1285 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ouchi K, George N, Schuur JD, Aaronson EL, Lindvall C, Bernstein E, Sudore RL, Schonberg MA, Block SD & Tulsky JA Goals-of-Care Conversations for Older Adults With Serious Illness in the Emergency Department: Challenges and Opportunities. Ann. Emerg. Med 74, 276–284 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Gips A, Daubman B-R, Petrillo LA, Bowman J, Ouchi K, Traeger L, Jackson V, Grudzen C, Ritchie CS & Aaronson EL Palliative care in the emergency department: A qualitative study exploring barriers, facilitators, desired clinician qualities, and future directions. Palliat. Support. Care 1–6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.George N, Phillips E, Zaurova M, Song C, Lamba S & Grudzen C Palliative Care Screening and Assessment in the Emergency Department: A Systematic Review. J. Pain Symptom Manage 51, 108–19.e2 (2016). [DOI] [PubMed] [Google Scholar]
- 5.Yourman LC, Lee SJ, Schonberg MA, Widera EW & Smith AK Prognostic indices for older adults: a systematic review. JAMA 307, 182–192 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Downar J, Goldman R, Pinto R, Englesakis M & Adhikari NKJ The “surprise question” for predicting death in seriously ill patients: a systematic review and meta-analysis. CMAJ 189, E484–E493 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ouchi K, Strout T, Haydar S, Baker O, Wang W, Bernacki R, Sudore R, Schuur JD, Schonberg MA, Block SD & Tulsky JA Association of Emergency Clinicians’ Assessment of Mortality Risk With Actual 1-Month Mortality Among Older Adults Admitted to the Hospital. JAMA Netw Open 2, e1911139 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ouchi K, Jambaulikar G, George NR, Xu W, Obermeyer Z, Aaronson EL, Schuur JD, Schonberg MA, Tulsky JA & Block SD The “Surprise Question” Asked of Emergency Physicians May Predict 12-Month Mortality among Older Emergency Department Patients. J. Palliat. Med 21, 236–240 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ernecoff NC, Abdel-Kader K, Cai M, Yabes J, Shah N, Schell JO & Jhamb M Implementation of Surprise Question Assessments using the Electronic Health Record in Older Adults with Advanced CKD. Kidney360 2, 966–973 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kirkland SW, Yang EH, Garrido Clua M, Kruhlak M, Campbell S, Villa-Roel C & Rowe BH Screening tools to identify patients with unmet palliative care needs in the emergency department: A systematic review. Acad. Emerg. Med 29, 1229–1246 (2022). [DOI] [PubMed] [Google Scholar]
- 11.Courtright KR, Chivers C, Becker M, Regli SH, Pepper LC, Draugelis ME & O’Connor NR Electronic Health Record Mortality Prediction Model for Targeted Palliative Care Among Hospitalized Medical Patients: a Pilot Quasi-experimental Study. J. Gen. Intern. Med 34, 1841–1847 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Major VJ & Aphinyanaphongs Y Development, implementation, and prospective validation of a model to predict 60-day end-of-life in hospitalized adults upon admission at three sites. BMC Med. Inform. Decis. Mak 20, 214 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wang E, Major VJ, Adler N, Hauck K, Austrian J, Aphinyanaphongs Y & Horwitz LI Supporting Acute Advance Care Planning with Precise, Timely Mortality Risk Predictions. NEJM Catalyst Innovations in Care Delivery 2, (2021). [Google Scholar]
- 14.Wegier P, Koo E, Ansari S, Kobewka D, O’Connor E, Wu P, Steinberg L, Bell C, Walton T, van Walraven C, Embuldeniya G, Costello J & Downar J mHOMR: a feasibility study of an automated system for identifying inpatients having an elevated risk of 1-year mortality. BMJ Qual. Saf 28, 971–979 (2019). [DOI] [PubMed] [Google Scholar]
- 15.van Walraven C, Austin PC, Jennings A, Quan H & Forster AJ A modification of the Elixhauser comorbidity measures into a point system for hospital death using administrative data. Med. Care 47, 626–633 (2009). [DOI] [PubMed] [Google Scholar]
- 16.Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, Januel J-M & Sundararajan V Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am. J. Epidemiol 173, 676–682 (2011). [DOI] [PubMed] [Google Scholar]
- 17.Palmer MK, Jacobson M & Enguidanos S Advance Care Planning For Medicare Beneficiaries Increased Substantially, But Prevalence Remained Low. Health Aff. 40, 613–621 (2021). [DOI] [PubMed] [Google Scholar]
- 18.Portanova J, Ailshire J, Perez C, Rahman A & Enguidanos S Ethnic Differences in Advance Directive Completion and Care Preferences: What Has Changed in a Decade? J. Am. Geriatr. Soc 65, 1352–1357 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Newhouse JP An iconoclastic view of health cost containment. Health Aff. 12 Suppl, 152–171 (1993). [DOI] [PubMed] [Google Scholar]
- 20.Einav L, Finkelstein A, Mullainathan S & Obermeyer Z Predictive modeling of U.S. health care spending in late life. Science 360, 1462–1465 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Collins GS, Reitsma JB, Altman DG & Moons KGM Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 13, 1 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Elfiky AA, Pany MJ, Parikh RB & Obermeyer Z Development and Application of a Machine Learning Approach to Assess Short-term Mortality Risk Among Patients With Cancer Starting Chemotherapy. JAMA Netw Open 1, e180926 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Parikh RB, Manz C, Chivers C, Regli SH, Braun J, Draugelis ME, Schuchter LM, Shulman LN, Navathe AS, Patel MS & O’Connor NR Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer. JAMA Netw Open 2, e1915997 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY & Van Calster B A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol 110, 12–22 (2019). [DOI] [PubMed] [Google Scholar]
- 25.Hong WS, Haimovich AD & Taylor RA Predicting hospital admission at emergency department triage using machine learning. PLoS One 13, e0201016 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Berkelmans GFN, Read SH, Gudbjörnsdottir S, Wild SH, Franzen S, van der Graaf Y, Eliasson B, Visseren FLJ, Paynter NP & Dorresteijn JAN Population median imputation was noninferior to complex approaches for imputing missing values in cardiovascular prediction models in clinical practice. J. Clin. Epidemiol 145, 70–80 (2022). [DOI] [PubMed] [Google Scholar]
- 27.Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N & Lee S-I From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell 2, 56–67 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Haimovich AD, Ravindra NG, Stoytchev S, Young HP, Wilson FP, van Dijk D, Schulz WL & Taylor RA Development and Validation of the Quick COVID-19 Severity Index: A Prognostic Tool for Early Clinical Decompensation. Ann. Emerg. Med 76, 442–453 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Saito T & Rehmsmeier M The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 10, e0118432 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Justice AC, Covinsky KE & Berlin JA Assessing the generalizability of prognostic information. Ann. Intern. Med 130, 515–524 (1999). [DOI] [PubMed] [Google Scholar]
- 31.Vickers AJ, Van Calster B & Steyerberg EW Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests. BMJ 352, i6 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Vickers AJ, van Calster B & Steyerberg EW A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res 3, 18 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Manz CR, Parikh RB, Small DS, Evans CN, Chivers C, Regli SH, Hanson CW, Bekelman JE, Rareshide CAL, O’Connor N & Others. Effect of integrating machine learning mortality estimates with behavioral Nudges to clinicians on serious illness conversations among patients with cancer: a Stepped-Wedge cluster randomized clinical trial. JAMA oncology 6, e204759–e204759 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Lindvall C, Deng C-Y, Moseley E, Agaronnik N, El-Jawahri A, Paasche-Orlow MK, Lakin JR, Volandes A & Tulsky TA-PIJA Natural Language Processing to Identify Advance Care Planning Documentation in a Multisite Pragmatic Clinical Trial. J. Pain Symptom Manage (2021). doi: 10.1016/j.jpainsymman.2021.06.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Lee RY, Brumback LC, Lober WB, Sibley J, Nielsen EL, Treece PD, Kross EK, Loggers ET, Fausto JA, Lindvall C, Engelberg RA & Curtis JR Identifying Goals of Care Conversations in the Electronic Health Record Using Natural Language Processing and Machine Learning. J. Pain Symptom Manage 61, 136–142.e2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hill JD, Schmucker AM, Siman N, Goldfeld KS, Cuthel AM, Chodosh J, Bouillon-Minois J-B & Grudzen CR Emergency and post-emergency care of older adults with Alzheimer’s disease/Alzheimer’s disease related dementias. J. Am. Geriatr. Soc (2022). doi: 10.1111/jgs.17833 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Kim KM, Nerlekar R, Tranah GJ, Browner WS & Cummings SR Higher red cell distribution width and poorer hospitalization-related outcomes in elderly patients. J. Am. Geriatr. Soc 70, 2354–2362 (2022). [DOI] [PubMed] [Google Scholar]
- 38.Lee J, Abrukin L, Flores S, Gavin N, Romney M-L, Blinderman CD & Nakagawa S Early intervention of palliative care in the emergency department during the COVID-19 pandemic. JAMA Intern. Med 180, 1252–1254 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kelen GD, Wolfe R, D’Onofrio G, Mills AM, Diercks D, Stern SA, Wadman MC & Sokolove PE Emergency department crowding: The canary in the health care system. Catalyst non-issue content 2, (2021). [Google Scholar]
- 40.Sean Morrison R, Meier DE & Arnold RM What’s Wrong With Advance Care Planning? JAMA (2021). doi: 10.1001/jama.2021.16430 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Loffredo AJ, Chan GK, Wang DH, Goett R, Isaacs ED, Pearl R, Rosenberg M, Aberger K & Lamba S United States Best Practice Guidelines for Primary Palliative Care in the Emergency Department. Ann. Emerg. Med (2021). doi: 10.1016/j.annemergmed.2021.05.021 [DOI] [PubMed] [Google Scholar]
- 42.Ouchi K, Lawton AJ, Bowman J, Bernacki R & George N Managing code status conversations for seriously ill older adults in respiratory failure. Ann. Emerg. Med 76, 751–756 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Leiter RE, Yusufov M, Hasdianda MA, Fellion LA, Reust AC, Block SD, Tulsky JA & Ouchi K Fidelity and Feasibility of a Brief Emergency Department Intervention to Empower Adults With Serious Illness to Initiate Advance Care Planning Conversations. Journal of Pain and Symptom Management 56, 878–885 Preprint at 10.1016/j.jpainsymman.2018.09.003 (2018) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Schmucker AM, Flannery M, Cho J, Goldfeld KS, Grudzen C & EMPallA Investigators. Data from emergency medicine palliative care access (EMPallA): a randomized controlled trial comparing the effectiveness of specialty outpatient versus telephonic palliative care of older adults with advanced illness presenting to the emergency department. BMC Emerg. Med 21, 83 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Flanagin A, Frey T, Christiansen SL & AMA Manual of Style Committee. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA 326, 621–627 (2021). [DOI] [PubMed] [Google Scholar]
- 46.Rojas JC, Fahrenbach J, Makhni S, Cook SC, Williams JS, Umscheid CA & Chin MH Framework for Integrating Equity Into Machine Learning Models: A Case Study. Chest 161, 1621–1627 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Methods S1: Mortality linkage between electronic health records and state records.
Methods S2: Electronic health records candidate features.
Methods S3: Logistic regression (LR) model development and validation.
Methods S4: Model robustness to missing data.
Methods S5: Calculation of Charlson and Elixhauser comorbidity indices.
Figure S1: Advance care planning note and code status display in the electronic health record.
Figure S2: Logistic regression coefficients for GEST.
Figure S3: Charlson and Elixhauser comorbidity score distributions across the study cohort.
Figure S4: Precision-recall curves. Average precision (AP) is shown in the legend.
Figure S5: Calibration curves for GEST across Black, Hispanic or Latino, and female patient subsets.
Table S1: Demographics of 8 EDs used for cross-validation and remaining ED used as a held-out cohort.
Table S2: GEST model coefficients from 5-fold cross validation across 8 ED sites.
Table S3: Test characteristics using 10% mortality risk threshold.
Table S4: Test characteristics using 50% mortality risk threshold.
Table S5: Model performance across age deciles
Table S6: Model performance across Black, Hispanic, and Female cohorts.



