Abstract
Objective –
Use health records data to predict suicide death following emergency department visits.
Methods –
Electronic health records and insurance claims from seven health systems were used to: identify emergency department visits with mental health or self-harm diagnoses by members aged 11 or older; extract approximately 2500 potential predictors including demographic, historical, and baseline clinical characteristics; and ascertain subsequent deaths by self-harm. Logistic regression with lasso and random forest models predicted self-harm death over 90 days after each visit.
Results –
Records identified 2,069,170 eligible visits, 899 followed by suicide death within 90 days. The best-fitting logistic regression with lasso model yielded an area under the receiver operating curve of 0.823 (95% CI 0.810 - 0.836). Visits above the 95th percentile of predicted risk included 34.8% (95% CI 31.1 – 38.7) of subsequent suicide deaths and had a 0.303% (95% CI 0.261 - 0.346) suicide death rate over the following 90 days. Model performance was similar across subgroups defined by age, sex, race, and ethnicity.
Conclusions –
Machine learning models using coded data from health records have moderate performance in predicting suicide death following emergency department visits for mental health or self-harm diagnosis and could be used to identify patients needing more systematic follow-up.
Keywords: Suicide, emergency department, prediction, epidemiology, machine learning, self-harm
People seen in emergency departments for mental health problems or self-harm are at high risk for subsequent death by suicide. Among visitors to California emergency departments, those presenting with self-harm experienced one-year suicide mortality 60 times that of the general population, and those presenting with suicidal ideation experienced suicide mortality 30 times as high1. A similar pattern, but with smaller increases in risk, was seen among Medicaid beneficiaries visiting emergency rooms for suicidal ideation or self-harm2. Among US health plan members who died by suicide, nearly half had at least one emergency department visit in the prior year with approximately one quarter visiting the emergency department for a mental health or substance use condition3.
Accurate identification of emergency department visitors at highest risk for suicidal behavior remains a challenge. While the Joint Commission recommends systematic screening of people receiving mental health care in emergency departments4, standard self-report questionnaires have limited sensitivity and only moderate predictive accuracy when evaluated in emergency department settings5–9.
A large literature describes the accuracy of machine learning models based on health records data to identify people at risk for suicidal behavior after outpatient visits or hospitalizations10. Some of these models achieve overall classification performance (as measured by area under the receiver operating curve or c-statistic) exceeding 85%10–12. Rather than identifying previously unknown predictors or categories of risk, these statistical models outperform screening questionnaires or simple risk checklists by simultaneously considering large numbers of risk factors recorded in health records13.
Less previous research has evaluated classification performance of prediction models using health records to identify emergency department visitors at increased risk for suicide. Among patients seen for self-harm in the Canadian province of Alberta, machine learning models using linked health care and social service databases predicted suicide mortality over the following 90 days with overall classification performance or AUC of approximately 0.8814. In a sample of patients seen by a Boston psychiatric emergency service, a machine learning model using health records data and patients’ self-reports of suicidal ideation and other risk factors predicted subsequent suicide attempt with an AUC of 0.798. In a pediatric emergency department population, a prediction model using records data predicted subsequent suicide-related visit with an AUC of 0.789. In a sample of California emergency department visits, machine learning models limited to data from that single emergency department visit predicted subsequent suicide death with AUCs ranging from 0.69 to 0.77 across different racial and ethnic groups15. No previous reports including US samples describe prediction of suicide death following emergency department care using coded or discrete data extracted from longitudinal health records data.
We describe here the development and validation of a model predicting death by suicide following an emergency department visit involving a mental health or self-harm diagnosis using records data from seven large integrated health systems.
METHODS
Data were collected from records of seven health systems (HealthPartners, Henry Ford Health, and the Colorado, Hawaii, Northwest, Southern California, and Washington regions of Kaiser Permanente) serving a combined population of approximately 6 million members or patients in the states of California, Colorado, Hawaii, Michigan, Minnesota, Oregon, and Washington. Members are insured or enrolled via employer sponsored insurance, individual insurance, Medicare, Medicaid, and subsidized insurance exchange plans. Patients served by these systems are generally representative of service area populations in terms of age, race, and ethnicity16. Responsible institutional review boards for each health system approved the use of de-identified records data for this research.
Research centers in each participating health system maintain harmonized research data warehouses integrating data from electronic health records (for services provided by the health system) and insurance claims (for external services covered by the health insurance plan)17. Health system data are linked to state and/or national vital statistics data to ascertain date and cause of death.
The study sample included all encounters by health system members to emergency departments or urgent care centers for which any diagnosis of a mental health condition, a substance use disorder, or a self-harm injury or poisoning was recorded. A list of included diagnoses can be found at https://mhresearchnetwork.org/resources/mhrn-data-resources/useful-tools/ . Eligible visits were limited to those for which the patient was enrolled in the participating health system on the visit date, but no duration of prior enrollment was required. An individual with more than one eligible encounter during the study period would contribute multiple encounters to the sample. Eligible encounters were excluded from analyses in cases of death by cause other than self-harm within 90 days of the index encounter.
For each eligible encounter, potential predictors of subsequent suicide death recorded during the prior 60 months were extracted from research data warehouses. Predictors included age, sex recorded in the electronic health record (most often indicating sex assigned at birth), self-reported race and ethnicity recorded in the electronic health record (typically reported at an outpatient visit in response to standard categories), source of insurance coverage, mental health and substance use disorder diagnoses (in 14 categories), chronic pain diagnoses, traumatic brain injury diagnoses, dispensed prescriptions for mental health medications (in 8 categories), prior injury or poisoning diagnoses (in 4 categories), and prior emergency department or inpatient encounters with mental health diagnoses, prior outpatient mental health specialty visits, responses to PHQ-9 depression questionnaires (both total scores and response to the 9th item regarding thoughts of death or self-harm), 17 indicators of chronic medical illness following the Charlson Comorbidity score, and neighborhood-level indicators of household income and educational attainment. To improve generalizability across health systems with potentially different prescribing patterns, diagnosis coding systems, and coding practices, representation of diagnoses and prescriptions collapsed large numbers of potential predictors into relatively broad clinical categories (e.g. all depression diagnoses combined into a single category, all benzodiazepines combined into a single category). For each category of mental health or substance use diagnosis or service use, count data for each of the prior 60 months (e.g., number of days on which diagnosis was recorded in each of the 60 months, number of days’ supply of medication dispensed during each of the 60 months) were transformed into 52 possible time patterns reflecting various permutations of first onset, most recent occurrence, and increase or decrease over time. Prescription dispensing data were transformed into 24 possible time patterns. Time categories for diagnosis, utilization, and medication predictors all included potential adjustments for duration of prior enrollment in the participating health system. A data dictionary describing potential predictors and time patterns is included in Appendix A.
Suicide deaths and deaths from other causes were ascertained by linking health system records to state mortality records. These linkages ascertain date and cause of death for all members or patients regardless of duration of health plan enrollment. Deaths with any ICD-10 cause of death code in the range X60 through X84 or Y87.0 (indicating intentional self-harm) or the range Y10 through Y34 or Y87.2 (indicating undetermined intent) were considered suicide deaths in these analyses. Inclusion of those deaths coded as having undetermined intent increased the count of suicide deaths by approximately 5 percent.
Logistic regression with lasso models18 were developed in RStudio19 using the CRAN package keras20. Five-fold cross-validation was used to find the optimal degree of penalization by maximizing the area under the receiver operating curve (AUC) on the left-out-folds. Confidence intervals (CIs) for AUC and other performance measures were constructed using 10,000 bootstrap replicates21. Logistic models considered a large number of potential interaction terms between the predictors listed in Appendix A, yielding 12,066 candidate predictors for logistic models. Random forest models were estimating using the CRAN package ranger22. R19 was used to fit models, using a variation of the predictors in Appendix A that allowed the model to assign missing values to either side of a split in a node. Again, five-fold cross-validation and the AUC was used to establish optimal values for the minimal size for splitting a node, as well as the number of predictors selected to consider splitting on at each node. All potential predictors were standardized to have a range from 0 to 1 prior to model fitting.
Model performance characteristics were evaluated using the out-of-sample cross-validation predictions generated during the cross-validation process. Previous research indicates that this approach provides adequate protection against optimism or over-fitting when predicting infrequent events23 and specifically for predicting suicide death using health records data24. We report the AUC of the models with the optimal parameter values for the out-of-sample predictions as well as positive predictive value and sensitivity using a variety of cut-points and calibration tables. Performance was evaluated overall as well as by age, type of mental health diagnosis at the index emergency department visit (with or without self-harm event), sex (male, female), and race (Asian, Black/African American, Native Hawaiian/Pacific Islander, American Indian/Alaskan Native, White, more than one race selected, Other, unknown) and ethnicity (Hispanic, non-Hispanic).
RESULTS
The criteria described above identified 2,070,638 eligible visits, of which 1,468 were excluded due to death from cause other than self-harm within 90 days. Of the remaining 2,069,170 eligible visits by 989,306 unique patients, 899 were followed by a self-harm death within 90 days. The 899 visits followed by self-harm death represented 807 unique self-harm deaths as some self-harm deaths were linked with more than one prior visit. Characteristics of the final visit sample, including those followed and not followed by self-harm death, are shown in Table 1.
Table 1 –
Characteristics of emergency department visits with mental health or self-harm diagnoses
| All eligible visits n=2,069,170 | Fatal self-harm within 90 days (n=899) | |||
|---|---|---|---|---|
| Diagnoses at index encounter | ||||
| Includes injury or poisoning | 380,677 | 18.40% | 210 | 23.36% |
| No injury or poisoning | 1,688,493 | 81.60% | 689 | 76.64% |
| Female | 1,241,034 | 59.98% | 428 | 47.61% |
| Race | ||||
| Asian | 83,138 | 4.02% | 39 | 4.34% |
| Black | 208,644 | 10.08% | 33 | 3.67% |
| Native Hawaiian/Pacific Islander | 9,316 | 0.45% | 1 | 0.11% |
| American Indian/Alaskan Native | 11,801 | 0.57% | 5 | 0.56% |
| White | 1,427,362 | 68.98% | 683 | 75.97% |
| More than one | 76,146 | 3.68% | 21 | 2.34% |
| Other | 10,702 | 0.52% | 7 | 0.78% |
| Unknown | 242,061 | 11.70% | 110 | 12.24% |
| Ethnicity | ||||
| Hispanic | 568,799 | 27.49% | 88 | 9.79% |
| Not Hispanic | 1,255,115 | 60.66% | 590 | 65.63% |
| Unknown | 245,256 | 11.85% | 221 | 24.58% |
| Diagnoses in prior 12 months | ||||
| Depressive disorder | 931,002 | 44.99% | 559 | 62.18% |
| Anxiety Disorder | 895,934 | 43.30% | 540 | 60.07% |
| Bipolar Disorder | 161,495 | 7.80% | 168 | 18.69% |
| Schizophrenia Spectrum Disorder | 65,865 | 3.18% | 60 | 6.67% |
| Personality Disorder | 148,850 | 7.19% | 163 | 18.13% |
| Alcohol Use Disorder | 223,389 | 10.80% | 261 | 29.03% |
| Other Substance Use Disorder | 270,069 | 13.05% | 261 | 29.03% |
| Maximum Item 9 score in past 12 mos | ||||
| None | 1,591,003 | 76.89% | 586 | 65.18% |
| Not at all | 315,940 | 15.27% | 165 | 18.35% |
| Several days | 82,597 | 3.99% | 68 | 7.56% |
| More than half the days | 37,760 | 1.82% | 21 | 2.34% |
| Nearly every day | 41,870 | 2.02% | 59 | 6.56% |
| Self-harm in prior 12 months | 53,727 | 2.60% | 138 | 15.35% |
| MH ED visit in prior 12 months | 850,802 | 41.12% | 515 | 57.29% |
| MH Inpatient stay in prior 12 months | 488,900 | 23.63% | 461 | 51.28% |
| MH Specialty visit in prior 12 months | 683,664 | 33.04% | 585 | 65.07% |
The best-fitting logistic regression with lasso model included 496 predictors and an overall classification performance or AUC in out-of-sample folds of 0.823 (95% CI 0.810 to 0.836). AUC was similar across the 5 out-of-sample folds (Appendix Figure 1). The best-fitting random forest model (including 200 trees with a minimum node size of 100,000) had poorer classification performance, with an out-of-sample AUC of 0.772 (95% CI 0.755 to 0.788). More detailed performance metrics for the best-fitting lasso model are described below. Parallel metrics for the best-fitting random forest model are presented in Appendix Tables 2 and 3.
Table 2 displays classification performance (sensitivity and positive predictive value or PPV) at different cut-points for the best-fitting logistic regression with lasso model at cut-points of the risk score distribution selected a priori. A cut-point at the 95th percentile of the risk distribution (i.e., top 5% of visits) had a sensitivity of 34.8% (95% CI 31.1% to 38.7%) and yielded a sample of visits with a risk of suicide death over the following 90 days of 0.303% (95% CI 0.261% to 0.345%). Table 3 displays calibration performance (predicted and observed risk) for strata of the risk distribution selected a priori. Visits in the top 0.5% of the risk distribution had an observed risk of approximately 1 in 200 or approximately 80 times as high as visits in the bottom 50% (0.567% vs 0.007%). Calibration performance for ten equally-sized decile strata is shown in Appendix Table 4 and Figure 2.
Table 2 –
Classification performance of the best-fitting lasso logistic model at cut-points defined a priori
| Percentile Cut-Point | Sensitivity | PPV |
|---|---|---|
| 50% | 92.3% | 0.080% |
| 75% | 73.7% | 0.128% |
| 90% | 47.2% | 0.205% |
| 95% | 34.8% | 0.303% |
| 99% | 11.6% | 0.503% |
| 99.5% | 6.7% | 0.580% |
Table 3 –
Calibration of the best-fitting lasso logistic model across risk strata selected a priori
| Percentile Stratum | % of Events in this stratum | Predicted Risk | Observed Risk |
|---|---|---|---|
| 0% to 50% | 7.7% | 0.009% | 0.007% |
| 50% to 75% | 19.2% | 0.031% | 0.033% |
| 75% to 90% | 25.8% | 0.068% | 0.075% |
| 90% to 95% | 13.0% | 0.123% | 0.113% |
| 95% to 99% | 22.9% | 0.229% | 0.250% |
| 99% to 99.5% | 4.7% | 0.450% | 0.406% |
| 99.5% to 100% | 6.6% | 0.911% | 0.567% |
The 20 most influential positive and negative predictors in the best-fitting logistic regression with lasso models, determined by the magnitude of the appropriate coefficient, are shown in Table 4. The most influential predictors included many expected predictors of suicidal behavior (prior self-harm or injury and poisoning diagnoses, prior mental health inpatient care, mental health diagnoses), often selected in interaction terms.
Table 4 –
Most influential predictors selected by best-fitting lasso logistic model. Asterisks indicate interaction terms.
| MOST INFLUENTIAL POSITIVE (HIGHER RISK) PREDICTORS | Coefficient | MOST INFLUENTIAL NEGATIVE (LOWER RISK) PREDICTORS | Coefficient |
|---|---|---|---|
| Age | 1.069965 | Dementia Dx at visit | −0.84173 |
| Depression Dx at visit | 0.578624 | Days with ADD Rx in last 2 years * race=black | −0.6589 |
| Insurance not through ACA exchange | 0.447669 | Months with ADD Dx in past 2 years * Hispanic=yes | −0.56789 |
| Any injury/poisoning in last month * male sex | 0.369343 | Months since last month with max number of self-harm Dx | −0.54792 |
| # of PHQ9 Item 9 scores = 0 in last 12 mos * male sex | 0.285185 | Months since 2nd-generation antipsychotic Rx * Hispanic=yes | −0.4357 |
| # mos in last 3 mos with mental health inpatient care | 0.251476 | Number of months in last year with antidepressant Rx | −0.40999 |
| # of last 3 mos with 2nd-gen antipsychotic Rx * Item 9 score = 0 | 0.225704 | No 2nd-generation antipsychotic Rx in past 5 years | −0.4044 |
| Days supply anticonvulsant dispensed in last 3 mos * male sex | 0.225379 | No benzodiazepine Rx in past 5 years | −0.40408 |
| # of last 3 mos with 2nd-gen antipsychotic Rx * Item 9 score missing at visit | 0.211783 | No self-harm Dx in last 5 years | −0.39867 |
| # mos since initial 2nd-gen antipsychotic Rx * male sex | 0.21109 | PHQ8 score never greater than 20 in past 5 years | −0.36773 |
| Most recent month with 2nd-gen antipsychotic Rx * Hispanic=unknown | 0.21049 | Pain Dx at visit | −0.36515 |
| % of outpatient visits in last month with depression Dx | 0.199566 | # Months with autism Dx last 5 yrs * Age 11-17 | −0.36312 |
| Proportion of outpatient visits in last 3 mos with depression Dx | 0.181717 | Days in last 5 yrs with non-laceration self-harm violent injury * age 11-17 | −0.3616 |
| No Dx of bipolar disorder in last 5 yrs * male sex | 0.163581 | No lithium Rx in last 5 yrs | −0.33591 |
| # of Item 9 scores = 3 in last 3 mos * Item 9 missing at visit | 0.154509 | No anticonvulsant Rx in last 5 yrs | −0.33564 |
| # mos in last 24 mos with antidepressant Rx * male sex | 0.149195 | No non-laceration self-harm violent injury in last 5 yrs | −0.33327 |
| # mos in last 3 mos with anxiety Dx * male sex | 0.148294 | # months with 2nd-gen antipsychotic Rx in last 12 mos | −0.31127 |
| Alcohol use disorder Dx at index visit | 0.148202 | % of last 60 mos with high specialty mental health visit rate | −0.2685 |
| # mos in last 3 mos with 1st-gen antipsychotic Rx * male sex | 0.128839 | # mos in last 12 mos with alcohol use disorder Dx * male sex | −0.25132 |
| # of Item 9 scores = 3 in last month * male sex | 0.115935 | Mos since this person’s highest specialty mental health visit rate | −0.23629 |
Additional analyses examined consistency of results across patient subgroups defined by sex, age, race, ethnicity, and type of diagnosis at the index emergency department visit are found in Table 5. Numbers of suicide deaths in some subgroups were too small for stable estimates of model performance. Otherwise, overall classification performance, as measured by AUC, as well as sensitivity and PPV at the 95th percentile were similar across all subgroups examined. Classification performance was also similar across participating health systems and across years in the study period (Appendix Table 5).
Table 5 –
Classification performance of the best-fitting lasso model across patient subgroups
| # of visits | # (%) of visits followed by suicide death | AUC | 95% CI for AUC | Sensitivity at 95th %ile | PPV at 95th %ile | |
|---|---|---|---|---|---|---|
| Total | 2,069,170 | 899 (0.043%) | 0.823 | (0.810, 0.836) | 34.8% | 0.303% |
| Diagnoses at index encounter | ||||||
| Includes injury or poisoning | 380,677 | 210 (0.055%) | 0.840 | (0.814, 0.865) | 34.0% | 0.289% |
| No injury or poisoning | 1,688,493 | 689 (0.041%) | 0.819 | (0.803, 0.836) | 35.0% | 0.306% |
| Recorded Sex | ||||||
| Female | 1,241,034 | 428 (0.035%) | 0.824 | (0.805, 0.843) | 35.0% | 0.311% |
| Male | 828,074 | 348 (0.042%) | 0.821 | (0.796, 0.845) | 34.2% | 0.287% |
| Age | ||||||
| 11 thru 21 | 240,462 | 51 (0.021%) | 0.826 | (0.786, 0.863) | 34.4% | 0.266% |
| 22 thru 35 | 375,615 | 157 (0.042%) | 0.839 | (0.811, 0.864) | 31.6% | 0.330% |
| 36 thru 50 | 412,426 | 211 (0.051%) | 0.805 | (0.775, 0.838) | 27.1% | 0.247% |
| 51 thru 65 | 431,794 | 293 (0.068%) | 0.823 | (0.786, 0.858) | 39.6% | 0.371% |
| 66 and older | 608,873 | 187 (0.031%) | 0.823 | (0.790, 0.854) | 39.5% | 0.286% |
| Race/Ethnicity | ||||||
| Asian | 83,138 | 39 (0.047%) | 0.844 | (0.770, 0.907) | 40.0% | 0.289% |
| Black | 208,644 | 33 (0.016%) | 0.822 | (0.767, 0.873) | 35.9% | 0.316% |
| Native Hawaiian/Pacific Islander | 9,316 | 1 (0.011%) | n/a | n/a | n/a | n/a |
| American Indian/Alaskan Native | 11,801 | 5 (0.042%) | n/a | n/a | n/a | n/a |
| White | 1,427,362 | 683 (0.048%) | 0.819 | (0.801, 0.837) | 34.3% | 0.291% |
| More than one | 76,146 | 21 (0.028%) | 0.815 | (0.754, 0.865) | 26.8% | 0.289% |
| Other | 10,702 | 7 (0.065%) | n/a | n/a | n/a | n/a |
| Unknown | 242,061 | 110 (0.045%) | 0.829 | (0.789, 0.871) | 34.2% | 0.322% |
| Ethnicity | ||||||
| Hispanic | 568,799 | 88 (0.016%) | 0.839 | (0.811, 0.866) | 42.4% | 0.355% |
| Not Hispanic | 1,255,115 | 590 (0.047%) | 0.818 | (0.798, 0.837) | 32.6% | 0.279% |
| Unknown | 245,256 | 221 (0.901%) | 0.809 | (0.768, 0.852) | 27.4% | 0.277% |
| Site | ||||||
| 1 | 123,388 | 90 (0.073%) | 0.772 | (0.728, 0.816) | 29.2% | 0.421% |
| 2 | 178,007 | 142 (0.080%) | 0.766 | (0.716, 0.812) | 20.8% | 0.180% |
| 4 | 1,429,368 | 445 (0.031%) | 0.838 | (0.822, 0.853) | 39.3% | 0.323% |
| 5 | 30,043 | 18 (0.060%) | 0.774 | (0.656, 0.874) | 16.7% | 0.133% |
| 6 | 136,482 | 105 (0.077%) | 0.813 | (0.771, 0.853) | 30.4% | 0.322% |
| 7 | 161,472 | 96 (0.059%) | 0.815 | (0.767, 0.860) | 23.6% | 0.161% |
| 10 | 10,410 | 3 (0.029%) | n/a | n/a | n/a | n/a |
| Year | ||||||
| 2009-2010 | 327,152 | 187 (0.057%) | 0.824 | (0.794, 0.852) | 31.3% | 0.275% |
| 2011-2012 | 407,202 | 184 (0.045%) | 0.796 | (0.762, 0.828) | 33.1% | 0.280% |
| 2013-2014 | 468,128 | 193 (0.041%) | 0.813 | (0.786, 0.838) | 34.8% | 0.333% |
| 2015-2016 | 606,493 | 241 (0.040%) | 0.845 | (0.821, 0.866) | 39.6% | 0.333% |
| 2017 | 260,195 | 94 (0.036%) | 0.837 | (0.801, 0.870) | 30.8% | 0.246% |
n/a - number of events is too small for reliable estimate
DISCUSSION
We find that a prediction model using only discrete or coded data extracted from health system records was able to predict suicide death after an emergency department visit with overall classification performance or AUC of approximately 0.82. Overall classification performance was similar to that of previous efforts to predict suicide death following emergency department visits from records data8, 9, 14. Performance far exceeded that previously reported for screening using the Columbia Suicide Severity Rating Scale, where reported AUCs were 0.667 for prediction of suicide death5 and 0.589 for prediction of suicide attempt6 following emergency department visit.
This encounter-based prediction model is designed to inform interventions that might occur during or soon after an emergency department encounter, such as more intensive assessment, safety planning, or more intensive outpatient follow-up. This encounter-based approach contrasts with population-based approaches, such as the Veterans Affairs REACH-VET model25, intended to inform outreach independent of any encounter.
As shown in Table 1, visits followed by self-harm events more often exhibited expected risk factors for suicidal behavior, including current or prior mental health diagnoses, prior self-reports of suicidal ideation, prior self-harm events, prior use of specialty mental health outpatient care, and prior emergency department or inpatient care with mental health diagnoses. As expected26, none of these individual risk factors were strongly enough associated with subsequent suicide death to guide decisions by emergency department clinicians. Combination of these and other risk indicators into a high-dimensional prediction score yielded much more accurate classification of risk than any individual risk factor or simple combination of risk factors.
Overall classification performance was generally similar across patient subgroups defined by presenting problems (with or without self-harm event), sex, age, race, and ethnicity, with the caveat that numbers of suicide deaths in some racial groups were too small to support stable estimates of performance. As expected, PPV at any risk percentile cut-point was greater in subgroups with higher overall suicide mortality rate. Given that PPV varies with average risk, users of this or any prediction model might choose different cut-points or thresholds for groups with higher and lower overall risk of self-harm. Selection of the optimal threshold for action for any group depends on the relative priority of avoiding false negative errors (sensitivity) and avoiding false positive errors (PPV). The consequences of errors (such as “welfare checks” by armed law enforcement) certainly could vary across racial and ethnic groups.
Overall classification performance as measured by AUC was modestly lower than parallel models developed in these health systems to predict suicide death following mental health specialty visits or general medical visits with mental health diagnoses, for which AUC statistics for prediction of fatal self-harm were 0.861 and 0.833 respectively11. This lower performance of the same method among emergency department visitors may reflect several possible differences between prediction tasks. The number of fatal self-harm events included in this sample was smaller than the numbers of events in earlier samples of outpatient visits. Patients in this sample were less likely to have prior outpatient mental health visits where risk factors for suicide would have been identified and recorded. Suicide following an emergency department visit may simply be less predictable, even if typical risk factors were identified and recorded.
Fortunately, suicide death was infrequent even among those with high predicted risk. While risk among visits above the 99th percentile was 80 times higher for visits below the 50th percentile, the observed rate for visits above the 99th percentile was still only slightly higher than 1 in 200. That absolute risk or positive predictive value is certainly not high enough to determine clinical decisions regarding intensive or aggressive interventions such as immediate hospitalization or involuntary treatment. Statistical predictions could, however, identify patients in need of additional assessment, safety planning, or more active follow-up care27. For comparison, a mortality rate of 1 in 200 is similar to head injury mortality rates among adult28 and pediatric29 emergency department patients identified by widely accepted decision rules recommending additional imaging assessment after minor head injury. National quality metrics have long recommended timely follow-up after an emergency department visit for a mental health condition30, and failure to attend follow-up is associated with higher risk of subsequent suicide death31. Nevertheless , fewer than half of patients in US commercial health plans (and fewer than 40% insured by Medicaid or Medicare) complete a follow-up visit within 7 days30. Prediction models could identify patients for whom more vigorous efforts to arrange and assure follow-up care are needed. We focus on predicting risk over 90 days following an emergency department visit, because that is the period during which interventions such as safety planning and assurance of follow-up care might occur or act to reduce risk.
The relatively rarity of suicide death guarantees than even a highly accurate prediction model would yield a high rate of false positives. Consequently, any implementation should focus on interventions with low potential for harm or infringement of autonomy.
The most influential predictors of suicide death (Table 5) included many expected risk factors, such as prior depression diagnoses, prior injury and poisoning events, and prior responses to item 9 of the PHQ-9 regarding suicidal ideation. Those expected risk factors were sometime selected through interactions with sex, age, or race and ethnicity. We should caution against over-emphasizing or over-interpreting any individual predictors, since models include large numbers of overlapping or correlated predictors. We should especially caution against any causal interpretation of individual risk factors. For example, the finding that absence of any prior lithium use was associated with lower risk likely reflects confounding or selection (people at low risk are less likely to be prescribed lithium) than a causal effect (use of lithium increases risk of suicide death).
Overall classification performance of this and similar prediction models8, 9, 14 appears superior to reported performance of self-report questionnaires for prediction of suicide attempt6, 7 or suicide death5 following emergency department care. In any clinical implementation, however, prediction models would supplement rather than replace or override self-report questionnaires or clinician assessments8, 13. Empirical comparison of screening questionnaire results and statistical models for predicting risk of self-harm finds that the combination of the two methods is superior to either alone7–9.
A model or algorithm including approximately 500 predictors extracted from health records could be challenging to implement in routine care. We have previously reported that, for prediction of suicidal behavior following mental health outpatient visits, simpler models considering only a few hundred potential predictors performed approximately as well as more complex models considering several thousand predictors32. Simpler models might perform as well in the emergency department setting, but that question must be addressed empirically.
This prediction model does not consider other sources or types of data that might contribute to prediction of suicide death. Text of clinical notes, during or before the index encounter, might include additional information regarding suicidal ideation, history of suicidal behavior, early life experiences, or recent negative events – all of which might improve prediction of suicidal behavior33, 34. Medical records would not capture information regarding risk factors not reported to treating clinicians or events occurring between the emergency department encounter and a subsequent self-harm event.
The performance or potential utility of this prediction model depends on the availability of comprehensive records data for people seen in emergency departments. Records data in these health systems capture both care provided in health system facilities (via EHR databases) and care provided at outside facilities (via insurance claims databases). We expect that prediction models would be less accurate in settings where complete records of care prior to an emergency department encounter are not available. Reeves and colleagues15 reported somewhat poorer overall performance of a prediction model using data from a single emergency department visit.
We cannot speculate how the performance of this specific prediction model would generalize to other health systems, other patient populations, or later time periods. While prediction performance was generally similar across various subgroups in this sample of visits, prediction performance in other settings could differ due to differences in patients served, differences in patterns of health service use, or differences in health records systems. Models developed in these health systems to predict suicide attempts after outpatient visits have performed similarly in other settings35, 36 and in subsequent time periods37, but similar testing for generalizability or portability would be necessary for models predicting suicide mortality after emergency department visits. Approximately 0.1% of emergency department encounters were excluded due to death by cause other than suicide withing 90 days, and our findings may not apply to that group.
CONCLUSIONS
Among visits to emergency departments with mental health, substance use disorder, or self-harm diagnoses, discrete data elements extracted from health system records can accurately classify risk of suicide death over the following 90 days. While these predictions are not accurate enough to direct control decisions regarding need for hospitalization or involuntary treatment, they can assist clinicians in identifying patients needing higher levels of follow-up care.
Supplementary Material
Acknowledgments
Supported by NIMH cooperative agreement U19 MH121738
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CREDIT Author Statement
Gregory E Simon – Conceptualization, Funding acquisition, Methodology, Supervision, Writing – original draft
Eric Johnson - Data curation, Formal analysis, Methodology, Writing – review and editing
Susan M Shortreed - Conceptualization, Formal analysis, Methodology, Supervision, Writing – review and editing
Rebecca A Ziebell – Data curation, Methodology, Writing – review and editing
Rebecca C Rossom - Methodology, Supervision, Writing – review and editing
Brian K Ahmedani - Methodology, Supervision, Writing – review and editing
Karen J Coleman - Methodology, Supervision, Writing – review and editing
Arne Beck - Methodology, Supervision, Writing – review and editing
Frances L Lynch - Methodology, Supervision, Writing – review and editing
Yihe G Daida - Methodology, Supervision, Writing – review and editing
The authors have no relevant financial interests or other competing interests to disclose
REFERENCES
- 1.Goldman-Mellor S, Olfson M, Lidon-Moyano C, Schoenbaum M. Association of Suicide and Other Mortality With Emergency Department Presentation. JAMA Netw Open. 2019;2(12):e1917571. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Olfson M, Gao YN, Xie M, Wiesel Cullen S, Marcus SC. Suicide Risk Among Adults With Mental Health Emergency Department Visits With and Without Suicidal Symptoms. J Clin Psychiatry. 2021;82(6). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ahmedani BK, Simon GE, Stewart C, Beck A, Waitzfelder BE, Rossom R, Lynch F, Owen-Smith A, Hunkeler EM, Whiteside U, Operskalski BH, Coffey MJ, Solberg LI. Health care contacts in the year before suicide death. J Gen Intern Med. 2014;29(6):870–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Patient Safety Advisory Group. Detecting and treating suicidal ideation in all settings. The Joint Commission Sentinel Event Alerts. 2016;56. [PubMed] [Google Scholar]
- 5.Simpson SA, Goans C, Loh R, Ryall K, Middleton MCA, Dalton A. Suicidal ideation is insensitive to suicide risk after emergency department discharge: Performance characteristics of the Columbia-Suicide Severity Rating Scale Screener. Acad Emerg Med. 2021;28(6):621–9. [DOI] [PubMed] [Google Scholar]
- 6.Brown LA, Boudreaux ED, Arias SA, Miller IW, May AM, Camargo CA Jr., Bryan CJ, Armey MF. C-SSRS performance in emergency department patients at high risk for suicide. Suicide Life Threat Behav. 2020;50(6):1097–104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Wilimitis D, Turer RW, Ripperger M, McCoy AB, Sperry SH, Fielstein EM, Kurz T, Walsh CG. Integration of Face-to-Face Screening With Real-time Machine Learning to Predict Risk of Suicide Among Adults. JAMA Netw Open. 2022;5(5):e2212095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nock MK, Millner AJ, Ross EL, Kennedy CJ, Al-Suwaidi M, Barak-Corren Y, Castro VM, Castro-Ramirez F, Lauricella T, Murman N, Petukhova M, Bird SA, Reis B, Smoller JW, Kessler RC. Prediction of Suicide Attempts Using Clinician Assessment, Patient Self-report, and Electronic Health Records. JAMA Netw Open. 2022;5(1):e2144373. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Haroz EE, Kitchen C, Nestadt PS, Wilcox HC, DeVylder JE, Kharrazi H. Comparing the predictive value of screening to the use of electronic health record data for detecting future suicidal thoughts and behavior in an urban pediatric emergency department: A preliminary analysis. Suicide Life Threat Behav. 2021;51(6):1189–202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Boudreaux ED, Rundensteiner E, Liu F, Wang B, Larkin C, Agu E, Ghosh S, Semeter J, Simon G, Davis-Martin RE. Applying Machine Learning Approaches to Suicide Prediction Using Healthcare Data: Overview and Future Directions. Front Psychiatry. 2021;12:707916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Simon GE, Johnson E, Lawrence JM, Rossom RC, Ahmedani B, Lynch FL, Beck A, Waitzfelder B, Ziebell R, Penfold RB, Shortreed SM. Predicting Suicide Attempts and Suicide Deaths Following Outpatient Visits Using Electronic Health Records. Am J Psychiatry. 2018:appiajp201817101167. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Kessler RC, Bossarte RM, Luedtke A, Zaslavsky AM, Zubizarreta JR. Suicide prediction models: a critical review of recent research with recommendations for the way forward. Mol Psychiatry. 2020;25(1):168–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Simon GE, Matarazzo BB, Walsh CG, Smoller JW, Boudreaux ED, Yarborough BJH, Shortreed SM, Coley RY, Ahmedani BK, Doshi RP, Harris LI, Schoenbaum M. Reconciling Statistical and Clinicians’ Predictions of Suicide Risk. Psychiatr Serv. 2021;72(5):555–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Sanderson M, Bulloch AG, Wang J, Williams KG, Williamson T, Patten SB. Predicting death by suicide following an emergency department visit for parasuicide with administrative health care system data and machine learning. EClinicalMedicine. 2020;20:100281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Reeves M, Bhat HS, Goldman-Mellor S. Resampling to address inequities in predictive modeling of suicide deaths. BMJ Health Care Inform. 2022;29(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Coleman KJ, Stewart C, Waitzfelder BE, Zeber JE, Morales LS, Ahmed AT, Ahmedani BK, Beck A, Copeland LA, Cummings JR, Hunkeler EM, Lindberg NM, Lynch F, Lu CY, Owen-Smith AA, Trinacty CM, Whitebird RR, Simon GE. Racial-Ethnic Differences in Psychiatric Diagnoses and Treatment Across 11 Health Care Systems in the Mental Health Research Network. Psychiatr Serv. 2016;67(7):749–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ross TR, Ng D, Brown JS, Pardee R, Hornbrook MC, Hart G, Steiner JF. The HMO Research Network Virtual Data Warehouse: A Public Data Model to Support Collaboration. eGEMs. 2014;2(1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Tibshirani R Regression shrinkage and selection via the lasso. J Royal Stat Soc (B). 1996;58:267–88. [Google Scholar]
- 19.R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2013. [Google Scholar]
- 20.Chollet F Keras. 2015. [updated 2015; cited]; Available from: https://github.com/fchollet/keras.
- 21.Efron B, Tibshirani R. An introduction to the bootstrap. New York: Chapman & Hall; 1994. [Google Scholar]
- 22.Wright MN, Ziegler A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv. 2015;1508:04409. [Google Scholar]
- 23.Smith GC, Seaman SR, Wood AM, Royston P, White IR. Correcting for optimistic prediction in small data sets. Am J Epidemiol. 2014;180(3):318–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Coley RY, Liao Q, Simon N, Shortreed SM. Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction. BMC Med Res Methodol. 2023;23(1):33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McCarthy JF, Cooper SA, Dent KR, Eagan AE, Matarazzo BB, Hannemann CM, Reger MA, Landes SJ, Trafton JA, Schoenbaum M, Katz IR. Evaluation of the Recovery Engagement and Coordination for Health-Veterans Enhanced Treatment Suicide Risk Modeling Clinical Program in the Veterans Health Administration. JAMA Netw Open. 2021;4(10):e2129900. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, Musacchio KM, Jaroszewski AC, Chang BP, Nock MK. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187–232. [DOI] [PubMed] [Google Scholar]
- 27.Miller IW, Camargo CA Jr., Arias SA, Sullivan AF, Allen MH, Goldstein AB, Manton AP, Espinola JA, Jones R, Hasegawa K, Boudreaux ED, Investigators E-S. Suicide Prevention in an Emergency Department Population: The ED-SAFE Study. JAMA Psychiatry. 2017;74(6):563–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Stiell IG, Wells GA, Vandemheen K, Clement C, Lesiuk H, Laupacis A, McKnight RD, Verbeek R, Brison R, Cass D, Eisenhauer ME, Greenberg G, Worthington J. The Canadian CT Head Rule for patients with minor head injury. Lancet. 2001;357(9266):1391–6. [DOI] [PubMed] [Google Scholar]
- 29.Kuppermann N, Holmes JF, Dayan PS, Hoyle JD Jr., Atabaki SM, Holubkov R, Nadel FM, Monroe D, Stanley RM, Borgialli DA, Badawy MK, Schunk JE, Quayle KS, Mahajan P, Lichenstein R, Lillis KA, Tunik MG, Jacobs ES, Callahan JM, Gorelick MH, Glass TF, Lee LK, Bachman MC, Cooper A, Powell EC, Gerardi MJ, Melville KA, Muizelaar JP, Wisner DH, Zuspan SJ, Dean JM, Wootton-Gorges SL, Pediatric Emergency Care Applied Research N. Identification of children at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study. Lancet. 2009;374(9696):1160–70. [DOI] [PubMed] [Google Scholar]
- 30.National Committee for Quality Assurance. Follow-Up After Emergency Department Visit for Mental Illness (FUM). Washington, DC: National Committee for Quality Assurance,; 2022. [updated 2022; cited 2022 July 8]; Available from: https://www.ncqa.org/hedis/measures/follow-up-after-emergency-department-visit-for-mental-illness/. [Google Scholar]
- 31.Qin P, Stanley B, Melle I, Mehlum L. Association of Psychiatric Services Referral and Attendance Following Treatment for Deliberate Self-harm With Prospective Mortality in Norwegian Patients. JAMA Psychiatry. 2022;79(7):651–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Shortreed SM, Walker RL, Johnson E, Wellman R, Cruz M, Ziebell R, Coley RY, Yaseen ZS, Dharmarajan S, Penfold RB, Ahmedani BK, Rossom RC, Beck A, Boggs JM, Simon GE. Electronic health record-based suicide risk prediction: Incorporating detiled temporal predictors and complex modeling strategies does not substantially improve performance. Nature Digital Medicine. 2023. (in press). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kessler RC, Bauer MS, Bishop TM, Bossarte RM, Castro VM, Demler OV, Gildea SM, Goulet JL, King AJ, Kennedy CJ, Landes SJ, Liu H, Luedtke A, Mair P, Marx BP, Nock MK, Petukhova MV, Pigeon WR, Sampson NA, Smoller JW, Miller A, Haas G, Benware J, Bradley J, Owen RR, House S, Urosevic S, Weinstock LM. Evaluation of a Model to Target High-risk Psychiatric Inpatients for an Intensive Postdischarge Suicide Prevention Intervention. JAMA Psychiatry. 2023;80(3):230–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Llamocca EN, Yeh HH, Miller-Matero LR, Westphal J, Frank CB, Simon GE, Owen-Smith AA, Rossom RC, Lynch FL, Beck A, Waring SC, Lu CY, Daida YG, Fontanella CA, Ahmedani BK. Association between adverse social determinants of health and suicide death. Med Care. 2023;(in press). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shaw JL, Beans JA, Noonan C, Smith JJ, Mosley M, Lillie KM, Avey JP, Ziebell R, Simon G. Validating a predictive algorithm for suicide risk with Alaska Native populations. Suicide Life Threat Behav. 2022;52(4):696–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Kline-Simon AH, Sterling S, Young-Wolff K, Simon G, Lu Y, Does M, Liu V. Estimates of Workload Associated With Suicide Risk Alerts After Implementation of Risk-Prediction Model. JAMA Netw Open. 2020;3(10):e2021189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Simon GE, Cruz M, Shortreed SM, Sterling SA, Coleman KJ, Ahmedani BK, Yaseen ZS, Mosholder AD. Stability of Suicide Risk Prediction Models During Changes in Health Care Delivery. Psychiatr Serv. 2023:0. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
