Skip to main content
JAMA Network logoLink to JAMA Network
. 2024 Mar 27;81(7):700–707. doi: 10.1001/jamapsychiatry.2024.0189

Validation of a Multivariable Model to Predict Suicide Attempt in a Mental Health Intake Sample

Santiago Papini 1,2,, Honor Hsin 3, Patricia Kipnis 1, Vincent X Liu 1, Yun Lu 1, Kristine Girard 3, Stacy A Sterling 1, Esti M Iturralde 1
PMCID: PMC10974695  PMID: 38536187

This prognostic study examines whether a model predicting suicide attempts can accurately stratify suicide risk among individuals scheduled for an intake visit to outpatient mental health care.

Key Points

Question

Can a model predicting suicide attempts accurately stratify suicide risk among individuals scheduled for an intake visit to outpatient mental health care?

Findings

In this prognostic study testing a previously validated model of suicide attempts using a sample of 1 623 232 mental health intake appointments scheduled during the past decade, the model showed good overall classification performance. The 10% of appointments at the highest risk level accounted for 48.8% of the appointments followed by a suicide attempt within 90 days.

Meaning

These findings suggest that risk for suicidal behavior may be accurately stratified for mental health care intake appointments to facilitate targeted preventive interventions for individuals who are seeking to initiate an episode of care.

Abstract

Importance

Given that suicide rates have been increasing over the past decade and the demand for mental health care is at an all-time high, targeted prevention efforts are needed to identify individuals seeking to initiate mental health outpatient services who are at high risk for suicide. Suicide prediction models have been developed using outpatient mental health encounters, but their performance among intake appointments has not been directly examined.

Objective

To assess the performance of a predictive model of suicide attempts among individuals seeking to initiate an episode of outpatient mental health care.

Design, Setting, and Participants

This prognostic study tested the performance of a previously developed machine learning model designed to predict suicide attempts within 90 days of any mental health outpatient visit. All mental health intake appointments scheduled between January 1, 2012, and April 1, 2022, at Kaiser Permanente Northern California, a large integrated health care delivery system serving over 4.5 million patients, were included. Data were extracted and analyzed from August 9, 2022, to July 31, 2023.

Main Outcome and Measures

Suicide attempts (including completed suicides) within 90 days of the appointment, determined by diagnostic codes and government databases. All predictors were extracted from electronic health records.

Results

The study included 1 623 232 scheduled appointments from 835 616 unique patients. There were 2800 scheduled appointments (0.17%) followed by a suicide attempt within 90 days. The mean (SD) age across appointments was 39.7 (15.8) years, and most appointments were for women (1 103 184 [68.0%]). The model had an area under the receiver operating characteristic curve of 0.77 (95% CI, 0.76-0.78), an area under the precision-recall curve of 0.02 (95% CI, 0.02-0.02), an expected calibration error of 0.0012 (95% CI, 0.0011-0.0013), and sensitivities of 37.2% (95% CI, 35.5%-38.9%) and 18.8% (95% CI, 17.3%-20.2%) at specificities of 95% and 99%, respectively. The 10% of appointments at the highest risk level accounted for 48.8% (95% CI, 47.0%-50.6%) of the appointments followed by a suicide attempt.

Conclusions and Relevance

In this prognostic study involving mental health intakes, a previously developed machine learning model of suicide attempts showed good overall classification performance. Implementation research is needed to determine appropriate thresholds and interventions for applying the model in an intake setting to target high-risk cases in a manner that is acceptable to patients and clinicians.

Introduction

The number of deaths by suicide in the US has increased dramatically over the past 2 decades, with a provisional estimate of 49 449 in 2022.1 This trend has coincided with rising demand for mental health services across the country.2 Health care systems that contend with these public health burdens, therefore, must ensure that patients at high risk presenting to mental health care services are adequately assessed and evaluated by clinicians as early in their care journey as possible.

Machine learning (ML) models developed to predict suicide attempts from electronic health record (EHR) data have shown acceptable to good accuracy using a wide range of algorithms, predictors, and outcome windows.3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 Although the practical utility of suicide prediction models has been debated,20,21,22,23,24 recent studies suggest that several models achieve good enough levels of accuracy at thresholds that take clinician and health care system burden into account.25,26 Several health care delivery systems have already begun implementing suicide prediction algorithms into clinical practice, providing augmented risk information during mental health visits to enhance treatment planning and collaboration between patients and clinicians.27,28

Suicide risk prediction models also provide an opportunity for health systems to approach population health management at key points of care. Because over 70% of treatment dropout occurs after the first or second mental health visit,29 accurate identification of high-risk status among patients early in an episode of care is crucial. Suicide risk prediction results, in combination with clinician assessment, can provide an additional level of vigilance during the process of initiating care to ensure that patients at high risk are adequately evaluated and connected to appropriate treatment pathways. Although prior models have been developed and validated in large samples of mental health encounters that included intake, treatment, and follow-up appointments, to our knowledge, the performance of predictive models for suicide prevention has not yet been examined for intake appointments specifically. Compared with the general mental health treatment population, an intake population is likely to differ in key ways that may impact the accuracy of previously validated suicide risk prediction models, specifically lower suicide-related mortality and less psychiatric history documented in health records. Therefore, validation of suicide risk prediction with this new use case is needed. To address this gap, we tested an ML model developed to predict suicide attempts after specialty mental health care visits in a sample of mental health intakes. We also assessed model performance across self-reported race, ethnicity, and sex given that underestimating or overestimating risk for some groups could potentially lead to reduced access to preventive care or unfair burden, respectively.

Methods

In this prognostic study, we followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guideline. Study procedures were approved by the institutional review board of Kaiser Permanente Northern California, which waived the need for informed consent. Informed consent was waived because this was a data-only study with no participant contact.

Study Sample

The sample included all mental health intake appointments scheduled between January 1, 2012, and April 1, 2022, across the Kaiser Permanente Northern California (KPNC) health care system. A mental health visit is coded as an intake when a patient initiates a new episode of mental health care, which includes first-time mental health visits, initial visits to a new KPNC clinic after a move, or first visit after a gap of 2 or more years.

Data Sources

The original model was a logistic regression with penalized least absolute shrinkage and selection operator variable selection developed in a sample of 10 275 853 specialty mental health outpatient visits from 2009 to 2015 in 7 health systems7 and adapted to a cohort of 1 408 682 mental health outpatient visits at KPNC.15,25 Data for all predictors were extracted from the EHR and included sociodemographic characteristics, mental health diagnoses, psychiatric medications, medical comorbidities, use of mental health services, and self-reported suicidal ideation. The eTable in Supplement 1 contains descriptive statistics and coefficients for the full list of predictors included in the model.

Outcome

The outcome was defined as suicide attempt at any point 1 to 90 days after the date of the scheduled intake appointment. To be consistent with the original model, suicide attempts also included completed suicides. Suicide attempts were determined using diagnostic codes in the EHR, and completed suicides were determined using California Department of Public Health Vital Records, Social Security Administration records, and the National Death Index.

Statistical Analysis

Data were extracted and analyzed from August 9, 2022, to July 31, 2023. For each scheduled appointment, a predicted probability of suicide attempt was calculated by summing the product of model coefficients and corresponding patient characteristics and applying a logit transformation. These model-predicted probabilities were compared with actual outcomes using several performance metrics and related plots. We examined model discrimination with an area under the receiver operating characteristic curve (AUROC), which summarizes the trade-off between sensitivity and specificity across the full range of thresholds. Given the low prevalence of the outcome, we also examined the area under the precision-recall curve. To summarize calibration, we plotted a logistic calibration curve and calculated the expected calibration error, which measures the degree of correspondence between model-predicted probabilities and observed outcomes.30

Several additional metrics were calculated to examine performance at specific thresholds. Sensitivity, specificity, and positive predictive value were estimated at each decile of predicted risk. Given that prior work suggests that suicide prediction models need low false-alarm rates to not overwhelm the capacities of clinicians25 and health care systems,26 sensitivity was estimated at thresholds that yielded 95% or 99% specificity.

Finally, we examined whether there was evidence of differential performance across sociodemographic categories by examining model classification performance across self-reported race (using US Census categories; American Indian or Alaska Native, Asian, Black or African American, Native Hawaiian or Other Pacific Islander, White, multiracial, other [individuals who self-identified as other], or unknown), ethnicity (Hispanic or Latino or not Hispanic or Latino), and sex. Research on algorithmic fairness is rapidly developing, but there is currently no consensus on the optimal criteria, particularly in situations in which the prevalence of the target outcome varies across groups.31 We examined ROC curves, which can reveal threshold regions where differences in sensitivity and specificity may exist across subgroups. To complement these analyses, we also reestimated the remaining performance metrics across the subgroups of interest. All statistics and 95% CIs were estimated using bootstrapping (1000 replications). Analyses were conducted using R, version 4.1.1 (R Core Team).

Results

Across the 1 623 232 scheduled mental health intake appointments among 835 616 unique patients, the mean (SD) age was 39.7 (15.8) years. Most appointments were for women (1 103 184 [68.0%]); 520 048 (32.0%) were for men. Self-reported ethnicity was Hispanic or Latino for 305 756 appointments (18.8%) and not Hispanic or Latino for 1 317 476 (81.2%). Self-reported race was American Indian or Alaska Native for 8388 appointments (0.5%), Asian for 180 995 (11.2%), Black or African American for 162 171 (10.0%), Native Hawaiian or Other Pacific Islander for 10 676 (0.7%), White for 860 993 (53.0%), other for 48 218 (3.0%), multiracial for 31 916 (2.0%), and unknown for 319 875 (19.7%). These appointment-level patient summary statistics were not meaningfully different from those for the sample of unique patients (Table 1).

Table 1. Characteristics of the Sample at the Appointment Level and Patient Level Based on the First Appointment.

Characteristic Appointments (N = 1 623 232) Unique patients (n = 835 616)
Age, mean (SD), y 39.7 (15.8) 39.5 (15.9)
Sex, No. (%)
Female 1 103 184 (68.0) 545 029 (65.2)
Male 520 048 (32.0) 290 587 (34.8)
Ethnicity, No. (%)
Hispanic or Latino 305 756 (18.8) 154 598 (18.5)
Not Hispanic or Latino 1 317 476 (81.2) 681 018 (81.5)
Race, No. (%)
American Indian or Alaska Native 8388 (0.5) 4237 (0.5)
Asian 180 995 (11.2) 97 942 (11.7)
Black or African American 162 171 (10.0) 74 079 (8.9)
Native Hawaiian or Other Pacific Islander 10 676 (0.7) 5546 (0.7)
White 860 993 (53.0) 447 145 (53.5)
Multiracial 31 916 (2.0) 15 100 (1.8)
Othera 48 218 (3.0) 24 961 (3.0)
Unknown 319 875 (19.7) 166 606 (19.9)
Medicare coverage, No. (%) 196 979 (12.1) 93 556 (11.2)
a

Included participants who self-reported their race as other.

There were 2800 scheduled appointments (0.17%) that were followed by a suicide attempt within 90 days. At the patient level, 2023 unique patients (0.24%) had a suicide attempt, and 78 of these attempts (3.9%) were fatal. Most of the appointments were completed (1 077 435 [66.4%]); however, 328 671 (20.2%) were canceled, and 217 126 (13.4%) resulted in a no-show. Sensitivity analyses suggested that there were no meaningful differences in AUROC between appointments that were and were not completed (eFigure in Supplement 1), and the predicted probabilities largely overlapped (completed: median, 0.0019 [IQR, 0.0012-0.0029]; not completed: median, 0.0021 [IQR, 0.0013-0.0035]); therefore, the remainder of analyses were conducted on the complete sample of appointments.

In the evaluation of performance, the model AUROC was 0.77 (95% CI, 0.76-0.78), the area under the precision-recall curve was 0.02 (95% CI, 0.02-0.02), and the expected calibration error was 0.0012 (95% CI, 0.0011-0.0013) (Figure 1). At the highest decile of predicted risk, the model identified nearly half of the appointments that were followed by a suicide attempt (sensitivity, 48.8%; 95% CI, 47.0%-50.6%), with a relatively low false-positive rate (specificity, 90.1%; 95% CI, 90.1%-90.1%) and a positive predictive value of 0.8% (95% CI, 0.8%-0.9%) (Table 2). At the threshold that achieved 95% specificity (0.0075), the model achieved 37.2% sensitivity (95% CI, 35.5%-38.9%); at the threshold that achieved 99% specificity (0.0186), the model achieved 18.8% sensitivity (95% CI, 17.3%-20.2%).

Figure 1. Model Performance.

Figure 1.

ROC indicates receiver operating characteristic.

Table 2. Threshold-Dependent Metrics at Deciles of Predicted Risk.

Decile Threshold value % (95% CI)
Cumulative sensitivity Cumulative specificity Cumulative PPV
1 0.0052 48.8 (47.0-50.6) 90.1 (90.1-90.1) 0.8 (0.8-0.9)
2 0.0035 62.7 (60.9-64.7) 80.1 (80.0-80.1) 0.5 (0.5-0.6)
3 0.0028 70.9 (69.2-72.5) 70.1 (70.0-70.1) 0.4 (0.4-0.4)
4 0.0023 77.4 (75.9-79.0) 60.0 (59.9-60.1) 0.3 (0.3-0.4)
5 0.0019 82.8 (81.3-84.2) 50.0 (50.0-50.1) 0.3 (0.3-0.3)
6 0.0016 87.2 (85.9-88.4) 40.0 (40.0-40.1) 0.2 (0.2-0.3)
7 0.0013 90.3 (89.1-91.3) 30.0 (30.0-30.0) 0.2 (0.2-0.2)
8 0.0011 93.6 (92.7-94.6) 20.0 (20.0-20.0) 0.2 (0.2-0.2)
9 0.0008 97.4 (96.8-97.9) 10.0 (10.0-10.0) 0.2 (0.2-0.2)
10 0.0001 100 (100-100) 0 (0-0) 0.2 (0.2-0.2)

Abbreviation: PPV, positive predictive value.

The proportion of appointments followed by a suicide attempt within 90 days was examined across race, ethnicity, and sex. Across racial subgroups, the proportion was lowest among appointments for patients who identified as other race (0.11%; 95% CI, 0.08%-0.14%) and highest among patients who identified as American Indian or Alaska Native (0.27%; 95% CI, 0.17%-0.39%). For ethnicity, the proportion was lower for patients who identified as Hispanic or Latino compared with those who did not (0.15% [95% CI, 0.14%-0.16%] vs 0.18% [95% CI, 0.17%-0.19%]). Finally, for sex, the proportion was higher for males compared with females (0.19% [95% CI, 0.17%-0.20%] vs 0.17% [95% CI, 0.16%-0.17%]). There was variability across ROC curves for race (Figure 2). Table 3 summarizes model performance metrics across sociodemographic subsamples. AUROCs were in the range of 0.69 (95% CI, 0.60-0.77) for other race to 0.89 (95% CI, 0.81-0.95) for Native Hawaiian or Other Pacific Islander. ROC curves (and corresponding AUROC) mostly overlapped across ethnicity, but across sex, the AUROC was higher for appointments with women than with men (0.79 [95% CI, 0.78-0.80] vs 0.74 [95% CI, 0.72-0.76]). At the threshold that achieved a global specificity of 95%, sensitivity across racial categories ranged from 24.3 (95% CI, 19.8-29.1) for Asian to 66.3% (95% CI, 42.9%-87.5%) for Native Hawaiian or Other Pacific Islander; for ethnicity categories, sensitivity was 36.2% (95% CI, 32.1%-40.5%) for Hispanic or Latino and 37.4% (95% CI, 35.6%-39.2%) for not Hispanic or Latino; and for sex, it was 32.7% (95% CI, 29.8%-35.6%) for male and 39.6% (95% CI, 37.4%-41.8%) for female. At the threshold that achieved a global specificity of 99%, sensitivity across racial categories ranged from 5.4% (95% CI, 0%-12.7%) for other race to 27.2% (95% CI, 7.1%-50.0%) for Native Hawaiian or Other Pacific Islander; for ethnicity categories, sensitivity was 16.8% (95% CI, 13.5%-20.3%) for Hispanic or Latino and 19.2% (95% CI, 17.6%-20.8%) for not Hispanic or Latino; and for sex, it was 16.0% (95% CI, 13.6%-18.3%) for male and 20.2% (95% CI, 18.5%-22.1%) for female.

Figure 2. Receiver Operating Characteristic Curves Across Sociodemographic Subgroups.

Figure 2.

Table 3. Performance Metrics Based on the Full Sample and Sociodemographic Subsamples.

Sample Appointments followed by suicide attempt, % (95% CI) AUROC (95% CI) PRAUC (95% CI) Expected calibration error (95% CI) 95% Specificity thresholda 99% Specificity thresholdb
Sensitivity, % (95% CI) Specificity, % (95% CI) Sensitivity, % (95% CI) Specificity, % (95% CI)
Full sample 0.17 (0.17-0.18) 0.77 (0.76-0.78) 0.02 (0.02-0.02) 0.0012 (0.0011-0.0013) 37.2 (35.5-38.9) 95.0 (95.0-95.0) 18.8 (17.3-20.2) 99.0 (99.0-99.0)
Race
American Indian or Alaska Native 0.27 (0.17-0.39) 0.71 (0.57-0.84) 0.13 (0.04-0.26) 0.0018 (0.0012-0.0026) 39.6 (22.2-61.9) 94.3 (93.8-94.8) 26.3 (9.7-47.4) 99.2 (99.0-99.4)
Asian 0.14 (0.13-0.16) 0.75 (0.72-0.78) 0.01 (0.01-0.02) 0.0013 (0.0011-0.0015) 24.3 (19.8-29.1) 96.1 (96.0-96.2) 8.1 (5.2-11.5) 99.4 (99.4-99.4)
Black 0.16 (0.14-0.18) 0.81 (0.78-0.84) 0.02 (0.02-0.04) 0.0011 (0.0009-0.0013) 37.6 (31.7-43.4) 95.6 (95.5-95.7) 15.9 (11.9-20.4) 99.2 (99.1-99.2)
Native Hawaiian or Other Pacific Islander 0.17 (0.10-0.25) 0.89 (0.81-0.95) 0.09 (0.04-0.16) 0.0017 (0.0012-0.0023) 66.3 (42.9-87.5) 94.8 (94.4-95.2) 27.2 (7.1-50.0) 99.2 (99.0-99.4)
White 0.19 (0.18-0.20) 0.78 (0.77-0.79) 0.02 (0.02-0.03) 0.0011 (0.0010-0.0012) 40.3 (38.0-42.5) 94.4 (94.4-94.5) 22.0 (20.1-24.1) 98.8 (98.8-98.8)
Multiracial 0.27 (0.21-0.32) 0.79 (0.74-0.84) 0.06 (0.03-0.12) 0.0011 (0.0007-0.0015) 43.5 (33.0-54.1) 92.8 (92.5-93.1) 22.4 (13.7-31.6) 98.6 (98.5-98.7)
Other 0.11 (0.08-0.14) 0.69 (0.60-0.77) 0.02 (0.01-0.03) 0.0015 (0.0012-0.0018) 29.1 (17.0-41.7) 96.3 (96.2-96.5) 5.4 (0-12.7) 99.5 (99.4-99.5)
Unknown 0.14 (0.12-0.15) 0.74 (0.71-0.77) 0.02 (0.01-0.02) 0.0015 (0.0014-0.0016) 31.6 (27.5-35.9) 95.7 (95.6-95.8) 14.6 (11.3-18.0) 99.2 (99.2-99.3)
Ethnicity
Hispanic or Latino 0.15 (0.14-0.16) 0.77 (0.74-0.79) 0.02 (0.01-0.03) 0.0015 (0.0013-0.0016) 36.2 (32.1-40.5) 95.1 (95.0-95.2) 16.8 (13.5-20.3) 99.1 (99.1-99.1)
Not Hispanic or Latino 0.18 (0.17-0.19) 0.77 (0.76-0.78) 0.02 (0.02-0.02) 0.0012 (0.0011-0.0012) 37.4 (35.6-39.2) 95.0 (95.0-95.0) 19.2 (17.6-20.8) 99.0 (99.0-99.0)
Sex
Female 0.17 (0.16-0.17) 0.79 (0.78-0.80) 0.02 (0.02-0.03) 0.0013 (0.0012-0.0014) 39.6 (37.4-41.8) 95.0 (95.0-95.0) 20.2 (18.5-22.1) 99.0 (99.0-99.0)
Male 0.19 (0.17-0.20) 0.74 (0.72-0.76) 0.02 (0.01-0.02) 0.0011 (0.0010-0.0012) 32.7 (29.8-35.6) 95.0 (95.0-95.1) 16.0 (13.6-18.3) 99.0 (98.9-99.0)

Abbreviations: AUROC, area under the receiver operating characteristic curve; PRAUC, area under the precision-recall curve.

a

The threshold to achieve a global specificity of 95% was 0.0075.

b

The threshold to achieve a global specificity of 99% was 0.0186.

Discussion

Our objective was to determine whether a suicide prediction model that was developed using a sample of mental health outpatient visits could be applied to mental health intakes. Suicide risk prediction in an intake sample could be challenging due to a lack of documented mental health treatment history and potentially lower psychiatric severity. In our sample, we found that most appointments were not preceded by prior-year mental health care use and that the proportion of appointments followed by a suicide attempt was around one-quarter the rate of that in general mental health outpatient samples used to validate the same15 or similar3 models (0.17% vs 0.65% of appointments with a suicide attempt within 90 days). Nevertheless, in a large sample of intake appointments spanning 1 decade, we found that model discrimination was within an acceptable range (AUROC, 0.77), which is lower than a previous model validation conducted on a sample of all mental health encounters in the same health care system25 and a similar model validation conducted on mental health specialty visits and primary care visits with mental health diagnoses in a consortium of health care systems (all AUROCs, 0.85).7 Model performance was slightly higher than what has been observed in models that were developed and evaluated in samples of individuals with 3 or more outpatient or inpatient visits (AUROC, 0.75-0.76)6 and higher than in EHR models developed using psychiatric inpatient visits to predict suicide attempts at 1 or 6 months (AUROC, 0.71 and 0.65, respectively).16 Similarly, sensitivity in the highest decile of risk was 48.8% in this sample, which is lower than what was found in mental health outpatient or primary care visits with mental health appointments (58.3%-61.0%)7 but slightly higher than in an outpatient or inpatient sample (44%-46%)6 and more than double the sensitivity found in a psychiatric inpatient sample with a 1-month outcome window (21.6%).16

Given that this study is the first, to our knowledge, to focus on an intake sample, more research is needed to determine whether a model with this level of performance can satisfy other criteria necessary for implementation. The model’s sensitivity was examined at high levels of specificity selected to limit the consequences for the workload of clinicians. However, the maximum frequency of alerts that generates an acceptable increase in workload depends on the intervention strategy and the degree of overlap with existing alerts25; these and other site-specific factors need to be considered in the selection of alert thresholds. For example, in an intake setting where there is already an intervention strategy for reported suicidal ideation on the Patient Health Questionnaire–9, alerts generated by the prediction model may add unique burdens when the model flags a patient who did not report suicidal ideation or did not complete the Patient Health Questionnaire–9. In addition to clinician burden, health care system burden must be taken into account. While the current study’s performance fell within the cost-effective range for low-burden intervention estimated in a simulation study that selected parameters based on primary care populations,26 an accurate assessment of the cost-effectiveness of incorporating prediction models to intake clinic workflows requires site-specific information about the population, insurance coverage, and cost of the intervention strategy, which are likely to diverge from what is observed in primary care.

Our use of a large, diverse sample facilitated comparisons of model performance across race, ethnicity, and sex. Across all but 1 of the subgroups, AUROC was in the acceptable to good range (0.71-0.89). The exception was the subsample of appointments for patients in the other race category (AUROC, 0.69), which also had the lowest prevalence rate of the subgroups (0.11% vs 0.14%-0.27% for all other categories). Findings have been mixed on performance of suicide prediction models across racial categories, with 1 study finding evidence of poor performance for some groups (despite good overall performance)32 and other studies finding comparable performance across all groups.3,15 Part of the challenge of comparing results across studies is that there are differences in how categories are operationalized.33 In our study, we examined race and ethnicity separately (given that Hispanic and Latino patients may self-identify across any racial category). We also distinguished among patients who identified as multiracial, those who identified as other race, or those whose race was unknown based on the EHR, whereas in the original model development, multiracial and other were collapsed into a single predictor.7 Another challenge is the lack of consensus on the optimal ethical and mathematical criteria for algorithmic fairness, which needs to take the relative costs of false positives (eg, increased patient burden and potential for stigma) and false negatives (eg, reduced access to intervention) into account.31,34 The sensitivities observed across all subgroups in our study fell within the range found to be cost-effective for low-burden suicide prevention interventions; however, this comparison is limited by potential differences in populations (mental health intake vs primary care) and potential trade-offs between maximizing benefit across the entire population and reducing disparities across subgroups.35,36 Distributional cost-benefit analyses, which take health inequality explicitly into account,37,38 are needed to better understand how the incorporation of prediction models may impact both net health benefit and equity in the context of suicide prevention.

From an implementation perspective, these results suggest that the existing suicide prediction model we evaluated can accurately stratify risk among patients seeking to establish an episode of care in outpatient mental health. Although adoption of universal screening measures for suicide risk at individual appointments remains a subject of further investigation,39 evidence suggests that health system–based organizational frameworks that include standardized workflows to identify at-risk patients, among other practices, can reduce the rate of suicide-related outcomes and improve quality of care.28,40 The performance characteristics identified here provide evidence in support of augmenting clinical decision-making with model-based risk scores at a key point of care in a large integrated health system,41 although the acceptability of targeted delivery of interventions above a specific threshold or targeted deprioritization of services below a specific threshold remains to be determined. Further efforts to understand the perceived cost-benefit trade-offs of interventions to patients and clinicians are needed. Process improvement efforts can also identify potential areas of opportunity unique to health systems, such as low-burden outreach to high-risk patients who do not show to an intake appointment.

Limitations

We note several limitations. This (and other) suicide prediction models use a wide outcome window,20 which means they do not necessarily distinguish between someone who is at imminent risk for suicide and someone whose suicide attempt may occur months after the point of prediction. Moreover, the EHR data that serve as predictors are collected during encounters with the health care system, which occur on a time frame that is unlikely to capture daily fluctuations in the internal and external factors that may exacerbate a suicide attempt. Pilot studies incorporating ecological momentary assessment to understand time-sensitive risk level show promise but may be difficult to scale at the health care systems level.42 Although we did an exhaustive search of the EHR for diagnostic codes related to suicidal behavior, it is possible that some suicide attempts were missed or some unintentional injuries were miscoded. Moreover, in the original model and current validation, all suicide attempts were coded the same regardless of the lethality of means, which has a bearing on the resulting injury when the suicide is not completed. Future research may benefit from incorporating this type of information during model development given that some scalable evidence-based prevention strategies directly target the means (eg, firearm restriction).43 Although we compared model performance across the sociodemographic characteristics that are well characterized in the EHR, future research can examine the intersection of these characteristics as well as additional factors that may have effects on model performance, including engagement with mental (and overall) health services, neighborhood characteristics, and social determinants of health, which can be extracted from clinician notes using natural language processing.44 Last, we identified our intake visit sample using specific intake EHR codes, and it is possible that some return appointments, level-of-care changes, or transfer-of-care visits may have been coded incorrectly as intakes by human error.

Conclusions

In this prognostic study of scheduled mental health intake appointments, an ML model previously developed on a sample of mental health outpatient visits (including intake, treatment, and follow-up appointments) showed good classification performance and calibration. Implementation research is needed to determine appropriate thresholds and interventions for applying the model in an intake setting to draw attention to individuals identified at high risk in a manner that is acceptable to patients and clinicians.

Supplement 1.

eTable. Descriptive Statistics of Variables Included in the Predictive Model and Corresponding Model Coefficients

eFigure. Comparison of Receiver Operating Characteristics Curve Between Completed and Canceled or Missed Intake Appointments

Supplement 2.

Data Sharing Statement

References

  • 1.Centers for Disease Control and Prevention . Provisional suicide deaths in the United States, 2022. August 10, 2023. Accessed August 21, 2023. https://www.cdc.gov/media/releases/2023/s0810-US-Suicide-Deaths-2022.html
  • 2.Cantor JH, McBain RK, Ho PC, Bravata DM, Whaley C. Telehealth and in-person mental health service utilization and spending, 2019 to 2022. JAMA Health Forum. 2023;4(8):e232645. doi: 10.1001/jamahealthforum.2023.2645 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Shortreed SM, Walker RL, Johnson E, et al. Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction. NPJ Digit Med. 2023;6(1):47. doi: 10.1038/s41746-023-00772-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kessler RC, Stein MB, Petukhova MV, et al. ; Army STARRS Collaborators . Predicting suicides after outpatient mental health visits in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Mol Psychiatry. 2017;22(4):544-551. doi: 10.1038/mp.2016.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kessler RC, Warner CH, Ivany C, et al. ; Army STARRS Collaborators . Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study To Assess Risk and Resilience in Servicemembers (Army STARRS). JAMA Psychiatry. 2015;72(1):49-57. doi: 10.1001/jamapsychiatry.2014.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Barak-Corren Y, Castro VM, Javitt S, et al. Predicting suicidal behavior from longitudinal electronic health records. Am J Psychiatry. 2017;174(2):154-162. doi: 10.1176/appi.ajp.2016.16010077 [DOI] [PubMed] [Google Scholar]
  • 7.Simon GE, Johnson E, Lawrence JM, et al. Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records. Am J Psychiatry. 2018;175(10):951-960. doi: 10.1176/appi.ajp.2018.17101167 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Walsh CG, Ribeiro JD, Franklin JC. Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning. J Child Psychol Psychiatry. 2018;59(12):1261-1270. doi: 10.1111/jcpp.12916 [DOI] [PubMed] [Google Scholar]
  • 9.Gradus JL, Rosellini AJ, Horváth-Puhó E, et al. Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry. 2020;77(1):25-34. doi: 10.1001/jamapsychiatry.2019.2905 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen Q, Zhang-James Y, Barnett EJ, et al. Predicting suicide attempt or suicide death following a visit to psychiatric specialty care: a machine learning study using Swedish national registry data. PLoS Med. 2020;17(11):e1003416. doi: 10.1371/journal.pmed.1003416 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sanderson M, Bulloch AG, Wang J, Williams KG, Williamson T, Patten SB. Predicting death by suicide following an emergency department visit for parasuicide with administrative health care system data and machine learning. EClinicalMedicine. 2020;20:100281. doi: 10.1016/j.eclinm.2020.100281 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zheng L, Wang O, Hao S, et al. Development of an early-warning system for high-risk patients for suicide attempt using deep learning and electronic health records. Transl Psychiatry. 2020;10(1):72. doi: 10.1038/s41398-020-0684-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bayramli I, Castro V, Barak-Corren Y, et al. Temporally informed random forests for suicide risk prediction. J Am Med Inform Assoc. 2021;29(1):62-71. doi: 10.1093/jamia/ocab225 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tsui FR, Shi L, Ruiz V, et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open. 2021;4(1):ooab011. doi: 10.1093/jamiaopen/ooab011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Papini S, Hsin H, Kipnis P, et al. Performance of a prediction model of suicide attempts across race and ethnicity. JAMA Psychiatry. 2023;80(4):399-400. doi: 10.1001/jamapsychiatry.2022.5063 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Nock MK, Millner AJ, Ross EL, et al. Prediction of suicide attempts using clinician assessment, patient self-report, and electronic health records. JAMA Netw Open. 2022;5(1):e2144373. doi: 10.1001/jamanetworkopen.2021.44373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kessler RC, Bauer MS, Bishop TM, et al. Evaluation of a model to target high-risk psychiatric inpatients for an intensive postdischarge suicide prevention intervention. JAMA Psychiatry. 2023;80(3):230-240. doi: 10.1001/jamapsychiatry.2022.4634 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Shaw JL, Beans JA, Noonan C, et al. Validating a predictive algorithm for suicide risk with Alaska Native populations. Suicide Life Threat Behav. 2022;52(4):696-704. doi: 10.1111/sltb.12853 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Simon GE, Cruz M, Shortreed SM, et al. Stability of suicide risk prediction models during changes in health care delivery. Psychiatr Serv. 2024;75(2):139-147. doi: 10.1176/appi.ps.20230172 [DOI] [PubMed] [Google Scholar]
  • 20.Belsher BE, Smolenski DJ, Pruitt LD, et al. Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry. 2019;76(6):642-651. doi: 10.1001/jamapsychiatry.2019.0174 [DOI] [PubMed] [Google Scholar]
  • 21.Belsher BE, Smolenski DJ, Pruitt LD. Positive predictive values and potential success of suicide prediction models—reply. JAMA Psychiatry. 2019;76(8):870-871. doi: 10.1001/jamapsychiatry.2019.1510 [DOI] [PubMed] [Google Scholar]
  • 22.Matarazzo BB, Brenner LA, Reger MA. Positive predictive values and potential success of suicide prediction models. JAMA Psychiatry. 2019;76(8):869-870. doi: 10.1001/jamapsychiatry.2019.1519 [DOI] [PubMed] [Google Scholar]
  • 23.Simon GE, Shortreed SM, Coley RY. Positive predictive values and potential success of suicide prediction models. JAMA Psychiatry. 2019;76(8):868-869. doi: 10.1001/jamapsychiatry.2019.1516 [DOI] [PubMed] [Google Scholar]
  • 24.Kessler RC, Bossarte RM, Luedtke A, Zaslavsky AM, Zubizarreta JR. Suicide prediction models: a critical review of recent research with recommendations for the way forward. Mol Psychiatry. 2020;25(1):168-179. doi: 10.1038/s41380-019-0531-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kline-Simon AH, Sterling S, Young-Wolff K, et al. Estimates of workload associated with suicide risk alerts after implementation of risk-prediction model. JAMA Netw Open. 2020;3(10):e2021189. doi: 10.1001/jamanetworkopen.2020.21189 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry. 2021;78(6):642-650. doi: 10.1001/jamapsychiatry.2021.0089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Richards JE, Yarborough BJH, Holden E, et al. Implementation of suicide risk estimation analytics to support mental health care for quality improvement. JAMA Netw Open. 2022;5(12):e2247195. doi: 10.1001/jamanetworkopen.2022.47195 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.McCarthy JF, Cooper SA, Dent KR, et al. Evaluation of the recovery engagement and coordination for health-veterans enhanced treatment suicide risk modeling clinical program in the Veterans Health Administration. JAMA Netw Open. 2021;4(10):e2129900. doi: 10.1001/jamanetworkopen.2021.29900 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Olfson M, Mojtabai R, Sampson NA, et al. Dropout from outpatient mental health care in the United States. Psychiatr Serv. 2009;60(7):898-907. doi: 10.1176/ps.2009.60.7.898 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Naeini MP, Cooper GF, Hauskrecht M. Obtaining well calibrated probabilities using bayesian binning. Proc AAAI Conf Artif Intell. 2015;2901-2907. [PMC free article] [PubMed] [Google Scholar]
  • 31.Mitchell S, Potash E, Barocas S, D’Amour A, Lum K. Algorithmic fairness: choices, assumptions, and definitions. Annu Rev Stat Appl. 2021;8:141-163. doi: 10.1146/annurev-statistics-042720-125902 [DOI] [Google Scholar]
  • 32.Coley RY, Johnson E, Simon GE, Cruz M, Shortreed SM. Racial/ethnic disparities in the performance of prediction models for death by suicide after mental health visits. JAMA Psychiatry. 2021;78(7):726-734. doi: 10.1001/jamapsychiatry.2021.0493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Jain A, Brooks JR, Alford CC, et al. Awareness of racial and ethnic bias and potential solutions to address bias with use of health care algorithms. JAMA Health Forum. 2023;4(6):e231197. doi: 10.1001/jamahealthforum.2023.1197 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.McCradden MD, Joshi S, Mazwi M, Anderson JA. Ethical limitations of algorithmic fairness solutions in health care machine learning. Lancet Digit Health. 2020;2(5):e221-e223. doi: 10.1016/S2589-7500(20)30065-0 [DOI] [PubMed] [Google Scholar]
  • 35.Iskander JK, Crosby AE. Implementing the national suicide prevention strategy: time for action to flatten the curve. Prev Med. 2021;152(pt 1):106734. doi: 10.1016/j.ypmed.2021.106734 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Kuehn BM. Disparities in suicide persist despite overall dip in US rate. JAMA. 2021;325(14):1386. doi: 10.1001/jama.2021.4131 [DOI] [PubMed] [Google Scholar]
  • 37.Asaria M, Griffin S, Cookson R. Distributional cost-effectiveness analysis: a tutorial. Med Decis Making. 2016;36(1):8-19. doi: 10.1177/0272989X15583266 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cookson R, Griffin S, Norheim OF, Culyer AJ, Chalkidou K. Distributional cost-effectiveness analysis comes of age. Value Health. 2021;24(1):118-120. doi: 10.1016/j.jval.2020.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Barry MJ, Nicholson WK, Silverstein M, et al. ; US Preventive Services Task Force . Screening for depression and suicide risk in adults: US Preventive Services Task Force recommendation statement. JAMA. 2023;329(23):2057-2067. doi: 10.1001/jama.2023.9297 [DOI] [PubMed] [Google Scholar]
  • 40.Layman DM, Kammer J, Leckman-Westin E, et al. The relationship between suicidal behaviors and zero suicide organizational best practices in outpatient mental health clinics. Psychiatr Serv. 2021;72(10):1118-1125. doi: 10.1176/appi.ps.202000525 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Simon GE, Matarazzo BB, Walsh CG, et al. Reconciling statistical and clinicians’ predictions of suicide risk. Psychiatr Serv. 2021;72(5):555-562. doi: 10.1176/appi.ps.202000214 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wang SB, Coppersmith DDL, Kleiman EM, et al. A pilot study using frequent inpatient assessments of suicidal thinking to predict short-term postdischarge suicidal behavior. JAMA Netw Open. 2021;4(3):e210591. doi: 10.1001/jamanetworkopen.2021.0591 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mann JJ, Michel CA, Auerbach RP. Improving suicide prevention through evidence-based strategies: a systematic review. Am J Psychiatry. 2021;178(7):611-624. doi: 10.1176/appi.ajp.2020.20060864 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Mitra A, Pradhan R, Melamed RD, et al. Associations between natural language processing-enriched social determinants of health and suicide death among US veterans. JAMA Netw Open. 2023;6(3):e233079. doi: 10.1001/jamanetworkopen.2023.3079 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eTable. Descriptive Statistics of Variables Included in the Predictive Model and Corresponding Model Coefficients

eFigure. Comparison of Receiver Operating Characteristics Curve Between Completed and Canceled or Missed Intake Appointments

Supplement 2.

Data Sharing Statement


Articles from JAMA Psychiatry are provided here courtesy of American Medical Association

RESOURCES