Skip to main content
JAMA Network logoLink to JAMA Network
. 2021 Jan 6;78(4):1–9. doi: 10.1001/jamapsychiatry.2020.4165

Identification of Suicide Attempt Risk Factors in a National US Survey Using Machine Learning

Ángel García de la Garza 1,, Carlos Blanco 2, Mark Olfson 3, Melanie M Wall 1,3
PMCID: PMC7788508  PMID: 33404590

This study evaluates future suicide attempt risk factors in the general population using a data-driven machine learning approach that includes more than 2500 questions from a large, nationally representative survey of US adults.

Key Points

Question

Can survey data identify risk factors of nonfatal suicide attempt in the general population?

Findings

This study used a large, nationally representative longitudinal survey of US adults to create a suicide attempt model addressing risk factors of suicide. The most important factors included previous suicidal ideation or behavior, feeling downhearted, doing activities less carefully or accomplishing less because of emotional problems, younger age, lower educational achievement, and recent financial crisis.

Meaning

By using an algorithmic approach to analyze survey data and identify new risk factors, this study offers new avenues to guide future clinical assessment and development of suicide risk scales in the general population.

Abstract

Importance

Because more than one-third of people making nonfatal suicide attempts do not receive mental health treatment, it is essential to extend suicide attempt risk factors beyond high-risk clinical populations to the general adult population.

Objective

To identify future suicide attempt risk factors in the general population using a data-driven machine learning approach including more than 2500 questions from a large, nationally representative survey of US adults.

Design, Setting, and Participants

Data came from wave 1 (2001 to 2002) and wave 2 (2004 to 2005) of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). NESARC is a face-to-face longitudinal survey conducted with a national representative sample of noninstitutionalized civilian population 18 years and older in the US. The cumulative response rate across both waves was 70.2% resulting in 34 653 wave 2 interviews. A balanced random forest was trained using cross-validation to develop a suicide attempt risk model. Out-of-fold model prediction was used to assess model performance, including the area under the receiver operator curve, sensitivity, and specificity. Survey design and nonresponse weights allowed estimates to be representative of the US civilian population based on the 2000 census. Analyses were performed between May 15, 2019, and June 10, 2020.

Main Outcomes and Measures

Attempted suicide in the 3 years between wave 1 and wave 2 interviews.

Results

Of 34 653 participants, 20 089 were female (weighted proportion, 52.1%). The weighted mean (SD) age was 45.1 (17.3) years at wave 1 and 48.2 (17.3) years at wave 2. Attempted suicide during the 3 years between wave 1 and wave 2 interviews was self-reported by 222 of 34 653 participants (0.6%). Using survey questions measured at wave 1, the suicide attempt risk model yielded a cross-validated area under the receiver operator characteristic curve of 0.857 with a sensitivity of 85.3% (95% CI, 79.8-89.7) and a specificity of 73.3% (95% CI, 72.8-73.8) at an optimized threshold. The model identified 1.8% of the US population to be at a 10% or greater risk of suicide attempt. The most important risk factors were 3 questions about previous suicidal ideation or behavior; 3 items from the 12-Item Short Form Health Survey, namely feeling downhearted, doing activities less carefully, or accomplishing less because of emotional problems; younger age; lower educational achievement; and recent financial crisis.

Conclusions and Relevance

In this study, after searching through more than 2500 survey questions, several well-known risk factors of suicide attempt were confirmed, such as previous suicidal behaviors and ideation, and new risks were identified, including functional impairment resulting from mental disorders and socioeconomic disadvantage. These results may help guide future clinical assessment and the development of new suicide risk scales.

Introduction

Between 2001 and 2017, suicide mortality in the US increased by 31% from 10.7 to 14.0 cases per 100 000 population.1 Previous studies estimate that between 8.5% and 13% of all suicide attempts are fatal2,3,4 and that around 3% of index attempts lead to death.5 Roughly half of suicide deaths do not occur during a first attempt.5,6 Thus, preventing nonfatal attempts presents an opportunity for early intervention in a substantial number of people at high risk of suicide7 and for decreasing the public health burden of suicide behaviors.

Despite extensive work over the last 50 years to improve prediction of suicide attempt, a meta-analysis of 365 studies concluded that using known suicide risk factors leads to only slightly better than chance prediction (weighted area under the receiver operating characteristic curve [AUC], 0.58).8 Machine learning methods and big data sources, such as electronic health records and social media text monitoring, have led to substantial improvements in predicting suicide attempt in clinical samples (AUC, 0.71-0.93).9,10,11,12,13,14 However, most of the published literature on nonfatal suicide attempt prediction has focused on high-risk patients who have received mental health treatment.15,16 More than one-third of people making nonfatal suicide attempts do not receive mental health treatment,17,18 and those that engage in mental health treatment only represent one-third of all fatal suicide attempts in the US.11,15,17,19,20,21 These findings underscore the importance of extending suicide attempt prediction models beyond high-risk populations to the general adult population.22,23

In the present study, we aimed to identify important risk factors of future suicide attempt in the general population by taking advantage of the richness of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data set using an explanatory machine learning model. We extended prior research in 3 important directions. First, we used a large, nationally representative longitudinal sample to identify risk factors of suicide attempt in the general population. Second, we used an extensive assessment instrument that includes detailed evaluation of substance use, psychiatric disorders, and symptoms that are not routinely available in electronic health records or administrative data. Third, we incorporated class imbalance as a feature in our model to address the limitations of more generic algorithms, as few studies have previously done this.15 Overall, we expected to confirm previously identified risk factors found in clinical samples and, more importantly, identify new risk factors to expand our understanding of the etiology of suicide attempts.

Methods

Sample

Data were drawn from NESARC, a face-to-face survey conducted with a nationally representative sample of the US adult population by the National Institute on Alcoholism and Alcohol Abuse.24 The target population included the noninstitutionalized civilian population 18 years and older in the US. Wave 1 NESARC survey data (2001 to 2002) and self-reported nonfatal suicide attempts at follow-up 3 years later (wave 2, 2004 to 2005)25 were used to build a suicide attempt risk model. The cumulative response rate at wave 2 was 70.2%, resulting in 34 653 wave 2 interviews. Survey design and nonresponse weights allowed estimates to be representative of the US civilian population based on the 2000 Census.26 Data were analyzed from May 15, 2019, to June 10, 2020. The research protocol received full human subjects review and approval from the US Census Bureau and the Office of Management and Budget. All participants provided written informed consent. The study followed Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guideline.

Risk Factors From Wave 1

At wave 1, participants were assessed using the Alcohol Use Disorder and Associated Disabilities Interview Schedule DSM-IV (AUDADIS-IV),27,28 a lay-administered structured interview to assess alcohol use, drug use, and mental health disorders according to DSM-IV criteria. Axis I disorders evaluated in the past 12 months included substance use disorders (alcohol use, drug use, and nicotine dependence), mood disorders (major depressive disorder, dysthymic disorder, and bipolar disorder), anxiety disorders (panic disorder, social anxiety disorder, specific phobia, and generalized anxiety disorder), and pathological gambling. Axis II disorders included avoidant, dependent, obsessive-compulsive, histrionic, paranoid, schizoid, and antisocial personality disorders assessed on a lifetime basis. Demographic and background information was collected. Response patterns for each of the 14 sections of the survey are summarized in eTable 1 in the Supplement. The test-retest reliability of AUDADIS-IV and its validity for measuring DSM-IV mental disorders is good to excellent for substance use (κ = 0.51-0.74) and fair to good for other disorders (κ = 0.40-0.67).27,29,30,31

The wave 1 survey contained 2805 separate questions. To reduce interview burden, participants skipped entire sections based on their responses to gate questions. Additionally, there were 180 derived variables for DSM-IV past-year, prior-to-past-year, and lifetime diagnoses of mental disorders, including personality disorders. For each wave 1 participant, there were between 643 and 2985 available features.

Outcome at Wave 2: Nonfatal Suicidal Attempt

At wave 2, a similar face-to-face structured interview follow-up was conducted. The primary outcome was retrospective and was defined as having attempted suicide at any point in the 3 years prior to the wave 2 interview. This variable was derived by combining responses to the wave 2 questions: “In your entire life, did you ever attempt suicide?” and, if affirmative, “How old were you the first time?” and “How old were you the most recent time?”32 If the most recent suicide attempt occurred within the last 3 years, the participant was considered to have met the primary outcome; otherwise the participant was not considered to have met the primary outcome. At wave 2, a total of 222 participants confirmed having attempted suicide since the wave 1 interview.

Statistical Analysis

Model Building

We performed an initial data analysis33 and addressed the survey structural missingness by using the missing-indicator method34,35 described in the eMethods in the Supplement. We used balanced random forest (BRF) to build a model to identify factors associated with suicide attempts by taking the processed 2978 wave 1 features to classify dichotomous suicide attempt at wave 2. BRF has better performance than regular random forest plotting for classification models with class-imbalanced data.36,37 As detailed in the eMethods in the Supplement, we tuned the BRF parameters by using 10-fold cross-validation and further validated our classification model by using nested cross-validation.38

We summarized the final model’s performance by aggregating the out-of-fold classifications of our optimal model and used this aggregated probability (threshold) to calculate an out-of-fold AUC. We weighted our results based on design and nonresponse weights to allow our estimates to be representative of the US civilian population based on the 2000 Census. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), alarms per 100 evaluations, and number needed to evaluate (NNE) to find 1 new suicide attempt case were examined against the threshold value.

Identifying Top Risk Factors

To quantify model variable importance, we calculated the decrease in classification ability after any individual feature was permuted (ie, no longer used in the suicide attempt model) across the data set.39,40 The importance measure was scaled between 0 and 100 by subtracting the smallest importance from all observations and dividing by the largest importance. To facilitate interpretation of suicide attempt risk, we defined 4 interpretable risk severity groups that could be used as a reference of suicide attempt subgroups in the US adult population based on the BRF model as defined in the eMethods in the Supplement. The first risk severity group was based on the Youden J statistic (sensitivity + specificity − 1), which was used to determine low risk vs anything higher than low risk. The cut point for this risk severity group was defined at a sensitivity of 85.3% and a specificity of 73.3%. This cut point placed 73.1% of the US population in the low-risk category. The remaining 26.9% of the population was categorized according to 2 meaningful population benchmarks: the cut point corresponding to the top decile of risk across the sample (differentiating the medium-risk group from the high-risk group) and the cut point corresponding to a PPV of 10% or greater (designating the very high-risk group). Sample weighting identified 7.5% of the US population as high risk and 1.8% as very high risk. The PPV of the very high-risk group was 10.4% (Table 1). We calculated summary statistics of suicide attempt broken down by risk groups. Finally, we quantified the risk associated with each of the top-performing risk factors by generating response plots of the distribution of probabilities for all observations in the 4 empirically derived risk groups.

Table 1. Summary of Suicide Attempt Risk Groups Among a Longitudinal Representative Sample of US Adultsa.
Suicide attempt risk groupb No. (%) Within risk group, %c Mean model-calculated risk scoreb,c
Projected at-risk individuals based on model-calculated risk scorec New attempt by wave 2c No new attempt by wave 2c Probability of new attempt by wave 2 Probability of no new attempt by wave 2
Low 24 862 (73.1) 32 (14.7) 24 830 (73.4) 0.13 99.87 16.2
Medium 6116 (17.5) 52 (24.6) 6064 (17.5) 0.88 99.12 42.3
High 2945 (7.6) 69 (31.1) 2876 (7.5) 2.57 97.43 62.6
Very high 730 (1.8) 69 (29.6) 661 (1.6) 10.40 89.59 81.9
Total 34 653 222 34 431 0.63c 99.37c 25.5
a

Data come from the National Epidemiologic Survey on Alcohol and Related Conditions. Wave 1 was conducted in 2001 and 2002 and wave 2 in 2004 and 2005.

b

Risk scores are based on the out-of-sample classification from the machine learning balanced random forest model. The model provides a continuous risk score for suicide attempt within 3 years between surveys based on wave 1 survey responses. Stratification to 4 risk group categories is described in the eMethods in the Supplement.

c

Statistics presented are raw numbers and weighted percentages, means, and proportions using National Epidemiologic Survey on Alcohol and Related Conditions sampling weights to be representative of the US adult population based on the 2000 Census.

Model Validation

We further validated our model in 3 ways. First, we calculated classification performance stratified by time-to-suicide attempt from the first interview. Second, we stratified classification performance across sex, age, self-reported race/ethnicity (White vs non-White), and income to test the robustness against demographic characteristics. Lastly, we examined erosion in model accuracy with fewer features by running additional BRFs using only the top 5 and 10 risk factors selected from the random forest importance measure.

Results

Performance of the Suicide Attempt Model

Of 34 653 participants, 20 089 were female. The weighted mean (SD) age was 45.1 (17.3) years at wave 1 and 48.2 (17.3) years at wave 2. We found that 222 participants (0.6%) attempted suicide. The out-of-sample AUC for the best model, including all wave 1 features, was 0.857 (range, 0.803-0.909) with a sensitivity of 85.3% (95% CI, 79.8-89.7) and a specificity of 73.3% (95% CI, 72.8-73.8) at an optimized threshold. The optimal cross-validated number of variables to sample at each fold was 1700, representing 57.1% of all features. The out-of-sample generalizability, defined as the correlation between our final model and our nested cross-validated model, was 0.997. Figure 1 presents the distribution of model-calculated risk scores across the whole sample, stratified by whether participants reported a suicide attempt at wave 2. Suicide attempt risk strata (low, medium, high, very high) are summarized in Table 1. Based on our model, 73.1% of the US population are at low risk, 17.5% at medium risk, 7.6% at high risk, and 1.8% at very high risk of suicide attempt. Within these categories, 138 of 222 individuals (62.2%) who attempted suicide between waves 1 and 2 were in the high-risk or very high-risk groups based on their wave 1 survey responses and 32 (14.4%) were classified as being low risk.

Figure 1. Distribution of the Model-Calculated Risk Scores Based on 2985 Features Collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) Wave 1, Weighted to be Representative of the US Population.

Figure 1.

A, Distribution of scores across the entire sample. B, Distribution of scores broken down by those who reported a suicide attempt (n = 222) and those who did not report a suicide attempt (n = 34 431). Risk scores are based on the out-of-sample classification from our machine learning balanced random forest model. NESARC wave 1 was conducted in 2001 and 2002 and wave 2 in 2004 and 2005. Data are weighted to be representative of the US adult population for region, age, sex, and race/ethnicity, based on the 2000 Census. The model-calculated risk groups are defined on interpretable thresholds for suicide attempt risk in US adult population. The cut points for the risk groups are detailed in the eMethods in the Supplement.

Figure 2 displays sensitivity, specificity, PPV, NPV, alarms per 100 evaluations, and NNE to find 1 new suicide attempt case across various classification thresholds. For each plot, we labeled the 3 thresholds used to classify the 4 risk groups. If using the very high-risk group as a classification threshold, there would be a PPV of 10.4%, an NPV of 99.6%, 2 alarms, and an NNE of 10. If using the high-risk group, or the top decile of risk, as a threshold for classification, there would be a PPV of 3.9%, an NPV of 99.7%, 10 alarms, and an NNE of 26. Using the threshold that optimizes the Youden statistic (the low-risk group), there would be a PPV of 2.0%, an NPV of 99.9%, 27 alarms, and an NNE of 51.

Figure 2. Summary Measures of the Classification Ability of the Model Based on 2985 Features Collected in the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) Wave 1 Responses Calculated Across All Possible Classification Thresholds.

Figure 2.

The highlighted cut points are those used to define our model-calculated suicide attempt risk groups. Results were weighted to be representative of the US population. NESARC wave 1 was conducted in 2001 and 2002 and wave 2 was conducted in 2004 and 2005. The model-calculated risk groups are defined on interpretable aggregated out-of-fold classifications of the optimal model (thresholds) for suicide attempt risk. Data are weighted to be representative of the US adult population for region, age, sex, and race/ethnicity, based on the 2000 Census. PPV indicates positive predictive value; NPV, negative predictive value; NNE, number needed to evaluate.

Variable Importance and Risk Factor Effects

Table 2 shows the 20 most important variables from the BRF model. The 3 most important risk factors were whether the individual felt at any point like they wanted to die, whether they thought about committing suicide, and previous suicide attempt. Several of the most important variables were associated with past month low energy and mood periods, such as feeling downhearted, feeling less accomplished, or paying less attention to work or other activities. Other features identified as important were age, family income and financial crisis, marital status, education level, paternal alcohol misuse, and parental separation. The eFigure in the Supplement shows the distribution of model-calculated scores as a function of the top 10 most important variables identified by the BRF algorithm allowing interpretation for how each variable was associated with suicide attempt.

Table 2. Top 20 Most Important Variables Based on Suicide Attempt Model Using 2985 Features Collected from the National Epidemiologic Survey on Alcohol and Related Conditions Wave 1 Responsesa.

Description Importance scoreb
Felt like wanted to die 100
Thought about committing suicide 48.425
Attempted suicide 21.932
During past 4 wk, how often felt downhearted and depressed 14.033
Age 13.731
During past 4 wk, how often did work or other activities less carefully than usual as result of emotional problems 13.051
Experienced major financial crisis, bankruptcy, or been unable to pay bills on time in last 12 mo 11.478
During past 4 wk, how often accomplished less than would like as result of emotional problems 11.213
Grade level during 2000-2001 school year 10.319
Highest grade or year of school completed 7.938
During past 4 wk, how often physical health or emotional problems interfered with social activities 7.746
Blood/natural father ever an alcoholic or problem drinker 7.377
Occupation: current or most recent job 6.059
Current marital status 4.727
Family income in last year 4.472
Age when biological/adoptive parents stopped living together 4.471
Thought a lot about own death 4.135
Present situation includes in school part time 4.128
Personal income in last year 4.122
Parent lived with after biological or adoptive parents stopped living together 4.037
a

NESARC wave 1 was conducted in 2001 and 2002 and wave 2 in 2004 and 2005.

b

The importance score was calculated by permuting the labels and estimating the decrease in classification performance.

Model Robustness Results

We conducted a series of sensitivity and complementary analyses. eTable 2 in the Supplement shows that the classification ability of the model decreased as time-to-suicide attempt from the first interview increased. Among participants who attempted suicide within the first year, 21 (50.5%) were classified as very high risk. Among participants who attempted suicide between the first and second year, 16 were classified as very high risk (33.1%), while among participants who attempted suicide between the second and third year, 21 (30.3%) were classified as very high risk. Finally, among participants who attempted suicide between the third year and follow-up, 11 (16.48%) were classified as very high risk.

Second, we examined the classification ability of our model across demographic characteristics. As shown in eTable 3 in the Supplement, the AUC was 0.808 (95% CI, 0.765-0.851) for participants aged 18 to 36 years, 0.867 (95% CI, 0.827-0.906) for those aged 37 to 53 years, and 0.872 (95% CI, 0.800-0.945) for those 54 years and older. We found that the AUC was 0.850 (95% CI, 0.813-0.886) for female individuals (weighted proportion, 52.1%) and 0.872 (95% CI, 0.840-0.904) for male individuals. The AUC was 0.877 (95% CI, 0.845-0.909) for White participants and 0.831 (95% CI, 0.788-0.873) for non-White participants. Finally, the AUC was 0.845 (95% CI, 0.813-0.877) for participants with an income lower than $20 000, 0.848 (95% CI, 0.786-0.910) for those with an income between $20 000 and $34 999, 0.794 (95% CI, 0.676-0.911) for those with incomes between $35 000 and $69 999, and 0.944 (95% CI, 0.893-0.994) for those with incomes higher than $70 000.

Finally, we measured the decrease in model accuracy when fewer features were included by building new separate BRF models with the top 5 and 10 most important variables previously found using the entire feature set. The out-of-sample AUCs (SE) for these models were 0.818 (0.017) and 0.845 (0.016), respectively.

Discussion

We built a model to classify nonfatal suicide attempts using a large, nationally representative sample of US adults. It confirmed several well-known risk factors of suicide attempt and identified several new ones. When tested outside the training set, our model performed at levels similar to models restricted to data from high-risk mental health patients8,9 for the full sample and when stratified by demographic characteristics, indicating its robustness. Its classification power decreased with time elapsed from the baseline interview, providing an indirect measure of its validity. These results are encouraging given the recent emphasis on models in the general adult population using rich data sets and their usefulness to develop precision treatment rules for individuals who attempt suicide.15,22,23

We found significant conceptual overlap of our most important risk factors with items commonly used in suicide risk scales. In accord with previous findings,41,42,43,44,45 the strongest risk factors of future suicide attempts were related to previous suicidal behaviors. For example, whether the individual felt at any point like they wanted to die is covered in the Patient Health Questionnaire-9,46 the Columbia Suicide Severity Rating Scale,47 and Beck’s Scale of Suicide Ideation.48 Previous suicide ideation is covered in Beck’s Scale of Suicide Ideation, the SAD PERSONS Scale,49 and the Suicide Assessment Scale,50 while previous suicide attempt is covered in the SAD PERSONS scale and the Columbia Suicide Severity Rating Scale. Feeling downhearted and depressed is covered in the Patient Health Questionnaire-946 (item 2) as well as the Beck Depression Inventory (item 1).51,52

Our results extend prior work by revealing the predictive value of variables related to functional impairment resulting from mental disorders, which are not generally covered in screening tools for suicide risk assessment. The questions identified from the impairment construct, including accomplishing less than you like and not performing work or other activities as carefully as usual are covered in the 12-Item Short Form Health Survey.53,54,55 These findings may offer new avenues to improve suicidal behavior prediction through functional assessment. We note that sex was not one of the most important variables, suggesting that sex differences in other risk factors are likely to mediate the difference in suicide attempt prevalence across sexes.7,56,57

Other important novel risk factors identified were related to socioeconomic disadvantage. Lower educational level and experiencing a financial crisis in the last year were among the 10 most important variables. Seeking to alleviate the economic and emotional effects of financial crises might be an important aspect of suicide risk prevention, particularly in the context of deaths of despair formulations of suicide risk.58,59 This is of particular contemporary relevance, given increased unemployment and economic stress in the US related to the coronavirus disease 2019 pandemic.60 Our study identifies an individual-level association between economic strain with suicide attempt risk that extends beyond the findings of previous studies, showing a population-level association between economic recessions and increased suicide rates,61,62,63 a link between financial debt and suicide ideation,64 and case-control research linking unemployment and personal debt to suicide risk.65 Although this association has previously been reported in NESARC,66 our data-driven results highlight this risk factor as one of the most important for suicide attempt in the general population.

We incorporated technical advances in the modeling by using BRF to address the extreme class imbalance and by using the missing-indicator approach to address gate questions and skip patterns common to survey data. We ensured population-level generalizability by incorporating a complex survey design and sampling weights. The algorithms in this study may be useful for the analysis of other large survey data sets. Our methods may have wide applications given the NIH’s recent decision to link research samples to the National Death Index67 and the greater availability of longitudinal mortality outcomes for cross-sectional surveys.

Limitations

This study had some limitations. First, we only had data from participants who were 18 years and older, and some of the risk factors identified, such as financial crisis, might only be relevant to adult populations. Furthermore, suicide risk is highest for people aged 15 to 25 years.68 Second, we did not have information about suicide attempts among participants lost to follow-up (ie, wave 2 nonresponders, including participants who died of suicide), which would have enhanced our ability to detect differences between fatal and nonfatal suicide attempts. Nevertheless, we found lower rates of prior suicidal behaviors and ideation at wave 1 among wave 2 nonresponders, suggesting that selection bias related to suicide attempts is likely small (eTable 4 in the Supplement). Third, there is potential for misclassification of suicide attempt. The reliability of self-reported suicide attempt over such a long recall period may be uncertain and may be affected by participants’ willingness to disclose previous attempts in a face-to-face interview.69,70 However, that our findings confirm previous risk factors adds validity to our results. Fourth, our study examined occurrence of suicide attempts within 3 years of assessment. Exploration of shorter and more clinically relevant time horizons should also be evaluated. Furthermore, the association between risk factors and future suicide attempts may vary over time. Fifth, the data were collected from 2000 to 2005, and there may have been recent secular changes in risk factors of suicide attempts. Sixth, the survey was not collected to study suicide, and some important covariates, such as stress and adjustment disorders, were not included. Furthermore, wave 1 suicide symptoms were only asked of participants who endorsed depressed mood or anhedonia. Given the important role this item assumed as a risk factor of future suicide attempt, an item that was asked of everyone might have increased the accuracy of the models.

Conclusions

Our study demonstrates the ability of machine learning methods to generate powerful and parsimonious suicide attempt models in general adult population samples that build on and complement knowledge derived from clinical and high-risk samples. We confirmed several well-known risk factors of suicide attempts, such as previous suicidal behaviors and depression, while identifying new important risks. Specifically, functional impairment and socioeconomic disadvantage emerged as novel important factors of suicide attempt in the general population with lower educational level and recent financial crisis as an individual-level risk of future suicide attempts. We hope that these results deepen our understanding of the etiology of suicide attempts in adults and improve suicidal behavior prediction by identifying new risk variables to guide clinical assessment and development of suicide risk scales.

Supplement.

eMethods.

eTable 1. Summary of response patterns across wave 1 National Epidemiologic Survey on Alcohol and Related Conditions survey sections

eFigure. Response plots for 10 most important variables from suicide attempt model using 2985 features collected from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 responses colored by model-calculated risk group

eTable 2. Summary of the model-calculated risk scores based on 2985 features collected from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 responses weighted to be representative of the US population broken down by year of suicide attempt (n = 222)

eTable 3. Summary of suicide attempts and model performance based on National Epidemiologic Survey on Alcohol and Related Conditions wave 1 demographic information

eTable 4. Comparison of suicide attempt related characteristics from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 across overall National Epidemiologic Survey on Alcohol and Related Conditions wave 1 sample and wave 2 responders and nonresponders sample

eReferences.

References

  • 1.Hedegaard H, Curtin SC, Warner M. Suicide mortality in the United States, 1999–2017. Accessed June 25, 2020. https://www.cdc.gov/nchs/data/databriefs/db330-h.pdf
  • 2.Miller M, Azrael D, Hemenway D. The epidemiology of case fatality rates for suicide in the northeast. Ann Emerg Med. 2004;43(6):723-730. doi: 10.1016/j.annemergmed.2004.01.018 [DOI] [PubMed] [Google Scholar]
  • 3.Miller M, Azrael D, Barber C. Suicide mortality in the United States: the importance of attending to method in understanding population-level disparities in the burden of suicide. Annu Rev Public Health. 2012;33(1):393-408. doi: 10.1146/annurev-publhealth-031811-124636 [DOI] [PubMed] [Google Scholar]
  • 4.Conner A, Azrael D, Miller M. Suicide case-fatality rates in the United States, 2007 to 2014: a nationwide population-based study. Ann Intern Med. 2019;171(12):885-895. doi: 10.7326/M19-1324 [DOI] [PubMed] [Google Scholar]
  • 5.Bostwick JM, Pabbati C, Geske JR, McKean AJ. Suicide attempt as a risk factor for completed suicide: even more lethal than we knew. Am J Psychiatry. 2016;173(11):1094-1100. doi: 10.1176/appi.ajp.2016.15070854 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Anestis MD. Prior suicide attempts are less common in suicide decedents who died by firearms relative to those who died by other means. J Affect Disord. 2016;189:106-109. doi: 10.1016/j.jad.2015.09.007 [DOI] [PubMed] [Google Scholar]
  • 7.Olfson M, Blanco C, Wall M, et al. National trends in suicide attempts among adults in the United States. JAMA Psychiatry. 2017;74(11):1095-1103. doi: 10.1001/jamapsychiatry.2017.2582 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Franklin JC, Ribeiro JD, Fox KR, et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187-232. doi: 10.1037/bul0000084 [DOI] [PubMed] [Google Scholar]
  • 9.Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci. 2017;5(3):457-469. doi: 10.1177/2167702617691560 [DOI] [Google Scholar]
  • 10.Walsh CG, Ribeiro JD, Franklin JC. Predicting suicide attempts in adolescents with longitudinal clinical data and machine learning. J Child Psychol Psychiatry. 2018;59(12):1261-1270. doi: 10.1111/jcpp.12916 [DOI] [PubMed] [Google Scholar]
  • 11.Belsher BE, Smolenski DJ, Pruitt LD, et al. Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry. 2019;76(6):642-651. doi: 10.1001/jamapsychiatry.2019.0174 [DOI] [PubMed] [Google Scholar]
  • 12.Choi SB, Lee W, Yoon J-H, Won J-U, Kim DW. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J Affect Disord. 2018;231:8-14. doi: 10.1016/j.jad.2018.01.019 [DOI] [PubMed] [Google Scholar]
  • 13.Torous J, Larsen ME, Depp C, et al. Smartphones, sensors, and machine learning to advance real-time prediction and interventions for suicide prevention: a review of current progress and next steps. Curr Psychiatry Rep. 2018;20(7):51. doi: 10.1007/s11920-018-0914-y [DOI] [PubMed] [Google Scholar]
  • 14.Birjali M, Beni-Hssane A, Erritali M. Machine learning and semantic sentiment analysis based algorithms for suicide sentiment prediction in social networks. Procedia Comput Sci. 2017;113:65-72. doi: 10.1016/j.procs.2017.08.290 [DOI] [Google Scholar]
  • 15.Kessler RC, Bossarte RM, Luedtke A, Zaslavsky AM, Zubizarreta JR. Suicide prediction models: a critical review of recent research with recommendations for the way forward. Mol Psychiatry. 2020;25(1):168-179. doi: 10.1038/s41380-019-0531-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kessler RC, Hwang I, Hoffmire CA, et al. Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration. Int J Methods Psychiatr Res. 2017;26(3):e1575. doi: 10.1002/mpr.1575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Houston K, Haw C, Townsend E, Hawton K. General practitioner contacts with patients before and after deliberate self harm. Br J Gen Pract. 2003;53(490):365-370. [PMC free article] [PubMed] [Google Scholar]
  • 18.Suominen K, Isometsä E, Martunnen M, Ostamo A, Lönnqvist J. Health care contacts before and after attempted suicide among adolescent and young adult versus older suicide attempters. Psychol Med. 2004;34(2):313-321. doi: 10.1017/S0033291703008882 [DOI] [PubMed] [Google Scholar]
  • 19.Ahmedani BK, Simon GE, Stewart C, et al. Health care contacts in the year before suicide death. J Gen Intern Med. 2014;29(6):870-877. doi: 10.1007/s11606-014-2767-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Luoma JB, Martin CE, Pearson JL. Contact with mental health and primary care providers before suicide: a review of the evidence. Am J Psychiatry. 2002;159(6):909-916. doi: 10.1176/appi.ajp.159.6.909 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Schaffer A, Sinyor M, Kurdyak P, et al. Population-based analysis of health care contacts among suicide decedents: identifying opportunities for more targeted suicide prevention strategies. World Psychiatry. 2016;15(2):135-145. doi: 10.1002/wps.20321 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kessler RC. Clinical epidemiological research on suicide-related behaviors-where we are and where we need to go. JAMA Psychiatry. 2019;76(8):777-778. doi: 10.1001/jamapsychiatry.2019.1238 [DOI] [PubMed] [Google Scholar]
  • 23.Gordon JA, Avenevoli S, Pearson JL. Suicide prevention research priorities in health care. JAMA Psychiatry. 2020;77(9):885-886. doi: 10.1001/jamapsychiatry.2020.1042 [DOI] [PubMed] [Google Scholar]
  • 24.Grant BF, Moore TC, Shepard J, Kaplan K. Source and Accuracy Statement: Wave 1 National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). National Institute on Alcohol Abuse and Alcoholism. 2003;52. [Google Scholar]
  • 25.Grant BF, Kaplan KK, Stinson FS. Source and accuracy statement: the wave 2 National Epidemiologic Survey on Alcohol and Related Conditions. Accessed June 25, 2020. https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDetail/PB2008109530.xhtml
  • 26.Bureau of the Census . Demographic profile: 2000. Accessed June 25, 2020. https://www.census.gov/prod/cen2000/doc/ProfileTD.pdf
  • 27.Grant BF, Dawson DA, Stinson FS, Chou PS, Kay W, Pickering R. The Alcohol Use Disorder and Associated Disabilities Interview Schedule-IV (AUDADIS-IV): reliability of alcohol consumption, tobacco use, family history of depression and psychiatric diagnostic modules in a general population sample. Drug Alcohol Depend. 2003;71(1):7-16. doi: 10.1016/S0376-8716(03)00070-X [DOI] [PubMed] [Google Scholar]
  • 28.Hasin DS, Greenstein E, Aivadyan C, et al. The Alcohol Use Disorder and Associated Disabilities Interview Schedule-5 (AUDADIS-5): procedural validity of substance use disorders modules through clinical re-appraisal in a general population sample. Drug Alcohol Depend. 2015;148:40-46. doi: 10.1016/j.drugalcdep.2014.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Canino G, Bravo M, Ramírez R, et al. The Spanish Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability and concordance with clinical diagnoses in a Hispanic population. J Stud Alcohol. 1999;60(6):790-799. doi: 10.15288/jsa.1999.60.790 [DOI] [PubMed] [Google Scholar]
  • 30.Chatterji S, Saunders JB, Vrasti R, Grant BF, Hasin D, Mager D. Reliability of the alcohol and drug modules of the Alcohol Use Disorder and Associated Disabilities Interview Schedule—Alcohol/Drug-Revised (AUDADIS-ADR): an international comparison. Drug Alcohol Depend. 1997;47(3):171-185. doi: 10.1016/S0376-8716(97)00088-4 [DOI] [PubMed] [Google Scholar]
  • 31.Hasin D, Carpenter KM, McCloud S, Smith M, Grant BF. The Alcohol Use Disorder and Associated Disabilities Interview Schedule (AUDADIS): reliability of alcohol and drug modules in a clinical sample. Drug Alcohol Depend. 1997;44(2-3):133-141. doi: 10.1016/S0376-8716(97)01332-X [DOI] [PubMed] [Google Scholar]
  • 32.Grant BF, Goldstein RB, Chou SP, et al. Sociodemographic and psychopathologic predictors of first incidence of DSM-IV substance use, mood and anxiety disorders: results from the wave 2 National Epidemiologic Survey on Alcohol and Related Conditions. Mol Psychiatry. 2009;14(11):1051-1066. doi: 10.1038/mp.2008.41 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Huebner M, Vach W, le Cessie S. A systematic approach to initial data analysis is good research practice. J Thorac Cardiovasc Surg. 2016;151(1):25-27. doi: 10.1016/j.jtcvs.2015.09.085 [DOI] [PubMed] [Google Scholar]
  • 34.Groenwold RHH, White IR, Donders ART, Carpenter JR, Altman DG, Moons KGM. Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis. CMAJ. 2012;184(11):1265-1269. doi: 10.1503/cmaj.110977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.White IR, Thompson SG. Adjusting for partially missing baseline measurements in randomized trials. Stat Med. 2005;24(7):993-1007. doi: 10.1002/sim.1981 [DOI] [PubMed] [Google Scholar]
  • 36.Chen C, Liaw A, Breiman L. Using random forest to learn imbalanced data. Accessed June 25, 2020. https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf
  • 37.Khalilia M, Chakraborty S, Popescu M. Predicting disease risks from highly imbalanced data using random forest. BMC Med Inform Decis Mak. 2011;11(1):51. doi: 10.1186/1472-6947-11-51 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Poldrack RA, Huckins G, Varoquaux G. Establishment of best practices for evidence for prediction: a review. JAMA Psychiatry. 2020;77(5):534-540. doi: 10.1001/jamapsychiatry.2019.3671 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. doi: 10.1023/A:1010933404324 [DOI] [Google Scholar]
  • 40.Wright MN, Ziegler A, König IR. Do little interactions get lost in dark random forests? BMC Bioinformatics. 2016;17(1):145. doi: 10.1186/s12859-016-0995-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kuo C-J, Gunnell D, Chen C-C, Yip PS, Chen Y-Y. Suicide and non-suicide mortality after self-harm in Taipei City, Taiwan. Br J Psychiatry. 2012;200(5):405-411. doi: 10.1192/bjp.bp.111.099366 [DOI] [PubMed] [Google Scholar]
  • 42.Hawton K, Bergen H, Cooper J, et al. Suicide following self-harm: findings from the Multicentre Study of Self-Harm in England, 2000-2012. J Affect Disord. 2015;175:147-151. doi: 10.1016/j.jad.2014.12.062 [DOI] [PubMed] [Google Scholar]
  • 43.Fedyszyn IE, Erlangsen A, Hjorthøj C, Madsen T, Nordentoft M. Repeated suicide attempts and suicide among individuals with a first emergency department contact for attempted suicide: a prospective, nationwide, Danish register-based study. J Clin Psychiatry. 2016;77(6):832-840. doi: 10.4088/JCP.15m09793 [DOI] [PubMed] [Google Scholar]
  • 44.Ribeiro JD, Huang X, Fox KR, Franklin JC. Depression and hopelessness as risk factors for suicide ideation, attempts and death: meta-analysis of longitudinal studies. Br J Psychiatry. 2018;212(5):279-286. doi: 10.1192/bjp.2018.27 [DOI] [PubMed] [Google Scholar]
  • 45.Ribeiro JD, Franklin JC, Fox KR, et al. Self-injurious thoughts and behaviors as risk factors for future suicide ideation, attempts, and death: a meta-analysis of longitudinal studies. Psychol Med. 2016;46(2):225-236. doi: 10.1017/S0033291715001804 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Kroenke K, Spitzer RL. The PHQ-9: a new depression diagnostic and severity measure. Psychiatr Ann. 2002;32(9):509-515. doi: 10.3928/0048-5713-20020901-06 [DOI] [Google Scholar]
  • 47.Posner K, Brent D, Lucas C, et al. Columbia-Suicide Severity Rating Scale (C-SSRS). Columbia University Medical Center; 2008. [Google Scholar]
  • 48.Beck AT, Steer RA, Ranieri WF. Scale for suicide ideation: psychometric properties of a self-report version. J Clin Psychol. 1988;44(4):499-505. doi: [DOI] [PubMed] [Google Scholar]
  • 49.Patterson WM, Dohn HH, Bird J, Patterson GA. Evaluation of suicidal patients: the SAD PERSONS scale. Psychosomatics. 1983;24(4):343-345, 348-349. doi: 10.1016/S0033-3182(83)73213-5 [DOI] [PubMed] [Google Scholar]
  • 50.Niméus A, Alsén M, Träskman-Bendz L. The suicide assessment scale: an instrument assessing suicide risk of suicide attempters. Eur Psychiatry. 2000;15(7):416-423. doi: 10.1016/S0924-9338(00)00512-5 [DOI] [PubMed] [Google Scholar]
  • 51.Beck AT. Measuring depression: the depression inventory. depression inventory. In: Williams TA, Katz MM, Shield JA, eds. Recent Advances in the Psychobiology of the Depressive Illnesses. Government Printing Office; 1970:299-302. [Google Scholar]
  • 52.Beck AT, Steer RA, Carbin MG. Psychometric properties of the Beck Depression Inventory: twenty-five years of evaluation. Clin Psychol Rev. 1988;8(1):77-100. doi: 10.1016/0272-7358(88)90050-5 [DOI] [Google Scholar]
  • 53.McHorney CA, Ware JE Jr, Raczek AE. The MOS 36-Item Short-Form Health Survey (SF-36): II. psychometric and clinical tests of validity in measuring physical and mental health constructs. Med Care. 1993;31(3):247-263. doi: 10.1097/00005650-199303000-00006 [DOI] [PubMed] [Google Scholar]
  • 54.Vilagut G, Forero CG, Pinto-Meza A, et al. ; ESEMeD Investigators . The mental component of the Short-Form 12 Health Survey (SF-12) as a measure of depressive disorders in the general population: results with three alternative scoring methods. Value Health. 2013;16(4):564-573. doi: 10.1016/j.jval.2013.01.006 [DOI] [PubMed] [Google Scholar]
  • 55.Ware J Jr, Kosinski M, Keller SDA. A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care. 1996;34(3):220-233. doi: 10.1097/00005650-199603000-00003 [DOI] [PubMed] [Google Scholar]
  • 56.Hawton K. Sex and suicide. gender differences in suicidal behaviour. Br J Psychiatry. 2000;177(6):484-485. doi: 10.1192/bjp.177.6.484 [DOI] [PubMed] [Google Scholar]
  • 57.Skogman K, Alsén M, Öjehagen A. Sex differences in risk factors for suicide after attempted suicide: a follow-up study of 1052 suicide attempters. Soc Psychiatry Psychiatr Epidemiol. 2004;39(2):113-120. doi: 10.1007/s00127-004-0709-9 [DOI] [PubMed] [Google Scholar]
  • 58.Case A, Deaton A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc Natl Acad Sci U S A. 2015;112(49):15078-15083. doi: 10.1073/pnas.1518393112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Kaufman JA, Salas-Hernández LK, Komro KA, Livingston MD. Effects of increased minimum wages by unemployment rate on suicide in the USA. J Epidemiol Community Health. 2020;74(3):219-224. doi: 10.1136/jech-2019-212981 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Reger MA, Stanley IH, Joiner TE. Suicide mortality and coronavirus disease 2019–a perfect storm? JAMA Psychiatry. 2020;77(11):1093-1094. doi: 10.1001/jamapsychiatry.2020.1060 [DOI] [PubMed] [Google Scholar]
  • 61.Reeves A, McKee M, Stuckler D. Economic suicides in the Great Recession in Europe and North America. Br J Psychiatry. 2014;205(3):246-247. doi: 10.1192/bjp.bp.114.144766 [DOI] [PubMed] [Google Scholar]
  • 62.Stuckler D, Basu S, Suhrcke M, Coutts A, McKee M. The public health effect of economic crises and alternative policy responses in Europe: an empirical analysis. Lancet. 2009;374(9686):315-323. doi: 10.1016/S0140-6736(09)61124-7 [DOI] [PubMed] [Google Scholar]
  • 63.Fountoulakis KN, Savopoulos C, Apostolopoulou M, et al. Rate of suicide and suicide attempts and their relationship to unemployment in Thessaloniki Greece (2000-2012). J Affect Disord. 2015;174:131-136. doi: 10.1016/j.jad.2014.11.047 [DOI] [PubMed] [Google Scholar]
  • 64.Meltzer H, Bebbington P, Brugha T, Jenkins R, McManus S, Dennis MS. Personal debt and suicidal ideation. Psychol Med. 2011;41(4):771-778. doi: 10.1017/S0033291710001261 [DOI] [PubMed] [Google Scholar]
  • 65.Chen EY, Chan WS, Wong PW, et al. Suicide in Hong Kong: a case-control psychological autopsy study. Psychol Med. 2006;36(6):815-825. doi: 10.1017/S0033291706007240 [DOI] [PubMed] [Google Scholar]
  • 66.Elbogen EB, Lanier M, Montgomery AE, Strickland S, Wagner HR, Tsai J. Financial strain and suicide attempts in a nationally representative sample of US adults. Am J Epidemiol. 2020;189(11):1266-1274. doi: 10.1093/aje/kwaa146 [DOI] [PubMed] [Google Scholar]
  • 67.Office of Behavioral and Social Sciences Research . Notice of information: National Death Index linkage access for NIH-supported investigators. Accessed April 7, 2020. https://grants.nih.gov/grants/guide/notice-files/NOT-OD-20-057.html
  • 68.Ivey-Stephenson AZ, Crosby AE, Jack SPD, Haileyesus T, Kresnow-Sedacca MJ. Suicide trends among and within urbanization levels by sex, race/ethnicity, age group, and mechanism of death—United States, 2001-2015. MMWR Surveill Summ. 2017;66(18):1-16. doi: 10.15585/mmwr.ss6618a1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Mars B, Cornish R, Heron J, et al. Using data linkage to investigate inconsistent reporting of self-harm and questionnaire non-response. Arch Suicide Res. 2016;20(2):113-141. doi: 10.1080/13811118.2015.1033121 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Hom MA, Stanley IH, Duffy ME, et al. Investigating the reliability of suicide attempt history reporting across five measures: a study of US military service members at risk of suicide. J Clin Psychol. 2019;75(7):1332-1349. doi: 10.1002/jclp.22776 [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

eMethods.

eTable 1. Summary of response patterns across wave 1 National Epidemiologic Survey on Alcohol and Related Conditions survey sections

eFigure. Response plots for 10 most important variables from suicide attempt model using 2985 features collected from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 responses colored by model-calculated risk group

eTable 2. Summary of the model-calculated risk scores based on 2985 features collected from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 responses weighted to be representative of the US population broken down by year of suicide attempt (n = 222)

eTable 3. Summary of suicide attempts and model performance based on National Epidemiologic Survey on Alcohol and Related Conditions wave 1 demographic information

eTable 4. Comparison of suicide attempt related characteristics from National Epidemiologic Survey on Alcohol and Related Conditions wave 1 across overall National Epidemiologic Survey on Alcohol and Related Conditions wave 1 sample and wave 2 responders and nonresponders sample

eReferences.


Articles from JAMA Psychiatry are provided here courtesy of American Medical Association

RESOURCES