Prediction models for high risk of suicide in Korean adolescents using machine learning techniques

Jun Su Jung; Sung Jin Park; Eun Young Kim; Kyoung-Sae Na; Young Jae Kim; Kwang Gi Kim

doi:10.1371/journal.pone.0217639

. 2019 Jun 6;14(6):e0217639. doi: 10.1371/journal.pone.0217639

Prediction models for high risk of suicide in Korean adolescents using machine learning techniques

Jun Su Jung ^1,^#, Sung Jin Park ^2,^#, Eun Young Kim ^3,^4,^‡,^*, Kyoung-Sae Na ⁵, Young Jae Kim ², Kwang Gi Kim ^2,^‡,^*

Editor: Vincenzo De Luca⁶

PMCID: PMC6553749 PMID: 31170212

Abstract

Objective

Suicide in adolescents is a major problem worldwide and previous history of suicide ideation and attempt represents the strongest predictors of future suicidal behavior. The aim of this study was to develop prediction model to identify Korean adolescents of high risk suicide (= who have history of suicide ideation/attempt in previous year) using machine learning techniques.

Methods

A nationally representative dataset of Korea Youth Risk Behavior Web-based Survey (KYRBWS) was used (n = 59,984 of middle and high school students in 2017). The classification process was performed using machine learning techniques such as logistic regression (LR), random forest (RF), support vector machine (SVM), artificial neural network (ANN), and extreme gradient boosting (XGB).

Results

A total of 7,443 adolescents (12.4%) had a previous history of suicidal ideation/attempt. In the multivariable analysis, sadness (odds ratio [OR], 6.41; 95% confidence interval [95% CI], 6.08–6.87), violence (OR, 2.32; 95% CI, 2.01–2.67), substance use (OR, 1.93; 95% CI, 1.52–2.45), and stress (OR, 1.63; 95% CI, 1.40–1.86) were associated factors. Taking into account 26 variables as predictors, the accuracy of models of machine learning techniques to predict the high-risk suicidal was comparable with that of LR; the accuracy was best in XGB (79.0%), followed by SVM (78.7%), LR (77.9%), RF (77.8%), and ANN (77.5%).

Conclusions

The machine leaning techniques showed comparable performance with LR to classify adolescents who have previous history of suicidal ideation/attempt. This model will hopefully serve as a foundation for decreasing future suicides as it enables early identification of adolescents at risk of suicide and modification of risk factors.

Introduction

In South Korea, suicide in adolescents has been emerging as a major public health problem. The suicide rate has increased annually in adolescents and is recorded as not only one of the highest, but also the most rapidly increasing feature among Organization for Economic Cooperation and Development (OECD) countries.

Although several studies have identified risk factors of suicide [1–5], a recent meta-analysis reveals that the ability to predict suicide behaviors have remained limited [6]. New application of machine learning techniques are gaining attention to identify suicide risk at various clinical setting [7]; Passos et al. classified individuals with a history of suicide attempt among patients with mood disorders based on demographic and clinical data [8]. Oh et al. distinguished suicide attempters from non-suicide attempters among patients with depression or anxiety disorders, applying ANN to multiple psychiatric scales and sociodemographic data [5]. Using general characteristics and insurance data from the National Health Insurance Service cohort in Korea, one recent study analyzed the probability of death by suicide [9].

Since the presence of previous suicide ideation/attempt represent one of the strongest predictors of future suicide behavior and death by suicide [6], it is important to identify adolescents who have history of previous suicide ideation/attempt. Herein, the purpose of this study was to establish prediction models for high-risk of suicide in Korean adolescents using machine learning techniques.

Materials and methods

Data collection and preparation

Data used in this study was brought from the Korean Young Risk Behavior Web-based Survey (KYRBWS) XIII in 2017. The KYRBWS is a self-administered online survey and it was approved by the Institutional Review Board (Certificate Number: 11758) of the Korea Centers for Disease Control and Prevention (KCDC).

This survey intends to grasp South Korean adolescents’ health-risk behaviors such as smoking, alcohol use, obesity, physical activity, eating habits, injury prevention, mental health, sexual behaviors, oral health, allergic disorders, personal hygiene, internet addiction, and health equity. Participants were provided with identification numbers and were guaranteed anonymity, and all participants completed an online, self-reported questionnaire in a school computer room after the survey had been fully explained. All data used in this study have been fully anonymized before we accessed them. All procedures and terms and conditions of the survey have been complied with were performed in accordance with the Declaration of Helsinki 7th version and informed consent was obtained from all participants. The test–retest reliability of the KYRBWS questionnaire has been reported to be stable [10]. The dataset and questionnaire is provided with guidelines for calculating a health-related index through the KCDC online site (http://www.cdc.go.kr/CDC/eng/main.jsp).

In 2017, the KYRBWS dataset included a total 62,276 adolescents from 799 middle and high schools (response rate: 95.8%), using a complex sampling design which involves stratification, clustering, and multistage sampling.

Suicide

High risk of suicide, as a dependent variable, was categorized as adolescents who had either suicidal ideation or suicidal attempt in previous year. Suicidal ideation was defined as a yes response to the question, “Did you consider suicide in the last 12 months?” and suicidal attempt was defined as a yes response to the question, “Did you attempt suicide in the last 12 months?” The respondents who experienced either suicidal ideation or suicidal attempt were categorized within the high risk of suicide group.

Independent variables

Independent variables included socio-demographic variables (sex, grade, city type, academic achievement, family structure, family socioeconomic status, and education level of father and mother), health-related lifestyle factors (current smoking, current alcohol consumption, substance use, physical activity, obesity, sexual experience, and internet addiction), and psychological stress factors (sadness, stress, self-rated health, sleep satisfaction, self-rated weight, distorted weight perception, school injury, and violence). Comorbidities included asthma, allergic rhinitis, and atopic dermatitis.

School grade was divided as middle school (Grades 1–3, corresponding age 12–15 years) and high school (Grades 4–6, corresponding age 16–18 years). City type was categorized as big cities, small and medium-sized cities, and countryside. Academic achievement was categorized as high, high middle, middle, low middle, and low. Family structure was categorized as having both parents, having either parent, and neither parent. Family socioeconomic status (SES) was categorized as high, high middle, middle, low middle, and low. Education level of father and mother was categorized as unknown, middle school graduate or less, high school graduate, and college or graduate degree.

Current smoking, current alcohol consumption, and substance use were defined as a yes response to the questions: “Did you smoke or drink alcohol more than once within the last 30 days?” and “Have you ever used any substance or sniffed glue or butane habitually on purpose?”

Physical activity was categorized as “active” (vigorous physical activities more than two days among the last seven days) or “inactive.” Vigorous physical activities were defined as those that make one sweat or feel breathless for 20 minutes or more in the questionnaire.

Body mass index (BMI) was calculated based on the self-reported height and weight, and was categorized as underweight (≤ 5^th percentile), normal (5-85^th percentile), overweight (85-95^th percentile), and obesity (≥ 95^th percentile or BMI ≥ 25 kg/m²). Self-rated weight was categorized as very fat, fat, normal, thin, and very thin. Distorted weight perception was defined when respondents answered “very fat” or “fat” for the self-rated weight question, while his or her actual weight was categorized as underweight or normal.

Information regarding sexual experience, school injury, and internet addiction was also collected. For sadness, the adolescents were asked, “In the last 12 months, has a feeling of sadness interrupted your daily activities for at least two weeks?” In addition, stress, self-rated health, and sleep satisfaction were categorized in five levels by the extent of these symptoms.

Models to predict high risk of suicide

To prevent learning bias resulting from an imbalanced dataset (the proportion of the non-suicide group was about 7 times larger than the suicide group in the entire dataset), a balanced dataset (same number of age- and sex-matched non-suicide group for the suicide group, n = 7,647 for each group) was selected from preprocessed data in terms of down-sampling (Fig 1). To prevent overfitting, the preprocessed dataset was split in five equally-sized random groups using a 5-fold cross validation. One group was used as the test set and the other groups were used as the training sets for the machine learning prediction models. Five machine learning methods were trained: logistic regression (LR), random forest (RF), support vector machine (SVM), artificial neural network (ANN), and extreme gradient boosting (XGB). Optimal parameters for each machine learning method were selected through a grid search (Table 1). The variables used in the model were categorical; hence, a 0 or 1 value was applied by one-hot encoding.

Table 1. Optimal parameters for each machine learning model are selected through the grid search.

Model	Optimal parameters
LR	Penalty: ‘l2,’ C: 0.1
SVM	C: 0.1, gamma: 0.01, kernel: ‘rbf’
RF	n_estimators: 3000, max depth: 5, min samples leaf: 4, min samples split: 10
ANN	Optimizer: ‘Adam’, learning rate: 0.0001, batch size: 200, epoch: 60
XGB	n_estimators: 5000, learning rate: 0.05, colsample bytree: 0.3, max depth: 4, gamma: 1, lambda: 0.5, alpha: 0.5

Open in a new tab

A comparison of LR and other machine learning discriminations for each model was performed, in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy to predict adolescents who had a history of suicidal ideation or attempt. For test dataset, the area under the receiver operating characteristic curve (AUC) for each model was also calculated to evaluate general prediction performance.

Statistical analysis

Results are presented as percentages for categorical variables and as means (± standard deviation) for continuous variables. Categorical variables and continuous variables were compared using the chi-square test or the Student’s t-test for comparisons between adolescents with/without risk of suicide. Multivariate regression analysis was used to identify factors associated with previous suicidal ideation or attempt using the backward stepwise selection method.

The analysis and machine learning models and diagnostic performance was evaluated using the open-source statistical software Python version 3.6.0. P-values of less than 0.05 (two-sided) were considered significant.

Results

The clinical characteristics for a total of 59,984 subjects with valid information regarding previous history of suicidal ideation/attempt are summarized in Table 2. The high risk suicide group showed higher proportions of girl, low school grade, low academic achievement, those not living with both parents, low family SES, low parental education level, current smoking, current alcohol drinking, substance use, inactive physical activity, sexual experience, internet addiction, sadness, high stress, poor self-rated health, low sleep satisfaction, high self-rated weight, distorted weight perception, experience of school injury and violence, and presence of comorbid diseases (asthma, allergic rhinitis, atopic dermatitis).

Table 2. Characteristics of high-risk suicide (n = 7,443) and no high-risk suicide (n = 52,541).

	no high-risk suicide (n = 52,541)	high-risk suicide (n = 7,443)	P value^*
Sex, boy	27493 (52.3%)	2891 (38.8%)	<0.001
Age (yrs.)	15.0±1.7	15.0±1.8	0.695
School			<0.001
Middle school	25876 (49.2%)	3869 (52.0%)
High school	26665 (50.8%)	3574 (48.0%)
School grade			<0.001
G1	8800 (16.7%)	1039 (14.0%)
G2	8520 (16.2%)	1435 (19.3%)
G3	8556 (16.3%)	1395 (18.7%)
G4	8760 (16.7%)	1057 (14.2%)
G5	9071 (17.3%)	1327 (17.8%)
G6	8834 (16.8%)	1190 (16.0%)
City type			0.654
countryside	4094 (7.8%)	563 (7.6%)
small/medium-sized cities	25154 (47.9%)	3545 (47.6%)
big cities	23293 (44.3%)	3335 (44.8%)
Academic achievement			<0.001
high	7221 (13.7%)	878 (11.8%)
high middle	13830 (26.3%)	1632 (21.9%)
middle	15286 (29.1%)	1926 (25.9%)
low middle	11439 (21.8%)	1922 (25.8%)
low	4765 (9.1%)	1085 (14.6%)
Family structure			<0.001
live with both parents	43647 (83.1%)	5740 (77.1%)
live with one parent	4955 (9.4%)	938 (12.6%)
neither parent	3939 (7.5%)	765 (10.3%)
Family SES			<0.001
high	5663 (10.8%)	700 (9.4%)
high middle	15518 (29.5%)	1987 (26.7%)
middle	24500 (46.6%)	3103 (41.7%)
low middle	5790 (11.0%)	1260 (16.9%)
low	1070 (2.0%)	393 (5.3%)
Education, father			<0.001
unknown	11405 (21.7%)	1614 (21.7%)
middle school graduate or less	932 (1.8%)	190 (2.6%)
high school graduate	13488 (25.7%)	1902 (25.6%)
college or graduate degree	26716 (50.8%)	3737 (50.2%)
Education, mother			<0.001
unknown	10695 (20.4%)	1515 (20.4%)
middle school graduate or less	779 (1.5%)	175 (2.4%)
high school graduate	16530 (31.5%)	2254 (30.3%)
college or graduate degree	24537 (46.7%)	3499 (47.0%)
Current smoking (yes)	2748 (5.2%)	753 (10.1%)	<0.001
Current alcohol drinking (yes)	7474 (14.2%)	1659 (22.3%)	<0.001
Substance use (yes)	303 (0.6%)	196 (2.6%)	<0.001
Physical activity			<0.001
active	20243 (38.5%)	2689 (36.1%)
inactive	32298 (61.5%)	4754 (63.9%)
Body mass index (kg/m²)	21.1±3.4	21.2±3.4	0.033
Obesity			0.228
underweight	4088 (7.8%)	621 (8.3%)
normal	39827 (75.8%)	5648 (75.9%)
overweight	1326 (2.5%)	178 (2.4%)
obesity	7300 (13.9%)	996 (13.4%)
Sexual experience (yes)	2128 (4.1%)	616 (8.3%)	<0.001
Internet addiction (yes)	1766 (3.4%)	563 (7.6%)	<0.001
Sadness (yes)	9548 (18.2%)	5389 (72.4%)	<0.001
Stress			<0.001
very high	3648 (6.9%)	2545 (34.2%)
high	13050 (24.8%)	3086 (41.5%)
middle	23913 (45.5%)	1507 (20.2%)
low	9621 (18.3%)	227 (3.0%)
very low	2309 (4.4%)	78 (1.0%)
Self-rated health			<0.001
very good	15525 (29.5%)	1204 (16.2%)
good	23892 (45.5%)	2695 (36.2%)
normal	10561 (20.1%)	2336 (31.4%)
poor	2438 (4.6%)	1081 (14.5%)
very poor	125 (0.2%)	127 (1.7%)
Sleep satisfaction			<0.001
very high	4570 (8.7%)	323 (4.3%)
high	9869 (18.8%)	755 (10.1%)
middle	17375 (33.1%)	1975 (26.5%)
low	14392 (27.4%)	2428 (32.6%)
very low	6335 (12.1%)	1962 (26.4%)
Self-rated weight			<0.001
very thin	2144 (4.1%)	351 (4.7%)
thin	11176 (21.3%)	1409 (18.9%)
normal	19141 (36.4%)	2281 (30.6%)
fat	16914 (32.2%)	2667 (35.8%)
very fat	3166 (6.0%)	735 (9.9%)
Distorted weight perception (yes)	16701 (31.8%)	2867 (38.5%)	<0.001
School injury (yes)	12105 (23.0%)	2382 (32.0%)	<0.001
Violence (yes)	893 (1.7%)	529 (7.1%)	<0.001
Asthma (yes)	4343 (8.3%)	827 (11.1%)	<0.001
Allergic rhinitis (yes)	18073 (34.4%)	2906 (39.0%)	<0.001
Atopic dermatitis (yes)	12839 (24.4%)	2152 (28.9%)	<0.001

Open in a new tab

Note. Values are means ± standard deviation, median (range), or number (percentages).

*Chi-squared test or Student's t test.

SES: socio-economic status

A multivariate regression analysis was performed to identify factors associated with high risk of suicide (Table 3). Sadness (odds ratio [OR], 6.41; 95% confidence interval [95% CI], 6.08–6.87), violence (OR, 2.32; 95% CI, 2.01–2.67), substance use (OR, 1.93; 95% CI, 1.52–2.45), and stress (OR, 1.63; 95% CI, 1.40–1.86) showed relatively strong associations with previous suicide ideation/attempt. There were other factors that showed associations with suicide: girl sex, grade, academic achievement, family structure, family SES, parental education level, current smoking, current alcohol drinking, physical activity, overweight, self-rated health, sleep satisfaction, sexual experience, school injury, and violence.

Table 3. Multivariate logistic regression analysis to identify factors associated with high risk of suicide.

	Adjusted OR	(95% CI)	P value
Sex
boy	Reference
girl	1.250	1.174 to 1.330	<0.001
School grade
G1	Reference
G2	0.911	0.829 to 1.000	0.051
G3	0.767	0.697 to 0.844	<0.001
G4	0.531	0.479 to 0.590	<0.001
G5	0.532	0.480 to 0.589	<0.001
G6	0.447	0.403 to 0.497	<0.001
City type
countryside	Reference
small/medium-sized cities	0.655	0.595 to 0.720	<0.001
big cities	0.674	0.612 to 0.741	<0.001
Academic achievement
high	Reference
high middle	0.766	0.695 to 0.844	<0.001
middle	0.801	0.728 to 0.882	<0.001
low middle	0.909	0.824 to 1.004	0.059
low	0.859	0.765 to 0.964	0.010
Family structure
live with both parents	Reference
live with one parent	1.116	1.020 to 1.222	0.017
neither	1.081	0.971 to 1.204	0.155
Family SES
high	Reference
high middle	0.822	0.743 to 0.909	<0.001
middle	0.804	0.728 to 0.882	<0.001
low middle	1.023	0.910 to 1.152	0.699
low	1.094	0.920 to 1.300	0.308
Education, father
unknown	0.844	0.767 to 0.929	0.001
middle school graduate or less	1.003	0.817 to 1.230	0.981
high school graduate	0.967	0.894 to 1.046	0.406
college or graduate degree	Reference
Education, mother
unknown	0.911	0.827 to 1.005	0.062
middle school graduate or less	1.075	0.868 to 1.331	0.508
high school graduate	0.852	0.790 to 0.918	<0.001
college or graduate degree	Reference
Current smoking (yes)	1.235	1.097 to 1.391	<0.001
Current alcohol drinking (yes)	1.184	1.093 to 1.282	<0.001
Substance use (yes)	1.932	1.523 to 2.450	<0.001
Physical activity
active	Reference
inactive	0.879	0.827 to 0.935	<0.001
Obesity
normal	Reference
underweight	1.089	0.980 to 1.210	0.113
overweight	0.767	0.636 to 0.924	0.005
obesity	0.937	0.862 to 1.018	0.122
Sexual experience (yes)	1.193	1.054 to 1.351	0.005
Internet addiction (yes)	1.230	0.911 to 1.660	0.177
Sadness (yes)	6.464	6.083 to 6.868	<0.001
Stress
very high	1.626	1.398 to 1.892	<0.001
high	0.843	0.729 to 0.975	0.021
middle	0.360	0.311 to 0.416	<0.001
low	0.182	0.151 to 0.218	<0.001
very low	Reference
Self-rated health
very good	Reference
good	1.116	1.030 to 1.208	0.007
normal	1.537	1.409 to 1.677	<0.001
poor	2.009	1.794 to 2.249	<0.001
very poor	2.901	2.131 to 3.950	<0.001
Sleep satisfaction
very high	Reference
high	0.690	0.604 to 0.788	<0.001
middle	0.777	0.688 to 0.877	<0.001
low	0.851	0.753 to 0.961	0.010
very low	0.883	0.776 to 1.005	0.059
Self-rated weight
very thin	Reference
thin	0.446	0.395 to 0.503	<0.001
normal	0.426	0.380 to 0.478	<0.001
fat	0.501	0.421 to 0.598	<0.001
very fat	0.578	0.476 to 0.703	<0.001
Distorted weight perception (yes)	0.967	0.828 to 1.129	0.671
School injury (yes)	1.078	1.012 to 1.148	0.020
Violence (yes)	2.317	2.014 to 2.666	<0.001
Asthma (yes)	1.022	0.929 to 1.124	0.653
Allergic rhinitis (yes)	0.988	0.930 to 1.050	0.702
Atopic dermatitis (yes)	1.053	0.987 to 1.122	0.117

Open in a new tab

Note. SES: socio-economic status

For the test dataset, the confusion matrix and receiver operating characteristic (ROC) curve show that the diagnostic performance of machine learning techniques are comparable with that of the LR result (Table 4 and Fig 2). XGB showed the best performance, with a sensitivity of 78.5%, specificity of 79.4%, PPV of 79.2%, NPV of 78.7%, classification accuracy of 79.0%, and AUC of 0.863.

Table 4. Confusion matrix for prediction models (Test set).

	Model	Sensitivity	Specificity	PPV	NPV	Accuracy	AUC
Test set	LR	78.2%	77.6%	77.7%	78.0%	77.9%	0.851
	SVM	78.4%	78.9%	78.8%	78.5%	78.7%	0.853
	RF	77.5%	78.0%	77.9%	77.6%	77.8%	0.857
	ANN	77.3%	77.8%	77.7%	77.4%	77.5%	0.851
	XGB	78.5%	79.4%	79.2%	78.7%	79.0%	0.863

Open in a new tab

Note. LR: logistic regression; SVM: support vector machine; RF: random forest; ANN: artificial neural network; XGB: extreme gradient boosting; PPV: positive predictive value; NPV: negative predictive value; AUC: area under ROC curve

Discussion

Machine learning techniques offer promise to improve risk prediction for suicide. A systematic review revealed greater prediction accuracy of self-injurious thoughts and behaviors than in previous studies using traditional statistical methods [7].

Machine learning techniques have advantages beyond traditional statistical approaches in psychological research [11]. For example, traditional approaches greatly minimize the number of variables and impose linearity on relationships that likely have more complex associations. On the other hand, machine learning approaches enable the simultaneous testing of numerous variables and their complex interactions and allow for non-linearity in producing predictive models [11].

The purpose of this study was to develop models to determine adolescent at risk of suicide using nationally representative survey dataset in Korea by using machine learning methods. In this study, we applied the LR method and several other machine learning algorithms, and XGB showed the best performance in the test dataset with an accuracy of 79.0% (AUC = 0.863). XGB, one of the machine learning techniques, is highly efficient and flexible and can be easily used on distributed platforms for further computational efficiency [12]. Ensemble learning is possible by attaching another algorithm to XGB. Future studies would possibly show a better performance if XGB is combined with various algorithms rather than a single algorithm model.

However, the machine leaning techniques showed an overall comparable diagnostic performance with LR. The main reason might be due to the type of dataset used in the present study. The KYRBWS survey data are composed of general health-risk behaviors and we arbitrarily select 26 categorical variables to develop prediction models. Further study is warranted to explore the increasing accuracy using latent variables.

The present study has several limitations. First, the KYRBWS was developed to cover general health-risk behaviors including psychological status and previous suicidal behavior, which were examined by simple questions and scales. If the survey had been composed of more detailed questions regarding suicide behavior or psychological status, the performance of models might have improved. Second, this model was developed using the KYRBWS dataset, it does not guarantee the same diagnostic performance with other datasets or populations. In the present study, we used pairing cross validation for imbalance outcome to avoid the problem of “limited generalization” or “overfitting.” Nevertheless, despite these limitations, this is the first study to adopt machine learning techniques to a nationally representative, and large number (n = 59,984) of Korean adolescents.

In conclusion, this study showed that machine learning techniques have the potential to identify Korean adolescents at risk of suicide using nationally representative survey dataset of general health-risk behaviors. Several machine learning models have comparable performance with the conventional LR method, which have potential for development. Establishment of accurate prediction models through additional studies would facilitate early screening of high risk adolescents and correction of modifiable risk factors, so that society can prevent future suicidal behavior and death by suicide.

Data Availability

These analyses were performed using data from the Korean Young Risk Behavior Web-based Survey (KYRBWS) link: http://www.cdc.go.kr/CDC/eng/main.jsp). The code used in this study has been uploaded to the following GithHub link: https://github.com/gcubme/PredictSuicide.git.

Funding Statement

This research was supported by MD-PhD research through the Korea Research-Driven Hospital (grant 2018-5287) and Future Innovation Challenges through the Gachon University (grant 2018-0669).

References

1.Byeon KH, Kimm H, Jee SH, Sull JW, Choi B. Relationship between heavy drinking experience and suicide attempts in Korean adolescents: Based on 2013 The Korea Youth Risk Behavior Web-based Survey. Epidemiology and health. 2018:e2018046 Epub 2018/10/20. 10.4178/epih.e2018046 . [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Choi SB, Lee W, Yoon JH, Won JU, Kim DW. Risk factors of suicide attempt among people with suicidal ideation in South Korea: a cross-sectional study. BMC public health. 2017;17(1):579 Epub 2017/06/18. 10.1186/s12889-017-4491-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Jordan P, Shedden-Mora MC, Lowe B. Predicting suicidal ideation in primary care: An approach to identify easily assessable key variables. General hospital psychiatry. 2018;51:106–11. Epub 2018/02/13. 10.1016/j.genhosppsych.2018.02.002 . [DOI] [PubMed] [Google Scholar]
4.Kim YJ, Moon SS, Lee JH, Kim JK. Risk Factors and Mediators of Suicidal Ideation Among Korean Adolescents. Crisis. 2018;39(1):4–12. Epub 2016/11/22. 10.1027/0227-5910/a000438 . [DOI] [PubMed] [Google Scholar]
5.Oh J, Yun K, Hwang JH, Chae JH. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales. Frontiers in psychiatry. 2017;8:192 Epub 2017/10/19. 10.3389/fpsyt.2017.00192 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187–232. 10.1037/bul0000084 . [DOI] [PubMed] [Google Scholar]
7.Burke TA, Ammerman BA, Jacobucci R. The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: A systematic review. Journal of affective disorders. 2019;245:869–84. 10.1016/j.jad.2018.11.073 . [DOI] [PubMed] [Google Scholar]
8.Passos IC, Mwangi B, Cao B, Hamilton JE, Wu MJ, Zhang XY, et al. Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach. Journal of affective disorders. 2016;193:109–16. Epub 2016/01/17. 10.1016/j.jad.2015.12.066 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Choi SB, Lee W, Yoon JH, Won JU, Kim DW. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J Affect Disord. 2018;231:8–14. Epub 2018/02/07. 10.1016/j.jad.2018.01.019 . [DOI] [PubMed] [Google Scholar]
10.Bae J, Joung H, Kim JY, Kwon KN, Kim YT, Park SW. Test-retest reliability of a questionnaire for the Korea Youth Risk Behavior Web-based Survey. J Prev Med Public Health. 2010;43(5):403–10. 10.3961/jpmph.2010.43.5.403 . [DOI] [PubMed] [Google Scholar]
11.McArdle JJ, Ritschard G. Contemporary Issues In Exploratory Data Mining in the Behavior Sciences.2014. [Google Scholar]
12.Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29(5):1189–232. 10.1214/aos/1013203451 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

[pone.0217639.ref001] 1.Byeon KH, Kimm H, Jee SH, Sull JW, Choi B. Relationship between heavy drinking experience and suicide attempts in Korean adolescents: Based on 2013 The Korea Youth Risk Behavior Web-based Survey. Epidemiology and health. 2018:e2018046 Epub 2018/10/20. 10.4178/epih.e2018046 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0217639.ref002] 2.Choi SB, Lee W, Yoon JH, Won JU, Kim DW. Risk factors of suicide attempt among people with suicidal ideation in South Korea: a cross-sectional study. BMC public health. 2017;17(1):579 Epub 2017/06/18. 10.1186/s12889-017-4491-5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0217639.ref003] 3.Jordan P, Shedden-Mora MC, Lowe B. Predicting suicidal ideation in primary care: An approach to identify easily assessable key variables. General hospital psychiatry. 2018;51:106–11. Epub 2018/02/13. 10.1016/j.genhosppsych.2018.02.002 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref004] 4.Kim YJ, Moon SS, Lee JH, Kim JK. Risk Factors and Mediators of Suicidal Ideation Among Korean Adolescents. Crisis. 2018;39(1):4–12. Epub 2016/11/22. 10.1027/0227-5910/a000438 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref005] 5.Oh J, Yun K, Hwang JH, Chae JH. Classification of Suicide Attempts through a Machine Learning Algorithm Based on Multiple Systemic Psychiatric Scales. Frontiers in psychiatry. 2017;8:192 Epub 2017/10/19. 10.3389/fpsyt.2017.00192 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0217639.ref006] 6.Franklin JC, Ribeiro JD, Fox KR, Bentley KH, Kleiman EM, Huang X, et al. Risk factors for suicidal thoughts and behaviors: A meta-analysis of 50 years of research. Psychol Bull. 2017;143(2):187–232. 10.1037/bul0000084 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref007] 7.Burke TA, Ammerman BA, Jacobucci R. The use of machine learning in the study of suicidal and non-suicidal self-injurious thoughts and behaviors: A systematic review. Journal of affective disorders. 2019;245:869–84. 10.1016/j.jad.2018.11.073 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref008] 8.Passos IC, Mwangi B, Cao B, Hamilton JE, Wu MJ, Zhang XY, et al. Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach. Journal of affective disorders. 2016;193:109–16. Epub 2016/01/17. 10.1016/j.jad.2015.12.066 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0217639.ref009] 9.Choi SB, Lee W, Yoon JH, Won JU, Kim DW. Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea. J Affect Disord. 2018;231:8–14. Epub 2018/02/07. 10.1016/j.jad.2018.01.019 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref010] 10.Bae J, Joung H, Kim JY, Kwon KN, Kim YT, Park SW. Test-retest reliability of a questionnaire for the Korea Youth Risk Behavior Web-based Survey. J Prev Med Public Health. 2010;43(5):403–10. 10.3961/jpmph.2010.43.5.403 . [DOI] [PubMed] [Google Scholar]

[pone.0217639.ref011] 11.McArdle JJ, Ritschard G. Contemporary Issues In Exploratory Data Mining in the Behavior Sciences.2014. [Google Scholar]

[pone.0217639.ref012] 12.Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Statist. 2001;29(5):1189–232. 10.1214/aos/1013203451 [DOI] [Google Scholar]

PERMALINK

Prediction models for high risk of suicide in Korean adolescents using machine learning techniques

Jun Su Jung

Sung Jin Park

Eun Young Kim

Kyoung-Sae Na

Young Jae Kim

Kwang Gi Kim

Roles

Abstract

Objective

Methods

Results

Conclusions

Introduction

Materials and methods

Data collection and preparation

Suicide

Independent variables

Models to predict high risk of suicide

Fig 1. Scheme prediction model development.

Table 1. Optimal parameters for each machine learning model are selected through the grid search.

Statistical analysis

Results

Table 2. Characteristics of high-risk suicide (n = 7,443) and no high-risk suicide (n = 52,541).

Table 3. Multivariate logistic regression analysis to identify factors associated with high risk of suicide.

Table 4. Confusion matrix for prediction models (Test set).

Fig 2. Receiver operating characteristic (ROC) curve.

Discussion

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Prediction models for high risk of suicide in Korean adolescents using machine learning techniques

Jun Su Jung

Sung Jin Park

Eun Young Kim

Kyoung-Sae Na

Young Jae Kim

Kwang Gi Kim

Roles

Abstract

Objective

Methods

Results

Conclusions

Introduction

Materials and methods

Data collection and preparation

Suicide

Independent variables

Models to predict high risk of suicide

Fig 1. Scheme prediction model development.

Table 1. Optimal parameters for each machine learning model are selected through the grid search.

Statistical analysis

Results

Table 2. Characteristics of high-risk suicide (n = 7,443) and no high-risk suicide (n = 52,541).

Table 3. Multivariate logistic regression analysis to identify factors associated with high risk of suicide.

Table 4. Confusion matrix for prediction models (Test set).

Fig 2. Receiver operating characteristic (ROC) curve.

Discussion

Data Availability

Funding Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases