Abstract
Machine learning (ML) techniques have been widely used to address mental health questions. We discuss two main aspects of ML in psychiatry in this paper, that is, supervised learning and unsupervised learning. Examples are used to illustrate how ML has been implemented in recent mental health research.
Keywords: models, statistical, psychiatry
Introduction
The development of high technologies has significantly changed the research and treatment methods in psychiatry. Advanced technologies such as social media, smartphones and wearable devices have enabled psychiatric clinicians and researchers to collect a wide range data of subjects/patients within a relatively short period of time to monitor the psychical status of clients or patients,1 and to offer more accurate and personalised treatments. While enjoying the convenience brought to us by the advanced technologies, we are facing the challenge of analysing the large data set generated from them, and making good prediction of some outcomes for a new subject. Unlike the traditional statistical methods which try to find a good fit of the data to interpret the association between the outcome and some potential features, medical researchers and clinicians are interested in the prediction of treatment methods (for example, the dosage of a drug) and treatment outcomes (eg, 5-year survival probability) given a comprehensive measurement of different features of a patient.
Machine learning (ML) takes advantage of advanced statistical methods and computer science techniques, and has been implemented to analyse ‘big data’ nowadays.2 The common types of ML techniques used in the psychiatric field include supervised learning (SL) and unsupervised learning (USL).3
SL is used for data type with a labelled response variable. The purpose of SL is to develop a model for which the outcome can be formulated as a function of the features (covariates) so that the model can make a prediction of the outcome in the future when only the features are given. For instance, suppose we are interested in identifying a patient with either major depressive disorder or no depression, based on the measurement of some factors of patients. SL methods try to build a model between the outcome (eg, depression or not) and a series of features, such as age, gender, education background, work type and so on, which are collected from different data sources. Commonly used examples of SL algorithms include logistic regression (LR) and support vector machine (SVM);4 LR was borrowed directly from traditional statistics and SVM was invented by computer scientists.5 We will discuss LR in detail in the next section.
USL is applied to data without a labelled outcome.6 The algorithms try to recognise similarities/dissimilarities between subjects through input variables (features) without the aid of a labelled outcome. This is why it is called ‘unsupervised’. One of the most commonly used USL methods is the k-means clustering which minimises within-cluster variances to partition observations into k clusters. The lack of labelling will make USL more challenging, while this could also help to reveal the underlying data structure without a possible prior bias. We will discuss the k-means clustering by concrete example in the later section.
Supervised learning
LR is a widely used statistical method to model the conditional expectation of a binary outcome with the given covariates. It has also been generalised to the case of polytomous outcomes.7 For this reason, it is a natural choice for a classification method in multivariate analysis. After estimating the parameters in the regression function based on the training data, we predict the probabilities that a new subject will be assigned to each category in the outcome by substituting its observed features into the regression function.
LR is one of the most popular SL tools in biomedical studies. Recently, Lee et al developed an LR model for adolescent suicide attempt prediction using sociodemographic characteristics, risk behaviours and psychological variables.8 This study is based on a sample of 247 222 subjects in the Korea Youth Risk Behavior Web-based Survey. The LR model was used to predict the risk of suicide with 13 different variables selected through univariate analysis screening.9
For simplicity, assume the behaviour of adolescent suicide attempt (a binary outcome variable Y, with 1 for ‘Yes’ and 0 for ‘No’) is strongly associated with age (a continuous variable X 1), gender (a binary variable X 2, with 1 for male and 0 for female), experience of violence (a binary variable X 3 with 1 for ‘Yes’ and 0 for ‘No’), feelings of sadness (a binary variable X 4, with 1 for ‘Yes’ and 0 for ‘No’) and current alcohol drinking (a binary variable X 5, with 1 for ‘Yes’ and 0 for ‘No’) after variable selection. The LR model assumes that the conditional distribution of Y given the covariates is of the form
where
is called the linear predictor, a linear combination of covariates.10
Suppose we have a new subject with features in table 1, and want to predict the probability of committing suicide.
Table 1.
Variables | Coefficients | Covariate values of the subject |
Age | −0.285 | 15 |
Gender: female | 0.403 | 0 |
Experience of violence: yes | 1.257 | 1 |
Feeling of sadness: yes | 1.760 | 1 |
Current alcohol drinking: yes | 0.382 | 1 |
Intercept | −2.497 |
Then we have,
and,
Therefore, the suggested probability of this person’s suicide was calculated as 3.32%, which belongs to the high-risk group (>0.12%).8
Unsupervised learning
K-means clustering is a statistical technique that has been used to recognise the patterns of different types or levels of severity of a specific illness, based on related variables with no outcome labels provided. Fuente-Tomas et al proposed an easy-to-use, cluster-based severity classification for bipolar disorder (BD) that may help clinicians in the processes of personalised medicine and shared decision-making.11 In this study, 224 subjects with a diagnosis of BD (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition, Text Revision) under ambulatory treatment were classified into five different clusters based on 12 variables from five domains.
In their method part, k-means clustering was used to reduce the dimension of four types of variables including patients’ sociodemographic and BD characteristics, psychometric instruments and laboratory results. By the algorithm of k-means clustering, the criterion is calculated by within-point scatter,
where is the mean vector of the jth cluster, , and refers to all clusters that subject i belongs to. The number of clustering k can be found out by the Elbow method,12 a heuristic method that helps interpret the consistency within cluster analysis. K-means clustering aims to minimise the criterion by assigning n observations to k clusters in such a way that within each cluster, the variance between the observations and the cluster mean is minimised.
The variables were selected by testing the between-group difference using χ2 test or one way analysis of variance. Along with other variables chosen by expert criteria, 12 variables were included in the global severity formula.11 The variables included in the global severity formula are listed in box 1.
Box 1. Variables in the global severity formula.
Variables
(1) Clinical characteristics of the bipolar disorder (BD)
Number of hospitalisations (HospN)
Number of suicide attempts (SuicAttN)
Comorbid personality disorder (ComPD)
(2) Physical health
Body mass index (BMI)
Metabolic syndrome (MetS)
Number of comorbid physical illnesses (IllnessN)
(3) Cognition
Screen for Cognitive Impairment in Psychiatry score (SCIPTr4)
(4) Real-world functioning
Permanently disabled due to BD (PD ×BD)
Functioning Assessment Short Test Total Score (FASTT)
Functioning Assessment Short Test Leisure Time Subscale Score (FASTleisure)
(5) Health-related quality of life
SF-36 Physical Functioning Scale Score (SFPF)
SF-36 Mental Health Scale Score (SFMH).
Since the 12 selected variables can take values from 0 to 1 and all of them have equal weights, the sum of all variables need to be multiplied by 10/12 so that the result could represent the severity from low (0) to high (10).
The severity clusters were defined by the 5th, 25th, 50th, 75th and 95th percentiles of the score calculated by this formula. Patients can be classified into different clusters by recognising which range (defined by the centiles) their global severity score falls into.
Conclusion and discussion
In this paper we give a brief introduction of two ML methods, SL and USL, through LR and k-means clustering. Examples have been used to show how they can be used in practice. As data structures are getting more and more complicated in mental health studies, we need advanced and flexible methods to analyse the data and to offer precise and personalised treatments for patients. We believe ML as a combination of statistical methods and computer science will play an important role in psychiatry. We plan to introduce some other developments of ML methods in the following issues.
Biography
Zhirou Zhou obtained her BEc in Economic Statistics from Beijing University of Technology in 2018. She is currently a masters student in Statistics in the Department of Biostatistics and Computational Biology at the University of Rochester Medical Center. Her research interests include variable selection and causal inference.
Footnotes
Correction notice: This article has been corrected since it was first published. The second equation under the section heading 'Unsupervised Learning' was missing an end parenthesis. This has since been updated.
Contributors: ZZ and T-CW: collected the data and wrote the draft. BW and HW: reviewed and revised the draft. XMT: reviewed the article. CF: proposed the topic and reviewed the final draft.
Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests: None declared.
Patient consent for publication: Not required.
Provenance and peer review: Commissioned; internally peer reviewed.
References
- 1. Chen M, Mao S, Liu Y. Big data: a survey. Mobile Netw Appl 2014;19:171–209. 10.1007/s11036-013-0489-0 [DOI] [Google Scholar]
- 2. Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science 2015;349:255–60. 10.1126/science.aaa8415 [DOI] [PubMed] [Google Scholar]
- 3. Cho G, Yim J, Choi Y, et al. . Review of machine learning algorithms for diagnosing mental illness. Psychiatry Investig 2019;16:262–9. 10.30773/pi.2018.12.21.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Bzdok D, Krzywinski M, Altman N. Machine learning: supervised methods. Nat Methods 2018;15:5–6. 10.1038/nmeth.4551 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995;20:273–97. 10.1007/BF00994018 [DOI] [Google Scholar]
- 6. Miotto R, Li L, Kidd BA, et al. . Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016;6:26094 10.1038/srep26094 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Agresti A. An introduction to categorical data analysis. 3rd edn New York: Wiley, 2018. [Google Scholar]
- 8. Lee J, Jang H, Kim J, et al. . Development of a suicide index model in general adolescents using the South Korea 2012–2016 national representative survey data. Sci Rep 2019;9:1846 10.1038/s41598-019-38886-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Wang H, Peng J, Wang B, et al. . Inconsistency between univariate and multiple logistic regressions. Shanghai Arch Psychiatry 2017;29:124–8. 10.11919/j.issn.1002-0829.217031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. McCullagh P, Nelder JA. Generalized linear models. 2nd edn Chapman and Hall/CRC, 1989. [Google Scholar]
- 11. Fuente-Tomas Ldela, Arranz B, Safont G, et al. . Classification of patients with bipolar disorder using k-means clustering. PLoS One 2019;14:e0210314 10.1371/journal.pone.0210314 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Thorndike RL. Who belongs in the family? Psychometrika 1953;18:267–76. 10.1007/BF02289263 [DOI] [Google Scholar]