Skip to main content
. 2025 Feb 6;55:e18. doi: 10.1017/S0033291724003295

Table 2.

Studies on AI-assisted diagnosis in mental health

Ref. Subject description Mental health condition Aim AI-based method Models Variables Predictors Results and accuracy Conclusions
Chen et al. (2024) Patients with MDD and healthy controls (n = 156) MDD To detect lifetime diagnosis of MDD and nonremission status Machine learning models combined with natural language processing
  • RF
  • Logistic regression
  • SVC
  • K-Nearest neighbors
  • DT
  • Naive bayes
  • Artificial neural networks
Clinically psychiatric diagnosis and HAMD–17 (cutoff score of 7)
  • Subjective happiness level
  • Actigraphy (physical activity and sleep estimation)
  • Facial expression (inner brow raising, brow lowering, cheek raising, lip corner pulling, lip corner depressing)
  • Voice (articulation rate, pause duration, pause variability, pause rate)
  • Self-reference and negative emotion
  • HADS-D
Artificial neural networks (use all variables as predictors to predict patients with MDD and HAMD–17 > 7)
  • Sensitivity: 0.64
  • Specificity: 0.96
  • PPV: 0.78
  • NPV: 0.91
Artificial neural networks (not include HADS-Depression as predictors to predict patients with MDD)
  • Sensitivity: 0.86
  • Specificity: 0.74
  • PPV: 0.76
  • NPV: 0.85
Naive bayes (not include HADS-D as predictors to predict patients with MDD)
  • Sensitivity: 0.90
  • Specificity: 0.71
  • PPV: 0.74
  • NPV: 0.88
The prediction performance of artificial neural networks was generally more favorable compared to other machine learning methods for both lifetime MDD diagnosis and nonremission (HAMD–17 > 7) with the fusion of all digital variables
Das and Naskar (2024) Patients with depression and controls (DAIC-WOZ dataset, n = 219; MODMA dataset, n = 52) Depression To identify symptoms of depression among individuals from their speech and responses Machine learning models e.g., SVM, CNN, LSTM, DT
PHQ–8 binary and professional judgment Variables from an audio spectrogram DAIC-WOZ dataset
Accuracy: 90.26%
MODMA dataset
Accuracy: 90.47%
A novel deep learning-based approach using audio signals for automatic depression recognition has demonstrated superior detection accuracy compared to existing methods
Maekawa et al. (2024) Individuals with depressive symptoms and healthy controls (n = 35628) Depressive symptoms To identify individuals with depressive symptoms Machine learning algorithms Stochastic gradient descent (evaluated by two different methods of feature selection: Bayesian network or Markov blanket) PHQ–9
  • Age, education, gender, income
  • Postural balance problems, shortness of breath, how old people feel they are, the ability to do usual activities, chest pain, chronic back problems, sleep problems, verbal abuse
Bayesian network: AUC are 0.736, 0.801, 0.809 in three different datasets (use different variables as predictors) Bayesian network feature selection method outperformed Markov blanket selection method
The models have emphasized the importance of the ability to do usual activities, chest pain, and sleep problems as key indicators for detecting depressive symptoms
Yang et al. (2024) Suicidal ideators and suicide attempters (n = 438) Suicide attempts To identify predictors for suicide attempts and suicides Machine learning
  • Logistic regression
  • Penalized regression (elastic net regression)
The number of suicide attempts reported 136 variables in total
  • Sociodemographic characteristics (age, sex, living status, employment status, religion)
  • Clinical information (medical and psychiatric illness, treatment, and previous suicidal thoughts and attempts)
  • Psychopathological evaluation (PHQ–9, BAI, AUDIT, BIS–11, ETI-SF, SRS, SQ for KNHANES-SF, C-SSRS)
Classical logistic regression (136 variables included):
AUC: 0.535
Elastic net regression (136 variables included):
AUC: 0.812
Classical logistic regression (15 variables included):
  • AUC: 0.926
  • Accuracy: 91.2%
Elastic net regression (15 variables included):
  • AUC: 0.912
  • Accuracy: 90.0%
Young age, suicidal ideation, previous suicidal attempts, anxiety, alcohol abuse, stress, and impulsivity were significant predictors of suicide attempts
C Manikis et al. (2023) Women with highly treatable breast cancer (n = 706) Depression, anxiety, overall mental health, and QoL To identify women at risk of poor mental health, declining mental health and declining global QoL following a diagnosis of breast cancer Machine learning algorithms Balanced RF
  • HADS–14
  • European Organisaztion for Research and Treatment of Cancer Quality of life Questionnaire-Cancer
  • Sociodemographics: six variables (month 0), two variables (month 3)
  • Lifestyle: four variables (month 0)
  • Medical: 11 variables
  • Breast cancer and treatment-related: 17 variables
  • Psychosocial characteristics: seven domains
Model A for patients with poor mental health at month 0
Model B for patients with good mental health at month 0
Model C for patients with good QoL at month 0
i: use variables at month 0 and month 3 as predictors
ii: use clinical and biological variables at month 0 with other variables at month 6 as predictors
12-month AUC
  • Model Ai: 0.81
  • Model Aii: 0.78
  • Model Bi: 0.86
  • Model Bii: 0.79
  • Model Ci: 0.77
  • Model Cii: 0.79
The top predictors of adverse mental health and QoL outcomes include common variables in clusters: negative affect, cancer coping responses/self-efficacy to cancer, sense of control/optimism, social support, lifestyle factors, and treatment-related symptoms
Geng et al. (2023) Patients with MDD and healthy controls (n = 80) MDD To optimize initial screening for MDD in both male and female patients Machine learning algorithms
  • SVM
  • ERTC
PHQ–9
  • 24 HRV-related variables (analyzed by 5-min short-term electrocardiogram signals during nighttime sleep stages): time domain and frequency domain
  • Gender
SVM:
  • AUC: 0.853
  • Accuracy: 79.29%
ERTC:
  • AUC: 0.945
  • Accuracy: 86.32%
Through feature importance
analysis, we found that MeanNN, MedianNN, pNN20, and gender were
the most important features
HRV parameters during sleep stages can be used for the identification of MDD patients
Kourou et al. (2023) Women diagnosed with stage I–III breast cancer with a curative treatment intention (n = 600) Symptoms of anxiety and depression To predict adverse mental health outcomes among patients who manifest fairly good initial emotional response to the diagnosis and the prospect of cancer treatments Adaptive machine learning algorithms Balanced RF HADS–14 Sociodemographic, lifestyle, medical variables, and self-reported psychological characteristics were recorded at diagnosis and assessed 3 months after diagnosis Model 1: use all variables at month 0 and month 3 as predictors
Model 2: not include mental health and subjective QoL ratings at months 0 and 3
12-month AUC
  • Model 1: 0.864
  • Model 2: 0.790
The top predictors of adverse mental health and QoL outcomes include common variables in clusters: negative affect, cancer coping responses/self-efficacy to cancer, sense of control/optimism, social support, lifestyle factors, and treatment-related symptoms
Lønfeldt et al. (2023) Adolescents with mild-to-moderate-severe obsessive-compulsive disorders (n = 9) Obsessive-compulsive disorders To detect obsessive-compulsive disorders episodes in the daily lives of adolescents Machine learning models
  • Logistic regression
  • RF
  • Feedforward neural networks
  • Mixed-effect RF
Obsessive-compulsive disorders events marked by participants Blood volume pulse, external skin temperature, and electrodermal activity and heart rate (calculated by blood volume pulse) 10-fold random cross-validation
  • Average accuracy: >70%
  • Recall: 50%
  • Precision: 66%
  • Average AUC: 0.8
Better performance was obtained
when generalizing across time compared to across patients
Generalized temporal models trained on multiple patients outperformed personalized single-patient models
RF and mixed-effect RF models consistently achieved superior accuracy, reaching 70% accuracy in random and participant cross-validation
Adler et al. (2022) Patients with schizophrenia, schizoaffective disorder, or psychosis non-specified in treatment, and university students (n = 109) Mental health symptoms To explore if machine learning models can be trained and validated across multiple mobile sensing longitudinal studies (CrossCheck and StudentLife) to predict mental health symptoms Machine learning algorithms
  • GBRT
  • SVM
EMA Mobile sensing data of sleep quality and stress Improved model performance for predicting sleep:
  • CrossCheck (W = 53,200, p = 0.007, RBC = 0.14)
  • StudentLife (W = 63,089, p < 0.001, RBC = 0.35)
Improved model performance for predicting stress:
  • CrossCheck (W = 55,373, p < 0.001, RBC = 0.18)
Machine learning models trained across longitudinal mobile sensing study datasets generalized and provided a more efficient method to build predictive models of adding what they predicted, e.g., sleep and stress
Chilla et al. (2022) Patients with schizophrenia and healthy controls (n = 234) Schizophrenia To classify schizophrenia and healthy control cohorts using a diverse set of neuroanatomical measures Machine learning
  • k-Nearest Neighbors
  • Logistic regression
  • SVC
  • Linear SVC
  • Nu-SVC
  • Decision trees
  • RF
A structured clinical interview for DSM-IV Disorders-Patient Version
Clinical history, existing medical records, and interviews with significant others (e.g., family members, spouse, children)
MRI imaging on subcortical volumes, cortical volumes, cortical areas, cortical thickness & mean cortical curvature Classification performance was comparable between independent measure sets, with accuracy, sensitivity, and specificity, ranging 70%–73%, 73%–81%, and 57%–61%, respectively
Employing a diverse set of measures (measures were merged and used in Ensemble) resulted in improved accuracy, sensitivity, and specificity, with ranges of 77%–87%, 79%–98%, and 65%–74%, respectively
Subcortical and cortical measures and Ensemble methods achieved better classification performance on people with schizophrenia
Hüfner et al. (2022) Individuals resided in Austria aged ≥16 or resided in Italy aged ≥18 who were confirmed with SARS-CoV–2 infection and were not under hospitalization (n = 2,050) Depression, anxiety, overall mental health and QoL To identify indicators for poor mental health following COVID–19 outpatient management and to identify high-risk individuals
Machine learning algorithm
Machine learning algorithm
RF PHQ– 4
Self-perceived Overall Mental Health and QoL rated with 4-point Likert scale
201 surveyed demographic, socioeconomic, medical history, COVID–19 course, and recovery parameters RMSE of Austria data: 0.15–0.18 and Italy data: 0.21–0.23 Machine learning achieved moderate-to-good performance in mental health risk prediction
Jacobson et al. (2022)
Note: Also included in the monitoring domain.
Users who made queries related to mental health screening tools to the Microsoft Bing search engine between December 1, 2018, and January 31, 2020 (n = 126,060) Suicidal ideation, active suicidal intent To examine the impact and qualities of widely used, freely available online mental health screening on potential benefits, including suicidal ideation and active suicidal intent Machine learning algorithm RF
  • Suicidal ideation and suicidal intent search queries identified by seed keywords
  • Rating of common queries by two independent raters
  • Filtering of a multi- variate regression model
Exposure to online screening tools and past search behaviors
AUC of:
  • Suicidal ideation: 0.58
  • Suicidal intent: 0.60
Websites with referrals to in-person treatments could put persons at greater risk of active suicidal intent. Machine learning’s prediction accuracy of suicidal ideation and intent was moderate
Matsuo et al. (2022) Pregnant women who delivered at ≥35 weeks of gestation (n = 34,710) PPD To develop and validate machine learning models for the prediction of postpartum depression and to compare the predictive accuracy of the machine learning models with conventional logistic regression models Four machine learning algorithms
  • Conventional logistic regression models
  • Ridge regression
  • Elastic net
  • Kernel-based SVM
  • RF
EPDS
  • Maternal baseline (18 variables)
  • Pregnancy-related (four variables)
  • Delivery-related (eight variables)
  • Neonatal (eight variables)
  • Postpartum at two-week postpartum checkup (three variables)
AUC assessing the predictive accuracy:
Model 1 (using variables collected in the first to second trimester):
  • Logistic (0.634)
  • Ridge regression (0.638)
  • Elastic net (0.637)
  • Kernel-based SVM (0.530)
  • RF (0.629)
Model 2 (using variables collected before discharge from hospitals):
  • Logistic (0.626)
  • Ridge regression (0.630)
  • Elastic net (0.628)
  • Kernel-based SVM (0.569)
  • Random Forest (0.613)
Model 3 (using all variables, including the two-week postpartum checkup):
  • Logistic (0.697)
  • Ridge regression (0.702)
  • Elastic net (0.701)
  • Kernel-based SVM (0.642)
  • RF (0.688)
The approach used did not achieve better predictive performance than the conventional logistic regression models
Susai et al. (2022) Participants from NEURAPRO, aged between 13 and 40 who fulfilled one of the criteria for at-risk state defined by CAARMS (n = 158) Psychosis: functioning To investigate the combined predictive ability of blood-based biological markers on functional outcome Machine learning model SVM SOFAS Clinical predictors:
four demographic variables including sex, age, smoking status, BMI, and seven symptom scale scores
Biomarker predictors: ten cytokines; 157 proteomic markers; and ten fatty acid markers
Model based on clinical predictors:
  • Accuracy: 56.4
  • AUC: 0.63
Model based on biomarker predictors:
  • Accuracy: 58.9
  • AUC: 0.62
Model based on clinical and biomarker predictors:
  • Accuracy: 57.5
  • AUC: 0.58
Machine learning model based on clinical and biological data poorly predicted functional outcome in clinical high-risk participants
Andersson et al. (2021) Pregnant women who were 18 years of age or older (n = 4,313) PPD To predict women at risk for depressive symptoms at six weeks postpartum, from clinical, demographic, and psychometric questionnaire data available after childbirth
Machine learning algorithm
Machine learning algorithm
  • Ridge Regression
  • LASSO Regression
  • Gradient Boosting Machines
  • DRF
  • XRT
  • Naive Bayes
  • Stacked Ensembles models
EPDS
  • BP
  • Psychometric data from RS, SOC, and VPSQ
Accuracy based on BP dataset:
  • Ridge Regression: 70%
  • LASSO Regression: 71%
  • DRF: 70%
  • XRT: 72%
  • Gradient Boosting Machines: 70%
  • Stacked Ensembles models: 70%
  • Naive Bayes: 70%
Accuracy based on combined dataset (BP + RS, SOC, and VPSQ):
  • Ridge Regression: 67%
  • LASSO Regression: 70%
  • DRF: 71%
  • XRT: 73%
  • Gradient Boosting Machines: 68%
  • Stacked Ensembles models: 65%
  • Naive Bayes: 69%
All machine learning models had similar performance based solely on BP dataset; there were greater variations in model performance for the combined dataset
Du et al. (2021) College students (n = 30) Depression To design a deep learning-based mental health monitoring scheme to detect depression in college students Deep learning Convolutional neural network model Confirmation with diagnosis of depression based on questionnaires and the bodily feelings EEG signal The model showed a classification accuracy score of 97.54% The proposed deep learning-based mental health monitoring scheme achieved a high accuracy rate in detection of depression using EEG data
Mongan et al. (2021)
  • EU-GEI participants who met clinical high-risk criteria of psychosis at baseline
  • ALSPAC participants who did not report psychotic experiences at age 12
(n = 344)
Psychosis To investigate whether proteomic biomarkers may aid prediction of:
  • Transition on to psychotic disorder in people at high-clinical risk of psychosis
  • Adolescent psychotic experiences in the general population
Machine learning algorithms SVM For the transition to psychotic disorders in the clinical high-risk:
  • CAARMS interview
  • Contact with the clinical team or review of clinical records
For the adolescent psychotic experiences in the general population:
  • Psychosis-Like Symptoms Interview at age 18
Proteomic data from plasma samples For the transition to psychotic disorders in the clinical high-risk:
Model based on clinical and proteomic data:
  • AUC: 0.95
  • PPV: 75.0%
  • NPV: 98.6%
Model based on clinical data:
  • AUC: 0.48
  • PPV: 37.1%
  • NPV: 63.4%
Model based on clinical and proteomic data:
  • AUC: 0.96
  • PPV: 79.0%
  • NPV: 100%
For the adolescent psychotic experiences in the general population:
  • AUC: 0.74
  • PPV: 67.8%
  • NPV: 75.8%
Models based on proteomic data demonstrated excellent predictive performance for the transition to psychotic disorder in clinically high-risk individuals.
Models based on proteomic data at age 12 had fair predictive performance for psychotic experiences at age 18
Tsui et al. (2021) Inpatients and emergency department patients aged 10–75 (n = 45,238) First-time suicide attempt To predict first-time suicide attempts from unstructured (narrative) clinical notes and structured EHR NLP
  • Naive Bayes
  • LASSO regression
  • RF
  • EXGB
ICD–9 and ICD–10 Unstructured data (clinical notes): history and physical examination, progress, and discharge summary notes. Structured data: demographics, diagnosis, healthcare utilization data, and medications AUC of prediction window smaller or equal to 30 days:
  • Full-feature (involved both structured and unstructured data) EXGB: 0.932
  • Structured-feature only EXGB: 0.901
  • Full-feature LASSO: 0.909
  • Full-feature LASSO: 0.884
  • Full-feature Naive Bayes: 0.766
  • Full-feature Random Forest: 0.900
Using both structured and unstructured data resulted in significantly higher accuracy than structured data alone
Maglanoc et al. (2020) Depression patients from outpatient clinics and healthy controls (n = 241) Depression, Anxiety To classify patients and controls, and to predict symptoms for depression and anxiety Machine learning Shrinkage discriminant analysis
  • M.I.N.I.
  • Becks Depression Inventory
  • Becks Anxiety Inventory
Brain components, including cortical macrostructure (thickness, area, gray matter density), white matter diffusion properties, radial diffusivity and resting-state functional magnetic resonance imaging (fMRI) default mode network amplitude
Sex
Age
Classifying patients and controls:
  • AUC: 0.6194
  • Accuracy (proportion of correct classification): 0.6169
  • Sensitivity (ability to correctly detect cases): 0.6991
  • Specificity (ability to correctly detect controls): 0.4292
Predicting depression symptoms:
  • RMSE: 10.72
Predicting anxiety symptoms:
  • RMSE: 8.181
Predicting age:
  • RMSE: 6.764
Machine learning revealed low model performance for discriminating patients from controls and predicting symptoms for depression and anxiety, but had high accuracy for age
Tate et al. (2020) Twins born between 1994 and 1999 (n = 7,638) Mental health problems: parent-rated emotional symptoms, conduct problems, prosocial behavior, hyper- activity/ inattention, and peer relationship problems To investigate if various machine learning techniques outperform logistic regression in predicting mental health problems in mid-adolescence. Machine learning algorithms
  • RF
  • XGBoost
  • Logistic regression
  • Neural network
  • SVM
Strengths and Difficulties Questionnaire Birth information, physical illness, mental health symptoms, environmental factors such as neighborhood and parental income AUC and 95% interval of:
  • Logistic Regression: 0.700 (0.665–0.734)
Compared to:
  • XGBoost: 0.692 (0.660–0.723)
  • Random Forest: 0.739 (0.708–0.769)
  • SVM: 0.736 (0.707–0.765)
  • Neural Network: 0.705 (0.671–0.737)
All models performed with relatively similar accuracy; machine learning algorithms were no more significant statistically than logistic regression
Byun et al. (2019) MDD patients and healthy controls who were matched for age and gender (n = 78) MDD To investigate the feasibility of automated MDD detection based on heart rate variability features Machine learning algorithm SVM-RFE for feature selection
SVM for classification
HAMD–17 Heart rate variability features extracted from electrocardiogram recordings The best AUC of heart rate variability features selection for:
  • SVM-RFE (based on two features): 0.742
  • Statistical filter (based on 5 features): 0.734
The highest accuracy of SVM classifier achieved based on:
  • SVM-RFE (based on two features): 74.4
  • Statistical filter (based on five features): 73.1
SVM-RFE marginally outperformed the statistical filter with fewer number of heart rate variability features required in MDD classification
Ebdrup et al. (2019) Antipsychotic-naive first-episode schizophrenia patients and healthy controls (n = 104) Schizophrenia, schizoaffective psychosis To investigate whether machine learning algorithms on multimodal data can serve as a framework for clinically translating into diagnostic utility Machine learning algorithms
  • Naive Bayes
  • Logistic regression
  • SVM
  • Decision tree
  • RF
  • Auto- sklearn
A structured diagnostic interview to
ensure fulfillment of ICD–10 diagnostic criteria of schizophrenia or schizoaffective psychosis
Four modalities
  1. Neurocognitive: DART, WAIS III, BACS, CANTAB
  2. Electrophysiology: CPTB
  3. Neuroanatomy: MRI scans
  4. Diffusion tensor imaging
Unimodal diagnostic accuracy:
Diagnostic accuracy of cognition
ranged between 60% and 69%
Diagnostic accuracy for electrophysiology, sMRI and DTI ranged between 49% and 56%, and it did not exceed chance accuracy:
‘chance accuracy’ = 56%
[(58/(46 patients +58 healthy controls) × 100%)]
Multimodal diagnostic accuracy:
None of the multimodal analyses with cognition plus any combination of one or more of the remaining modalities (electrophysiology, sMRI, and DTI) showed a significantly higher accuracy than cognition alone: the accuracy ranged between 51% and 68%
Only cognitive data, but no other modality, significantly discriminated patients from healthy controls
No enhanced accuracies were noted by combining cognition with other modalities
Jaroszewski et al. (2019) Koko app users who signed up for the service (n = 39,450) Mental health crisis: suicide (ideation, plan, and attempt), self-harm, eating disorder, physical abuse, unspecified abuse, emotional abuse and otherwise unspecified To develop and evaluate a brief, automated risk assessment and intervention platform designed to increase the use of crisis resources among individuals routed to a digital mental health app who were identified as likely experiencing a mental health crisis Machine learning classifiers Recurrent neural networks with word embeddings A binary classification of “crisis” or “not crisis, “crisis” defined as possibly at risk of serious, imminent physical harm, either through self-inflicted actions or through abuse from a third party Semantic content of posts in real time Performance:
  • AUC: 0.93
  • Sensitivity: 0.64
  • Specificity: 0.98
  • PPV: 0.90
  • NPV: 0.93
  • Accuracy: 0.93
The classifiers demonstrated excellent performance in classifying risk of crisis from real time post, regardless of whether these posts were referring to the writer himself or a third party
Lyu and Zhang (2019) Suicide attempters randomly recruited through the hospital emergency and patient registration system (n = 659) Suicide attempt To establish the prediction model based on the Back Propagation neural network to improve prediction accuracy Artificial Neural Network Back Propagation Neural Network Taken suicide attempt or not Demographic information (such as age, gender, education level, marital status), family history of suicide, mental problem, aspiration strain, health status variables, hopelessness level, impulsivity, anxiety, depression, suicide attitude, negative life events, social support, coping skills, community environment etc. The Back Propagation neural network:
  • Sensitivity: 67.6%
  • Specificity: 93.9%
  • Total coincidence rate: 84.6%
Traditional statistical methods such as multivariate Logistic regression:
  • Sensitivity: 80.2%
  • Specificity: 83.8%
  • Total coincidence rate: 82.2%
Back Propagation neural network prediction model was superior in predicting suicide attempt
Simon et al. (2019) Members of the seven health systems, who had outpatient visits, either to a specialty mental health provider or a general medical provider when a mental health diagnosis was recorded (n = 25,373) Suicide death
Probable suicide attempt
To evaluate how availability of different types of health records data affect the accuracy of machine learning models predicting suicidal behavior
Machine learning models
Machine learning models Logistic regression with penalized LASSO variable selection ICD–9th Revision cause of injury code indicating intentional self-harm (E950–E958) or undetermined intent (E980–E989)
ICD–10th Revision diagnosis of self- inflicted injury (X60–X84) or injury or poisoning with undetermined intent (Y10–Y34)
Historical insurance claims data
Sociodemographic characteristics (race, ethnicity, and neighborhood characteristics)
Past patient-reported outcome questionnaires from electronic health records
Data (diagnoses and questionnaires) recorded during medical visit
Prediction of suicide attempt following mental health visits:
  • AUC of model 1 (limited to data typically available to an insurer or health plan): 0.843
  • AUC of model 4 (reflecting data that might inform predictions in an EHR environment capable of real-time calculation or updating risk scores): 0.850
Prediction of suicide death following mental health visits:
  • AUC of model 1: 0.836
  • AUC of model 4: 0.861
Prediction of suicide attempt following general medical visits:
  • AUC of model 1: 0.836
  • AUC of model 4: 0.853
Prediction of suicide death following general medical visits:
  • AUC of model 1: 0.819
  • AUC of model 4: 0.833
For prediction of suicide attempt following mental health visits, model limited to historical insurance claims data performed approximately as well as model using all available data
For prediction of suicide attempt following general medical visits, addition of data recorded during visits yielded improvement in model accuracy
Carrillo et al. (2018) Patients with treatment-resistant depression (n = 35) Depression To classify patients with depression and healthy control with machine learning algorithm Natural speech algorithm combined with machine learning Gaussian Naive Bayes classifier Quick Inventory of Depressive Symptoms AMT structured interview in which participants were asked to provide specific autobiographical memories in response to specific cue words Mean accuracy of identifying patients with depression from controls was 82.85% The natural speech analysis identified depression from the healthy control
Liang et al. (2018) First-episode patients with schizophrenia, MDD, and demographically matched healthy controls (n = 577) Schizophrenia MDD To investigate the accuracy of neurocognitive graphs in classifying individuals with first-episode schizophrenia and MDD in comparison with healthy controls Machine learning algorithm Graphical LASSO logistic regression
  • Wechsler Adult Intelligence Scale-Revised in China
  • The computerized CANTAB
  • Trail Making Test, parts A and B- Modified
Neurocognitive graphs based on cognitive features including general intelligence, immediate and delayed logical memory, processing speed, visual memory, planning, shifting, and psychosocial functioning Classification accuracy of:
  • First-episode schizophrenia and Healthy control: 73.41%
  • MDD and Healthy control: 67.07%
  • First-episode schizophrenia and MDD: 59.48%
Machine learning algorithm achieved moderate accuracy in classifying first-episode schizophrenia and MDD against healthy controls. Classification accuracy between first-episode schizophrenia and MDD was substantially lower
Xu et al. (2018) Postmenopausal obese or overweight, early-stage breast cancer survivors participating in a weight loss treatment (n = 333) Depression and QOLm To elicit bio-behavioral pathways implicated in obesity and health in breast cancer survivorship Machine learning Bayesian networks
  • SF–36 for QOLm
  • Variable(s) for depression was not mentioned
  • Demographics and lifestyle
  • Clinical factors
  • Cancer treatment
  • Coping
  • Neighborhood
  • Health
  • Health behaviors
Insomnia predict depression with
  • Strength = 0.93
  • Direction = 0.60
  • SE = 1.000 (0.257)
Depression predict QOLm with
  • Strength = 0.95
  • Direction = 0.97
  • SE = −7.029 (1.717)
Sleep impairment predict QOLm with
  • strength = 1.00
  • direction = 0.95
  • SE = −1.060 (0.094)
Higher level of insomnia is associated with higher level of depression
Poor depression and sleep were associated with poorer QOLm
Cook et al. (2016) Adults discharged after self-harm from emergency services or after a short hospitalization (n = 1,453) Suicidal Ideation, Heightened Psychiatric Symptoms Developing and employing a predictive algorithm in a free- text platform (i.e., physician notes in EHRs, texts, and social media) to predict suicidal ideation and heightened psychiatric symptoms Machine learning algorithm NLP Suicidal ideation by the question: “Have you felt that you do not have the will to live?”
Heightened psychiatric symptoms measured by GHQ–12
Structured items (e.g., relating to sleep and well-being)
Responses to one unstructured question, “how do you feel today?”
Suicidal ideation:
  • NLP-based models using unstructured question: PPV: 0.61, Sensitivity: 0.56, Specificity: 0.57
  • Logistic regression prediction models using structured data: PPV: 0.73, Sensitivity: 0.76, Specificity: 0.62
Heightened psychiatric symptoms:
  • NLP-based models using unstructured question: PPV: 0.56, Sensitivity: 0.59, Specificity: 0.60
  • Logistic regression prediction models using structured data: PPV: 0.79, Sensitivity: 0.79, Specificity: 0.85
NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question
Pestian et al. (2016) Suicidal (intervention group) or orthopedic (control group) teenager patients aged 13 to 17 admitted at the emergency department (n = 61) Suicidal ideation To evaluate whether machine learning methods discriminate between conversations of suicidal and non-suicidal individuals NLP SVM
  • C-SSRS
  • SIQ
  • UQ
Language 96.67% accurately matched the gold standard C-SSRS
Machine learning methods accurately distinguished between suicidal and non-suicidal teenagers
Setoyama et al. (2016) Patients with any depressive symptoms (HAMD–17 > 0), including both medicated and medication free (n = 115) Depression and suicidal ideation To create a more objective system evaluating the severity of depression, especially suicidal ideation Machine learning Partial least squares regression model
Logistic regression
Support vector machine
Random Forest
  • HAMD–17
  • PHQ–9
  • Structured interview using M.I.N.I.
Aqueous metabolites in blood plasma Each model on evaluating severity of depression showed a fairly good correlation with either value R2 = 0.24 (PHQ–9) and R2 = 0.263 (HAMD–17)
The three models discriminated depressive patients with or without SI showed true rate > 0.7
Plasma metabolome analysis is a useful tool to evaluate the severity of depression
An algorithm to estimate a grade of SI using only a few metabolites was successfully created
Schnack et al. (2014) Schizophrenia patients, bipolar disorder patients, and healthy controls selected from database (n = 334) Schizophrenia and bipolar disorder To classify patients with schizophrenia, bipolar disorder, and healthy controls on the basis of their structural MRI scans Machine learning algorithms Three SVM:
  • M(sz-hc) to separate schizophrenia from healthy controls
  • M(bp-sz) to separate bipolar from schizophrenia
  • M(bp-hc) to separate bipolar from healthy controls
DSM-IV criteria for schizophrenia
DSM-IV criteria for bipolar disorder
Gray matter density M(sz-hc):
  • Average accuracy rate 90.1%
  • 92.4% of schizophrenia and 87.9% of healthy controls correctly classified
M(bp-sz):
  • Average accuracy rate 87.9%
  • 86.4% of bipolar and 89.4% of schizophrenia correctly classified
M(bp-hc):
  • Average accuracy rate 59.8%
  • 53.0% of bipolar and 66.7% of healthy controls correctly classified
Models based on gray matter density separated schizophrenia patients from healthy controls and bipolar disorder patients with high accuracy rate, and separated bipolar disorder from healthy control with much lower accuracy rate
Marquand, Mourão-Miranda, Brammer, Cleare, & Fu, (2008) Patients meeting criteria for major depression and in an acute episode of moderate severity with a minimum score of 18 on the 17-item HRSD
Healthy controls with no history of psychiatric disorder, neurological disorder or head injury resulting in a loss of consciousness, and an HRSD score < 7 (n = 40)
Depression To examine the sensitivity and specificity of the diagnosis of depression achieved with the neural correlates of verbal working memory Machine learning algorithms SVM
  • DSM-IV criteria for major depression
  • HRSD
fMRI data Accuracy of 68% with sensitivity of 65% and specificity of 70% with the blood oxygenation level-dependent convolution model at the mid-level of difficulty, which corresponded to a distributed network of cerebral regions involved in verbal working memory Functional neuroanatomy of verbal working memory provides a statistically significant but clinically moderate contribution as a diagnostic biomarker for depression

Abbreviations: AUDIT: Alcohol Use Disorder Identification Test; ALSPAC: Avon Longitudinal Study of Parents and Children; AMT: Autobiographical memory test; AUROC and AUC: Area under the receiver operating characteristic curve; BACS: Brief Assessment of Cognition in Schizophrenia; BAI: Beck Anxiety Inventory; BIS-11: Barratt Impulsiveness Scale-11; BP: Background, medical history, and pregnancy/delivery variables; CAARMS: Comprehensive Assessment of At-Risk Mental State; CANTAB: Cambridge Neuropsychological Test Automated Battery; CNN: Convolutional Neural Network; CPTB: Copenhagen Psychophysiology Test Battery; C-SSRS: Columbia Suicide Severity Rating Scale; DART: Danish Adult Reading Test; DRF: Distributed Random Forests; DSM-IV: Diagnostic and Statistical Manual of Mental Disorder-IV; DT: Decision tree; DTI: Diffusion tensor imaging; EEG: electroencephalogram; EHR: Electronic Health Record; EMA: Ecological momentary assessment; EPDS: Edinburgh Postnatal Depression Scale; ERTC: Extremely randomized trees classifier; ETI-SF: Early Trauma Inventory-Short Form; EU-GEI: European Network of National Schizophrenia Networks Studying Gene–Environment Interactions Multimodal diagnostic accuracy; EXGB: Ensemble of extreme gradient boosting; fMRI: Functional magnetic resonance imaging; GBRT: Gradient Boosting Regression Trees; GHQ: General Health Questionnaire; HADS: Hospital Anxiety and Depression Scale; HAMD and HRSD: Hamilton Rating Scale for Depression; ICD: International Classification of Diseases; Koko: An online peer-to-peer crowdsourcing platform that teaches users cognitive reappraisal strategies that they use to help other users manage negative emotions; LASSO: Least absolute shrinkage and selection operator; LSTM: Long Short-Term Memory; MDD: Major Depressive Disorder; M.I.N.I.: Mini-International Neuropsychiatric Interview; MRI: Magnetic resonance imaging; p: p-value; NEURAPRO: A clinical trial conducted between March 2010 and the end of September 2014, tested the potential preventive role of omega-3 fatty acids in clinical high-risk participants; NLP: Natural Language Processing; NPV: Negative predictive value; PHQ: Patient Health Questionnaire; PPD: Postpartum depression; PPV: Positive Predictive Value; QoL: Quality of life; QOLm: Mental quality of life; RBC: Rank-biserial correlation; RF: Random Forest; RMSE: Root mean square error; RS: Resilience-14; SOC: Sense of Coherence-29; VPSQ: Vulnerable Personality Scale Questionnaire; SE: Regression coefficients; SF-36: 36-Item Short Form Survey; SIQ: Suicidal Ideation Questionnaire; sMRI: Structural magnetic resonance imaging; SOFAS: Social and Occupational Functional Assessment Score; SQ for KNHANES-SF: Stress Questionnaire for Korean National Health and Nutrition Examination Survey-Short Form; SVC and SVM: Support Vector Machine; SVM-RFE: Support Vector Machine Recursive Feature Elimination; UQ: Ubiquitous Questionnaire; WAIS III: Wechsler Adult Intelligence Scale® – Third Edition; W: Wilcoxon signed-rank test (one-sided) statistics; XRT: Extreme randomized forest.