. 2025 Feb 6;55:e18. doi: 10.1017/S0033291724003295

Table 2.

Studies on AI-assisted diagnosis in mental health

Ref.	Subject description	Mental health condition	Aim	AI-based method	Models	Variables	Predictors	Results and accuracy	Conclusions
Chen et al. (2024)	Patients with MDD and healthy controls (n = 156)	MDD	To detect lifetime diagnosis of MDD and nonremission status	Machine learning models combined with natural language processing	RF Logistic regression SVC K-Nearest neighbors DT Naive bayes Artificial neural networks	Clinically psychiatric diagnosis and HAMD–17 (cutoff score of 7)	Subjective happiness level Actigraphy (physical activity and sleep estimation) Facial expression (inner brow raising, brow lowering, cheek raising, lip corner pulling, lip corner depressing) Voice (articulation rate, pause duration, pause variability, pause rate) Self-reference and negative emotion HADS-D	Artificial neural networks (use all variables as predictors to predict patients with MDD and HAMD–17 > 7) Sensitivity: 0.64 Specificity: 0.96 PPV: 0.78 NPV: 0.91 Artificial neural networks (not include HADS-Depression as predictors to predict patients with MDD) Sensitivity: 0.86 Specificity: 0.74 PPV: 0.76 NPV: 0.85 Naive bayes (not include HADS-D as predictors to predict patients with MDD) Sensitivity: 0.90 Specificity: 0.71 PPV: 0.74 NPV: 0.88	The prediction performance of artificial neural networks was generally more favorable compared to other machine learning methods for both lifetime MDD diagnosis and nonremission (HAMD–17 > 7) with the fusion of all digital variables
Das and Naskar (2024)	Patients with depression and controls (DAIC-WOZ dataset, n = 219; MODMA dataset, n = 52)	Depression	To identify symptoms of depression among individuals from their speech and responses	Machine learning models	e.g., SVM, CNN, LSTM, DT	PHQ–8 binary and professional judgment	Variables from an audio spectrogram	DAIC-WOZ dataset Accuracy: 90.26% MODMA dataset Accuracy: 90.47%	A novel deep learning-based approach using audio signals for automatic depression recognition has demonstrated superior detection accuracy compared to existing methods
Maekawa et al. (2024)	Individuals with depressive symptoms and healthy controls (n = 35628)	Depressive symptoms	To identify individuals with depressive symptoms	Machine learning algorithms	Stochastic gradient descent (evaluated by two different methods of feature selection: Bayesian network or Markov blanket)	PHQ–9	Age, education, gender, income Postural balance problems, shortness of breath, how old people feel they are, the ability to do usual activities, chest pain, chronic back problems, sleep problems, verbal abuse	Bayesian network: AUC are 0.736, 0.801, 0.809 in three different datasets (use different variables as predictors)	Bayesian network feature selection method outperformed Markov blanket selection method The models have emphasized the importance of the ability to do usual activities, chest pain, and sleep problems as key indicators for detecting depressive symptoms
Yang et al. (2024)	Suicidal ideators and suicide attempters (n = 438)	Suicide attempts	To identify predictors for suicide attempts and suicides	Machine learning	Logistic regression Penalized regression (elastic net regression)	The number of suicide attempts reported	136 variables in total Sociodemographic characteristics (age, sex, living status, employment status, religion) Clinical information (medical and psychiatric illness, treatment, and previous suicidal thoughts and attempts) Psychopathological evaluation (PHQ–9, BAI, AUDIT, BIS–11, ETI-SF, SRS, SQ for KNHANES-SF, C-SSRS)	Classical logistic regression (136 variables included): AUC: 0.535 Elastic net regression (136 variables included): AUC: 0.812 Classical logistic regression (15 variables included): AUC: 0.926 Accuracy: 91.2% Elastic net regression (15 variables included): AUC: 0.912 Accuracy: 90.0%	Young age, suicidal ideation, previous suicidal attempts, anxiety, alcohol abuse, stress, and impulsivity were significant predictors of suicide attempts
C Manikis et al. (2023)	Women with highly treatable breast cancer (n = 706)	Depression, anxiety, overall mental health, and QoL	To identify women at risk of poor mental health, declining mental health and declining global QoL following a diagnosis of breast cancer	Machine learning algorithms	Balanced RF	HADS–14 European Organisaztion for Research and Treatment of Cancer Quality of life Questionnaire-Cancer	Sociodemographics: six variables (month 0), two variables (month 3) Lifestyle: four variables (month 0) Medical: 11 variables Breast cancer and treatment-related: 17 variables Psychosocial characteristics: seven domains	Model A for patients with poor mental health at month 0 Model B for patients with good mental health at month 0 Model C for patients with good QoL at month 0 i: use variables at month 0 and month 3 as predictors ii: use clinical and biological variables at month 0 with other variables at month 6 as predictors 12-month AUC Model Ai: 0.81 Model Aii: 0.78 Model Bi: 0.86 Model Bii: 0.79 Model Ci: 0.77 Model Cii: 0.79	The top predictors of adverse mental health and QoL outcomes include common variables in clusters: negative affect, cancer coping responses/self-efficacy to cancer, sense of control/optimism, social support, lifestyle factors, and treatment-related symptoms
Geng et al. (2023)	Patients with MDD and healthy controls (n = 80)	MDD	To optimize initial screening for MDD in both male and female patients	Machine learning algorithms	SVM ERTC	PHQ–9	24 HRV-related variables (analyzed by 5-min short-term electrocardiogram signals during nighttime sleep stages): time domain and frequency domain Gender	SVM: AUC: 0.853 Accuracy: 79.29% ERTC: AUC: 0.945 Accuracy: 86.32%	Through feature importance analysis, we found that MeanNN, MedianNN, pNN20, and gender were the most important features HRV parameters during sleep stages can be used for the identification of MDD patients
Kourou et al. (2023)	Women diagnosed with stage I–III breast cancer with a curative treatment intention (n = 600)	Symptoms of anxiety and depression	To predict adverse mental health outcomes among patients who manifest fairly good initial emotional response to the diagnosis and the prospect of cancer treatments	Adaptive machine learning algorithms	Balanced RF	HADS–14	Sociodemographic, lifestyle, medical variables, and self-reported psychological characteristics were recorded at diagnosis and assessed 3 months after diagnosis	Model 1: use all variables at month 0 and month 3 as predictors Model 2: not include mental health and subjective QoL ratings at months 0 and 3 12-month AUC Model 1: 0.864 Model 2: 0.790	The top predictors of adverse mental health and QoL outcomes include common variables in clusters: negative affect, cancer coping responses/self-efficacy to cancer, sense of control/optimism, social support, lifestyle factors, and treatment-related symptoms
Lønfeldt et al. (2023)	Adolescents with mild-to-moderate-severe obsessive-compulsive disorders (n = 9)	Obsessive-compulsive disorders	To detect obsessive-compulsive disorders episodes in the daily lives of adolescents	Machine learning models	Logistic regression RF Feedforward neural networks Mixed-effect RF	Obsessive-compulsive disorders events marked by participants	Blood volume pulse, external skin temperature, and electrodermal activity and heart rate (calculated by blood volume pulse)	10-fold random cross-validation Average accuracy: >70% Recall: 50% Precision: 66% Average AUC: 0.8	Better performance was obtained when generalizing across time compared to across patients Generalized temporal models trained on multiple patients outperformed personalized single-patient models RF and mixed-effect RF models consistently achieved superior accuracy, reaching 70% accuracy in random and participant cross-validation
Adler et al. (2022)	Patients with schizophrenia, schizoaffective disorder, or psychosis non-specified in treatment, and university students (n = 109)	Mental health symptoms	To explore if machine learning models can be trained and validated across multiple mobile sensing longitudinal studies (CrossCheck and StudentLife) to predict mental health symptoms	Machine learning algorithms	GBRT SVM	EMA	Mobile sensing data of sleep quality and stress	Improved model performance for predicting sleep: CrossCheck (W = 53,200, p = 0.007, RBC = 0.14) StudentLife (W = 63,089, p < 0.001, RBC = 0.35) Improved model performance for predicting stress: CrossCheck (W = 55,373, p < 0.001, RBC = 0.18)	Machine learning models trained across longitudinal mobile sensing study datasets generalized and provided a more efficient method to build predictive models of adding what they predicted, e.g., sleep and stress
Chilla et al. (2022)	Patients with schizophrenia and healthy controls (n = 234)	Schizophrenia	To classify schizophrenia and healthy control cohorts using a diverse set of neuroanatomical measures	Machine learning	k-Nearest Neighbors Logistic regression SVC Linear SVC Nu-SVC Decision trees RF	A structured clinical interview for DSM-IV Disorders-Patient Version Clinical history, existing medical records, and interviews with significant others (e.g., family members, spouse, children)	MRI imaging on subcortical volumes, cortical volumes, cortical areas, cortical thickness & mean cortical curvature	Classification performance was comparable between independent measure sets, with accuracy, sensitivity, and specificity, ranging 70%–73%, 73%–81%, and 57%–61%, respectively Employing a diverse set of measures (measures were merged and used in Ensemble) resulted in improved accuracy, sensitivity, and specificity, with ranges of 77%–87%, 79%–98%, and 65%–74%, respectively	Subcortical and cortical measures and Ensemble methods achieved better classification performance on people with schizophrenia
Hüfner et al. (2022)	Individuals resided in Austria aged ≥16 or resided in Italy aged ≥18 who were confirmed with SARS-CoV–2 infection and were not under hospitalization (n = 2,050)	Depression, anxiety, overall mental health and QoL	To identify indicators for poor mental health following COVID–19 outpatient management and to identify high-risk individuals Machine learning algorithm	Machine learning algorithm	RF	PHQ– 4 Self-perceived Overall Mental Health and QoL rated with 4-point Likert scale	201 surveyed demographic, socioeconomic, medical history, COVID–19 course, and recovery parameters	RMSE of Austria data: 0.15–0.18 and Italy data: 0.21–0.23	Machine learning achieved moderate-to-good performance in mental health risk prediction
Jacobson et al. (2022) Note: Also included in the monitoring domain.	Users who made queries related to mental health screening tools to the Microsoft Bing search engine between December 1, 2018, and January 31, 2020 (n = 126,060)	Suicidal ideation, active suicidal intent	To examine the impact and qualities of widely used, freely available online mental health screening on potential benefits, including suicidal ideation and active suicidal intent	Machine learning algorithm	RF	Suicidal ideation and suicidal intent search queries identified by seed keywords Rating of common queries by two independent raters Filtering of a multi- variate regression model	Exposure to online screening tools and past search behaviors	AUC of: Suicidal ideation: 0.58 Suicidal intent: 0.60	Websites with referrals to in-person treatments could put persons at greater risk of active suicidal intent. Machine learning’s prediction accuracy of suicidal ideation and intent was moderate
Matsuo et al. (2022)	Pregnant women who delivered at ≥35 weeks of gestation (n = 34,710)	PPD	To develop and validate machine learning models for the prediction of postpartum depression and to compare the predictive accuracy of the machine learning models with conventional logistic regression models	Four machine learning algorithms	Conventional logistic regression models Ridge regression Elastic net Kernel-based SVM RF	EPDS	Maternal baseline (18 variables) Pregnancy-related (four variables) Delivery-related (eight variables) Neonatal (eight variables) Postpartum at two-week postpartum checkup (three variables)	AUC assessing the predictive accuracy: Model 1 (using variables collected in the first to second trimester): Logistic (0.634) Ridge regression (0.638) Elastic net (0.637) Kernel-based SVM (0.530) RF (0.629) Model 2 (using variables collected before discharge from hospitals): Logistic (0.626) Ridge regression (0.630) Elastic net (0.628) Kernel-based SVM (0.569) Random Forest (0.613) Model 3 (using all variables, including the two-week postpartum checkup): Logistic (0.697) Ridge regression (0.702) Elastic net (0.701) Kernel-based SVM (0.642) RF (0.688)	The approach used did not achieve better predictive performance than the conventional logistic regression models
Susai et al. (2022)	Participants from NEURAPRO, aged between 13 and 40 who fulfilled one of the criteria for at-risk state defined by CAARMS (n = 158)	Psychosis: functioning	To investigate the combined predictive ability of blood-based biological markers on functional outcome	Machine learning model	SVM	SOFAS	Clinical predictors: four demographic variables including sex, age, smoking status, BMI, and seven symptom scale scores Biomarker predictors: ten cytokines; 157 proteomic markers; and ten fatty acid markers	Model based on clinical predictors: Accuracy: 56.4 AUC: 0.63 Model based on biomarker predictors: Accuracy: 58.9 AUC: 0.62 Model based on clinical and biomarker predictors: Accuracy: 57.5 AUC: 0.58	Machine learning model based on clinical and biological data poorly predicted functional outcome in clinical high-risk participants
Andersson et al. (2021)	Pregnant women who were 18 years of age or older (n = 4,313)	PPD	To predict women at risk for depressive symptoms at six weeks postpartum, from clinical, demographic, and psychometric questionnaire data available after childbirth Machine learning algorithm	Machine learning algorithm	Ridge Regression LASSO Regression Gradient Boosting Machines DRF XRT Naive Bayes Stacked Ensembles models	EPDS	BP Psychometric data from RS, SOC, and VPSQ	Accuracy based on BP dataset: Ridge Regression: 70% LASSO Regression: 71% DRF: 70% XRT: 72% Gradient Boosting Machines: 70% Stacked Ensembles models: 70% Naive Bayes: 70% Accuracy based on combined dataset (BP + RS, SOC, and VPSQ): Ridge Regression: 67% LASSO Regression: 70% DRF: 71% XRT: 73% Gradient Boosting Machines: 68% Stacked Ensembles models: 65% Naive Bayes: 69%	All machine learning models had similar performance based solely on BP dataset; there were greater variations in model performance for the combined dataset
Du et al. (2021)	College students (n = 30)	Depression	To design a deep learning-based mental health monitoring scheme to detect depression in college students	Deep learning	Convolutional neural network model	Confirmation with diagnosis of depression based on questionnaires and the bodily feelings	EEG signal	The model showed a classification accuracy score of 97.54%	The proposed deep learning-based mental health monitoring scheme achieved a high accuracy rate in detection of depression using EEG data
Mongan et al. (2021)	EU-GEI participants who met clinical high-risk criteria of psychosis at baseline ALSPAC participants who did not report psychotic experiences at age 12 (n = 344)	Psychosis	To investigate whether proteomic biomarkers may aid prediction of: Transition on to psychotic disorder in people at high-clinical risk of psychosis Adolescent psychotic experiences in the general population	Machine learning algorithms	SVM	For the transition to psychotic disorders in the clinical high-risk: CAARMS interview Contact with the clinical team or review of clinical records For the adolescent psychotic experiences in the general population: Psychosis-Like Symptoms Interview at age 18	Proteomic data from plasma samples	For the transition to psychotic disorders in the clinical high-risk: Model based on clinical and proteomic data: AUC: 0.95 PPV: 75.0% NPV: 98.6% Model based on clinical data: AUC: 0.48 PPV: 37.1% NPV: 63.4% Model based on clinical and proteomic data: AUC: 0.96 PPV: 79.0% NPV: 100% For the adolescent psychotic experiences in the general population: AUC: 0.74 PPV: 67.8% NPV: 75.8%	Models based on proteomic data demonstrated excellent predictive performance for the transition to psychotic disorder in clinically high-risk individuals. Models based on proteomic data at age 12 had fair predictive performance for psychotic experiences at age 18
Tsui et al. (2021)	Inpatients and emergency department patients aged 10–75 (n = 45,238)	First-time suicide attempt	To predict first-time suicide attempts from unstructured (narrative) clinical notes and structured EHR	NLP	Naive Bayes LASSO regression RF EXGB	ICD–9 and ICD–10	Unstructured data (clinical notes): history and physical examination, progress, and discharge summary notes. Structured data: demographics, diagnosis, healthcare utilization data, and medications	AUC of prediction window smaller or equal to 30 days: Full-feature (involved both structured and unstructured data) EXGB: 0.932 Structured-feature only EXGB: 0.901 Full-feature LASSO: 0.909 Full-feature LASSO: 0.884 Full-feature Naive Bayes: 0.766 Full-feature Random Forest: 0.900	Using both structured and unstructured data resulted in significantly higher accuracy than structured data alone
Maglanoc et al. (2020)	Depression patients from outpatient clinics and healthy controls (n = 241)	Depression, Anxiety	To classify patients and controls, and to predict symptoms for depression and anxiety	Machine learning	Shrinkage discriminant analysis	M.I.N.I. Becks Depression Inventory Becks Anxiety Inventory	Brain components, including cortical macrostructure (thickness, area, gray matter density), white matter diffusion properties, radial diffusivity and resting-state functional magnetic resonance imaging (fMRI) default mode network amplitude Sex Age	Classifying patients and controls: AUC: 0.6194 Accuracy (proportion of correct classification): 0.6169 Sensitivity (ability to correctly detect cases): 0.6991 Specificity (ability to correctly detect controls): 0.4292 Predicting depression symptoms: RMSE: 10.72 Predicting anxiety symptoms: RMSE: 8.181 Predicting age: RMSE: 6.764	Machine learning revealed low model performance for discriminating patients from controls and predicting symptoms for depression and anxiety, but had high accuracy for age
Tate et al. (2020)	Twins born between 1994 and 1999 (n = 7,638)	Mental health problems: parent-rated emotional symptoms, conduct problems, prosocial behavior, hyper- activity/ inattention, and peer relationship problems	To investigate if various machine learning techniques outperform logistic regression in predicting mental health problems in mid-adolescence.	Machine learning algorithms	RF XGBoost Logistic regression Neural network SVM	Strengths and Difficulties Questionnaire	Birth information, physical illness, mental health symptoms, environmental factors such as neighborhood and parental income	AUC and 95% interval of: Logistic Regression: 0.700 (0.665–0.734) Compared to: XGBoost: 0.692 (0.660–0.723) Random Forest: 0.739 (0.708–0.769) SVM: 0.736 (0.707–0.765) Neural Network: 0.705 (0.671–0.737)	All models performed with relatively similar accuracy; machine learning algorithms were no more significant statistically than logistic regression
Byun et al. (2019)	MDD patients and healthy controls who were matched for age and gender (n = 78)	MDD	To investigate the feasibility of automated MDD detection based on heart rate variability features	Machine learning algorithm	SVM-RFE for feature selection SVM for classification	HAMD–17	Heart rate variability features extracted from electrocardiogram recordings	The best AUC of heart rate variability features selection for: SVM-RFE (based on two features): 0.742 Statistical filter (based on 5 features): 0.734 The highest accuracy of SVM classifier achieved based on: SVM-RFE (based on two features): 74.4 Statistical filter (based on five features): 73.1	SVM-RFE marginally outperformed the statistical filter with fewer number of heart rate variability features required in MDD classification
Ebdrup et al. (2019)	Antipsychotic-naive first-episode schizophrenia patients and healthy controls (n = 104)	Schizophrenia, schizoaffective psychosis	To investigate whether machine learning algorithms on multimodal data can serve as a framework for clinically translating into diagnostic utility	Machine learning algorithms	Naive Bayes Logistic regression SVM Decision tree RF Auto- sklearn	A structured diagnostic interview to ensure fulfillment of ICD–10 diagnostic criteria of schizophrenia or schizoaffective psychosis	Four modalities Neurocognitive: DART, WAIS III, BACS, CANTAB Electrophysiology: CPTB Neuroanatomy: MRI scans Diffusion tensor imaging	Unimodal diagnostic accuracy: Diagnostic accuracy of cognition ranged between 60% and 69% Diagnostic accuracy for electrophysiology, sMRI and DTI ranged between 49% and 56%, and it did not exceed chance accuracy: ‘chance accuracy’ = 56% [(58/(46 patients +58 healthy controls) × 100%)] Multimodal diagnostic accuracy: None of the multimodal analyses with cognition plus any combination of one or more of the remaining modalities (electrophysiology, sMRI, and DTI) showed a significantly higher accuracy than cognition alone: the accuracy ranged between 51% and 68%	Only cognitive data, but no other modality, significantly discriminated patients from healthy controls No enhanced accuracies were noted by combining cognition with other modalities
Jaroszewski et al. (2019)	Koko app users who signed up for the service (n = 39,450)	Mental health crisis: suicide (ideation, plan, and attempt), self-harm, eating disorder, physical abuse, unspecified abuse, emotional abuse and otherwise unspecified	To develop and evaluate a brief, automated risk assessment and intervention platform designed to increase the use of crisis resources among individuals routed to a digital mental health app who were identified as likely experiencing a mental health crisis	Machine learning classifiers	Recurrent neural networks with word embeddings	A binary classification of “crisis” or “not crisis, “crisis” defined as possibly at risk of serious, imminent physical harm, either through self-inflicted actions or through abuse from a third party	Semantic content of posts in real time	Performance: AUC: 0.93 Sensitivity: 0.64 Specificity: 0.98 PPV: 0.90 NPV: 0.93 Accuracy: 0.93	The classifiers demonstrated excellent performance in classifying risk of crisis from real time post, regardless of whether these posts were referring to the writer himself or a third party
Lyu and Zhang (2019)	Suicide attempters randomly recruited through the hospital emergency and patient registration system (n = 659)	Suicide attempt	To establish the prediction model based on the Back Propagation neural network to improve prediction accuracy	Artificial Neural Network	Back Propagation Neural Network	Taken suicide attempt or not	Demographic information (such as age, gender, education level, marital status), family history of suicide, mental problem, aspiration strain, health status variables, hopelessness level, impulsivity, anxiety, depression, suicide attitude, negative life events, social support, coping skills, community environment etc.	The Back Propagation neural network: Sensitivity: 67.6% Specificity: 93.9% Total coincidence rate: 84.6% Traditional statistical methods such as multivariate Logistic regression: Sensitivity: 80.2% Specificity: 83.8% Total coincidence rate: 82.2%	Back Propagation neural network prediction model was superior in predicting suicide attempt
Simon et al. (2019)	Members of the seven health systems, who had outpatient visits, either to a specialty mental health provider or a general medical provider when a mental health diagnosis was recorded (n = 25,373)	Suicide death Probable suicide attempt	To evaluate how availability of different types of health records data affect the accuracy of machine learning models predicting suicidal behavior Machine learning models	Machine learning models	Logistic regression with penalized LASSO variable selection	ICD–9th Revision cause of injury code indicating intentional self-harm (E950–E958) or undetermined intent (E980–E989) ICD–10th Revision diagnosis of self- inflicted injury (X60–X84) or injury or poisoning with undetermined intent (Y10–Y34)	Historical insurance claims data Sociodemographic characteristics (race, ethnicity, and neighborhood characteristics) Past patient-reported outcome questionnaires from electronic health records Data (diagnoses and questionnaires) recorded during medical visit	Prediction of suicide attempt following mental health visits: AUC of model 1 (limited to data typically available to an insurer or health plan): 0.843 AUC of model 4 (reflecting data that might inform predictions in an EHR environment capable of real-time calculation or updating risk scores): 0.850 Prediction of suicide death following mental health visits: AUC of model 1: 0.836 AUC of model 4: 0.861 Prediction of suicide attempt following general medical visits: AUC of model 1: 0.836 AUC of model 4: 0.853 Prediction of suicide death following general medical visits: AUC of model 1: 0.819 AUC of model 4: 0.833	For prediction of suicide attempt following mental health visits, model limited to historical insurance claims data performed approximately as well as model using all available data For prediction of suicide attempt following general medical visits, addition of data recorded during visits yielded improvement in model accuracy
Carrillo et al. (2018)	Patients with treatment-resistant depression (n = 35)	Depression	To classify patients with depression and healthy control with machine learning algorithm	Natural speech algorithm combined with machine learning	Gaussian Naive Bayes classifier	Quick Inventory of Depressive Symptoms	AMT structured interview in which participants were asked to provide specific autobiographical memories in response to specific cue words	Mean accuracy of identifying patients with depression from controls was 82.85%	The natural speech analysis identified depression from the healthy control
Liang et al. (2018)	First-episode patients with schizophrenia, MDD, and demographically matched healthy controls (n = 577)	Schizophrenia MDD	To investigate the accuracy of neurocognitive graphs in classifying individuals with first-episode schizophrenia and MDD in comparison with healthy controls	Machine learning algorithm	Graphical LASSO logistic regression	Wechsler Adult Intelligence Scale-Revised in China The computerized CANTAB Trail Making Test, parts A and B- Modified	Neurocognitive graphs based on cognitive features including general intelligence, immediate and delayed logical memory, processing speed, visual memory, planning, shifting, and psychosocial functioning	Classification accuracy of: First-episode schizophrenia and Healthy control: 73.41% MDD and Healthy control: 67.07% First-episode schizophrenia and MDD: 59.48%	Machine learning algorithm achieved moderate accuracy in classifying first-episode schizophrenia and MDD against healthy controls. Classification accuracy between first-episode schizophrenia and MDD was substantially lower
Xu et al. (2018)	Postmenopausal obese or overweight, early-stage breast cancer survivors participating in a weight loss treatment (n = 333)	Depression and QOLm	To elicit bio-behavioral pathways implicated in obesity and health in breast cancer survivorship	Machine learning	Bayesian networks	SF–36 for QOLm Variable(s) for depression was not mentioned	Demographics and lifestyle Clinical factors Cancer treatment Coping Neighborhood Health Health behaviors	Insomnia predict depression with Strength = 0.93 Direction = 0.60 SE = 1.000 (0.257) Depression predict QOLm with Strength = 0.95 Direction = 0.97 SE = −7.029 (1.717) Sleep impairment predict QOLm with strength = 1.00 direction = 0.95 SE = −1.060 (0.094)	Higher level of insomnia is associated with higher level of depression Poor depression and sleep were associated with poorer QOLm
Cook et al. (2016)	Adults discharged after self-harm from emergency services or after a short hospitalization (n = 1,453)	Suicidal Ideation, Heightened Psychiatric Symptoms	Developing and employing a predictive algorithm in a free- text platform (i.e., physician notes in EHRs, texts, and social media) to predict suicidal ideation and heightened psychiatric symptoms	Machine learning algorithm	NLP	Suicidal ideation by the question: “Have you felt that you do not have the will to live?” Heightened psychiatric symptoms measured by GHQ–12	Structured items (e.g., relating to sleep and well-being) Responses to one unstructured question, “how do you feel today?”	Suicidal ideation: NLP-based models using unstructured question: PPV: 0.61, Sensitivity: 0.56, Specificity: 0.57 Logistic regression prediction models using structured data: PPV: 0.73, Sensitivity: 0.76, Specificity: 0.62 Heightened psychiatric symptoms: NLP-based models using unstructured question: PPV: 0.56, Sensitivity: 0.59, Specificity: 0.60 Logistic regression prediction models using structured data: PPV: 0.79, Sensitivity: 0.79, Specificity: 0.85	NLP-based models were able to generate relatively high predictive values based solely on responses to a simple general mood question
Pestian et al. (2016)	Suicidal (intervention group) or orthopedic (control group) teenager patients aged 13 to 17 admitted at the emergency department (n = 61)	Suicidal ideation	To evaluate whether machine learning methods discriminate between conversations of suicidal and non-suicidal individuals	NLP	SVM	C-SSRS SIQ UQ	Language	96.67% accurately matched the gold standard C-SSRS	Machine learning methods accurately distinguished between suicidal and non-suicidal teenagers
Setoyama et al. (2016)	Patients with any depressive symptoms (HAMD–17 > 0), including both medicated and medication free (n = 115)	Depression and suicidal ideation	To create a more objective system evaluating the severity of depression, especially suicidal ideation	Machine learning	Partial least squares regression model Logistic regression Support vector machine Random Forest	HAMD–17 PHQ–9 Structured interview using M.I.N.I.	Aqueous metabolites in blood plasma	Each model on evaluating severity of depression showed a fairly good correlation with either value R2 = 0.24 (PHQ–9) and R2 = 0.263 (HAMD–17) The three models discriminated depressive patients with or without SI showed true rate > 0.7	Plasma metabolome analysis is a useful tool to evaluate the severity of depression An algorithm to estimate a grade of SI using only a few metabolites was successfully created
Schnack et al. (2014)	Schizophrenia patients, bipolar disorder patients, and healthy controls selected from database (n = 334)	Schizophrenia and bipolar disorder	To classify patients with schizophrenia, bipolar disorder, and healthy controls on the basis of their structural MRI scans	Machine learning algorithms	Three SVM: M(sz-hc) to separate schizophrenia from healthy controls M(bp-sz) to separate bipolar from schizophrenia M(bp-hc) to separate bipolar from healthy controls	DSM-IV criteria for schizophrenia DSM-IV criteria for bipolar disorder	Gray matter density	M(sz-hc): Average accuracy rate 90.1% 92.4% of schizophrenia and 87.9% of healthy controls correctly classified M(bp-sz): Average accuracy rate 87.9% 86.4% of bipolar and 89.4% of schizophrenia correctly classified M(bp-hc): Average accuracy rate 59.8% 53.0% of bipolar and 66.7% of healthy controls correctly classified	Models based on gray matter density separated schizophrenia patients from healthy controls and bipolar disorder patients with high accuracy rate, and separated bipolar disorder from healthy control with much lower accuracy rate
Marquand, Mourão-Miranda, Brammer, Cleare, & Fu, (2008)	Patients meeting criteria for major depression and in an acute episode of moderate severity with a minimum score of 18 on the 17-item HRSD Healthy controls with no history of psychiatric disorder, neurological disorder or head injury resulting in a loss of consciousness, and an HRSD score < 7 (n = 40)	Depression	To examine the sensitivity and specificity of the diagnosis of depression achieved with the neural correlates of verbal working memory	Machine learning algorithms	SVM	DSM-IV criteria for major depression HRSD	fMRI data	Accuracy of 68% with sensitivity of 65% and specificity of 70% with the blood oxygenation level-dependent convolution model at the mid-level of difficulty, which corresponded to a distributed network of cerebral regions involved in verbal working memory	Functional neuroanatomy of verbal working memory provides a statistically significant but clinically moderate contribution as a diagnostic biomarker for depression

Abbreviations: AUDIT: Alcohol Use Disorder Identification Test; ALSPAC: Avon Longitudinal Study of Parents and Children; AMT: Autobiographical memory test; AUROC and AUC: Area under the receiver operating characteristic curve; BACS: Brief Assessment of Cognition in Schizophrenia; BAI: Beck Anxiety Inventory; BIS-11: Barratt Impulsiveness Scale-11; BP: Background, medical history, and pregnancy/delivery variables; CAARMS: Comprehensive Assessment of At-Risk Mental State; CANTAB: Cambridge Neuropsychological Test Automated Battery; CNN: Convolutional Neural Network; CPTB: Copenhagen Psychophysiology Test Battery; C-SSRS: Columbia Suicide Severity Rating Scale; DART: Danish Adult Reading Test; DRF: Distributed Random Forests; DSM-IV: Diagnostic and Statistical Manual of Mental Disorder-IV; DT: Decision tree; DTI: Diffusion tensor imaging; EEG: electroencephalogram; EHR: Electronic Health Record; EMA: Ecological momentary assessment; EPDS: Edinburgh Postnatal Depression Scale; ERTC: Extremely randomized trees classifier; ETI-SF: Early Trauma Inventory-Short Form; EU-GEI: European Network of National Schizophrenia Networks Studying Gene–Environment Interactions Multimodal diagnostic accuracy; EXGB: Ensemble of extreme gradient boosting; fMRI: Functional magnetic resonance imaging; GBRT: Gradient Boosting Regression Trees; GHQ: General Health Questionnaire; HADS: Hospital Anxiety and Depression Scale; HAMD and HRSD: Hamilton Rating Scale for Depression; ICD: International Classification of Diseases; Koko: An online peer-to-peer crowdsourcing platform that teaches users cognitive reappraisal strategies that they use to help other users manage negative emotions; LASSO: Least absolute shrinkage and selection operator; LSTM: Long Short-Term Memory; MDD: Major Depressive Disorder; M.I.N.I.: Mini-International Neuropsychiatric Interview; MRI: Magnetic resonance imaging; p: p-value; NEURAPRO: A clinical trial conducted between March 2010 and the end of September 2014, tested the potential preventive role of omega-3 fatty acids in clinical high-risk participants; NLP: Natural Language Processing; NPV: Negative predictive value; PHQ: Patient Health Questionnaire; PPD: Postpartum depression; PPV: Positive Predictive Value; QoL: Quality of life; QOLm: Mental quality of life; RBC: Rank-biserial correlation; RF: Random Forest; RMSE: Root mean square error; RS: Resilience-14; SOC: Sense of Coherence-29; VPSQ: Vulnerable Personality Scale Questionnaire; SE: Regression coefficients; SF-36: 36-Item Short Form Survey; SIQ: Suicidal Ideation Questionnaire; sMRI: Structural magnetic resonance imaging; SOFAS: Social and Occupational Functional Assessment Score; SQ for KNHANES-SF: Stress Questionnaire for Korean National Health and Nutrition Examination Survey-Short Form; SVC and SVM: Support Vector Machine; SVM-RFE: Support Vector Machine Recursive Feature Elimination; UQ: Ubiquitous Questionnaire; WAIS III: Wechsler Adult Intelligence Scale® – Third Edition; W: Wilcoxon signed-rank test (one-sided) statistics; XRT: Extreme randomized forest.