Skip to main content
. 2025 Feb 6;55:e18. doi: 10.1017/S0033291724003295

Table 3.

Studies on AI-assisted monitoring in mental health

Ref. Subject description Mental health condition Aim AI-based method Models Variables for monitoring/prediction Results and accuracy Conclusions
Carreiro et al. (2024) Patients with substance use disorder (n = 30)
  • Stress
  • Craving
To uses continuous physiologic data to
detect high-risk behavioral states (stress and craving) during substance use disorder recovery
Machine learning models
  • DT
  • Discriminant analysis
  • LR
  • Naive Bayes classifiers
  • SVM
  • Nearest neighbor classifiers
  • Ensemble classifiers
  • Time-series raw physiologic data from the commercial sensor
  • Basic demographics, phone and operating system information, current medications, and self-reported past medical, mental health, and substance use history
  • Stress detection
AUC: 0.78
  • Craving detection
AUC: 0.74
  • Stress vs Craving detection
AUC: 0.75
All models performed close to previously validated models from a research grade sensor
Choo et al. (2024) People with borderline personality disorder (n = 80) Suicidal ideation To explore predicting suicidal ideation in individuals with borderline personality disorder using EMA data Machine learning models
  • MEM
  • RNN
  • Baseline: Sex, any prior suicide attempt, baseline BDI, Affective Lability Scale, MDD diagnosis, BSSI, Barrett Impulsivity Scale, Childhood Trauma Questionnaire, and HDRS
  • EMA: Suicidal ideation, stressful events, coping strategies, affect items, and suicidal behavior
  • MEM (RMSE = 3.84, MAPE = 56%, pseudo-R2 = 16%)
  • RNN (RMSE = 3.41, MAPE = 42%, pseudo-R2 = 26%)
RNN showed enhanced predictive accuracy for higher SI values and participants with depression diagnoses or higher baseline depression score
Dong et al. (2024) Patients with diagnosis of schizophrenia (n = 92) Schizophrenia To predict the responsiveness of patients with schizophrenia to rTMS treatment Machine learning models
  • Base model
  • Stacker model
  • Sequential model
  • 16 clinical variables (e.g., PANSS, CDSS, CGI, GAF, MADRS)
  • 4 comorbidity variables (lifetime history of alcohol abuse, alcohol addiction, substance abuse, substance addiction prior to study recruitment)
  • 5 sociodemographic variables (marital status, employment status, housing status, education, sum of education years from parents)
  • PRS
  • sMRI imaging data
Balanced accuracy for predicting ≥20% reduction in negative symptoms of PANSS:
  • Active treatment group: 94%
  • Sham treatment group: 50%
Key predictors of non-response:
  • Clinical + PRS model: Apparent sadness, inability to feel, education level, unemployment
  • sMRI model: Gray matter density reductions in default mode network, limbic networks, cerebellum
Sequential modeling approach enhanced predictive accuracy while reducing diagnostic complexity
Hammelrath et al. (2024) Patients with mild-to-moderate depression (training sample: n = 1270; test sample: n = 318) Mild-to-moderate depression To compare algorithms using features collected at baseline or early in treatment to predict non-response to a 6-week online depression program Machine learning algorithm RF
Baseline variables:
  • Sociodemographic variables(e.g., age, sex, marital status, education, occupation, BMI)
  • Processing (registration year, study variant, treatment affected by the corona pandemic)
  • Healthcare system usage (e.g., previous treatment, usage during the last 4 weeks)
  • Clinical variables (e.g., SCID, BDI-II, PHQ-Depression, COSTA, QoL)
Variables of early treatment (week 2)
  • PHQ-Depression
  • COSTA
  • SEWIP
Best performance form early treatment variables
AUC: 0.71–0.77
Recall: 0.75–0.76
Therapeutic alliance and early symptom change constituted the most important predictors
Hilbert et al. (2024) Patients with a diagnosis of panic disorder, agoraphobia, social anxiety disorder, or multiple specific phobias (n = 309) Panic disorder, agoraphobia, social anxiety disorder, or multiple specific phobias To test if functional neuroimaging data maintains strong prediction accuracy in larger samples using rs-fMRI data Machine learning models
  • RF
  • LR
  • Majority voting
  • Softmax voting
  • Weighted softmax voting
  • Clinical and demographic variables (e.g., age, sex, baseline severity)
  • Resting state-fMRI data (ROI-to-ROI and edge-functional connectivity, sliding-windows, and graph measures)
Accuracy: 0.465–0.600
Balanced accuracy: 0.465–0.613
Sensitivity: 0.460–0.687
Specificity: 0.375–0.539
Caution is advised when interpreting promising prediction results from neuroimaging data in small samples
Wang, Wu, et al. (2024) College students with symptoms of anxiety or depression (n = 107) Symptoms of anxiety, depression, stress To predict efficacy and response using machine learning in college students undergoing biofeedback therapy Machine learning model ANN
  • Heart rate variability characters (time and frequency domains)
  • Acoustic variables from the data using a speech frame (32 ms)
Model accuracy for anxiety treatment response: 62% Speech features, such as the energy parameters as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy
Wang, Wu, et al. (2024) Patients with MDD (training samples: n = 85; test samples: n = 147) MDD To predict treatment response by using neuroimaging data Machine learning models
  • RF
  • GBDT
  • XGBoost
  • Penalized LR
  • SVM
  • Neural network
  • 307 brain imaging variables
  • 49 questionnaire variables from QIDS and HDRS (including baseline and week 8 HDRS scores)
  • 4 clinical and demographic variables (age, total years of education, sex, and medication use)
  • Training set model AUC: 0.615–0.8257
  • Testing set model AUC: 0.4884–0.4941
The machine learning pipeline exhibited high accuracy and AUC (>0.80) on the training set but encountered challenges when applied to an external validation dataset, prompting an investigation into site heterogeneity issues
Zainal and Newman (2024) Patients with GAD (N = 110) GAD To identify which clients with generalized anxiety disorder benefit from mindfulness ecological momentary intervention versus self-monitoring app Machine learning models
  • LR
  • SVM - radial kernel
  • RF
  • Demographic variables (i.e., age, gender, and race/ethnicity)
  • GAD-Questionnaire-IV
  • FFMQ
  • Wechsler Adult Intelligence Scale–Fourth Edition
  • Controlled Oral Word Association Test
GAD severity prediction
SVM nested leave-one-out cross-validation:
AUC = 0.817, accuracy = 0.800, balanced accuracy = 0.795, sensitivity = 0.767, specificity = 0.822
RF nested leave-one-out cross-validation:
AUC = 0.817, accuracy = 0.819, balanced accuracy = 0.814, sensitivity = 0.791, specificity = 0.837
Predictors of optimization to the intervention were higher anxiety severity, higher trait perseverative cognition, lower set-shifting deficits, older age, and stronger trait mindfulness
Brandt et al. (2023) Participants with schizophrenia or schizoaffective disorder (aged ≥18 years) (n = 1392) Schizophrenia or schizoaffective disorder To identify general prognostic factors of relapse for all participants (irrespective of treatment continuation or discontinuation) and specific predictors of relapse for treatment discontinuation Machine learning
  • Proportional hazard regression model (for multivariate analysis)
  • Random survival forests (for exploratory analysis to improve the predictive ability)
36 variables:
  • Demography (sex, age)
  • Somatic history (somatic illness, BMI)
  • Psychiatric history (e.g., disorganized type, catatonic type, paranoid type, residual type, duration of illness)
  • Substance use (smoking, drug-positive urine screening)
  • Standardized scales (PANSS, CGI, AIMS, BARS, PSP)
  • Treatment characteristics before randomization (e.g., last dosage of the antipsychotic study drug, treatment duration of antipsychotic study drug)
  • Comedication
  • Adverse events
  • Laboratory results (e.g., alanine aminotransferase, prolactin, white blood cell count)
The concordance index for predictive performance was 0.707, meaning that the algorithm’s prediction about which of the two participants will relapse sooner is correct in 71% of the cases Out of the 36 baseline variables, general prognostic factors of increased risk of relapse for all participants were drug-positive urine; paranoid, disorganized, and undifferentiated types of schizophrenia; psychiatric and neurological adverse events; higher severity of akathisia; antipsychotic discontinuation; lower social performance; younger age; lower glomerular filtration rate; benzodiazepine comedication
Predictors of increased risk specifically after antipsychotic discontinuation were increased prolactin concentration, higher number of hospitalizations, and smoking
Barrigon et al. (2023) Patients with a history of suicidal thoughts and behavior (n = 225) Suicidal ideation To predict short-term (one week) suicide risk by using smartphone data in suicidal patients Machine learning algorithm Bayesian algorithm
  • Distance traveled
  • Time spent at home
  • Steps taken
  • Use of any app
AUC: 0.78 Unsupervised machine learning on smartphone data from patients with suicidal ideation effectively predicts suicide risk
Dougherty et al. (2023) Patients with TRD (n = 233) TRD To predict which participants with treatment-resistant depression would be week 3 responders and sustained responders through week 12 to psilocybin treatment Machine learning algorithms and models
  • NLP
  • LR
Two-dimensional sentiment from the first session (computed by NLP), emotional breakthrough index, treatment dose At week 3:
Accuracy: 85%
AUC: 88%
At week 12:
Accuracy: 88%
AUC: 85%
Treatment response to psilocybin is accurately predicted using a logistic regression model incorporating NLP metrics, EBI scale responses, and treatment arm data
Harrer et al. (2023) Patients with chronic back pain and depressive symptoms (n = 504) Depressive symptoms To predict treatment effects of an Internet-based depression intervention for patients with chronic back pain Machine learning models DT (developed by multilevel model-based recursive partitioning)
  • Sociodemographic variables (e.g., age, gender, marital status, education, internet affinity, social support, medication, sick leave)
  • Symptom severity and quality of life (PHQ–9, HAMD, NPRS, QoL)
  • Pain-related risk factors (PSEQ, ODI, SPE)
  • Decision tree model: R2app = 52%
  • After bootstrap bias-correction, R2adj = 45%
  • During external validation, R2adj = 33%
Predictions of the multivariate tree learning model suggest a pattern in which patients with moderate depression and relatively low pain self-efficacy benefit most, while no benefits arise when patients’ self-efficacy is already high
Jankowsky et al. (2024) Naturalistic inpatients (n = 723) Anxious and depressive symptoms To compare machine learning algorithms for predicting treatment response in naturalistic inpatient samples Machine learning algorithms
  • Linear regression
  • EN regression
  • Gradient boosting machines
  • Sociodemographic background variables (e.g., gender, age)
  • Indicators of physical health (e.g., subjective health, BMI, smoking)
  • Indicators of personality and mental health (e.g., maladaptive personality traits, anxiety or depression scores)
  • Treatment variables (e.g., number of treatments within the last 12 months)
Training:
R2: 0.329–0.70
Test:
R2: 0.315–0.441
Treatment-related
variables were the most predictive, followed psychological indicators
Ricka et al. (2023) Patients with MDD (n = 26) MDD To identify markers of mood disorders using six months of physiological and clinical data by machine learning Machine learning algorithm Label extension and detrending processes, a feature selection, and a deep learning multilayer perceptron model
  • Physical activity (12 variables)
  • Heart rate (25 variables)
  • Heart rate variability (39 variables)
  • Breathing rate (12 variables)
  • Sleep (13 variables)
  • MADRS scores
2-class prediction (depressed/not depressed)
Accuracy: 86%
Sensitivity: 79%
Specificity: 94%
A supervised ML system can efficiently predict a patient’s clinical score by identifying their biosignature of symptoms during a MDD episode
Scodari et al. (2023) Patients with subclinical depression (n = 236) Minor depressive symptoms To forecast symptom changes among subclinical
depression patients receiving stepped care or usual care
Machine learning models Tree-based and nested framework
  • 15 categorical variables: Gender, marital status, parental birthplace, rural residential area, employment status, education level, excessive alcohol usage, current smoking behavior, normal exercise behavior, onset of depression, baseline dysthymia status, and presence of comorbid illness, and comorbid conditions
  • 8 continuous variables: The number of chronic diseases, BMI, number of historical depressive episodes, baseline locus of control, baseline social support, baseline HADS scores, baseline PHQ–9 scores
For the intervention group, the R2 for models at various treatment time intervals are as follows:
  • 0–3 months: 0.15
  • 0–6 months: 0.13
  • 0–9 months: 0.21
  • 0–12 months: 0.12
For the usual care group, the R2 for models at different treatment time intervals are as follows:
  • 0–3 months: 0.24
  • 0–6 months: 0.15
  • 0–9 months: 0.15
  • 0–12 months: 0.11
Patients who received stepped
care were more likely to reduce PHQ–9 scores if they had high PHQ–9 but low HADS-Anxiety scores at baseline, a low
number of chronic illnesses, and an internal locus of control
Zou et al. (2023) Patients with MDD (N = 245) MDD Using passive sensing data to predict treatment response in patients with MDD Machine learning models
  • SVM
  • LR
  • RF
  • LSTM
  • GRU
  • GRU-Decay
  • Call log (type of phone call, mean and SD time of all calls being made, mean and SD of duration, number and entropy of phone calls)
  • Phone usage (frequency and duration of smartphone usage in a day, duration of phone usage for each period (6–12 pm, 12–6 pm, 6–0 am), earliest and latest phone usage time)
  • App usage (duration of social apps, content-providing apps, shopping apps, and entertainment apps)
  • Sleep and step data (duration and ratio of both light and deep sleep, wake-up and sleep times)
GRU-Decay
Precision: 0.61
Recall: 0.64
F1 score: 0.58
AUC: 0.65
Other models
Precision: 0.57–0.71
Recall: 0.22–0.59
F1 score: 0.33–0.54
AUC: 0.54–0.59
In terms of recall,
F1 score, and AUC, the sequence model based on GRU-Decay
achieve the best performance
Weintraub et al. (2023) Youth aged 13to 19 who had active mood symptoms, mood instability, and at least one parent with bipolar or MDD (n = 44) Depressive symptoms Use of machine learning to identify the speech features that most strongly correlated with concurrent depressive symptoms over 18 weeks Machine learning algorithm SVM PSRs from the Adolescent Longitudinal Interval Follow-up Evaluation
20 speech features reflecting affective processes, social processes, drives, informal, time orientation words etc.
Strongest correlated combination of features:
affective processes, drives, informal, leisure, and risk (r = 0.47, 95% CI: 0.37–0.56, R2 = 0.12)
Strongest association of features from subject’s first speech features:
affective processes, nonfluencies, drives, and risks (r = 0.68, 95% CI: 0.48–0.81, R2 = 0.11)
Speech features identified by machine learning analysis achieved moderate correlation
Jacobson et al. (2022) Note: Also included in the diagnosis domain Participants aged 38.5 years old on average (n = 126,060) MDD
Generalized anxiety disorder
Social anxiety disorder
Panic disorder
Borderline personality
Paranoid personality disorder
Schizophrenia
To examine the effectiveness of prediction of mental health outcomes based on exposure to online screening tools Machine learning RF
Cox Proportional Hazards Models
Screening tool topic
Screening tool attributes
Hour of the day and day of the week at which the screening tool was clicked
Whether the screening tool was a Mental Health America screening tool or from another online web domain
Number of previous searches which resulted in a click to a screening tool
Past interests, e.g., distribution of query topics prior to the clicking on the first screening tool by each use (to ascertain whether the online screen information provided incremental information to their general search pattern types)
Prediction accuracy was high for mental health self- references, self-diagnosis, and seeking care: screen content predicted later searches with mental health self-references (AUC =0·73), mental health self-diagnosis (AUC = 0·69), mental health care-seeking (AUC = 0·61)
Other outcomes were more difficult to predict: psychoactive medications (AUC = 0·55), suicidal ideation (AUC = 0·58), and suicidal intent (AUC = 0·60)
Cox proportional hazards models suggested individuals utilizing tools with in-person care referral were significantly more likely to subsequently search for methods to actively end their life (HR = 1·727)
Online screens may influence help-seeking behavior, suicidal ideation, and suicidal intent
Websites with referrals to in-person treatments could put persons at greater risk of active suicidal intent
Nguyen et al. (2022) Participants with MDD, included early onset (before the age of 30) and chronic (episode duration of two years) or recurrent (2+ episodes) disease episodes (n = 222) MDD To determine whether pretreatment reward task-based fMRI can predict treatment-specific outcome Deep learning models Feedforward neural network
(a separate model was trained for each treatment: sertraline, bupropion, and placebo)
Reward task-based fMRI, which was acquired during a block-design number-guessing task that probes reward processing neural circuitry known to be altered in MDD
Clinical measurements
Demographic features
For predicting change in HAMD
  • Model 1 on sertraline R2: 48%; RMSE: 5.15
  • Model 2 on bupropion R2: 34%; RMSE: 4.46
  • Model 3 on placebo R2: 28%; RMSE: 5.87
All the models explained a substantial proportion of the variance in change in HAMD. The combination of these predictive models presented a possible precision medicine approach for antidepressant selection, and each model would be applied to provide a prediction of response to each treatment
Webb et al. (2022) School district employees aged 18 or above who owned a smartphone, had limited exposure to meditation app, and had depressive symptoms below the severe range (n = 662) Depression and anxiety To use a data-driven algorithm to predict which individuals are most likely to benefit from app-based meditation training Machine learning ENRR Pre-intervention distress, anxiety, depression, stress, repetitive negative thinking, the mindfulness aspect of acting with awareness, loneliness, diffusion, presence, search for meaning, self-compassion, well-being, age, gender, race, marital status, and income
Anxiety measure, PROMIS Depression measures, and 10-item Perceived Stress Scale
Multivariable ENRR model:
Higher baseline levels of the following variables predicted a greater reduction in distress:
  • distress (r = −0.30)
  • depression (r = −0.30)
  • stress (r = −0.26)
Higher baseline scores of the following variables predicted greater reduction in distress in the control condition: diffusion, presence, distress, anxiety, stress, depression, and loneliness
Linear regression mode:
Higher levels of repetitive negative thinking predicted:
  • a greater reduction in distress from the mindfulness app (B = −0.02)
  • higher levels of repetitive negative thinking were significantly associated with poorer outcomes in the control condition (B = 0.01)
Overall:
A significant group with PAI interaction was observed
  • linear regression model including repetitive negative thinking as the sole baseline predictor: r2 = 0.11
  • multivariable ENRR model: r2 = 0.10
Either the linear regression model with a single predictor of baseline levels of repetitive negative thinking, or the multivariable ENRR model with multiple predictors can predict changes in the level of distress
Athreya et al. (2021) People with nonpsychotic MDD and received at least 8 weeks of treatment with a study drug, including SSRIs, SNRIs or TCAs, placebo (n = 3,518) Depression To identify specific depressive symptoms and thresholds of improvement that were predictive of antidepressant response Machine learning Gaussian mixture models
Probabilistic graphical models
Four HDRS items (depressed mood, psychic anxiety, guilt feelings/ delusions, and work/activities)
Thresholds of change in prognostic symptom severity, derived based on the absolute difference in median scores on symptom dynamic paths between baseline and four-week strata
Four depressive symptoms and specific thresholds of four-week change in each symptom predicted the eventual eight-week outcome of SSRI therapy with an average accuracy of 77%.
The symptoms and thresholds derived from patients treated with SSRIs correctly predicted outcomes in 72% of patients treated with other antidepressants
Conjunction of the two AI models derived consistently high predictive accuracies across numerous commonly prescribed antidepressants, and hence interpretable and accurate prognoses of antidepressant treatment outcomes
Bao et al. (2021) Depressive patients receiving six intravenous infusions of ketamine over 2 weeks (n = 83) MDD To identify a set of biomarkers that could be used to predict clinical outcomes for treatment in MDD Machine learning
  • SVM
  • RF
  • kNN
  • LR
  • DT
  • LR with EN
Age, sex, BMI, smoking status, and the HAMD score Accuracy:
  • SVM: 0.62 ± 0.23
  • RF: 0.56 ± 0.15
  • kNN: 0.63 ± 0.12
  • LR: 0.62 ± 0.12
  • DT: 0.57 ± 0.12
  • LR with EN: 0.63 ± 0.19
Machine learning approach could predict treatment outcomes of multiple ketamine infusions on the basis of the genotyping information
Lee et al. (2021) Adults aged 18 to 65 with bipolar disorder (n = 60) Bipolar depression To identify biologically relevant moderators of response to TNF-α inhibitor, infliximab Machine learning CART Plasma cytokine and neuronal origin-enriched extracellular vesicle protein concentrations, intervention assignment and week
SHAPS
MADRS
Accuracy of predicting reduction in anhedonic symptoms with baseline cytokine biotype, intervention allocation, week, and baseline and change in neuronal origin-enriched extracellular vesicle factor scores:
  • r2 = 0.22
  • RMSE = 0.08
No significant moderation effect is observed in MADRS total score by baseline biotype
Pretreatment biotypes, which derived from peripheral cytokine measurements, can predict antianhedonic efficacy with infliximab
Solomonov et al. (2021) Older adults over 60 who suffered from unipolar, nonpsychotic MDD (n = 221) Suicidal ideation To identify baseline predictors of the course of suicidal ideation Machine learning algorithms
  • LASSO
  • RF
  • GBM
  • Classification tree
Demographics, treatment assignment, age of onset, length of current episode, number of previous episodes, severity of depression, disability, cognitive impairment, executive functioning, neuroticism, apathy, hopelessness, activation, avoidance/rumination, work/school impairment avoidance/rumination; social impairment; anhedonia; rumination response style scale; and digit span Predictive performance:
  • LASSO: AUC = 0.735
  • GBM: AUC = 0.725
  • RF: AUC = 0.684
  • Classification tree: AUC = 0.670
Four machine learning algorithms identified hopelessness, neuroticism, and low general self-efficacy as the strongest predictors of an unfavorable trajectory of suicidal ideation
Van Bronswijk et al. (2021) Adult outpatients recruited from the mood disorders unit with a primary diagnosis of MDD (n = 151) MDD To extend the PAI to long-term depression outcomes after acute-phase psychotherapy Two-step machine learning
  • RF
  • Regression model
38 pretreatment variables from six domains:
  1. depression variables
  2. demographics
  3. psychological distress
  4. general functioning
  5. psychological processes
  6. life and family history
For parental alcohol abuse, the regression coefficients across the bootstrapped samples were stable with a positive value in 99.8% of the samples A history of parental alcohol abuse was associated with higher BDI-II scores during the 17-month follow-up phase. Therefore, parental alcohol abuse could be used as a predictor for long-term depression outcomes following cognitive therapy and interpersonal psychotherapy
Busk et al. (2020) Patients with bipolar disorder who had previously been treated (n = 15,975) Bipolar disorder To examine the feasibility of forecasting daily subjective mood scores based on daily self-assessments Multi-task learning Hierarchical Bayesian models Daily self-assessments via Android smartphone app, including activity, alcohol, anxiety, irritability, cognitive difficulty, medicine intake, presence of mixed mood, mood, sleep, stress
Clinical evaluations with HDRS and YMRS to assess depression and mania
Historical mood was the most important predictor of future mood, with self-reported mood scores and HDRS scores were negatively correlated (r = −0.40) whereas self-reported mood scores and YMRS scores were positively correlated (r = 0.22) Application of hierarchical Bayesian models could forecast subjective mood for up to 7 days, thus improving continuous disease monitoring
Furukawa et al. (2020) Patients aged 25 to 75 years, with nonpsychotic unipolar MDD episode, and having received no antidepressant, antipsychotic, or mood stabilizer in the previous month (n = 2,011) MDD To predict depression severity from a large set of baseline predictors through a web app Machine learning Penalized linear regression models using LASSO
Penalized linear regression models using the ridge penalty
SVM with a polynomial or radial kernel
Artificial neural networks with one hidden layer, three or four nodes
Sociodemographic variables including age, sex, education, employment status, and marital status
Baseline clinical characteristics include age at onset of depression, number of previous depressive episodes, length of index episode, and concurrent physical illness
Depression characteristics by week three include individual item scores of PHQ–9 for the index episode; individual item scores of the BDI-II; individual item scores of the FIBSER; and adherence to pharmacotherapy
SVMs are observed with a lower prediction error in both internal and internal-external cross-validation (MAE = 1.5) Three different SVMs with a radial kernel, one SVM per treatment arm, could be chosen to predict treatment outcome
Rajpurkar et al. (2020) Outpatients aged 18 to 65 from primary or specialty care practices with a diagnosis of MDD (n = 518) MDD To identify the extent to which a machine learning approach can predict acute improvement for individual depressive symptoms with antidepressants based on pretreatment symptom scores and EEG measures Machine learning ELECTREE Score algorithm using GBDTs Resting-state EEG continuously recorded
Symptoms of HRSD–21
C index score, which is indicative of discriminative performance, was found for 12 symptoms. The highest C index score was found on:
  • loss of insight (C index, 0.963 [95% CI 0.939–1.000])
  • unreality and nihilism (C index, 0.951 [95% CI, 0.932–0.976])
  • weight loss (C index, 0.923 [95% CI, 0.896–0.953])
The most critical predictor for each symptom was the baseline symptoms severity
Any single EEG feature was higher than 5% predictors for seven symptoms
Combination of EEG and baselines symptom feature significantly increased the C index for improvement in four symptoms:
  • Energy loss (C index increase, 0.035 [95% CI, 0.011–0.059])
  • Appetite changes (C index increase, 0.017 [95% CI, 0.003–0.030])
  • Psychomotor retardation (C index increase, 0.020 [95% CI, 0.008–0.032])
  • Loss of insight (C index increase, 0.012 [95% CI, 0.001–0.020])
The machine learning model could predict the improvement in depressive symptoms most accurately with baseline symptom severity in combination with EEG features
Rozek et al. (2020) Army soldiers reporting active suicide ideation with intent to die during the previous week and/or a suicide attempt during the previous month (n = 152) Suicide To examine predictors of suicidal behaviors among high-risk suicidal soldiers who received outpatient mental health services in a RCT of Brief CBT for Suicide Prevention compared to treatment as usual Machine learning MondoBrain Augmented Intelligence® System
  • BSSI-W
  • Prior attempts
  • Treatment Group
  • SCS
  • Sex
This combination of variables correctly classified eight of 26 participants who attempted suicide during the two-year follow-up period (30.8%) and misclassified only one of 126 participants who did not attempt suicide (0.8%), yielding 88.9% positive predictive value, and 87.4% negative predictive value This combination of variables correctly classified almost one-third of participants who attempted suicide in the subsequent two years with good positive predictive value and negative predictive value
Browning et al. (2019) Depressive patients whose treating clinician had made the decision to prescribe citalopram (n = 239) Depression To assess whether changes in emotional processing and subjective symptoms over the first week of antidepressant treatment predicts clinical response after four–eight weeks of treatment Machine learning SVM QIDS-SR16, ECAT, EREC, FERT Accuracy:
  • QIDS-SR16: ~60%
  • FERT: 70%
  • ECAT & EREC: 50–60%
  • QIDS-SR16 & FERT: 77%
  • QIDS-SR16, FERT, ECAT & EREC: 79%
Cognitive and symptomatic measures were possible to be used in guiding antidepressant treatment in depressed patients
Foster et al. (2019) Adolescents aged 12–17 with MDD (n = 439) MDD To estimate patient-specific inter-treatment differences among three treatment conditions: CBT, FLX, and the combination of CBT and FLX, as a function of patients’ baseline characteristics Machine learning Model-based Random Forest Gender, race, family income, referral source, dysthymia, anxiety disorder, ADHD, childhood trauma, study site, age, verbal intelligence, current episode duration, baseline depression severity, functional impairment, suicidal ideation, melancholic features, number comorbid diagnoses, caregiver depression, conflict with caregiver, hopelessness, cognitive distortions, treatment expectations from parent, treatment expectations from adolescents FLX-CBT difference:
FLX was more effective (b = −0.13, 95% CI: −0.22 to −0.05), especially with more severe baseline depression
CB -combination difference:
Combination was more effective (b = −0.25, 95% CI: −0.33 to −0.17)
FLX-combination difference:
Combination was more effective (b = −0.11, 95% CI: −0.21 to −0.02), especially with less severe baseline depression and higher treatment expectations from patients
Combined treatment with CBT and FLX was consistently superior to either therapy administered alone across a broad range of patients
Vitinius et al. (2019) Depressed patients with CAD (n = 570) Depression To identify somatic and sociodemographic predictors of depression outcome among depressed patients with CAD Machine learning LR and linear or binomial linear model with LASSO regularization 141 potential sociodemographic and somatic predictors including blood tests, medical history, current drug use, comorbidities, and sociodemographic data.
HADS
Predictors to favorable depression outcome:
higher heart rate variability during numeracy tests (p = 0.020), unknown previous myocardial infarction (p = 0.013), higher age (p = 0.002)
Predictors to unfavorable depression outcome:
anticholinergic drugs (p = 0.045), state after resuscitation (p ≤ 0.042), uric acid drugs (p ≤ 0.039), beta blockers (p = 0.035), New York Heart Association (NYHA) class III (p ≤ 0.028), analgesic drugs (p = 0.027), antidiabetic drugs (p = 0.015), higher triglycerides (p = 0.014), intake of thyroid hormones (p = 0.007), and hyperuricemia (p ≤ 0.003)
Machine learning could identify somatic and sociodemographic predictors of depression outcome in patients with CAD
Bailey et al. (2018) Patients with TRD and healthy controls aged 20 to 72 with normal or corrected to normal vision (n = 50) Depression To determine whether working memory related power, connectivity, and theta- gamma coupling measures could be used to predict responders to rTMS treatment for treatment-resistant depression Multivariate machine learning SVM
  • Mood: Montgomery-Asberg depression rating scale
  • Behavior: working memory accuracy, average reaction time
  • EEG: alpha, theta, and gamma power, connectivity, and theta-gamma coupling
Prediction of individual responders:
  • mean sensitivity: 0.91 (±0.06 SD)
  • specificity: 0.92 (±0.02 SD)
  • balanced accuracy: 91% (±3.64 SD)
Baseline and week 1 frontal-midline theta power and theta connectivity showed good potential for predicting response to rTMS treatment for depression
Kautzky et al. (2018) Patients diagnosed with MDD (n = 55) MDD To generate a prediction model for TRD using machine learning featuring a large set of clinical and sociodemographic predictors of treatment outcome Machine learning
RF 47 predictors documented in the GSRD database, which can be classified into:
  • Sociodemographic
  • MDD history
  • Axis II comorbidity
  • Axis III comorbidity
  • Clinical features
  • Other predictors, e.g., inpatient or outpatient, quality of social life, quality of work life, quality of family life, retrospective MADRS score
The full model with 47 predictors yielded an accuracy of 75.0% for predicting TRD and treatment response, with positive predictive value of 79.6%, and negative predictive value of 67.9%
When the number of predictors was reduced to 15, accuracies between 67.6% and 71.0% were attained for different test sets
Machine learning techniques have shown promising results on prediction of TRD by considering interaction and main effects equally and producing reliable classification with high accuracy
Lenhard et al. (2018) Adolescents with aged 12–17 with OCD and had received either immediate or delayed (12 weeks) internet-delivered CBT (n = 61) Pediatric OCD To test four different machine learning methods in the prediction of treatment response in a sample of pediatric OCD patients who had received internet-delivered CBT Machine learning Linear model with best subset predictor selection
L1 Elastic Net (LASSO)
RF
SVM
46 demographic and clinical baseline variables, related to:
  • Parental education level
  • Referral to study
  • Medication
  • Previous treatment experience
  • Comorbidity
  • Number of comorbid diagnoses
  • Baseline OCD symptoms
  • Clinical Global Impression
  • Self-rated baseline measures
  • Parent-rated baseline measures
  • Outcome at posttreatment
  • Outcomes at three-month follow-up
Accuracy:
  • Linear model with best subset predictor selection: 83%
  • L1 Elastic Net (LASSO): 75%
  • RF: 75%
  • SVM: 75%
Machine learning models were able to predict treatment outcome in internet-delivered CBT for pediatric OCD with good to excellent accuracy
Maciukiewicz et al. (2018) Individuals diagnosed with MDD from three clinical trials who received duloxetine or placebo for up to eight weeks (n = 186) MDD To use supervised machine learning to build predictive models of duloxetine outcome for MDD with genome-wide data Machine learning models LASSO regression
CRT
SVM
SNPs Accuracy on remission prediction:
  • CRT = 0.51
  • SVM = 0.52
Accuracy on prediction of treatment response accuracy:
  • CRT = 0.57
  • SVM = 0.64
(chance accuracy = 0.57)
Of the 19 most robust SNPs, 17 were characterized by large LASSO coefficients
None of the machine learning models performed satisfactorily in remission prediction. For treatment response, SVM achieved moderate performance whereas CRT’s performance was just equal to chance accuracy
Nie et al. (2018) STAR*D cohort:
Patients with MDD.
RIS-INT–93 cohort:
Patients with MDD and had history of resistance to therapy with antidepressant medication and were treated prospectively with citalopram for up to six weeks (n = 5686)
MDD To identify risk factors of treatment resistance by extending the work in predictive modeling of treatment-resistant depression via partition of the data from the STAR*D cohort and completely independent cohort RIS-INT–93 into training and testing datasets Machine learning
  • l2 penalized LR
  • RF
  • GBDT
  • XGBoost
  • EN
CRS, demographics, PHX, MHX, PRISE, PDSQ, baseline and week two of level 1 treatment which include records from Clinic Visit Form, QIDS-C16, QIDS-SR16, Bech melancholia scale, the Maier-Phillipp severity subscale, the Santen Subscale, the Gibbons’ global depression severity scale, HAM-D7 STAR*D testing dataset and RIS-INT–93 independent dataset with an AUC of 0.70–0.78 and 0.72–0.77, respectively The series of machine learning models were able to predict treatment-resistant depression using clinical and sociodemographic data
Chekroud et al. (2016) STAR*D trial:
Patients from primary and psychiatric care settings, with nonpsychotic MDD, with at least 14 score on 17-item HAMD, and aged 18–75
COMED trial:
Patients with nonpsychotic MDD, had recurrent or chronic depression, with at least 16 scores on 17-item HAMD, and aged 18–75 (n = 4041)
MDD To develop an algorithm to assess whether patients will achieve symptomatic remission from a 12-week course of citalopram Machine learning EN Overlapping variables in the two clinical trials including sociodemographic features, DSM-IV-based diagnostic items, depressive severity checklists, eating disorder diagnoses, whether the patient had previously taken specific antidepressant drugs, the number and age of onset of previous major depressive episodes, and the first 100 items of the psychiatric diagnostic symptom questionnaire Accuracy in internal validation:
  • STAR*D cohort: 64.6%
Accuracy in external validation:
  • COMED cohort (escitalopram treatment group): 59.6%
  • COMED cohort (escitalopram-bupropion treatment group): 59.7%
  • COMED cohort (venlafaxine-mirtazapine treatment group): 51.4%
Machine learning achieved moderate performance for internal prediction. The performance across cohort varied for different treatment groups showed fair to moderate accuracy
Iniesta et al. (2016) Treatment-seeking adults with MDD and a current depressive episode (n = 793) MDD To optimize prediction of symptom improvement and remission during treatment with escitalopram or nortriptyline Machine and statistical learning ENRR Demographics data including current age, age at onset of depression, sex, smoking status, BMI, occupation, marital status, years of education and number of children
Baseline severity measures including the clinician-rated MADRS, the 17-item HRSD and the self-report BDI
Individual depressive symptoms from the SCAN interview and depression subtypes
Observed mood, cognitive and neurovegetative symptom factors, and six dimensions (mood, anxiety, pessimism, interest-activity, sleep, and appetite) from a published factor analysis
Stressful life events experienced during the six months prior to the baseline assessment, measured with the LTE-Q
Medication history included the use of antidepressant at the time of recruitment, any prior antidepressant treatment, number and types of antidepressants tried established with Medication History Form
Accuracy of prediction on different outcomes:
  • Reduction in depressive symptoms: a model including 29 of the 60 predictors explained a 3.85% of the variance in MADRS scores change across treatment arms
  • Remission: AUC = 0.72, R2 = 0.15
Predictors with strong contribution:
  • Symptoms of depressed mood, reduced interest, decreased activity, indecisiveness, pessimism, and anxiety significantly predicted symptom improvement
  • BMI, appetite, interest-activity symptom dimension, and anxious-somatizing depression subtype predicted remission
Easily obtained demographic and clinical variables could predict therapeutic response to escitalopram with clinically meaningful accuracy
Amminger et al. (2015) Individuals with ultra-high risk for psychosis and meeting at least one operationally defined groups of risk factors for psychosis:
  1. Attenuated positive psychotic symptoms
  2. Transient psychosis
  3. Genetic risk plus a significant decrease in functioning
(n = 81)
Psychosis To determine biological and clinical factors associated with treatment response indexed by functional improvement in a pre–post examination of a 12-week intervention in individuals at ultra-high risk for psychosis Machine learning Linear regression models
Gaussian Process Classification
Erythrocyte fatty acid composition of the phosphatidylethanolamine phospholipid fraction Univariate analysis:
Variance in prediction of functional improvement:
  • In ω–3 PUFA group:
ALA and negative symptoms explained 14% and 10% of the variance
  • In-placebo group:
Positive symptoms and functioning explained 23% and 11% of the variance
Multivariate analysis:
Overall accuracy of fatty acid prediction in treatment response:
  • In ω–3 PUFA group:
86.7%
  • In-placebo group:
79.2%
Univariate analysis:
Higher levels of erythrocyte membrane ALA (parent fatty acid of the ω–3 family) and more severe negative symptoms at baseline predicted subsequent functional improvement in the treatment group
Less severe positive symptoms and lower functioning at baseline were predictive on functional improvement in the placebo group
Multivariate analysis:
Fatty acids predicted response to treatment in both ω–3 PUFA and placebo groups with a high level of accuracy
Guilloux et al. (2015) Anxious-depressed adults with nonpsychotic MDD episode of sufficient severity (score ≥ 15 on the 25-item HRSD) and elevated symptoms of panic or anxiety (score ≥ 7 on the past-month panic and agoraphobic spectrum self-report)
Nonpatient controls not meeting criteria for any mood or anxiety disorder (n = 67)
MDD To identify the biomarkers predicting nonremission prior treatment initiation Machine learning prediction model Random intercept model
SVM
Peripheral blood-based gene expression The results from these studies indicate an average cross-validated accuracy (i.e., model selection bias corrected) of 79.4% in predicting remission status, with the 13-gene model displaying the highest individual noncorrected prediction value (88%).
The newly built prediction model in the validation cohort using the same 13 genes identified in the initial cohort, and found through another round of leave-one-out cross-validation that a 6-gene model achieved the highest accuracy (76.2%)
At pretreatment assessment, the gene expression profiles obtained from blood samples of MDD subjects who will not attain remission after treatment differ from nondepressed controls and also from MDD patients who will remit with treatment
Six out of 13 genes identified in the initial cohort could predict remission in an independent cohort, which demonstrated the potential of pretreatment peripheral gene expression profiles to predict nonremission following an eight- to 12-week course of citalopram treatment

Abbreviations: ADHD: Attention-Deficit/Hyperactivity Disorder; AIMS: Abnormal Involuntary Movement Scale; ALA: α-linolenic acid; ANN: Artificial neural network; AUC: Area under the receiver operating characteristic curve; BARS: Barnes Akathisia Rating Scale; BDI: Beck Depression Inventory; BMI: Body mass index; BSSI-W: Beck Scale for Suicide Ideation, Worst Point; CAD: Coronary artery disease; CART: Classification and regression trees; CBT: Cognitive behavioral therapy; CDSS: Sum of Calgary Depression Scale for Schizophrenia; CGI: Clinical Global Impression; COSTA: Cognitive Style Assessment measuring cognitive distortions; CRS: Cumulative Illness Rating Scale; CRT: Classification and regression tree; DT: Decision tree; EBI: Emotional Breakthrough Index; ECAT: Emotional categorization task; EEG: Electroencephalographic; EMA: Ecological Momentary Assessment; EN: Elastic net; ENRR: Elastic net regularized regression; EREC: Emotional recall task; FERT: Face-based emotional recognition task; FFMQ: Five Factor Mindfulness Questionnaire; FLX: Fluoxetine; fMRI: Functional magnetic resonance imaging; GAD: Generalized anxiety disorder; GAF: Global Assessment of Functioning; GBDT: Gradient-boosted decision trees; GRU: Gated Recurrent Unit; GSRD: Group for the Study of Resistant Depression; HADS: Hospital Anxiety and Depression Scale; HAMD: Hamilton Depression Rating Scale; HDRS: Hamilton Depression Rating Scale; HRSD: Hamilton Rating Scale for Depression; kNN: K-nearest neighbor; LASSO: Least absolute shrinkage and selection operator; LR: Logistics regression; LSTM: Long Short-Term Memory; LTE-Q: List of Threatening Experiences Questionnaire; MADRS: Montgomerye-Åsberg Depression Rating Scale; MAPE: Mean absolute percent error; MDD: Major depressive disorder; MEM: Mixed-effects linear regression models; MHX: Medication history; NLP: Natural language processing; NPRS: Numerical pain rating scale; ODI: Oswestry Disability Index; OCD: Obsessive-compulsive disorder; PAI: Personalized Advantage Index; PANSS: Positive and Negative Syndrome Scale; PDSQ: Psychiatric Diagnostic Screening Questionnaire; PHQ-9: Personal Health Questionnaire-9; PHX: Psychiatric history; PRISE: Patient Rated Inventory of Side Effect; PROMIS: Patient-Reported Outcomes Information System; PRS: Polygenic risk score; PSEQ: Pain Self-Efficacy Questionnaire; PSP: Personal and Social Performance; PSRs: Psychiatric Status Ratings; QIDS-C16: Quick Inventory of Depressive Symptomatology (Clinician-Rated); QIDS-SR16: Quick Inventory of Depressive Symptomatology (Self-assessment); QoL: Quality of life; RCT: randomized controlled trial; RF: Random Forest; rTMS: Repetitive transcranial magnetic stimulation; RMSE: Root mean squared error; RNN: Recurrent neural networks; SCAN: Schedules for Clinical Assessment in Neuropsychiatry; SCS: Suicide Cognitions Scale; SEWIP: Scale for the Multiperspective Assessment of General Change Mechanisms in Psychotherapy; SHAPS: Snaith Hamilton Pleasure Scale; SICD: Structured clinical interview for DSM-IV; sMRI: Structural Magnetic Resonance Imaging; SNPs: Single nucleotide polymorphism; SNRIs: Serotonin-norepinephrine reuptake inhibitors; SPE: Subjective Prognostic Employment Scale; SSRIs: Selective serotonin reuptake inhibitors; SVM: Support vector machine; TCAs: Tricyclic antidepressants; TNF: Tumor necrosis factor; TRD: Treatment-resistant depression; XGBoost: Extreme gradient boosting; YMRS: Young Mania Rating Scale; ω-3 PUFA: Omega-3 polyunsaturated fatty acids.