. 2025 Feb 6;55:e18. doi: 10.1017/S0033291724003295

Table 3.

Studies on AI-assisted monitoring in mental health

Ref.	Subject description	Mental health condition	Aim	AI-based method	Models	Variables for monitoring/prediction	Results and accuracy	Conclusions
Carreiro et al. (2024)	Patients with substance use disorder (n = 30)	Stress Craving	To uses continuous physiologic data to detect high-risk behavioral states (stress and craving) during substance use disorder recovery	Machine learning models	DT Discriminant analysis LR Naive Bayes classifiers SVM Nearest neighbor classifiers Ensemble classifiers	Time-series raw physiologic data from the commercial sensor Basic demographics, phone and operating system information, current medications, and self-reported past medical, mental health, and substance use history	Stress detection AUC: 0.78 Craving detection AUC: 0.74 Stress vs Craving detection AUC: 0.75	All models performed close to previously validated models from a research grade sensor
Choo et al. (2024)	People with borderline personality disorder (n = 80)	Suicidal ideation	To explore predicting suicidal ideation in individuals with borderline personality disorder using EMA data	Machine learning models	MEM RNN	Baseline: Sex, any prior suicide attempt, baseline BDI, Affective Lability Scale, MDD diagnosis, BSSI, Barrett Impulsivity Scale, Childhood Trauma Questionnaire, and HDRS EMA: Suicidal ideation, stressful events, coping strategies, affect items, and suicidal behavior	MEM (RMSE = 3.84, MAPE = 56%, pseudo-R² = 16%) RNN (RMSE = 3.41, MAPE = 42%, pseudo-R² = 26%)	RNN showed enhanced predictive accuracy for higher SI values and participants with depression diagnoses or higher baseline depression score
Dong et al. (2024)	Patients with diagnosis of schizophrenia (n = 92)	Schizophrenia	To predict the responsiveness of patients with schizophrenia to rTMS treatment	Machine learning models	Base model Stacker model Sequential model	16 clinical variables (e.g., PANSS, CDSS, CGI, GAF, MADRS) 4 comorbidity variables (lifetime history of alcohol abuse, alcohol addiction, substance abuse, substance addiction prior to study recruitment) 5 sociodemographic variables (marital status, employment status, housing status, education, sum of education years from parents) PRS sMRI imaging data	Balanced accuracy for predicting ≥20% reduction in negative symptoms of PANSS: Active treatment group: 94% Sham treatment group: 50%	Key predictors of non-response: Clinical + PRS model: Apparent sadness, inability to feel, education level, unemployment sMRI model: Gray matter density reductions in default mode network, limbic networks, cerebellum Sequential modeling approach enhanced predictive accuracy while reducing diagnostic complexity
Hammelrath et al. (2024)	Patients with mild-to-moderate depression (training sample: n = 1270; test sample: n = 318)	Mild-to-moderate depression	To compare algorithms using features collected at baseline or early in treatment to predict non-response to a 6-week online depression program	Machine learning algorithm	RF	Baseline variables: Sociodemographic variables(e.g., age, sex, marital status, education, occupation, BMI) Processing (registration year, study variant, treatment affected by the corona pandemic) Healthcare system usage (e.g., previous treatment, usage during the last 4 weeks) Clinical variables (e.g., SCID, BDI-II, PHQ-Depression, COSTA, QoL) Variables of early treatment (week 2) PHQ-Depression COSTA SEWIP	Best performance form early treatment variables AUC: 0.71–0.77 Recall: 0.75–0.76	Therapeutic alliance and early symptom change constituted the most important predictors
Hilbert et al. (2024)	Patients with a diagnosis of panic disorder, agoraphobia, social anxiety disorder, or multiple specific phobias (n = 309)	Panic disorder, agoraphobia, social anxiety disorder, or multiple specific phobias	To test if functional neuroimaging data maintains strong prediction accuracy in larger samples using rs-fMRI data	Machine learning models	RF LR Majority voting Softmax voting Weighted softmax voting	Clinical and demographic variables (e.g., age, sex, baseline severity) Resting state-fMRI data (ROI-to-ROI and edge-functional connectivity, sliding-windows, and graph measures)	Accuracy: 0.465–0.600 Balanced accuracy: 0.465–0.613 Sensitivity: 0.460–0.687 Specificity: 0.375–0.539	Caution is advised when interpreting promising prediction results from neuroimaging data in small samples
Wang, Wu, et al. (2024)	College students with symptoms of anxiety or depression (n = 107)	Symptoms of anxiety, depression, stress	To predict efficacy and response using machine learning in college students undergoing biofeedback therapy	Machine learning model	ANN	Heart rate variability characters (time and frequency domains) Acoustic variables from the data using a speech frame (32 ms)	Model accuracy for anxiety treatment response: 62%	Speech features, such as the energy parameters as more accurate and objective indicators for tracking biofeedback therapy response and predicting efficacy
Wang, Wu, et al. (2024)	Patients with MDD (training samples: n = 85; test samples: n = 147)	MDD	To predict treatment response by using neuroimaging data	Machine learning models	RF GBDT XGBoost Penalized LR SVM Neural network	307 brain imaging variables 49 questionnaire variables from QIDS and HDRS (including baseline and week 8 HDRS scores) 4 clinical and demographic variables (age, total years of education, sex, and medication use)	Training set model AUC: 0.615–0.8257 Testing set model AUC: 0.4884–0.4941	The machine learning pipeline exhibited high accuracy and AUC (>0.80) on the training set but encountered challenges when applied to an external validation dataset, prompting an investigation into site heterogeneity issues
Zainal and Newman (2024)	Patients with GAD (N = 110)	GAD	To identify which clients with generalized anxiety disorder benefit from mindfulness ecological momentary intervention versus self-monitoring app	Machine learning models	LR SVM - radial kernel RF	Demographic variables (i.e., age, gender, and race/ethnicity) GAD-Questionnaire-IV FFMQ Wechsler Adult Intelligence Scale–Fourth Edition Controlled Oral Word Association Test	GAD severity prediction SVM nested leave-one-out cross-validation: AUC = 0.817, accuracy = 0.800, balanced accuracy = 0.795, sensitivity = 0.767, specificity = 0.822 RF nested leave-one-out cross-validation: AUC = 0.817, accuracy = 0.819, balanced accuracy = 0.814, sensitivity = 0.791, specificity = 0.837	Predictors of optimization to the intervention were higher anxiety severity, higher trait perseverative cognition, lower set-shifting deficits, older age, and stronger trait mindfulness
Brandt et al. (2023)	Participants with schizophrenia or schizoaffective disorder (aged ≥18 years) (n = 1392)	Schizophrenia or schizoaffective disorder	To identify general prognostic factors of relapse for all participants (irrespective of treatment continuation or discontinuation) and specific predictors of relapse for treatment discontinuation	Machine learning	Proportional hazard regression model (for multivariate analysis) Random survival forests (for exploratory analysis to improve the predictive ability)	36 variables: Demography (sex, age) Somatic history (somatic illness, BMI) Psychiatric history (e.g., disorganized type, catatonic type, paranoid type, residual type, duration of illness) Substance use (smoking, drug-positive urine screening) Standardized scales (PANSS, CGI, AIMS, BARS, PSP) Treatment characteristics before randomization (e.g., last dosage of the antipsychotic study drug, treatment duration of antipsychotic study drug) Comedication Adverse events Laboratory results (e.g., alanine aminotransferase, prolactin, white blood cell count)	The concordance index for predictive performance was 0.707, meaning that the algorithm’s prediction about which of the two participants will relapse sooner is correct in 71% of the cases	Out of the 36 baseline variables, general prognostic factors of increased risk of relapse for all participants were drug-positive urine; paranoid, disorganized, and undifferentiated types of schizophrenia; psychiatric and neurological adverse events; higher severity of akathisia; antipsychotic discontinuation; lower social performance; younger age; lower glomerular filtration rate; benzodiazepine comedication Predictors of increased risk specifically after antipsychotic discontinuation were increased prolactin concentration, higher number of hospitalizations, and smoking
Barrigon et al. (2023)	Patients with a history of suicidal thoughts and behavior (n = 225)	Suicidal ideation	To predict short-term (one week) suicide risk by using smartphone data in suicidal patients	Machine learning algorithm	Bayesian algorithm	Distance traveled Time spent at home Steps taken Use of any app	AUC: 0.78	Unsupervised machine learning on smartphone data from patients with suicidal ideation effectively predicts suicide risk
Dougherty et al. (2023)	Patients with TRD (n = 233)	TRD	To predict which participants with treatment-resistant depression would be week 3 responders and sustained responders through week 12 to psilocybin treatment	Machine learning algorithms and models	NLP LR	Two-dimensional sentiment from the first session (computed by NLP), emotional breakthrough index, treatment dose	At week 3: Accuracy: 85% AUC: 88% At week 12: Accuracy: 88% AUC: 85%	Treatment response to psilocybin is accurately predicted using a logistic regression model incorporating NLP metrics, EBI scale responses, and treatment arm data
Harrer et al. (2023)	Patients with chronic back pain and depressive symptoms (n = 504)	Depressive symptoms	To predict treatment effects of an Internet-based depression intervention for patients with chronic back pain	Machine learning models	DT (developed by multilevel model-based recursive partitioning)	Sociodemographic variables (e.g., age, gender, marital status, education, internet affinity, social support, medication, sick leave) Symptom severity and quality of life (PHQ–9, HAMD, NPRS, QoL) Pain-related risk factors (PSEQ, ODI, SPE)	Decision tree model: R²app = 52% After bootstrap bias-correction, R²adj = 45% During external validation, R²adj = 33%	Predictions of the multivariate tree learning model suggest a pattern in which patients with moderate depression and relatively low pain self-efficacy benefit most, while no benefits arise when patients’ self-efficacy is already high
Jankowsky et al. (2024)	Naturalistic inpatients (n = 723)	Anxious and depressive symptoms	To compare machine learning algorithms for predicting treatment response in naturalistic inpatient samples	Machine learning algorithms	Linear regression EN regression Gradient boosting machines	Sociodemographic background variables (e.g., gender, age) Indicators of physical health (e.g., subjective health, BMI, smoking) Indicators of personality and mental health (e.g., maladaptive personality traits, anxiety or depression scores) Treatment variables (e.g., number of treatments within the last 12 months)	Training: R²: 0.329–0.70 Test: R²: 0.315–0.441	Treatment-related variables were the most predictive, followed psychological indicators
Ricka et al. (2023)	Patients with MDD (n = 26)	MDD	To identify markers of mood disorders using six months of physiological and clinical data by machine learning	Machine learning algorithm	Label extension and detrending processes, a feature selection, and a deep learning multilayer perceptron model	Physical activity (12 variables) Heart rate (25 variables) Heart rate variability (39 variables) Breathing rate (12 variables) Sleep (13 variables) MADRS scores	2-class prediction (depressed/not depressed) Accuracy: 86% Sensitivity: 79% Specificity: 94%	A supervised ML system can efficiently predict a patient’s clinical score by identifying their biosignature of symptoms during a MDD episode
Scodari et al. (2023)	Patients with subclinical depression (n = 236)	Minor depressive symptoms	To forecast symptom changes among subclinical depression patients receiving stepped care or usual care	Machine learning models	Tree-based and nested framework	15 categorical variables: Gender, marital status, parental birthplace, rural residential area, employment status, education level, excessive alcohol usage, current smoking behavior, normal exercise behavior, onset of depression, baseline dysthymia status, and presence of comorbid illness, and comorbid conditions 8 continuous variables: The number of chronic diseases, BMI, number of historical depressive episodes, baseline locus of control, baseline social support, baseline HADS scores, baseline PHQ–9 scores	For the intervention group, the R² for models at various treatment time intervals are as follows: 0–3 months: 0.15 0–6 months: 0.13 0–9 months: 0.21 0–12 months: 0.12 For the usual care group, the R² for models at different treatment time intervals are as follows: 0–3 months: 0.24 0–6 months: 0.15 0–9 months: 0.15 0–12 months: 0.11	Patients who received stepped care were more likely to reduce PHQ–9 scores if they had high PHQ–9 but low HADS-Anxiety scores at baseline, a low number of chronic illnesses, and an internal locus of control
Zou et al. (2023)	Patients with MDD (N = 245)	MDD	Using passive sensing data to predict treatment response in patients with MDD	Machine learning models	SVM LR RF LSTM GRU GRU-Decay	Call log (type of phone call, mean and SD time of all calls being made, mean and SD of duration, number and entropy of phone calls) Phone usage (frequency and duration of smartphone usage in a day, duration of phone usage for each period (6–12 pm, 12–6 pm, 6–0 am), earliest and latest phone usage time) App usage (duration of social apps, content-providing apps, shopping apps, and entertainment apps) Sleep and step data (duration and ratio of both light and deep sleep, wake-up and sleep times)	GRU-Decay Precision: 0.61 Recall: 0.64 F1 score: 0.58 AUC: 0.65 Other models Precision: 0.57–0.71 Recall: 0.22–0.59 F1 score: 0.33–0.54 AUC: 0.54–0.59	In terms of recall, F1 score, and AUC, the sequence model based on GRU-Decay achieve the best performance
Weintraub et al. (2023)	Youth aged 13to 19 who had active mood symptoms, mood instability, and at least one parent with bipolar or MDD (n = 44)	Depressive symptoms	Use of machine learning to identify the speech features that most strongly correlated with concurrent depressive symptoms over 18 weeks	Machine learning algorithm	SVM	PSRs from the Adolescent Longitudinal Interval Follow-up Evaluation 20 speech features reflecting affective processes, social processes, drives, informal, time orientation words etc.	Strongest correlated combination of features: affective processes, drives, informal, leisure, and risk (r = 0.47, 95% CI: 0.37–0.56, R² = 0.12) Strongest association of features from subject’s first speech features: affective processes, nonfluencies, drives, and risks (r = 0.68, 95% CI: 0.48–0.81, R² = 0.11)	Speech features identified by machine learning analysis achieved moderate correlation
Jacobson et al. (2022) Note: Also included in the diagnosis domain	Participants aged 38.5 years old on average (n = 126,060)	MDD Generalized anxiety disorder Social anxiety disorder Panic disorder Borderline personality Paranoid personality disorder Schizophrenia	To examine the effectiveness of prediction of mental health outcomes based on exposure to online screening tools	Machine learning	RF Cox Proportional Hazards Models	Screening tool topic Screening tool attributes Hour of the day and day of the week at which the screening tool was clicked Whether the screening tool was a Mental Health America screening tool or from another online web domain Number of previous searches which resulted in a click to a screening tool Past interests, e.g., distribution of query topics prior to the clicking on the first screening tool by each use (to ascertain whether the online screen information provided incremental information to their general search pattern types)	Prediction accuracy was high for mental health self- references, self-diagnosis, and seeking care: screen content predicted later searches with mental health self-references (AUC =0·73), mental health self-diagnosis (AUC = 0·69), mental health care-seeking (AUC = 0·61) Other outcomes were more difficult to predict: psychoactive medications (AUC = 0·55), suicidal ideation (AUC = 0·58), and suicidal intent (AUC = 0·60) Cox proportional hazards models suggested individuals utilizing tools with in-person care referral were significantly more likely to subsequently search for methods to actively end their life (HR = 1·727)	Online screens may influence help-seeking behavior, suicidal ideation, and suicidal intent Websites with referrals to in-person treatments could put persons at greater risk of active suicidal intent
Nguyen et al. (2022)	Participants with MDD, included early onset (before the age of 30) and chronic (episode duration of two years) or recurrent (2+ episodes) disease episodes (n = 222)	MDD	To determine whether pretreatment reward task-based fMRI can predict treatment-specific outcome	Deep learning models	Feedforward neural network (a separate model was trained for each treatment: sertraline, bupropion, and placebo)	Reward task-based fMRI, which was acquired during a block-design number-guessing task that probes reward processing neural circuitry known to be altered in MDD Clinical measurements Demographic features	For predicting change in HAMD Model 1 on sertraline R²: 48%; RMSE: 5.15 Model 2 on bupropion R²: 34%; RMSE: 4.46 Model 3 on placebo R²: 28%; RMSE: 5.87	All the models explained a substantial proportion of the variance in change in HAMD. The combination of these predictive models presented a possible precision medicine approach for antidepressant selection, and each model would be applied to provide a prediction of response to each treatment
Webb et al. (2022)	School district employees aged 18 or above who owned a smartphone, had limited exposure to meditation app, and had depressive symptoms below the severe range (n = 662)	Depression and anxiety	To use a data-driven algorithm to predict which individuals are most likely to benefit from app-based meditation training	Machine learning	ENRR	Pre-intervention distress, anxiety, depression, stress, repetitive negative thinking, the mindfulness aspect of acting with awareness, loneliness, diffusion, presence, search for meaning, self-compassion, well-being, age, gender, race, marital status, and income Anxiety measure, PROMIS Depression measures, and 10-item Perceived Stress Scale	Multivariable ENRR model: Higher baseline levels of the following variables predicted a greater reduction in distress: distress (r = −0.30) depression (r = −0.30) stress (r = −0.26) Higher baseline scores of the following variables predicted greater reduction in distress in the control condition: diffusion, presence, distress, anxiety, stress, depression, and loneliness Linear regression mode: Higher levels of repetitive negative thinking predicted: a greater reduction in distress from the mindfulness app (B = −0.02) higher levels of repetitive negative thinking were significantly associated with poorer outcomes in the control condition (B = 0.01) Overall: A significant group with PAI interaction was observed linear regression model including repetitive negative thinking as the sole baseline predictor: r2 = 0.11 multivariable ENRR model: r2 = 0.10	Either the linear regression model with a single predictor of baseline levels of repetitive negative thinking, or the multivariable ENRR model with multiple predictors can predict changes in the level of distress
Athreya et al. (2021)	People with nonpsychotic MDD and received at least 8 weeks of treatment with a study drug, including SSRIs, SNRIs or TCAs, placebo (n = 3,518)	Depression	To identify specific depressive symptoms and thresholds of improvement that were predictive of antidepressant response	Machine learning	Gaussian mixture models Probabilistic graphical models	Four HDRS items (depressed mood, psychic anxiety, guilt feelings/ delusions, and work/activities) Thresholds of change in prognostic symptom severity, derived based on the absolute difference in median scores on symptom dynamic paths between baseline and four-week strata	Four depressive symptoms and specific thresholds of four-week change in each symptom predicted the eventual eight-week outcome of SSRI therapy with an average accuracy of 77%. The symptoms and thresholds derived from patients treated with SSRIs correctly predicted outcomes in 72% of patients treated with other antidepressants	Conjunction of the two AI models derived consistently high predictive accuracies across numerous commonly prescribed antidepressants, and hence interpretable and accurate prognoses of antidepressant treatment outcomes
Bao et al. (2021)	Depressive patients receiving six intravenous infusions of ketamine over 2 weeks (n = 83)	MDD	To identify a set of biomarkers that could be used to predict clinical outcomes for treatment in MDD	Machine learning	SVM RF kNN LR DT LR with EN	Age, sex, BMI, smoking status, and the HAMD score	Accuracy: SVM: 0.62 ± 0.23 RF: 0.56 ± 0.15 kNN: 0.63 ± 0.12 LR: 0.62 ± 0.12 DT: 0.57 ± 0.12 LR with EN: 0.63 ± 0.19	Machine learning approach could predict treatment outcomes of multiple ketamine infusions on the basis of the genotyping information
Lee et al. (2021)	Adults aged 18 to 65 with bipolar disorder (n = 60)	Bipolar depression	To identify biologically relevant moderators of response to TNF-α inhibitor, infliximab	Machine learning	CART	Plasma cytokine and neuronal origin-enriched extracellular vesicle protein concentrations, intervention assignment and week SHAPS MADRS	Accuracy of predicting reduction in anhedonic symptoms with baseline cytokine biotype, intervention allocation, week, and baseline and change in neuronal origin-enriched extracellular vesicle factor scores: r² = 0.22 RMSE = 0.08 No significant moderation effect is observed in MADRS total score by baseline biotype	Pretreatment biotypes, which derived from peripheral cytokine measurements, can predict antianhedonic efficacy with infliximab
Solomonov et al. (2021)	Older adults over 60 who suffered from unipolar, nonpsychotic MDD (n = 221)	Suicidal ideation	To identify baseline predictors of the course of suicidal ideation	Machine learning algorithms	LASSO RF GBM Classification tree	Demographics, treatment assignment, age of onset, length of current episode, number of previous episodes, severity of depression, disability, cognitive impairment, executive functioning, neuroticism, apathy, hopelessness, activation, avoidance/rumination, work/school impairment avoidance/rumination; social impairment; anhedonia; rumination response style scale; and digit span	Predictive performance: LASSO: AUC = 0.735 GBM: AUC = 0.725 RF: AUC = 0.684 Classification tree: AUC = 0.670	Four machine learning algorithms identified hopelessness, neuroticism, and low general self-efficacy as the strongest predictors of an unfavorable trajectory of suicidal ideation
Van Bronswijk et al. (2021)	Adult outpatients recruited from the mood disorders unit with a primary diagnosis of MDD (n = 151)	MDD	To extend the PAI to long-term depression outcomes after acute-phase psychotherapy	Two-step machine learning	RF Regression model	38 pretreatment variables from six domains: depression variables demographics psychological distress general functioning psychological processes life and family history	For parental alcohol abuse, the regression coefficients across the bootstrapped samples were stable with a positive value in 99.8% of the samples	A history of parental alcohol abuse was associated with higher BDI-II scores during the 17-month follow-up phase. Therefore, parental alcohol abuse could be used as a predictor for long-term depression outcomes following cognitive therapy and interpersonal psychotherapy
Busk et al. (2020)	Patients with bipolar disorder who had previously been treated (n = 15,975)	Bipolar disorder	To examine the feasibility of forecasting daily subjective mood scores based on daily self-assessments	Multi-task learning	Hierarchical Bayesian models	Daily self-assessments via Android smartphone app, including activity, alcohol, anxiety, irritability, cognitive difficulty, medicine intake, presence of mixed mood, mood, sleep, stress Clinical evaluations with HDRS and YMRS to assess depression and mania	Historical mood was the most important predictor of future mood, with self-reported mood scores and HDRS scores were negatively correlated (r = −0.40) whereas self-reported mood scores and YMRS scores were positively correlated (r = 0.22)	Application of hierarchical Bayesian models could forecast subjective mood for up to 7 days, thus improving continuous disease monitoring
Furukawa et al. (2020)	Patients aged 25 to 75 years, with nonpsychotic unipolar MDD episode, and having received no antidepressant, antipsychotic, or mood stabilizer in the previous month (n = 2,011)	MDD	To predict depression severity from a large set of baseline predictors through a web app	Machine learning	Penalized linear regression models using LASSO Penalized linear regression models using the ridge penalty SVM with a polynomial or radial kernel Artificial neural networks with one hidden layer, three or four nodes	Sociodemographic variables including age, sex, education, employment status, and marital status Baseline clinical characteristics include age at onset of depression, number of previous depressive episodes, length of index episode, and concurrent physical illness Depression characteristics by week three include individual item scores of PHQ–9 for the index episode; individual item scores of the BDI-II; individual item scores of the FIBSER; and adherence to pharmacotherapy	SVMs are observed with a lower prediction error in both internal and internal-external cross-validation (MAE = 1.5)	Three different SVMs with a radial kernel, one SVM per treatment arm, could be chosen to predict treatment outcome
Rajpurkar et al. (2020)	Outpatients aged 18 to 65 from primary or specialty care practices with a diagnosis of MDD (n = 518)	MDD	To identify the extent to which a machine learning approach can predict acute improvement for individual depressive symptoms with antidepressants based on pretreatment symptom scores and EEG measures	Machine learning	ELECTREE Score algorithm using GBDTs	Resting-state EEG continuously recorded Symptoms of HRSD–21	C index score, which is indicative of discriminative performance, was found for 12 symptoms. The highest C index score was found on: loss of insight (C index, 0.963 [95% CI 0.939–1.000]) unreality and nihilism (C index, 0.951 [95% CI, 0.932–0.976]) weight loss (C index, 0.923 [95% CI, 0.896–0.953]) The most critical predictor for each symptom was the baseline symptoms severity Any single EEG feature was higher than 5% predictors for seven symptoms Combination of EEG and baselines symptom feature significantly increased the C index for improvement in four symptoms: Energy loss (C index increase, 0.035 [95% CI, 0.011–0.059]) Appetite changes (C index increase, 0.017 [95% CI, 0.003–0.030]) Psychomotor retardation (C index increase, 0.020 [95% CI, 0.008–0.032]) Loss of insight (C index increase, 0.012 [95% CI, 0.001–0.020])	The machine learning model could predict the improvement in depressive symptoms most accurately with baseline symptom severity in combination with EEG features
Rozek et al. (2020)	Army soldiers reporting active suicide ideation with intent to die during the previous week and/or a suicide attempt during the previous month (n = 152)	Suicide	To examine predictors of suicidal behaviors among high-risk suicidal soldiers who received outpatient mental health services in a RCT of Brief CBT for Suicide Prevention compared to treatment as usual	Machine learning	MondoBrain Augmented Intelligence® System	BSSI-W Prior attempts Treatment Group SCS Sex	This combination of variables correctly classified eight of 26 participants who attempted suicide during the two-year follow-up period (30.8%) and misclassified only one of 126 participants who did not attempt suicide (0.8%), yielding 88.9% positive predictive value, and 87.4% negative predictive value	This combination of variables correctly classified almost one-third of participants who attempted suicide in the subsequent two years with good positive predictive value and negative predictive value
Browning et al. (2019)	Depressive patients whose treating clinician had made the decision to prescribe citalopram (n = 239)	Depression	To assess whether changes in emotional processing and subjective symptoms over the first week of antidepressant treatment predicts clinical response after four–eight weeks of treatment	Machine learning	SVM	QIDS-SR16, ECAT, EREC, FERT	Accuracy: QIDS-SR16: ~60% FERT: 70% ECAT & EREC: 50–60% QIDS-SR16 & FERT: 77% QIDS-SR16, FERT, ECAT & EREC: 79%	Cognitive and symptomatic measures were possible to be used in guiding antidepressant treatment in depressed patients
Foster et al. (2019)	Adolescents aged 12–17 with MDD (n = 439)	MDD	To estimate patient-specific inter-treatment differences among three treatment conditions: CBT, FLX, and the combination of CBT and FLX, as a function of patients’ baseline characteristics	Machine learning	Model-based Random Forest	Gender, race, family income, referral source, dysthymia, anxiety disorder, ADHD, childhood trauma, study site, age, verbal intelligence, current episode duration, baseline depression severity, functional impairment, suicidal ideation, melancholic features, number comorbid diagnoses, caregiver depression, conflict with caregiver, hopelessness, cognitive distortions, treatment expectations from parent, treatment expectations from adolescents	FLX-CBT difference: FLX was more effective (b = −0.13, 95% CI: −0.22 to −0.05), especially with more severe baseline depression CB -combination difference: Combination was more effective (b = −0.25, 95% CI: −0.33 to −0.17) FLX-combination difference: Combination was more effective (b = −0.11, 95% CI: −0.21 to −0.02), especially with less severe baseline depression and higher treatment expectations from patients	Combined treatment with CBT and FLX was consistently superior to either therapy administered alone across a broad range of patients
Vitinius et al. (2019)	Depressed patients with CAD (n = 570)	Depression	To identify somatic and sociodemographic predictors of depression outcome among depressed patients with CAD	Machine learning	LR and linear or binomial linear model with LASSO regularization	141 potential sociodemographic and somatic predictors including blood tests, medical history, current drug use, comorbidities, and sociodemographic data. HADS	Predictors to favorable depression outcome: higher heart rate variability during numeracy tests (p = 0.020), unknown previous myocardial infarction (p = 0.013), higher age (p = 0.002) Predictors to unfavorable depression outcome: anticholinergic drugs (p = 0.045), state after resuscitation (p ≤ 0.042), uric acid drugs (p ≤ 0.039), beta blockers (p = 0.035), New York Heart Association (NYHA) class III (p ≤ 0.028), analgesic drugs (p = 0.027), antidiabetic drugs (p = 0.015), higher triglycerides (p = 0.014), intake of thyroid hormones (p = 0.007), and hyperuricemia (p ≤ 0.003)	Machine learning could identify somatic and sociodemographic predictors of depression outcome in patients with CAD
Bailey et al. (2018)	Patients with TRD and healthy controls aged 20 to 72 with normal or corrected to normal vision (n = 50)	Depression	To determine whether working memory related power, connectivity, and theta- gamma coupling measures could be used to predict responders to rTMS treatment for treatment-resistant depression	Multivariate machine learning	SVM	Mood: Montgomery-Asberg depression rating scale Behavior: working memory accuracy, average reaction time EEG: alpha, theta, and gamma power, connectivity, and theta-gamma coupling	Prediction of individual responders: mean sensitivity: 0.91 (±0.06 SD) specificity: 0.92 (±0.02 SD) balanced accuracy: 91% (±3.64 SD)	Baseline and week 1 frontal-midline theta power and theta connectivity showed good potential for predicting response to rTMS treatment for depression
Kautzky et al. (2018)	Patients diagnosed with MDD (n = 55)	MDD	To generate a prediction model for TRD using machine learning featuring a large set of clinical and sociodemographic predictors of treatment outcome	Machine learning	RF	47 predictors documented in the GSRD database, which can be classified into: Sociodemographic MDD history Axis II comorbidity Axis III comorbidity Clinical features Other predictors, e.g., inpatient or outpatient, quality of social life, quality of work life, quality of family life, retrospective MADRS score	The full model with 47 predictors yielded an accuracy of 75.0% for predicting TRD and treatment response, with positive predictive value of 79.6%, and negative predictive value of 67.9% When the number of predictors was reduced to 15, accuracies between 67.6% and 71.0% were attained for different test sets	Machine learning techniques have shown promising results on prediction of TRD by considering interaction and main effects equally and producing reliable classification with high accuracy
Lenhard et al. (2018)	Adolescents with aged 12–17 with OCD and had received either immediate or delayed (12 weeks) internet-delivered CBT (n = 61)	Pediatric OCD	To test four different machine learning methods in the prediction of treatment response in a sample of pediatric OCD patients who had received internet-delivered CBT	Machine learning	Linear model with best subset predictor selection L1 Elastic Net (LASSO) RF SVM	46 demographic and clinical baseline variables, related to: Parental education level Referral to study Medication Previous treatment experience Comorbidity Number of comorbid diagnoses Baseline OCD symptoms Clinical Global Impression Self-rated baseline measures Parent-rated baseline measures Outcome at posttreatment Outcomes at three-month follow-up	Accuracy: Linear model with best subset predictor selection: 83% L1 Elastic Net (LASSO): 75% RF: 75% SVM: 75%	Machine learning models were able to predict treatment outcome in internet-delivered CBT for pediatric OCD with good to excellent accuracy
Maciukiewicz et al. (2018)	Individuals diagnosed with MDD from three clinical trials who received duloxetine or placebo for up to eight weeks (n = 186)	MDD	To use supervised machine learning to build predictive models of duloxetine outcome for MDD with genome-wide data	Machine learning models	LASSO regression CRT SVM	SNPs	Accuracy on remission prediction: CRT = 0.51 SVM = 0.52 Accuracy on prediction of treatment response accuracy: CRT = 0.57 SVM = 0.64 (chance accuracy = 0.57) Of the 19 most robust SNPs, 17 were characterized by large LASSO coefficients	None of the machine learning models performed satisfactorily in remission prediction. For treatment response, SVM achieved moderate performance whereas CRT’s performance was just equal to chance accuracy
Nie et al. (2018)	STAR*D cohort: Patients with MDD. RIS-INT–93 cohort: Patients with MDD and had history of resistance to therapy with antidepressant medication and were treated prospectively with citalopram for up to six weeks (n = 5686)	MDD	To identify risk factors of treatment resistance by extending the work in predictive modeling of treatment-resistant depression via partition of the data from the STAR*D cohort and completely independent cohort RIS-INT–93 into training and testing datasets	Machine learning	l₂ penalized LR RF GBDT XGBoost EN	CRS, demographics, PHX, MHX, PRISE, PDSQ, baseline and week two of level 1 treatment which include records from Clinic Visit Form, QIDS-C₁₆, QIDS-SR₁₆, Bech melancholia scale, the Maier-Phillipp severity subscale, the Santen Subscale, the Gibbons’ global depression severity scale, HAM-D₇	STAR*D testing dataset and RIS-INT–93 independent dataset with an AUC of 0.70–0.78 and 0.72–0.77, respectively	The series of machine learning models were able to predict treatment-resistant depression using clinical and sociodemographic data
Chekroud et al. (2016)	STAR*D trial: Patients from primary and psychiatric care settings, with nonpsychotic MDD, with at least 14 score on 17-item HAMD, and aged 18–75 COMED trial: Patients with nonpsychotic MDD, had recurrent or chronic depression, with at least 16 scores on 17-item HAMD, and aged 18–75 (n = 4041)	MDD	To develop an algorithm to assess whether patients will achieve symptomatic remission from a 12-week course of citalopram	Machine learning	EN	Overlapping variables in the two clinical trials including sociodemographic features, DSM-IV-based diagnostic items, depressive severity checklists, eating disorder diagnoses, whether the patient had previously taken specific antidepressant drugs, the number and age of onset of previous major depressive episodes, and the first 100 items of the psychiatric diagnostic symptom questionnaire	Accuracy in internal validation: STAR*D cohort: 64.6% Accuracy in external validation: COMED cohort (escitalopram treatment group): 59.6% COMED cohort (escitalopram-bupropion treatment group): 59.7% COMED cohort (venlafaxine-mirtazapine treatment group): 51.4%	Machine learning achieved moderate performance for internal prediction. The performance across cohort varied for different treatment groups showed fair to moderate accuracy
Iniesta et al. (2016)	Treatment-seeking adults with MDD and a current depressive episode (n = 793)	MDD	To optimize prediction of symptom improvement and remission during treatment with escitalopram or nortriptyline	Machine and statistical learning	ENRR	Demographics data including current age, age at onset of depression, sex, smoking status, BMI, occupation, marital status, years of education and number of children Baseline severity measures including the clinician-rated MADRS, the 17-item HRSD and the self-report BDI Individual depressive symptoms from the SCAN interview and depression subtypes Observed mood, cognitive and neurovegetative symptom factors, and six dimensions (mood, anxiety, pessimism, interest-activity, sleep, and appetite) from a published factor analysis Stressful life events experienced during the six months prior to the baseline assessment, measured with the LTE-Q Medication history included the use of antidepressant at the time of recruitment, any prior antidepressant treatment, number and types of antidepressants tried established with Medication History Form	Accuracy of prediction on different outcomes: Reduction in depressive symptoms: a model including 29 of the 60 predictors explained a 3.85% of the variance in MADRS scores change across treatment arms Remission: AUC = 0.72, R² = 0.15 Predictors with strong contribution: Symptoms of depressed mood, reduced interest, decreased activity, indecisiveness, pessimism, and anxiety significantly predicted symptom improvement BMI, appetite, interest-activity symptom dimension, and anxious-somatizing depression subtype predicted remission	Easily obtained demographic and clinical variables could predict therapeutic response to escitalopram with clinically meaningful accuracy
Amminger et al. (2015)	Individuals with ultra-high risk for psychosis and meeting at least one operationally defined groups of risk factors for psychosis: Attenuated positive psychotic symptoms Transient psychosis Genetic risk plus a significant decrease in functioning (n = 81)	Psychosis	To determine biological and clinical factors associated with treatment response indexed by functional improvement in a pre–post examination of a 12-week intervention in individuals at ultra-high risk for psychosis	Machine learning	Linear regression models Gaussian Process Classification	Erythrocyte fatty acid composition of the phosphatidylethanolamine phospholipid fraction	Univariate analysis: Variance in prediction of functional improvement: In ω–3 PUFA group: ALA and negative symptoms explained 14% and 10% of the variance In-placebo group: Positive symptoms and functioning explained 23% and 11% of the variance Multivariate analysis: Overall accuracy of fatty acid prediction in treatment response: In ω–3 PUFA group: 86.7% In-placebo group: 79.2%	Univariate analysis: Higher levels of erythrocyte membrane ALA (parent fatty acid of the ω–3 family) and more severe negative symptoms at baseline predicted subsequent functional improvement in the treatment group Less severe positive symptoms and lower functioning at baseline were predictive on functional improvement in the placebo group Multivariate analysis: Fatty acids predicted response to treatment in both ω–3 PUFA and placebo groups with a high level of accuracy
Guilloux et al. (2015)	Anxious-depressed adults with nonpsychotic MDD episode of sufficient severity (score ≥ 15 on the 25-item HRSD) and elevated symptoms of panic or anxiety (score ≥ 7 on the past-month panic and agoraphobic spectrum self-report) Nonpatient controls not meeting criteria for any mood or anxiety disorder (n = 67)	MDD	To identify the biomarkers predicting nonremission prior treatment initiation	Machine learning prediction model	Random intercept model SVM	Peripheral blood-based gene expression	The results from these studies indicate an average cross-validated accuracy (i.e., model selection bias corrected) of 79.4% in predicting remission status, with the 13-gene model displaying the highest individual noncorrected prediction value (88%). The newly built prediction model in the validation cohort using the same 13 genes identified in the initial cohort, and found through another round of leave-one-out cross-validation that a 6-gene model achieved the highest accuracy (76.2%)	At pretreatment assessment, the gene expression profiles obtained from blood samples of MDD subjects who will not attain remission after treatment differ from nondepressed controls and also from MDD patients who will remit with treatment Six out of 13 genes identified in the initial cohort could predict remission in an independent cohort, which demonstrated the potential of pretreatment peripheral gene expression profiles to predict nonremission following an eight- to 12-week course of citalopram treatment

Abbreviations: ADHD: Attention-Deficit/Hyperactivity Disorder; AIMS: Abnormal Involuntary Movement Scale; ALA: α-linolenic acid; ANN: Artificial neural network; AUC: Area under the receiver operating characteristic curve; BARS: Barnes Akathisia Rating Scale; BDI: Beck Depression Inventory; BMI: Body mass index; BSSI-W: Beck Scale for Suicide Ideation, Worst Point; CAD: Coronary artery disease; CART: Classification and regression trees; CBT: Cognitive behavioral therapy; CDSS: Sum of Calgary Depression Scale for Schizophrenia; CGI: Clinical Global Impression; COSTA: Cognitive Style Assessment measuring cognitive distortions; CRS: Cumulative Illness Rating Scale; CRT: Classification and regression tree; DT: Decision tree; EBI: Emotional Breakthrough Index; ECAT: Emotional categorization task; EEG: Electroencephalographic; EMA: Ecological Momentary Assessment; EN: Elastic net; ENRR: Elastic net regularized regression; EREC: Emotional recall task; FERT: Face-based emotional recognition task; FFMQ: Five Factor Mindfulness Questionnaire; FLX: Fluoxetine; fMRI: Functional magnetic resonance imaging; GAD: Generalized anxiety disorder; GAF: Global Assessment of Functioning; GBDT: Gradient-boosted decision trees; GRU: Gated Recurrent Unit; GSRD: Group for the Study of Resistant Depression; HADS: Hospital Anxiety and Depression Scale; HAMD: Hamilton Depression Rating Scale; HDRS: Hamilton Depression Rating Scale; HRSD: Hamilton Rating Scale for Depression; kNN: K-nearest neighbor; LASSO: Least absolute shrinkage and selection operator; LR: Logistics regression; LSTM: Long Short-Term Memory; LTE-Q: List of Threatening Experiences Questionnaire; MADRS: Montgomerye-Åsberg Depression Rating Scale; MAPE: Mean absolute percent error; MDD: Major depressive disorder; MEM: Mixed-effects linear regression models; MHX: Medication history; NLP: Natural language processing; NPRS: Numerical pain rating scale; ODI: Oswestry Disability Index; OCD: Obsessive-compulsive disorder; PAI: Personalized Advantage Index; PANSS: Positive and Negative Syndrome Scale; PDSQ: Psychiatric Diagnostic Screening Questionnaire; PHQ-9: Personal Health Questionnaire-9; PHX: Psychiatric history; PRISE: Patient Rated Inventory of Side Effect; PROMIS: Patient-Reported Outcomes Information System; PRS: Polygenic risk score; PSEQ: Pain Self-Efficacy Questionnaire; PSP: Personal and Social Performance; PSRs: Psychiatric Status Ratings; QIDS-C₁₆: Quick Inventory of Depressive Symptomatology (Clinician-Rated); QIDS-SR₁₆: Quick Inventory of Depressive Symptomatology (Self-assessment); QoL: Quality of life; RCT: randomized controlled trial; RF: Random Forest; rTMS: Repetitive transcranial magnetic stimulation; RMSE: Root mean squared error; RNN: Recurrent neural networks; SCAN: Schedules for Clinical Assessment in Neuropsychiatry; SCS: Suicide Cognitions Scale; SEWIP: Scale for the Multiperspective Assessment of General Change Mechanisms in Psychotherapy; SHAPS: Snaith Hamilton Pleasure Scale; SICD: Structured clinical interview for DSM-IV; sMRI: Structural Magnetic Resonance Imaging; SNPs: Single nucleotide polymorphism; SNRIs: Serotonin-norepinephrine reuptake inhibitors; SPE: Subjective Prognostic Employment Scale; SSRIs: Selective serotonin reuptake inhibitors; SVM: Support vector machine; TCAs: Tricyclic antidepressants; TNF: Tumor necrosis factor; TRD: Treatment-resistant depression; XGBoost: Extreme gradient boosting; YMRS: Young Mania Rating Scale; ω-3 PUFA: Omega-3 polyunsaturated fatty acids.