Skip to main content
. Author manuscript; available in PMC: 2020 Nov 7.
Published in final edited form as: Curr Psychiatry Rep. 2019 Nov 7;21(11):116. doi: 10.1007/s11920-019-1094-0

Table 2.

Summary of Studies of AI and Mental Health

Authors Primary
Goal
Study
Setting
Subjects Sample
size (n)
Age Range Predictors AI
Method
Validation Best
algorithm/
performance
Primary
Conclusions
SML
UML
DL
NLP
CompV
cv In sample test Out of sample test
Clinical Assessments
A. Electronic health record (EHR) data
Arun et al., 2018 [65] Predict depression (Euro-depression inventory) from hospital EHR data South India Research Unit, CSI Holdworth Memorial Hospital, Mysuru Persons born between 1934-1966, from the MYNAH database 270 patient records 27 – 67 yrs Depression, physical frailty, pulmonary function, BMI, LDL SML X XGBoost: accuracy= 98% SN=98% SP=94% Clinical measures can be used to distinguish whether someone has depression.
Choi et al., 2018 [66] Predict probability of suicide death using health insurance records S. Korea Random sample from health insurance registry of all S. Korean residents Subjects in the National Health Insurance Service (NHIS)–Cohort Database from 2004-2013; Suicide deaths based on ICD-10 codes 819,951 (573,965 training; 1782 suicide deaths) (245,986 testing; 764 suicide deaths) 14+ yrs of age Baseline sociodemographic, ICD-10 coded medical conditions SML DL Stats: cox regression X X Cox regression: training AUC=0.72 Testing: AUC=0.69 Male sex, older age, lower income, medical aid insurance, & disability were linked with suicide deaths at 10-year follow-up.
Fernandes et al., 2018 [67] Detect suicide ideation & attempts using NLP from EHRs South London (Lambeth, Southwark, Lewisham, & Croydon) The Clinical Record Interactive Search (CRIS) system from the South London & Maudsley (SLaM) NHS Trust 500 events & correspondence documents Not reported Suicide terms from epidemiology literature, documents of patients with past suicide attempts, clinician suggestions NLP SML SVM: Suicide ideation SN=88% precision= 92% Suicide attempts SN=98% Precision= 83% NLP approaches can be used to identify & classify suicide ideation & attempts in EHR data
Jackson et al., 2016 [68] Identify symptoms of SMI from clinical EHR text using NLP 23,128 discharge sumaries from 7962 patients with SMI; 13,496 discharge sumaries from 7575 patients non-SMI Not reported SMI symptoms identified by a team of psychiatrists NLP SML X Extracted data for 46 symptoms with a median F1= 0.88 precision= 90% recall=85% Extracted symptoms in 87% patients with SMI & 60% patients with non-SMI diagnosis NLP approaches can be used to extract psychiatric symptoms from EHR data
Kessler et al., 2017 [69] Identify veterans at high suicide risk from EHRs Harvard medical school Data from: US Veterans Health Administration (VHA) National Death Index (NDI; CDC & Department of HHS, 2015) as having died by suicide in fiscal years 2009–2011 6,360 cases 2,108,496 controls (1% probability sample) Not reported VHA service use, sociodemographic variables SML Stats: penalized logistic regression model X X Previous study McCarthy et al. (2015) BART: best sensitivity 11% of suicides occurred among 1% of Veterans with highest predicted risk & 28% among 5% with highest predicted risk A different ML model can predict suicide based on fewer predictors than the McCarthy 2015 model.
Sau & Bhakta 2017 [70] Predict depression from sociodemographic variables & clinical data Kolkata, India Bagbazar Urban Health & Training Centre Older adults (43% F) living in a slum with or without depresssion based on GDS score 105 66.6 ± 5.6 yrs Sociodemographic & physical comorbidities DL X ANN: Accuracy= 97% AUC=0.99 Sociodemographic & comorbid conditions can be used to predict the presence of depression.
B. Mood rating scales
Chekroud et al., 2016 [71] Predict whether patients with depression will achieve symptomatic remission after a 12-week course of citalopram Training data from Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial Testing data from Combining Medications to Enhance Depression Outcomes (CO-MED) trial Adults with MDD per DSM-IV criteria. Remission based on QIDS-SR follow-up score Training n=1,949 Test n=151 18 to 75 yrs Sociodemographic variables, psychiatric history, mood ratings SML X X GBM: Training accuracy =65% AUC=0.70 p<9·8 × 1033 SN=63% remission & SP=66% non-remitters test accuracy= 60%, p<0.04 Escitalopram + buproprion accuracy = 60%, p<0.02 Venlafaxine + Mirtazapine accuracy= 51%, p=0.53 Model based on clinical history, sociodemographics, & mood can predict which patients with MDD will respond & remit after taking citalopram.
Chekroud et al., 2017 [72] Determine efficacy of antidepressant treatments on empirically defined groups of symptoms & examine replicability of these groups Adults with MDD based on DSM-IV diagnosis. Training 63% F Testing: 66% F Training n=4,039 Testing n=640 Training 41.2 ± 13.3 yrs Testing 42.7 ± 12.2 yrs Items from the QIDS-SR & HAM-D UML SML X X 3 clusters in QIDS-SR (core emotional, insomnia, & atypical symptoms) identified in training 3 clusters replicated in testing GBM: sleep symptom cluster most predictable (R2=20%; p <.01) Antidepressants (8 of 9) more effective for core emotional symptoms than for sleep or atypical symptoms 3 patient clusters (based on type of depressive symptoms) had varied responses to different antidepressants.
Zilcha-Mano et al., 2018 [73] Predict who will respond better to placebo or medication in antidepressant (citalopram) trials 15 clinical sites in US (8-week placebo-controlled RCT trial of citalopram) Community-dwelling older adults (75+ yrs) with unipolar depresssion, diagnosed by HAM-D Trajectory based on weekly HAM-D scores (58% F) 174 79.6 ± 4.4 yrs Sociodemographic, baseline depression, anxiety, cognition, IADLs. SML X Medication superior for those with ≤12 years education & longer duration depression (>3.57 yrs) (B = 2.5, t(32) = 3.0, p = 0.004). Placebo best for those with >12 years education; almost outperformed medication (B = −0.57, t(96) = −1.9, p = 0.06) Patients with less education & longer duration of depression more likely to respond to citalopram (than placebo). Patients with more education more likely to respond to placebo (than citalopram).
Research Assessments
C. Brain Imaging
Drysdale et al., 2017 [74] Diagnose subtypes of depression with biomarkers from fMRI 5 academic sites in US & Canada Adults with MDD based on DSM-IV (59% F) HC (58% F) 1,188 Training (711; n = 333 with depresssion; n = 378 HC Test (477; n=125 depression; n=352 HC) Training mean= 40.6 yrs depression Mean= 38.0 yrs HC Connectivity in limbic and fronto-striatal networks from fMRI data UML SML X X UL: 4-cluster solution SVM: training accuracy= 89% SN=84–91% & SP=84–93% test: accuracy= 86% Different patterns of fMRI connectivity may distinguish biotypes of MDD with different clinical features & responsiveness to TMS therapy.
Kalmady et al., 2019 [75] Classify SZ using fMRI data National Institute of Mental Health & Neurosciences (NIMHANS, India) SZ who are antipsychotic drug-naïve & met DSM-IV criteria Age & sex-matched HC 81 SZ 93 HC Not reported Regional activity & functional connectivity from fMRI data SML X Ensemble model: accuracy= 87% (vs. chance 53%) fMRI measures can distinguish between SZ & HC
Dwyer et al., 2018 [76] Identify neuroanatomical subtypes of chronic SZ; determine if subtypes enhance computeraided discrimination of SZ from HC Publicly available US data repository of the Mind Research Network & the University of New Mexico Adults with SZ based on DSM-IV (20% F) HC (31% F) n=71 schizophrenia n=74 HC 316 test set 38.1 ± 14 years schizophrenia 35.8 ± 11.6 HC Brain volume measures based on structural MRI data UML SML X X UL: 2 subgroups SVM: training subgroup improved accuracy 68%-73% (subgroup 1) & 79% (subgroup 2) testing: accuracy decreased: 64%-71% (subgroup 1) & 67% (subgroup 2) Two neuroanatomical subtypes of SZ have distinct clinical characteristics, cognitive & symptom courses.
Nenadic et al., 2017 [77] Detect accelerated brain aging in SZ compared to BD or HC using BrainAGE scores Jena University Adults with SZ or BD type I based on DSM-IV criteria & HCs (42% F) 137 (45 SZ, 22 BD, 70 HC) Sz mean 33.7±10.5, (21.4–64.9); HC mean 33.8±9.4, (21.7–57.8); BP mean 37.7±10.7, (23.8–57.7) Structural MRI data SML: Stats: ANOVA RVR: no accuracy reported significant effect of group on BrainAGE score (ANOVA, p = 0.009) SZ had higher mean BrainAGE score than both BD & HC Using a brain aging algorithm derived from structural MRI data, different diagnostic groups can be compared.
Patel et al., 2015 [78] Predict late-life MDD diagnosis from multi-modal imaging & network-based features Pittsburgh, PA, USA MRI laboratory Late-life MDD based on DSM-IV criteria age-matched HC 68 Depression n=33 HC n=35 Not reported Multi-modal MRI data (functional connectivity, atrophy, integrity, lesions) SML X ADTree: diagnosis accuracy= 87% treatment response accuracy= 89% MRI data can distinguish late-life depression patients from HCs.
Cai et al., 2018 [79] Classify depression vs. HC using EEG features from the prefrontal cortex China Laboratory setting (quiet room) Existing Psycho-physiological database of 250 individuals 213 (92 with clinically diagnosed depresssion 121 HC) Not reported EEG data while at rest & with sound stimulation SML UML X KNN: average accuracy= 77% EEG patterns can distinguish between persons with depression & HCs.
Erguzel et al., 2016 [80] Classify unipolar MDD & BD using EEG data Istanbul, Turkey Out-patients with MDD or BD, based on DSM-IV criteria. Medication free for ≥ 48 hrs 89 Not reported EEG data over 12-hr period SML X ANN: AUC=0.76 no feature selection AUC=0.91 with feature selection EEG patterns may distinguish between MDD & BD patients.
D. Novel monitoring system
Bain et al., 2017 [81] RCT of medication adherence in subjects with SZ using a novel AI platform AiCure 10 US sites Medication adherence in a 24-week clinical trial of drug ABT-126 [ClinicalTrials.gov NCT01655680]) Stable adult out-patients with SZ who do not smoke (45% F) 75 (53 monitored with AI platform 22 monitored with directly observed therapy) 45.9 ± 10.9 years Medication adherence monitored by modified directly observed therapy (mDOT) CompV 90% adherence for subjects monitored with AI platform; 72% for subjects monitored by mDOT A novel AI platform has better medication adherence than directly observed therapy in persons with SZ.
Kacem et al., 2018 [82] Predict depression severity from automated assessments of psychomotor retardation using video data Not reported Recruited from a clinical trial of depression treatment Adults with MDD based on DSM-IV criteria Depression severity based on HAM-D scores (60% F) 126 sessions from 49 participants: (56 moderate-severely depressed, 35 mildly depressed, & 35 remitted) Not reported Measurement of face & head motion based on video recordings SML X SVM: accuracy of facial movement= 66% head movement= 61% Combined= 71% Highest accuracy for severe vs. mild depression 84% Facial (but not head) movements may be used to distinguish severity levels of depression.
Chattopadhyay 2017 [83] Mathematically model how psychiatrists clinically perceive depression symptoms & diagnose depression states India Hospital Adults with depresssion, based on DSM-IV criteria. Depression severity rated by clinicians HC—not described 302 depression 50 HC 19-50 yrs Psychiatrists’ ratings of individual symptoms DL X Fuzzy neural hybrid model: accuracy= 96% The link between clinicians’ assessments of symptoms & overall depression severity can be modeled by AI.
Wahle et al., 2016 [84] Identify subjects with clinically meaningful depression from smartphone data Zürich, Switzerland Community smart-phone usage over 4 weeks Clinically depressed adults from Switzerland & Germany Change in depresssion based on PHQ-9 scale (64% F) 28 (64% F) 20 to 57 yrs Smartphone usage, accelerometer, Wi-Fi, & GPS data (movement, activity) SML X RF accuracy= 62% Smartphone sensor data can distinguish between those with & without depression at follow-up.
E. Social Media
Cook et al., 2016 [85] Predict suicide ideation & heightened psychiatric symptoms from survey data & text messaging data Madrid, Spain Community text message data over 12 months Adults (65% F) with recent hospital-based treatment for self-harm who endorsed suicidal ideation (SI) OR did not endorse SI based on text or GHQ-12 1,453 n=609 never suicidal n=844 suicidal at some point half of data used for training; half testing 40.5 yrs 40.0 ± 13.8 yrs never suicidal 41.6 ± 13.9 yrs suicidal Survey (sleep, depressive symptoms, medications), & text response to “how are you feeling today?” NLP SML X multivariate logistic model: suicide ideation (structured better) PPV=0.73, SN=0.76, SP=0.62 Heightened psychiatric symptoms (structed better) PPV=0.79, SN=0.79, SP=0.85 NLP-based models of unstructured texts have high predictive value for SI, & may require less time & effort from subjects.
Aldarwish & Ahmed 2017 [86] Identify social network users with depression based on their posts Saudi Arabia Posts from Saudi Arabian social network users Training: Depressed post if ≥ 1 DSM-IV MDD symptom mentioned Testing: Depressed subject based on BDI-II scale Training set= 6773 posts (2073 depressed, 2073 not depresssed) Testing set=30 (15 depressed, 15 not depresssed) Not reported Social network posts from LiveJournal, Twitter, & Facebook NLP SML NB: accuracy= 63% precision= 100% recall= 58% Social media posts could be used to identify which users are depressed.
Deshpande et al., 2017 [87] Classify Tweets which demonstrate signs of depression & emotional ill-health from those that do not Twitter API platform Tweets from all over the world collected at random Categorized based on a curated word-list that suggest poor mental health 10,000 Tweets (8,000 training; 2,000 test) Not reported Unstructured text (Tweet) NLP SML X NB: Precision= 0.836 Accuracy= 83% Text-based emotion can detect depression from Twitter data.
Dos Reis & Culotta 2015 [88] Detect mood from Twitter data & examine effect of users’ physical activity on mental health Twitter platform Twitter users who are: Physically active, based on hashtags for activity tracking apps Control users who are not active n=1,161 active n=1,161 controls Matched based on gender, location, & online activity Not reported 2,367 unstructured text (Tweets) that were hand-classified as expressing either anxiety, depression, anger, or none SML Stats: Wilcoxon singed rank X Logistic regression classifier: hostility AUC= 0.901 dejection AUC= 0.870 anxiety AUC= 0.850 physically active users had 2.7% fewer anxious tweets & 3.9% fewer dejected tweets than a matched user Social media posts can be used to infer negative mood states. Physically active social media users post fewer Tweets reflecting negative mood states.
Gkotsis et al., 2017 [92] Classify mental health-related Reddit posts according to theme-based subreddit (topic-specific) groupings Reddit dataset from (https://www.reddit.com/dev/api) from 1/1/2006 to 8/31/2015 Reddit users 80/20 training/ testing split 348,040 users 458,240 mental health-related posts 476,388 non mental health-related posts Not reported Identified subreddits related to mental health using keywords Semi-SML DL X CNN: accuracy = 91.08% distinguishing mental health posts precision= 0.72 recall=0.71 for which theme a post belonged to Can distinguish mental health-related Reddit posts from unrelated posts as well as the mental health theme they relate to; identified 11 mental health themes
Mowery et al., 2016 [89] Classify whether a Twitter post represents evidence of depression & depression subtype Depressive Symptoms & Psychosocial Stressors Associated with Depression (SAD) dataset User information not reported Tweets classified using linguistic annotation scheme based on DSM-5 & DSM-IV criteria. 9,300 tweets queried using a subset of the Linguistic Inquiry Word Count Not reported Unstructured text (Tweet) NLP SML X SVM: F1 score=52 for a tweet with evidence of depression Text analysis of tweets can be used to identify depressive symptoms & subtype.
Ricard et al., 2018 [90] Predict depression from community-generated vs. individual-generated social media content Dartmouth Hanover, NH Clickworker crowd-sourcing platform Participants on the Clickworker crowd-sourcing platform MDD by PHQ-8 (69% F) 749 10% (78/749) held out as a test set 26.7 ± 7.29 yrs Unstructured text data (Instagram posts & comment), demographics, other survey data NLP SML X X Training: not reported Testing: Elastic-net RLR model: community-generated AUC=0.71, p<0.03 Combination AUC=0.72, p<0.02 User-generated AUC=0.63, p=0.11 Instagram posts (both user-generated & community-generated content) can distinguish people with depression.
Tung & Lu 2016 [91] Predict depression tendency from web posts PTT online discussion forum Chinese web posts between 2004-2011 724 posts selected as training/ test data Annotated as T/F for depresssion tendency Not reported Unstructured text data (posts) NLP EDDTW highest recall= 0.67 & F measure= 0.62 DSM precision= 0.666 NLP of web posts can identify depressive tendencies.

ADTree=alternating decision tree; ANN=artificial neural network; BAO=Beck anxiety inventory; BDI=Beck depression inventory; cTAKES=clinical text analysis knowledge extract system; CompV=computer vision; DL=deep learning; EDDTW= event-driven depression tendency warning; GAD-7=generalized anxiety disorder; GHQ-12=general health questionnaire; GMM=Gaussian mixture models; HAMD=Hamilton rating scale for depression; HC=healthy control; HHS=health and human services; JSON=JavaScript Object Notation; LES=life event scale; LDA=linear discriminant analysis; MDD=major depressive disorder; MMSE=mini mental state examination; NLP=natural language processing; PANSS=positive and negative sydrome scale; PHQ-9=patient health questionnaire; PPV=positive predictive value; PSQI=Pittsburg sleep quality index; QIDS-SR= Quick Inventory of Depressive Symptomatology; SL=supervised learning; SMI=severe mental illness; SN=sensitivity; SP=specificity; SCID-I=Structured Clinical Interview for Axis I Disorders; SVM=support vector machine; UL=unsupervised learning