Table 2.
Authors | Primary Goal |
Study Setting |
Subjects | Sample size (n) |
Age Range | Predictors | AI Method |
Validation | Best algorithm/ performance |
Primary Conclusions |
||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SML UML DL NLP CompV |
cv | In sample test | Out of sample test | |||||||||
Clinical Assessments | ||||||||||||
A. Electronic health record (EHR) data | ||||||||||||
Arun et al., 2018 [65] | Predict depression (Euro-depression inventory) from hospital EHR data | South India Research Unit, CSI Holdworth Memorial Hospital, Mysuru | Persons born between 1934-1966, from the MYNAH database | 270 patient records | 27 – 67 yrs | Depression, physical frailty, pulmonary function, BMI, LDL | SML | X | XGBoost: accuracy= 98% SN=98% SP=94% | Clinical measures can be used to distinguish whether someone has depression. | ||
Choi et al., 2018 [66] | Predict probability of suicide death using health insurance records | S. Korea Random sample from health insurance registry of all S. Korean residents | Subjects in the National Health Insurance Service (NHIS)–Cohort Database from 2004-2013; Suicide deaths based on ICD-10 codes | 819,951 (573,965 training; 1782 suicide deaths) (245,986 testing; 764 suicide deaths) | 14+ yrs of age | Baseline sociodemographic, ICD-10 coded medical conditions | SML DL Stats: cox regression | X | X | Cox regression: training AUC=0.72 Testing: AUC=0.69 | Male sex, older age, lower income, medical aid insurance, & disability were linked with suicide deaths at 10-year follow-up. | |
Fernandes et al., 2018 [67] | Detect suicide ideation & attempts using NLP from EHRs | South London (Lambeth, Southwark, Lewisham, & Croydon) | The Clinical Record Interactive Search (CRIS) system from the South London & Maudsley (SLaM) NHS Trust | 500 events & correspondence documents | Not reported | Suicide terms from epidemiology literature, documents of patients with past suicide attempts, clinician suggestions | NLP SML | SVM: Suicide ideation SN=88% precision= 92% Suicide attempts SN=98% Precision= 83% | NLP approaches can be used to identify & classify suicide ideation & attempts in EHR data | |||
Jackson et al., 2016 [68] | Identify symptoms of SMI from clinical EHR text using NLP | 23,128 discharge sumaries from 7962 patients with SMI; 13,496 discharge sumaries from 7575 patients non-SMI | Not reported | SMI symptoms identified by a team of psychiatrists | NLP SML | X | Extracted data for 46 symptoms with a median F1= 0.88 precision= 90% recall=85% Extracted symptoms in 87% patients with SMI & 60% patients with non-SMI diagnosis | NLP approaches can be used to extract psychiatric symptoms from EHR data | ||||
Kessler et al., 2017 [69] | Identify veterans at high suicide risk from EHRs | Harvard medical school Data from: US Veterans Health Administration (VHA) | National Death Index (NDI; CDC & Department of HHS, 2015) as having died by suicide in fiscal years 2009–2011 | 6,360 cases 2,108,496 controls (1% probability sample) | Not reported | VHA service use, sociodemographic variables | SML Stats: penalized logistic regression model | X | X | Previous study McCarthy et al. (2015) BART: best sensitivity 11% of suicides occurred among 1% of Veterans with highest predicted risk & 28% among 5% with highest predicted risk | A different ML model can predict suicide based on fewer predictors than the McCarthy 2015 model. | |
Sau & Bhakta 2017 [70] | Predict depression from sociodemographic variables & clinical data | Kolkata, India Bagbazar Urban Health & Training Centre | Older adults (43% F) living in a slum with or without depresssion based on GDS score | 105 | 66.6 ± 5.6 yrs | Sociodemographic & physical comorbidities | DL | X | ANN: Accuracy= 97% AUC=0.99 | Sociodemographic & comorbid conditions can be used to predict the presence of depression. | ||
B. Mood rating scales | ||||||||||||
Chekroud et al., 2016 [71] | Predict whether patients with depression will achieve symptomatic remission after a 12-week course of citalopram | Training data from Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial Testing data from Combining Medications to Enhance Depression Outcomes (CO-MED) trial | Adults with MDD per DSM-IV criteria. Remission based on QIDS-SR follow-up score | Training n=1,949 Test n=151 | 18 to 75 yrs | Sociodemographic variables, psychiatric history, mood ratings | SML | X | X | GBM: Training accuracy =65% AUC=0.70 p<9·8 × 1033 SN=63% remission & SP=66% non-remitters test accuracy= 60%, p<0.04 Escitalopram + buproprion accuracy = 60%, p<0.02 Venlafaxine + Mirtazapine accuracy= 51%, p=0.53 | Model based on clinical history, sociodemographics, & mood can predict which patients with MDD will respond & remit after taking citalopram. | |
Chekroud et al., 2017 [72] | Determine efficacy of antidepressant treatments on empirically defined groups of symptoms & examine replicability of these groups | Adults with MDD based on DSM-IV diagnosis. Training 63% F Testing: 66% F | Training n=4,039 Testing n=640 | Training 41.2 ± 13.3 yrs Testing 42.7 ± 12.2 yrs | Items from the QIDS-SR & HAM-D | UML SML | X | X | 3 clusters in QIDS-SR (core emotional, insomnia, & atypical symptoms) identified in training 3 clusters replicated in testing GBM: sleep symptom cluster most predictable (R2=20%; p <.01) Antidepressants (8 of 9) more effective for core emotional symptoms than for sleep or atypical symptoms | 3 patient clusters (based on type of depressive symptoms) had varied responses to different antidepressants. | ||
Zilcha-Mano et al., 2018 [73] | Predict who will respond better to placebo or medication in antidepressant (citalopram) trials | 15 clinical sites in US (8-week placebo-controlled RCT trial of citalopram) | Community-dwelling older adults (75+ yrs) with unipolar depresssion, diagnosed by HAM-D Trajectory based on weekly HAM-D scores (58% F) | 174 | 79.6 ± 4.4 yrs | Sociodemographic, baseline depression, anxiety, cognition, IADLs. | SML | X | Medication superior for those with ≤12 years education & longer duration depression (>3.57 yrs) (B = 2.5, t(32) = 3.0, p = 0.004). Placebo best for those with >12 years education; almost outperformed medication (B = −0.57, t(96) = −1.9, p = 0.06) | Patients with less education & longer duration of depression more likely to respond to citalopram (than placebo). Patients with more education more likely to respond to placebo (than citalopram). | ||
Research Assessments | ||||||||||||
C. Brain Imaging | ||||||||||||
Drysdale et al., 2017 [74] | Diagnose subtypes of depression with biomarkers from fMRI | 5 academic sites in US & Canada | Adults with MDD based on DSM-IV (59% F) HC (58% F) | 1,188 Training (711; n = 333 with depresssion; n = 378 HC Test (477; n=125 depression; n=352 HC) | Training mean= 40.6 yrs depression Mean= 38.0 yrs HC | Connectivity in limbic and fronto-striatal networks from fMRI data | UML SML | X | X | UL: 4-cluster solution SVM: training accuracy= 89% SN=84–91% & SP=84–93% test: accuracy= 86% | Different patterns of fMRI connectivity may distinguish biotypes of MDD with different clinical features & responsiveness to TMS therapy. | |
Kalmady et al., 2019 [75] | Classify SZ using fMRI data | National Institute of Mental Health & Neurosciences (NIMHANS, India) | SZ who are antipsychotic drug-naïve & met DSM-IV criteria Age & sex-matched HC | 81 SZ 93 HC | Not reported | Regional activity & functional connectivity from fMRI data | SML | X | Ensemble model: accuracy= 87% (vs. chance 53%) | fMRI measures can distinguish between SZ & HC | ||
Dwyer et al., 2018 [76] | Identify neuroanatomical subtypes of chronic SZ; determine if subtypes enhance computeraided discrimination of SZ from HC | Publicly available US data repository of the Mind Research Network & the University of New Mexico | Adults with SZ based on DSM-IV (20% F) HC (31% F) | n=71 schizophrenia n=74 HC 316 test set | 38.1 ± 14 years schizophrenia 35.8 ± 11.6 HC | Brain volume measures based on structural MRI data | UML SML | X | X | UL: 2 subgroups SVM: training subgroup improved accuracy 68%-73% (subgroup 1) & 79% (subgroup 2) testing: accuracy decreased: 64%-71% (subgroup 1) & 67% (subgroup 2) | Two neuroanatomical subtypes of SZ have distinct clinical characteristics, cognitive & symptom courses. | |
Nenadic et al., 2017 [77] | Detect accelerated brain aging in SZ compared to BD or HC using BrainAGE scores | Jena University | Adults with SZ or BD type I based on DSM-IV criteria & HCs (42% F) | 137 (45 SZ, 22 BD, 70 HC) | Sz mean 33.7±10.5, (21.4–64.9); HC mean 33.8±9.4, (21.7–57.8); BP mean 37.7±10.7, (23.8–57.7) | Structural MRI data | SML: Stats: ANOVA | RVR: no accuracy reported significant effect of group on BrainAGE score (ANOVA, p = 0.009) SZ had higher mean BrainAGE score than both BD & HC | Using a brain aging algorithm derived from structural MRI data, different diagnostic groups can be compared. | |||
Patel et al., 2015 [78] | Predict late-life MDD diagnosis from multi-modal imaging & network-based features | Pittsburgh, PA, USA MRI laboratory | Late-life MDD based on DSM-IV criteria age-matched HC | 68 Depression n=33 HC n=35 | Not reported | Multi-modal MRI data (functional connectivity, atrophy, integrity, lesions) | SML | X | ADTree: diagnosis accuracy= 87% treatment response accuracy= 89% | MRI data can distinguish late-life depression patients from HCs. | ||
Cai et al., 2018 [79] | Classify depression vs. HC using EEG features from the prefrontal cortex | China Laboratory setting (quiet room) | Existing Psycho-physiological database of 250 individuals | 213 (92 with clinically diagnosed depresssion 121 HC) | Not reported | EEG data while at rest & with sound stimulation | SML UML | X | KNN: average accuracy= 77% | EEG patterns can distinguish between persons with depression & HCs. | ||
Erguzel et al., 2016 [80] | Classify unipolar MDD & BD using EEG data | Istanbul, Turkey | Out-patients with MDD or BD, based on DSM-IV criteria. Medication free for ≥ 48 hrs | 89 | Not reported | EEG data over 12-hr period | SML | X | ANN: AUC=0.76 no feature selection AUC=0.91 with feature selection | EEG patterns may distinguish between MDD & BD patients. | ||
D. Novel monitoring system | ||||||||||||
Bain et al., 2017 [81] | RCT of medication adherence in subjects with SZ using a novel AI platform AiCure | 10 US sites Medication adherence in a 24-week clinical trial of drug ABT-126 [ClinicalTrials.gov NCT01655680]) | Stable adult out-patients with SZ who do not smoke (45% F) | 75 (53 monitored with AI platform 22 monitored with directly observed therapy) | 45.9 ± 10.9 years | Medication adherence monitored by modified directly observed therapy (mDOT) | CompV | 90% adherence for subjects monitored with AI platform; 72% for subjects monitored by mDOT | A novel AI platform has better medication adherence than directly observed therapy in persons with SZ. | |||
Kacem et al., 2018 [82] | Predict depression severity from automated assessments of psychomotor retardation using video data | Not reported Recruited from a clinical trial of depression treatment | Adults with MDD based on DSM-IV criteria Depression severity based on HAM-D scores (60% F) | 126 sessions from 49 participants: (56 moderate-severely depressed, 35 mildly depressed, & 35 remitted) | Not reported | Measurement of face & head motion based on video recordings | SML | X | SVM: accuracy of facial movement= 66% head movement= 61% Combined= 71% Highest accuracy for severe vs. mild depression 84% | Facial (but not head) movements may be used to distinguish severity levels of depression. | ||
Chattopadhyay 2017 [83] | Mathematically model how psychiatrists clinically perceive depression symptoms & diagnose depression states | India Hospital | Adults with depresssion, based on DSM-IV criteria. Depression severity rated by clinicians HC—not described | 302 depression 50 HC | 19-50 yrs | Psychiatrists’ ratings of individual symptoms | DL | X | Fuzzy neural hybrid model: accuracy= 96% | The link between clinicians’ assessments of symptoms & overall depression severity can be modeled by AI. | ||
Wahle et al., 2016 [84] | Identify subjects with clinically meaningful depression from smartphone data | Zürich, Switzerland Community smart-phone usage over 4 weeks | Clinically depressed adults from Switzerland & Germany Change in depresssion based on PHQ-9 scale (64% F) | 28 (64% F) | 20 to 57 yrs | Smartphone usage, accelerometer, Wi-Fi, & GPS data (movement, activity) | SML | X | RF accuracy= 62% | Smartphone sensor data can distinguish between those with & without depression at follow-up. | ||
E. Social Media | ||||||||||||
Cook et al., 2016 [85] | Predict suicide ideation & heightened psychiatric symptoms from survey data & text messaging data | Madrid, Spain Community text message data over 12 months | Adults (65% F) with recent hospital-based treatment for self-harm who endorsed suicidal ideation (SI) OR did not endorse SI based on text or GHQ-12 | 1,453 n=609 never suicidal n=844 suicidal at some point half of data used for training; half testing | 40.5 yrs 40.0 ± 13.8 yrs never suicidal 41.6 ± 13.9 yrs suicidal | Survey (sleep, depressive symptoms, medications), & text response to “how are you feeling today?” | NLP SML | X | multivariate logistic model: suicide ideation (structured better) PPV=0.73, SN=0.76, SP=0.62 Heightened psychiatric symptoms (structed better) PPV=0.79, SN=0.79, SP=0.85 | NLP-based models of unstructured texts have high predictive value for SI, & may require less time & effort from subjects. | ||
Aldarwish & Ahmed 2017 [86] | Identify social network users with depression based on their posts | Saudi Arabia | Posts from Saudi Arabian social network users Training: Depressed post if ≥ 1 DSM-IV MDD symptom mentioned Testing: Depressed subject based on BDI-II scale | Training set= 6773 posts (2073 depressed, 2073 not depresssed) Testing set=30 (15 depressed, 15 not depresssed) | Not reported | Social network posts from LiveJournal, Twitter, & Facebook | NLP SML | NB: accuracy= 63% precision= 100% recall= 58% | Social media posts could be used to identify which users are depressed. | |||
Deshpande et al., 2017 [87] | Classify Tweets which demonstrate signs of depression & emotional ill-health from those that do not | Twitter API platform | Tweets from all over the world collected at random Categorized based on a curated word-list that suggest poor mental health | 10,000 Tweets (8,000 training; 2,000 test) | Not reported | Unstructured text (Tweet) | NLP SML | X | NB: Precision= 0.836 Accuracy= 83% | Text-based emotion can detect depression from Twitter data. | ||
Dos Reis & Culotta 2015 [88] | Detect mood from Twitter data & examine effect of users’ physical activity on mental health | Twitter platform | Twitter users who are: Physically active, based on hashtags for activity tracking apps Control users who are not active | n=1,161 active n=1,161 controls Matched based on gender, location, & online activity | Not reported | 2,367 unstructured text (Tweets) that were hand-classified as expressing either anxiety, depression, anger, or none | SML Stats: Wilcoxon singed rank | X | Logistic regression classifier: hostility AUC= 0.901 dejection AUC= 0.870 anxiety AUC= 0.850 physically active users had 2.7% fewer anxious tweets & 3.9% fewer dejected tweets than a matched user | Social media posts can be used to infer negative mood states. Physically active social media users post fewer Tweets reflecting negative mood states. | ||
Gkotsis et al., 2017 [92] | Classify mental health-related Reddit posts according to theme-based subreddit (topic-specific) groupings | Reddit dataset from (https://www.reddit.com/dev/api) from 1/1/2006 to 8/31/2015 | Reddit users | 80/20 training/ testing split 348,040 users 458,240 mental health-related posts 476,388 non mental health-related posts | Not reported | Identified subreddits related to mental health using keywords | Semi-SML DL | X | CNN: accuracy = 91.08% distinguishing mental health posts precision= 0.72 recall=0.71 for which theme a post belonged to | Can distinguish mental health-related Reddit posts from unrelated posts as well as the mental health theme they relate to; identified 11 mental health themes | ||
Mowery et al., 2016 [89] | Classify whether a Twitter post represents evidence of depression & depression subtype | Depressive Symptoms & Psychosocial Stressors Associated with Depression (SAD) dataset | User information not reported Tweets classified using linguistic annotation scheme based on DSM-5 & DSM-IV criteria. | 9,300 tweets queried using a subset of the Linguistic Inquiry Word Count | Not reported | Unstructured text (Tweet) | NLP SML | X | SVM: F1 score=52 for a tweet with evidence of depression | Text analysis of tweets can be used to identify depressive symptoms & subtype. | ||
Ricard et al., 2018 [90] | Predict depression from community-generated vs. individual-generated social media content | Dartmouth Hanover, NH Clickworker crowd-sourcing platform | Participants on the Clickworker crowd-sourcing platform MDD by PHQ-8 (69% F) | 749 10% (78/749) held out as a test set | 26.7 ± 7.29 yrs | Unstructured text data (Instagram posts & comment), demographics, other survey data | NLP SML | X | X | Training: not reported Testing: Elastic-net RLR model: community-generated AUC=0.71, p<0.03 Combination AUC=0.72, p<0.02 User-generated AUC=0.63, p=0.11 | Instagram posts (both user-generated & community-generated content) can distinguish people with depression. | |
Tung & Lu 2016 [91] | Predict depression tendency from web posts | PTT online discussion forum | Chinese web posts between 2004-2011 | 724 posts selected as training/ test data Annotated as T/F for depresssion tendency | Not reported | Unstructured text data (posts) | NLP | EDDTW highest recall= 0.67 & F measure= 0.62 DSM precision= 0.666 | NLP of web posts can identify depressive tendencies. |
ADTree=alternating decision tree; ANN=artificial neural network; BAO=Beck anxiety inventory; BDI=Beck depression inventory; cTAKES=clinical text analysis knowledge extract system; CompV=computer vision; DL=deep learning; EDDTW= event-driven depression tendency warning; GAD-7=generalized anxiety disorder; GHQ-12=general health questionnaire; GMM=Gaussian mixture models; HAMD=Hamilton rating scale for depression; HC=healthy control; HHS=health and human services; JSON=JavaScript Object Notation; LES=life event scale; LDA=linear discriminant analysis; MDD=major depressive disorder; MMSE=mini mental state examination; NLP=natural language processing; PANSS=positive and negative sydrome scale; PHQ-9=patient health questionnaire; PPV=positive predictive value; PSQI=Pittsburg sleep quality index; QIDS-SR= Quick Inventory of Depressive Symptomatology; SL=supervised learning; SMI=severe mental illness; SN=sensitivity; SP=specificity; SCID-I=Structured Clinical Interview for Axis I Disorders; SVM=support vector machine; UL=unsupervised learning