Skip to main content
. 2021 Mar 8;23(3):e24870. doi: 10.2196/24870

Table 9.

Summary of research methodologies employed in highly cited publications.

Data Source Reference Mental Health Data Description Machine learning




Model Feature Output Annotation Results
Twitter Budenz et al [28] Mental illness, bipolar disorder 1,270,902 tweets including bipolar or mental health-related words Logistic regression Term frequency-inverse document frequency Related to mental illness or bipolar disorder Manually annotated 2047 tweets with topic, stigma, and social support messaging 10-fold cross validation
(AUCa=0.83)
Twitter Du et al [29] Suicide 1,962,766 tweets including 21 suicide-related keywords/phrases CNNb, SVMc, extra trees, random forest, logistic regression, Bi-LSTMd One-hot-vector mapped to pretrained GloVe Twitter embedding Related to suicide Manually annotated 3263 tweets and trained classifier to select 3000 additional suicide-related tweets Accuracy 0.74, recall 0.96, precision 0.78, F1 0.83
Facebook, Twitter Guntuku et al [30] Psychological stress 601 users’ Facebook and Twitter posts Linear regression with several regularization methods (eg, ridge, elastic-net, LASSOe and L2 penalized SVMs) LIWCf, latent Dirichlet allocation topic modeling, stress lexicon, user engagement Stress Qualtrics survey; fill out the demographic questions and Cohen 10-item Stress Scale 5-fold cross-validation
(Pearson r=0.24); trained on Facebook and Twitter, tested on Twitter
Facebook, Instagram Shuai et al [31] Social network mental disorders (eg, cyber-relationship addiction, information overload, and net compulsion) 3126 users' Instagram and Facebook data Decision tree learning, SVM, logistic regression, DTSVMg, SNMDDh (newly proposed model; tensor technique for deriving latent features) Social interaction, personal profile, duration Social network mental disorders MTurk survey
- fill out the standard social network mental disorder questionnaires; professional psychiatrists labeled users having a social network mental disorder
5-fold cross validation (accuracy0.78 for Instagram and 0.83 for Facebook)
Instagram Reece and Danforth [32] Depression 43,950 users’ Instagram images Random forest classifier Number of comments and likes, number of faces in photograph, 3 color properties (hue, saturation, value) Depression MTurk survey; Center for Epidemiologic Studies Depression Scale to measure depression level Recall: 0.697; precision: 0.604; F1: 0.647
Weibo Lin et al [33] Stress 1 billion Weibo posts SVM, softmax regression, gradient-boosted decision tree, LASSO-MTLi, L2-MTLj, cASO-MTLk CNN features or word vector representations with hand-crafted features 12 stressor events (eg, marriage, financial, illness, school), 6 stressor subjects 30 volunteers manually annotated the stressor events and subjects 10-fold cross validation
(F1>0.80)
Weibo Cheng et al [34] Suicide risk, depression, anxiety, stress 974 participants’ Weibo posts, suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress. SVM Simplified Chinese-LIWC Suicide risk, emotional distress (depression, anxiety, stress),
WSC
Survey and psychological test tools (ie, Suicide Probability Scale, Depression Anxiety Stress Scales-21) leave-one-out cross-validation: suicide probability (AUC=0.61, P=.04), severe anxiety (AUC=0.75, P<.001)
Reddit Gkotsis et al [35] Bipolar, schizophrenia, anxiety, depression, self-harm, suicide watch, addiction, crippling alcoholism, opiates, autism 1,014,660 posts CNN, FFl, linear regression, SVM Word vector representation (16 vector size) Mental health N/Am Accuracy: 91.8% (binary classification task), 79.8% (multiclass classification task)
Facebook, Twitter, Instagram, Reddit Coppersmith et al [36] Suicide risk 197,615 posts from 418 users LSTM with attention One-hot-vector mapped to pretrained GloVe Suicide risk Examining public self-stated data and using data donated through OurDataHelps.org 10-fold cross validation
(AUC=0.94)
Online Community - Live Journal Saha et al [37] Mental health 620,060 posts from 78,647 users MTL Linguistic features of LIWC; topics by LDAn Mental health subreddit (eg, Abuse, Anorexia, Anxiety, Bipolar disorder, Cutting, Death, Drugs, Eating disorders, Insomnia, Pain, Self-injury, and Suicide) N/A AUC=0.94 with the community on eating disorders

aAUC: area under the curve.

bCNN: convolutional neural network.

cSVM: support vector machine.

dBi-LSTM: bidirectional long short-term memory.

eLASSO: least absolute shrinkage and selection operator.

fLIWC: Linguistic Inquiry and Word Count.

gDTSVM: decision tree support vector machine.

hSNMDD: social network mental disorder detection.

iMTL: multitask learning.

jl2-MTL: multitask learning considering l2 loss.

kcASO-MTL: clustered alternating structure optimization multitask learning.

lFF: feed-forward.

mN/A: not applicable.

nLDA: latent Dirichlet allocation.