Table 9.
Summary of research methodologies employed in highly cited publications.
| Data Source | Reference | Mental Health | Data Description | Machine learning | ||||
|
|
|
|
|
Model | Feature | Output | Annotation | Results |
| Budenz et al [28] | Mental illness, bipolar disorder | 1,270,902 tweets including bipolar or mental health-related words | Logistic regression | Term frequency-inverse document frequency | Related to mental illness or bipolar disorder | Manually annotated 2047 tweets with topic, stigma, and social support messaging | 10-fold cross validation (AUCa=0.83) |
|
| Du et al [29] | Suicide | 1,962,766 tweets including 21 suicide-related keywords/phrases | CNNb, SVMc, extra trees, random forest, logistic regression, Bi-LSTMd | One-hot-vector mapped to pretrained GloVe Twitter embedding | Related to suicide | Manually annotated 3263 tweets and trained classifier to select 3000 additional suicide-related tweets | Accuracy 0.74, recall 0.96, precision 0.78, F1 0.83 | |
| Facebook, Twitter | Guntuku et al [30] | Psychological stress | 601 users’ Facebook and Twitter posts | Linear regression with several regularization methods (eg, ridge, elastic-net, LASSOe and L2 penalized SVMs) | LIWCf, latent Dirichlet allocation topic modeling, stress lexicon, user engagement | Stress | Qualtrics survey; fill out the demographic questions and Cohen 10-item Stress Scale | 5-fold cross-validation (Pearson r=0.24); trained on Facebook and Twitter, tested on Twitter |
| Facebook, Instagram | Shuai et al [31] | Social network mental disorders (eg, cyber-relationship addiction, information overload, and net compulsion) | 3126 users' Instagram and Facebook data | Decision tree learning, SVM, logistic regression, DTSVMg, SNMDDh (newly proposed model; tensor technique for deriving latent features) | Social interaction, personal profile, duration | Social network mental disorders | MTurk survey - fill out the standard social network mental disorder questionnaires; professional psychiatrists labeled users having a social network mental disorder |
5-fold cross validation (accuracy0.78 for Instagram and 0.83 for Facebook) |
| Reece and Danforth [32] | Depression | 43,950 users’ Instagram images | Random forest classifier | Number of comments and likes, number of faces in photograph, 3 color properties (hue, saturation, value) | Depression | MTurk survey; Center for Epidemiologic Studies Depression Scale to measure depression level | Recall: 0.697; precision: 0.604; F1: 0.647 | |
| Lin et al [33] | Stress | 1 billion Weibo posts | SVM, softmax regression, gradient-boosted decision tree, LASSO-MTLi, L2-MTLj, cASO-MTLk | CNN features or word vector representations with hand-crafted features | 12 stressor events (eg, marriage, financial, illness, school), 6 stressor subjects | 30 volunteers manually annotated the stressor events and subjects | 10-fold cross validation (F1>0.80) |
|
| Cheng et al [34] | Suicide risk, depression, anxiety, stress | 974 participants’ Weibo posts, suicide probability, Weibo suicide communication (WSC), depression, anxiety, and stress. | SVM | Simplified Chinese-LIWC | Suicide risk, emotional distress (depression, anxiety, stress), WSC |
Survey and psychological test tools (ie, Suicide Probability Scale, Depression Anxiety Stress Scales-21) | leave-one-out cross-validation: suicide probability (AUC=0.61, P=.04), severe anxiety (AUC=0.75, P<.001) | |
| Gkotsis et al [35] | Bipolar, schizophrenia, anxiety, depression, self-harm, suicide watch, addiction, crippling alcoholism, opiates, autism | 1,014,660 posts | CNN, FFl, linear regression, SVM | Word vector representation (16 vector size) | Mental health | N/Am | Accuracy: 91.8% (binary classification task), 79.8% (multiclass classification task) | |
| Facebook, Twitter, Instagram, Reddit | Coppersmith et al [36] | Suicide risk | 197,615 posts from 418 users | LSTM with attention | One-hot-vector mapped to pretrained GloVe | Suicide risk | Examining public self-stated data and using data donated through OurDataHelps.org | 10-fold cross validation (AUC=0.94) |
| Online Community - Live Journal | Saha et al [37] | Mental health | 620,060 posts from 78,647 users | MTL | Linguistic features of LIWC; topics by LDAn | Mental health subreddit (eg, Abuse, Anorexia, Anxiety, Bipolar disorder, Cutting, Death, Drugs, Eating disorders, Insomnia, Pain, Self-injury, and Suicide) | N/A | AUC=0.94 with the community on eating disorders |
aAUC: area under the curve.
bCNN: convolutional neural network.
cSVM: support vector machine.
dBi-LSTM: bidirectional long short-term memory.
eLASSO: least absolute shrinkage and selection operator.
fLIWC: Linguistic Inquiry and Word Count.
gDTSVM: decision tree support vector machine.
hSNMDD: social network mental disorder detection.
iMTL: multitask learning.
jl2-MTL: multitask learning considering l2 loss.
kcASO-MTL: clustered alternating structure optimization multitask learning.
lFF: feed-forward.
mN/A: not applicable.
nLDA: latent Dirichlet allocation.