Skip to main content
. 2022 Apr 29;24(4):e35788. doi: 10.2196/35788

Table 2.

Top system performance within studies using machine learning or natural language processing (result metrics are reflected here as reported in the original publications).

Study Classifier MLa model Features Results reported




Accuracy F1 score Area under curve
Pennacchiotti and Popescu, 2011 [68] Binary GBDTb Images, text, topics, and sentiment N/Ac 0.66 N/A
Pennacchiotti and Popescu, 2011 [67] Binary GBDT Images, text, topics, sentiment, and network N/A 0.70 N/A
Bergsma et al, 2013 [38] Binary SVMd Names and name clusters 0.85 N/A N/A
Ardehaly and Culotta, 2017 [35] Binary DLLPe Text and images N/A 0.95 (image); 0.92 (text) N/A
Volkova and Backrach, 2018 [76] Binary LRf Text, sentiment, and emotion N/A N/A 0.97
Wood-Doughtry et al, 2018 [79] Binary CNNg Name 0.73 0.72 N/A
Saravanan, 2017 [72] Ternary CNN Text NRh NR NR
Ardehaly and Culotta, 2017 [33] Ternary DLLP Text and images N/A 0.84 (image); 0.83 (text) N/A
Gunarathne et al, 2019 [94] Ternary CNN Text N/A 0.88 N/A
Wood-Doughtry et al, 2018 [79] Ternary CNN Name 0.62 0.43 N/A
Culotta et al, 2016 [47] Quaternary Regression Network and text N/A 0.86 N/A
Chen et al, 2015 [46] Quaternary SVM n-grams, topics, self-declarations, and image 0.79 0.79 0.72
Markson, 2017 [61] Quaternary CNN Synonym expansion and topics 0.76 N/A N/A
Wang et al, 2016 [189] Quaternary CNN Images 0.84 N/A N/A
Xu et al, 2016 [82] Quaternary SVM Synonym expansion and topics 0.76 N/A N/A
Ardehaly and Culotta, 2015 [34] Quaternary Multinomial logistic regression Census, name, network, and tweet language 0.83 N/A N/A
Ardehaly, 2014 [64] Quaternary LR Census and image tweets 0.82 0.81 N/A
Barbera, 2016 [37] Quaternary LR with ENi Tweets, emojis, and network 0.81 N/A N/A
Wood-Doughty 2020 [81] Quaternary CNN Name, profile metadata, and text 0.83 0.46 N/A
Preotiuc-Pietro and Ungar, 2018 [96] Quaternary LR with EN Text, topics, sentiment, part-of-speech tagging, name, perceived race labels, and ensemble N/A N/A 0.88 (African American), 0.78 (Latino), 0.83 (Asian), and 0.83 (White)
Mueller et al, 2021 [91] Quaternary CNN Text and accounts followed N/A 0.25 (Asian), 0.63 (African American or Black), 0.28 (Hispanic), and 0.90 (White) N/A
Bergsma et al, 2013 [38] Multinomial (>4) SVM Name and name clusters 0.81 N/A N/A
Nguyen et al, 2018 [66] Multinomial (>4) Neural network Images 0.53 N/A N/A

aML: machine learning.

bGBDT: gradient-boosted decision tree.

cN/A: not applicable.

dSVM: support vector machine.

eDLLP: deep learning from label proportions.

fLR: logistic regression.

gCNN: convolutional neural network.

hNR: not reported.

iEN: elastic net.