Table 2.
Study | Classifier | MLa model | Features | Results reported | ||
|
|
|
|
Accuracy | F1 score | Area under curve |
Pennacchiotti and Popescu, 2011 [68] | Binary | GBDTb | Images, text, topics, and sentiment | N/Ac | 0.66 | N/A |
Pennacchiotti and Popescu, 2011 [67] | Binary | GBDT | Images, text, topics, sentiment, and network | N/A | 0.70 | N/A |
Bergsma et al, 2013 [38] | Binary | SVMd | Names and name clusters | 0.85 | N/A | N/A |
Ardehaly and Culotta, 2017 [35] | Binary | DLLPe | Text and images | N/A | 0.95 (image); 0.92 (text) | N/A |
Volkova and Backrach, 2018 [76] | Binary | LRf | Text, sentiment, and emotion | N/A | N/A | 0.97 |
Wood-Doughtry et al, 2018 [79] | Binary | CNNg | Name | 0.73 | 0.72 | N/A |
Saravanan, 2017 [72] | Ternary | CNN | Text | NRh | NR | NR |
Ardehaly and Culotta, 2017 [33] | Ternary | DLLP | Text and images | N/A | 0.84 (image); 0.83 (text) | N/A |
Gunarathne et al, 2019 [94] | Ternary | CNN | Text | N/A | 0.88 | N/A |
Wood-Doughtry et al, 2018 [79] | Ternary | CNN | Name | 0.62 | 0.43 | N/A |
Culotta et al, 2016 [47] | Quaternary | Regression | Network and text | N/A | 0.86 | N/A |
Chen et al, 2015 [46] | Quaternary | SVM | n-grams, topics, self-declarations, and image | 0.79 | 0.79 | 0.72 |
Markson, 2017 [61] | Quaternary | CNN | Synonym expansion and topics | 0.76 | N/A | N/A |
Wang et al, 2016 [189] | Quaternary | CNN | Images | 0.84 | N/A | N/A |
Xu et al, 2016 [82] | Quaternary | SVM | Synonym expansion and topics | 0.76 | N/A | N/A |
Ardehaly and Culotta, 2015 [34] | Quaternary | Multinomial logistic regression | Census, name, network, and tweet language | 0.83 | N/A | N/A |
Ardehaly, 2014 [64] | Quaternary | LR | Census and image tweets | 0.82 | 0.81 | N/A |
Barbera, 2016 [37] | Quaternary | LR with ENi | Tweets, emojis, and network | 0.81 | N/A | N/A |
Wood-Doughty 2020 [81] | Quaternary | CNN | Name, profile metadata, and text | 0.83 | 0.46 | N/A |
Preotiuc-Pietro and Ungar, 2018 [96] | Quaternary | LR with EN | Text, topics, sentiment, part-of-speech tagging, name, perceived race labels, and ensemble | N/A | N/A | 0.88 (African American), 0.78 (Latino), 0.83 (Asian), and 0.83 (White) |
Mueller et al, 2021 [91] | Quaternary | CNN | Text and accounts followed | N/A | 0.25 (Asian), 0.63 (African American or Black), 0.28 (Hispanic), and 0.90 (White) | N/A |
Bergsma et al, 2013 [38] | Multinomial (>4) | SVM | Name and name clusters | 0.81 | N/A | N/A |
Nguyen et al, 2018 [66] | Multinomial (>4) | Neural network | Images | 0.53 | N/A | N/A |
aML: machine learning.
bGBDT: gradient-boosted decision tree.
cN/A: not applicable.
dSVM: support vector machine.
eDLLP: deep learning from label proportions.
fLR: logistic regression.
gCNN: convolutional neural network.
hNR: not reported.
iEN: elastic net.