Table 3.
Top reported system performance for studies predicting the gender of Twitter users using deep learning machine learning (ML) methods. Result metrics are reflected in this table as reported in the original publications and are not comparable to each other.
| Study | Language | ML method | Reported performance | |
|
|
|
|
F1-score | Accuracy |
| Ardehaly and Culotta [52], 2017 | English | Deep LLPa | 0.96 | N/Ab |
| Geng et al [71], 2017 | English | Ensemble: LDAc and CNNd | N/A | 0.87 |
| Kim et al [83], 2017 | English | GRNNe | N/A | 0.68 |
| Vijayaraghavan et al [108], 2017 | English | DMTf | 0.89 | N/A |
| Wang et al [111], 2017 | N/A | CNN | 0.91 | 0.9 |
| Bayot and Goncalves [55], 2017 | English and Spanish | CNN | N/A | 0.59-0.72 |
| Bsir and Zrigui [57], 2018 | Arabic | GRUg | N/A | 0.79 |
| Wood-Doughty et al [115], 2018 | English | RNNh | 0.84 | 0.84 |
| Bsir and Zrigui [58], 2019 | Arabic | BILSTMi with attention | N/A | 0.82 |
| Hashempour [75], 2019 | Portuguese, French, Dutch, Spanish, German, and Italian | FFNNj | N/A | 0.84-0.86 |
| Wang et al [112], 2019 | Multilingual | mmDNNk | 0.92 | N/A |
| ElSayed and Farouk [68], 2020 | Egyptian and Arabic dialects | Multichannel CNN-biGRUl | N/A | 0.84-0.91 |
| Imuede et al [93], 2020 | English | DNNm | N/A | 0.68 |
| Zhao et al [121], 2020 | English | CNN | 0.80 | N/A |
| Yang et al [117], 2021 | English | Ensemble: M3n and SVMo | 0.95 | 0.94 |
aLLP: learning with label proportions.
bN/A: not applicable.
cLDA: latent Dirichlet allocation.
dCNN: convolutional neural network.
eGRNN: graph recurrent neural network.
fDMT: deep multimodal multitask.
gGRU: gated recurrent network.
hRNN: recurrent neural network.
iBILSTM: bidirectional long-term short-term memory.
jFFNN: feed forward neural network.
kmmDNN: multimodal deep neural network.
lbiGRU: bidirectional gated recurrent unit.
mDNN: deep neural network.
nM3: multimodal, multilingual, and multi-attribute system.
oSVM: support vector machine.