Table 2.
Top reported system performance for studies predicting the gender of Twitter users using traditional machine learning (ML) methods. Result metrics are reflected in this table as reported in the original publications and are not necessarily comparable to each other.
| Study | Language | ML method | Reported performance | |
|
|
|
|
F1-score | Accuracy |
| Cesare et al [122], 2017 | English | Ensemble: lexical match and SVMa and DTb | 0.84 | 0.83 |
| Jurgens et al [80], 2017 | English | RFc ensemble | 0.78 | 0.80 |
| Ljubešić et al [85], 2017 | Portuguese, French, Dutch, Spanish, German, and Italian | SVM | N/Ad | 0.61-0.69 |
| Markov et al [87], 2016 | English, Spanish, Dutch, and Italian | LogRe | N/A | 0.57-0.77 |
| Mukherjee and Bala [92], 2016 | English | NBf | 0.75 | 0.71 |
| Verhoeven et al [106], 2017 | Slovenian | SVM | 0.93 | 0.93 |
| Volkova [110], 2015 | English and Spanish | LogR | N/A | 0.82 |
| Xiang et al [116], 2017 | English | SVM and PMEg | N/A | 0.76 |
| Cheng et al [65], 2018 | English, Filipino, and Taglish | SVCh with lasso | 0.84 | 0.84 |
| Emmery et al [69], 2017 | English | fastText | N/A | 0.76 |
| Giannakopoulos et al [72], 2018 | N/A | SVM PNNi | N/A | 0.87 |
| Khandelwal et al [82], 2018 | Code-mixed Hindi-English | SVM | N/A | 0.9 |
| Miura et al [89], 2018 | Japanese | XGBoostj | N/A | 0.89 |
| van der Goot et al [104], 2018 | English, Dutch, French, Portuguese, and Spanish | SVM | N/A | 0.66-0.72 |
| Alessandra et al [49], 2019 | Italian | Ensemble: lexical match and SVM | N/A | 0.75 |
| Hirt et al [76], 2019 | German | Ensemble: binary classifiers | 0.81 | N/A |
| Hussein et al [79], 2019 | Dialect Egyptian Arabic | Ensemble: RF and LinRk | NA | 0.77-0.88 |
| Vicente et al [107], 2018 | English and Portuguese | Ensemble: Face++, LinR, and SVM | N/A | 0.93-0.97 |
| Arafat et al [51], 2020 | Indonesian | Multinomial NB | N/A | 0.75 |
| Baxevanakis et al [54], 2020 | Greek | SVM | N/A | 0.7 |
| Garcia-Guzman et al [70], 2020 | English | Bag of trees | 0.64 | 0.64 |
| López-Monroy et al [86], 2020 | English and Spanish | Bag of trees | 0.64 | 0.64 |
| Pizarro [96], 2020 | English and Spanish | SVM | 0.82-0.84 | N/A |
| Vashisth and Meehan [105], 2020 | English | LogR | N/A | 0.57 |
| Wong et al [113], 2020 | English | SVM | 0.58-0.62 | 0.60 |
aSVM: support vector machine.
bDT: decision tree.
cRF: random forest.
dN/A: not applicable.
eLogR: logistic regression.
fNB: naive Bayes.
gPME: projection matrix extraction.
hSVC: support vector classifier.
iPNN: probabilistic neural network.
jXGBoost: extreme gradient boosting.
kLinR: linear regression.