Skip to main content
. 2024 Mar 15;26:e47923. doi: 10.2196/47923

Table 2.

Top reported system performance for studies predicting the gender of Twitter users using traditional machine learning (ML) methods. Result metrics are reflected in this table as reported in the original publications and are not necessarily comparable to each other.

Study Language ML method Reported performance



F1-score Accuracy
Cesare et al [122], 2017 English Ensemble: lexical match and SVMa and DTb 0.84 0.83
Jurgens et al [80], 2017 English RFc ensemble 0.78 0.80
Ljubešić et al [85], 2017 Portuguese, French, Dutch, Spanish, German, and Italian SVM N/Ad 0.61-0.69
Markov et al [87], 2016 English, Spanish, Dutch, and Italian LogRe N/A 0.57-0.77
Mukherjee and Bala [92], 2016 English NBf 0.75 0.71
Verhoeven et al [106], 2017 Slovenian SVM 0.93 0.93
Volkova [110], 2015 English and Spanish LogR N/A 0.82
Xiang et al [116], 2017 English SVM and PMEg N/A 0.76
Cheng et al [65], 2018 English, Filipino, and Taglish SVCh with lasso 0.84 0.84
Emmery et al [69], 2017 English fastText N/A 0.76
Giannakopoulos et al [72], 2018 N/A SVM PNNi N/A 0.87
Khandelwal et al [82], 2018 Code-mixed Hindi-English SVM N/A 0.9
Miura et al [89], 2018 Japanese XGBoostj N/A 0.89
van der Goot et al [104], 2018 English, Dutch, French, Portuguese, and Spanish SVM N/A 0.66-0.72
Alessandra et al [49], 2019 Italian Ensemble: lexical match and SVM N/A 0.75
Hirt et al [76], 2019 German Ensemble: binary classifiers 0.81 N/A
Hussein et al [79], 2019 Dialect Egyptian Arabic Ensemble: RF and LinRk NA 0.77-0.88
Vicente et al [107], 2018 English and Portuguese Ensemble: Face++, LinR, and SVM N/A 0.93-0.97
Arafat et al [51], 2020 Indonesian Multinomial NB N/A 0.75
Baxevanakis et al [54], 2020 Greek SVM N/A 0.7
Garcia-Guzman et al [70], 2020 English Bag of trees 0.64 0.64
López-Monroy et al [86], 2020 English and Spanish Bag of trees 0.64 0.64
Pizarro [96], 2020 English and Spanish SVM 0.82-0.84 N/A
Vashisth and Meehan [105], 2020 English LogR N/A 0.57
Wong et al [113], 2020 English SVM 0.58-0.62 0.60

aSVM: support vector machine.

bDT: decision tree.

cRF: random forest.

dN/A: not applicable.

eLogR: logistic regression.

fNB: naive Bayes.

gPME: projection matrix extraction.

hSVC: support vector classifier.

iPNN: probabilistic neural network.

jXGBoost: extreme gradient boosting.

kLinR: linear regression.