Skip to main content
. 2024 Mar 15;26:e47923. doi: 10.2196/47923

Table 4.

Top reported system performance for studies predicting the age of Twitter users using traditional machine learning (ML) methods. Result metrics are reflected in this table as reported in the original publications and are not directly comparable to each other. Reviews are ordered by the number of classification groups.

Study Number of age groups Age class detail (y) Language ML method Reported performance





F1-score Accuracy
Jurgens et al [80], 2017 N/Aa Continuous English RFb regression N/A 0.71
Volkova [110], 2017 2 18-23 and 25-30 English and Spanish LogRc N/A 0.77
Xiang et al [116], 2017 2 ≤30 and >30 English CPMEd N/A 0.74
Ardehaly and Culotta [53], 2018 2 <25 and >25 English LLPe N/A 0.78
Morgan-Lopez et al [90], 2017 3 13-17, 18-24, and >24 English LogR 0.74 N/A
Arafat et al [51], 2020 3 ≤24, 25-39, and ≥40 NRf LogR N/A 0.71
Cornelisse and Pillai [66], 2020 3 18-24, 25-54, and >55 English LogR 0.78 N/A
Markov et al [87], 2017 5 18-24, 25-34, 35-49, 50-64, and >65 English, Spanish, Dutch, and Italian LogR N/A 0.56-0.65
Cheng et al [65], 2018 5 18-24, 25-34, 35-44, 45-54, and 55-64 English, Filipino, and Taglish SVCg 0.61 0.86
Garcia-Guzman et al [70], 2020 4 18-24, 25-34, 35-49, and >50 English Bag of trees N/A 0.67
Chamberlain et al [64], 2017 10 (3 subgroups) <12, 12-13, 14-15, 16-17, 18-24, 25-34, 35-44, 45-54, 55-64, and >64 English, Spanish, French, and Portuguese Bayesian probability 0.31-0.86 (3 class) N/A

aN/A: not applicable.

bRF: random forest.

cLogR: logistic regression.

dCPME: coupled projection matrix extraction.

eLLP: learning with label proportions.

fNR: not reported.

gSVC: support vector classifier.