. 2022 Apr 29;24(4):e35788. doi: 10.2196/35788

Table 2.

Top system performance within studies using machine learning or natural language processing (result metrics are reflected here as reported in the original publications).

Study	Classifier	ML^a model	Features	Results reported
				Accuracy	F₁ score	Area under curve
Pennacchiotti and Popescu, 2011 [68]	Binary	GBDT^b	Images, text, topics, and sentiment	N/A^c	0.66	N/A
Pennacchiotti and Popescu, 2011 [67]	Binary	GBDT	Images, text, topics, sentiment, and network	N/A	0.70	N/A
Bergsma et al, 2013 [38]	Binary	SVM^d	Names and name clusters	0.85	N/A	N/A
Ardehaly and Culotta, 2017 [35]	Binary	DLLP^e	Text and images	N/A	0.95 (image); 0.92 (text)	N/A
Volkova and Backrach, 2018 [76]	Binary	LR^f	Text, sentiment, and emotion	N/A	N/A	0.97
Wood-Doughtry et al, 2018 [79]	Binary	CNN^g	Name	0.73	0.72	N/A
Saravanan, 2017 [72]	Ternary	CNN	Text	NR^h	NR	NR
Ardehaly and Culotta, 2017 [33]	Ternary	DLLP	Text and images	N/A	0.84 (image); 0.83 (text)	N/A
Gunarathne et al, 2019 [94]	Ternary	CNN	Text	N/A	0.88	N/A
Wood-Doughtry et al, 2018 [79]	Ternary	CNN	Name	0.62	0.43	N/A
Culotta et al, 2016 [47]	Quaternary	Regression	Network and text	N/A	0.86	N/A
Chen et al, 2015 [46]	Quaternary	SVM	n-grams, topics, self-declarations, and image	0.79	0.79	0.72
Markson, 2017 [61]	Quaternary	CNN	Synonym expansion and topics	0.76	N/A	N/A
Wang et al, 2016 [189]	Quaternary	CNN	Images	0.84	N/A	N/A
Xu et al, 2016 [82]	Quaternary	SVM	Synonym expansion and topics	0.76	N/A	N/A
Ardehaly and Culotta, 2015 [34]	Quaternary	Multinomial logistic regression	Census, name, network, and tweet language	0.83	N/A	N/A
Ardehaly, 2014 [64]	Quaternary	LR	Census and image tweets	0.82	0.81	N/A
Barbera, 2016 [37]	Quaternary	LR with ENⁱ	Tweets, emojis, and network	0.81	N/A	N/A
Wood-Doughty 2020 [81]	Quaternary	CNN	Name, profile metadata, and text	0.83	0.46	N/A
Preotiuc-Pietro and Ungar, 2018 [96]	Quaternary	LR with EN	Text, topics, sentiment, part-of-speech tagging, name, perceived race labels, and ensemble	N/A	N/A	0.88 (African American), 0.78 (Latino), 0.83 (Asian), and 0.83 (White)
Mueller et al, 2021 [91]	Quaternary	CNN	Text and accounts followed	N/A	0.25 (Asian), 0.63 (African American or Black), 0.28 (Hispanic), and 0.90 (White)	N/A
Bergsma et al, 2013 [38]	Multinomial (>4)	SVM	Name and name clusters	0.81	N/A	N/A
Nguyen et al, 2018 [66]	Multinomial (>4)	Neural network	Images	0.53	N/A	N/A

^aML: machine learning.

^bGBDT: gradient-boosted decision tree.

^cN/A: not applicable.

^dSVM: support vector machine.

^eDLLP: deep learning from label proportions.

^fLR: logistic regression.

^gCNN: convolutional neural network.

^hNR: not reported.

ⁱEN: elastic net.