Skip to main content
. 2021 Nov 11;11:22083. doi: 10.1038/s41598-021-01487-w

Table 3.

Performance of our hate speech classification model on the training set (cross validation results) and the out-of-sample evaluation set, in comparison to the inter-annotator agreement on the same datasets. The overall performance is measured by Krippendorff’s Alpha and accuracy (Acc), and performance for individual classes by F1. Note that the performance of our model is comparable to

the annotator agreement, except for the Violent class, indicated by lower F1.

Performance and agreement Overall Acceptable Inappropriate Offensive Violent
Alpha Acc F1 F1 F1 F1
Model
Training 0.59 0.79 0.87 0.54 0.64 0.52
Evaluation 0.55 0.84 0.91 0.59 0.58 0.39
Inter-annotator
Training 0.59 0.77 0.86 0.52 0.63 0.63
Evaluation 0.56 0.82 0.90 0.53 0.57 0.55