Skip to main content
. 2024 Apr 4;56(4):2782–2803. doi: 10.3758/s13428-024-02381-9

Table 4.

Best-performing methods within each text mining method family

Feedback data Election data
Family Method F-score Acc. Subset acc. Method F-score Acc. Subset acc.
RMD Dict. 1 .329 .800 .152 Dict. 1 .492 .875 .312
CMD Dict. 2 .537 .806 .260 Dict. 3 .477 .879 .298
SML RoBERTa .779 .919 .562 RoBERTa .696 .939 .550
Zero-shot GPT-4 .499 .666 .032 GPT-4 .409 .836 .227
Reddit data Hate speech dataa
Family Method F-score Acc. Subset acc. Method F-score Acc. Subset acc.
RMD Dict. 1 .369 .734 .195 Dict. 1 .611 .611 NA
CMD Dict. 3 .501 .837 .376 Dict. 3 .845 .845 NA
SML RoBERTa .690 .927 .626 RoBERTa .907 .907 NA
Zero-shot GPT-3.5 .492 .861 .386 GPT-4 .897 .897 NA

All performances are reported on the holdout dataset, which is identical for each method

a In the Hate speech dataset, different categories cannot co-occur in a single text, which is why micro-averaged score and accuracy take the same values and the subset accuracy is not calculated