Table 4.
Best-performing methods within each text mining method family
| Feedback data | Election data | |||||||
| Family | Method | F-score | Acc. | Subset acc. | Method | F-score | Acc. | Subset acc. |
| RMD | Dict. 1 | .329 | .800 | .152 | Dict. 1 | .492 | .875 | .312 |
| CMD | Dict. 2 | .537 | .806 | .260 | Dict. 3 | .477 | .879 | .298 |
| SML | RoBERTa | .779 | .919 | .562 | RoBERTa | .696 | .939 | .550 |
| Zero-shot | GPT-4 | .499 | .666 | .032 | GPT-4 | .409 | .836 | .227 |
| Reddit data | Hate speech dataa | |||||||
| Family | Method | F-score | Acc. | Subset acc. | Method | F-score | Acc. | Subset acc. |
| RMD | Dict. 1 | .369 | .734 | .195 | Dict. 1 | .611 | .611 | NA |
| CMD | Dict. 3 | .501 | .837 | .376 | Dict. 3 | .845 | .845 | NA |
| SML | RoBERTa | .690 | .927 | .626 | RoBERTa | .907 | .907 | NA |
| Zero-shot | GPT-3.5 | .492 | .861 | .386 | GPT-4 | .897 | .897 | NA |
All performances are reported on the holdout dataset, which is identical for each method
a In the Hate speech dataset, different categories cannot co-occur in a single text, which is why micro-averaged score and accuracy take the same values and the subset accuracy is not calculated