Skip to main content
. 2023 May 9;9:e1377. doi: 10.7717/peerj-cs.1377

Table 17. Benchmark of the different LLMs with the whole corpus and the two splits evaluated for multi-label sentiment classification.

For each model and dataset, weighted precision (W-P), weighted recall (W-R), weighted (W-F1) and macro (M-F1) are reported.

Dataset Model W-P W-R W-F1 M-F1
Tweets BETO 0.7476 0.7362 0.7314 0.6397
ALBETO 0.7170 0.6997 0.6928 0.5664
DistilBETO 0.7060 0.7058 0.6971 0.5729
MarIA 0.7669 0.7445 0.7395 0.6398
BERTIN 0.7421 0.7171 0.7102 0.6079
Headlines BETO 0.7400 0.7253 0.7284 0.6515
ALBETO 0.7137 0.7006 0.7021 0.6133
DistilBETO 0.6973 0.6710 0.6808 0.5892
MarIA 0.7439 0.7266 0.7301 0.6542
BERTIN 0.7301 0.7188 0.7165 0.6171
Total BETO 0.7734 0.7640 0.7680 0.7113
ALBETO 0.7399 0.7332 0.7324 0.6543
DistilBETO 0.7468 0.7279 0.7338 0.6561
MarIA 0.7827 0.7597 0.7678 0.7017
BERTIN 0.7600 0.7553 0.7516 0.6679