Table 17. Benchmark of the different LLMs with the whole corpus and the two splits evaluated for multi-label sentiment classification.
Dataset | Model | W-P | W-R | W-F1 | M-F1 |
---|---|---|---|---|---|
Tweets | BETO | 0.7476 | 0.7362 | 0.7314 | 0.6397 |
ALBETO | 0.7170 | 0.6997 | 0.6928 | 0.5664 | |
DistilBETO | 0.7060 | 0.7058 | 0.6971 | 0.5729 | |
MarIA | 0.7669 | 0.7445 | 0.7395 | 0.6398 | |
BERTIN | 0.7421 | 0.7171 | 0.7102 | 0.6079 | |
Headlines | BETO | 0.7400 | 0.7253 | 0.7284 | 0.6515 |
ALBETO | 0.7137 | 0.7006 | 0.7021 | 0.6133 | |
DistilBETO | 0.6973 | 0.6710 | 0.6808 | 0.5892 | |
MarIA | 0.7439 | 0.7266 | 0.7301 | 0.6542 | |
BERTIN | 0.7301 | 0.7188 | 0.7165 | 0.6171 | |
Total | BETO | 0.7734 | 0.7640 | 0.7680 | 0.7113 |
ALBETO | 0.7399 | 0.7332 | 0.7324 | 0.6543 | |
DistilBETO | 0.7468 | 0.7279 | 0.7338 | 0.6561 | |
MarIA | 0.7827 | 0.7597 | 0.7678 | 0.7017 | |
BERTIN | 0.7600 | 0.7553 | 0.7516 | 0.6679 |