Skip to main content
. 2023 May 9;9:e1377. doi: 10.7717/peerj-cs.1377

Table 9. Benchmark of the different LLMs with the whole corpus and the two splits evaluated for sentiment towards the MET.

For each model and dataset, the weighted precision (W-P), weighted recall (W-R), weighted (W-F1) and macro (M-F1) are reported.

Dataset Model W-P W-R W-F1 M-F1
Tweet BETO 0.8227 0.8266 0.8193 0.6284
ALBETO 0.7813 0.8110 0.7950 0.5386
DistilBETO 0.7652 0.7992 0.7818 0.5268
MarIA 0.8200 0.8292 0.8238 0.6371
BERTIN 0.7884 0.8120 0.7999 0.5409
Headlines BETO 0.8336 0.8409 0.8341 0.6596
ALBETO 0.8034 0.8031 0.8019 0.6629
DistilBETO 0.7945 0.8018 0.7967 0.6214
MarIA 0.8380 0.8370 0.8375 0.6949
BERTIN 0.8006 0.8031 0.8018 0.6564
Total BETO 0.8428 0.8422 0.8424 0.7259
ALBETO 0.8428 0.8435 0.8428 0.7336
DistilBETO 0.8068 0.8136 0.8089 0.6599
MarIA 0.8580 0.8605 0.8588 0.7597
BERTIN 0.8270 0.8201 0.8229 0.6743