. 2023 May 9;9:e1377. doi: 10.7717/peerj-cs.1377

Table 9. Benchmark of the different LLMs with the whole corpus and the two splits evaluated for sentiment towards the MET.

For each model and dataset, the weighted precision (W-P), weighted recall (W-R), weighted (W-F1) and macro (M-F1) are reported.

Dataset	Model	W-P	W-R	W-F1	M-F1
Tweet	BETO	0.8227	0.8266	0.8193	0.6284
	ALBETO	0.7813	0.8110	0.7950	0.5386
	DistilBETO	0.7652	0.7992	0.7818	0.5268
	MarIA	0.8200	0.8292	0.8238	0.6371
	BERTIN	0.7884	0.8120	0.7999	0.5409
Headlines	BETO	0.8336	0.8409	0.8341	0.6596
	ALBETO	0.8034	0.8031	0.8019	0.6629
	DistilBETO	0.7945	0.8018	0.7967	0.6214
	MarIA	0.8380	0.8370	0.8375	0.6949
	BERTIN	0.8006	0.8031	0.8018	0.6564
Total	BETO	0.8428	0.8422	0.8424	0.7259
	ALBETO	0.8428	0.8435	0.8428	0.7336
	DistilBETO	0.8068	0.8136	0.8089	0.6599
	MarIA	0.8580	0.8605	0.8588	0.7597
	BERTIN	0.8270	0.8201	0.8229	0.6743