. 2023 May 9;9:e1377. doi: 10.7717/peerj-cs.1377

Table 14. Benchmark of the different LLMs with the whole corpus and the two splits evaluated for sentiment towards other companies.

For each model and dataset, the weighted precision (W-P), weighted recall (W-R), weighted (W-F1) and macro (M-F1) are reported.

Dataset	Model	W-P	W-R	W-F1	M-F1
Tweets	BETO	0.7161	0.7288	0.7031	0.6056
	ALBETO	0.6532	0.6806	0.6600	0.5448
	DistilBETO	0.6765	0.7001	0.6757	0.5730
	MarIA	0.7088	0.7158	0.7049	0.6144
	BERTIN	0.6184	0.6441	0.6271	0.5055
Headlines	BETO	0.6832	0.6728	0.6770	0.6035
	ALBETO	0.6541	0.6375	0.6438	0.5657
	DistilBETO	0.6453	0.6375	0.6409	0.5491
	MarIA	0.6858	0.6780	0.6809	0.6054
	BERTIN	0.6569	0.5698	0.5924	0.5109
Total	BETO	0.7384	0.7445	0.7400	0.6711
	ALBETO	0.7251	0.7327	0.7280	0.6503
	DistilBETO	0.7155	0.7223	0.7177	0.6352
	MarIA	0.7373	0.7445	0.7382	0.6655
	BERTIN	0.7020	0.6741	0.6816	0.6028