. 2024 Jan 11;2023:1125–1134.

Table 1:

Zero-shot Evaluation: precision, recall, and F1-score metrics for different sentiment analysis models with various types of averages (Macro, Micro, and Mean). The best performing model in each scenario is highlighted in bold. Results with statistical significance at p < 0.05.

Model	precision_M	recall_M	f1_M	precision_µ	recall_µ	f1_µ	precision_m	recall_m	f 1_m
RoBERTa	0.7539	0.3683	0.3567	0.6604	0.6604	0.6604	0.6290	0.6604	0.6443
DistilBERT	0.1713	0.4436	0.2240	0.1927	0.1927	0.1927	0.0537	0.1927	0.0781
BLOOM	0.1092	0.4055	0.1702	0.1718	0.1718	0.1718	0.0415	0.1718	0.0665
DistilBERT_n	0.4647	0.3869	0.2750	0.2864	0.2864	0.2864	0.5986	0.2864	0.2740
BLOOM_n	0.3907	0.3473	0.2283	0.2708	0.2708	0.2708	0.5502	0.2708	0.2553
RoBERTa_c	0.7142	0.5178	0.4711	0.6604	0.6604	0.6604	0.6385	0.6604	0.6489
DistilBERT_c	0.4700	0.4763	0.3145	0.3177	0.3177	0.3177	0.5904	0.3177	0.2908
BLOOM_c	0.4400	0.4516	0.2993	0.3020	0.3020	0.3020	0.5508	0.3020	0.2769