Skip to main content
. 2024 Jan 11;2023:1125–1134.

Table 1:

Zero-shot Evaluation: precision, recall, and F1-score metrics for different sentiment analysis models with various types of averages (Macro, Micro, and Mean). The best performing model in each scenario is highlighted in bold. Results with statistical significance at p < 0.05.

Model precisionM recallM f1M precisionµ recallµ f1µ precisionm recallm f 1m
RoBERTa 0.7539 0.3683 0.3567 0.6604 0.6604 0.6604 0.6290 0.6604 0.6443
DistilBERT 0.1713 0.4436 0.2240 0.1927 0.1927 0.1927 0.0537 0.1927 0.0781
BLOOM 0.1092 0.4055 0.1702 0.1718 0.1718 0.1718 0.0415 0.1718 0.0665
DistilBERTn 0.4647 0.3869 0.2750 0.2864 0.2864 0.2864 0.5986 0.2864 0.2740
BLOOMn 0.3907 0.3473 0.2283 0.2708 0.2708 0.2708 0.5502 0.2708 0.2553
RoBERTac 0.7142 0.5178 0.4711 0.6604 0.6604 0.6604 0.6385 0.6604 0.6489
DistilBERTc 0.4700 0.4763 0.3145 0.3177 0.3177 0.3177 0.5904 0.3177 0.2908
BLOOMc 0.4400 0.4516 0.2993 0.3020 0.3020 0.3020 0.5508 0.3020 0.2769