Skip to main content
. 2024 Jan 11;2023:1125–1134.

Table 3:

Full-Trained Evaluation: precision, recall, and F1-score metrics for different sentiment analysis models with various types of averages (Macro, Micro, and Mean). The best performing model in each scenario is highlighted in bold. Results with statistical significance at p < 0.05.

Model precisionM recallM f 1M precisionµ recallµ f 1µ precisionm recallm f1m
RoBERTa 0.4250 0.4083 0.4043 0.7333 0.7333 0.7333 0.6461 0.7333 0.6793
DistilBERT 0.3675 0.3982 0.3803 0.7111 0.7111 0.7111 0.6159 0.7111 0.6589
MiniLM 0.2500 0.3030 0.2739 0.6666 0.6666 0.6666 0.5500 0.6666 0.6027
BLOOM 0.2416 0.2929 0.2648 0.6444 0.6444 0.6444 0.5316 0.6444 0.5826
RoBERTac 0.6875 0.5946 0.6011 0.7555 0.7555 0.7555 0.7283 0.7555 0.7169
DistilBERTc 0.6346 0.5795 0.5833 0.7333 0.7333 0.7333 0.6974 0.7333 0.7000
MiniLMc 0.5750 0.5378 0.5286 0.7111 0.7111 0.7111 0.6566 0.7111 0.6654
BLOOMc 0.4625 0.4810 0.4560 0.6666 0.6666 0.6666 0.5850 0.6666 0.6140