. 2025 Mar 11;12:424. doi: 10.1038/s41597-025-04718-1

Table 3.

Performance metrics of various models using the SGE feature set evaluated on the CT-ADE-SOC²⁸ test split. All metrics are micro-averaged.

Model Type	Parameters (x10⁹)	Backbone	Precision (%)	Recall (%)	F1-score (%)	Accuracy (%)	Balanced Accuracy (%)
MAJ	0	—	0.00	0.00	0.00	88.68	50.00
Discriminative	0.11	ChemBERTa & PubMedBERT	51.65	55.40	53.46	89.08	74.39
Generative	7 – 8	Meditron	52.82	53.84	53.32	89.33	73.85
		OpenBioLLM	52.18	54.75	53.43	89.20	74.17
		Llama-3	53.60	58.42	55.90	89.57	75.98
	70	Meditron	61.01	44.10	51.20	90.49	70.25
		OpenBioLLM	60.28	42.42	49.79	90.32	69.42
		Llama-3	62.09	49.30	54.96	90.86	72.73