. 2025 Sep 2;11:e3007. doi: 10.7717/peerj-cs.3007

Table 2. Results after applying stacking ensemble.

ML	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)	AUC (%)	p-value (vs Stacking)	95% CI of F1 difference
DT	99.93	89.89	81.63	85.56	90.80	0.2027	[−0.0161 to 0.0550]
RF	99.96	97.40	76.53	85.71	97.25	0.0199	[−0.0450 to −0.0067]
SVM	99.94	97.02	66.33	78.79	95.13	0.0225	[0.0083–0.0635]
XGBoost	99.95	95.00	77.55	85.39	97.83	0.0628	[−0.0261 to 0.0011]
CatBoost	99.96	97.44	77.55	86.36	98.37	0.0315	[−0.0423 to −0.0033]
LR	99.92	88.06	60.20	71.52	97.01	0.0047	[0.0593–0.1727]
Stacking hybrid	99.96	98.73	79.59	88.14	89.80	----	----
Voting	99.95	97.37	75.51	85.06	97.45	----	----
Subspace Random	99.95	95.95	72.45	82.56	86.22	----	----
Bagging	99.95	97.30	73.47	83.72	86.73	----	----