. 2024 Dec 30;24:353. doi: 10.1186/s12880-024-01548-2

Table 4.

Prediction performance of each model in the training, testing, and validation sets

	Model	AUC (95% CI)	Accuracy	F1 score	Sensitivity	Specificity	Delong test p
Training set	Clinical	0.982 (0.964–0.995)	0.927	0.927	0.911	0.944	0.061
	MRS	0.856 (0.798–0.911)	0.770	0.757	0.941	0.578	<0.001^*
	Intra-radiomics	0.939 (0.905–0.968)	0.874	0.875	0.871	0.878	<0.001^*
	Peri-radiomics	0.957 (0.928–0.980)	0.911	0.911	0.891	0.933	0.004^*
	Combined	1.000 (0.999-1.000)	0.990	0.989	1.000	0.978	Ref
Testing set	Clinical	0.868 (0.782–0.947)	0.795	0.795	0.795	0.795	0.024
	MRS	0.824 (0.726–0.904)	0.735	0.720	0.909	0.538	0.004^*
	Intra-radiomics	0.936 (0.884–0.980)	0.880	0.879	0.886	0.872	0.352
	Peri-radiomics	0.943 (0.894–0.983)	0.904	0.902	0.955	0.846	0.442
	Combined	0.968 (0.924–0.995)	0.928	0.927	0.932	0.923	Ref
Validation set	Clinical	0.834 (0.718–0.936)	0.737	0.732	0.821	0.688	<0.001^*
	MRS	0.787 (0.660–0.896)	0.750	0.743	0.786	0.729	<0.001^*
	Intra-radiomics	0.913 (0.843–0.974)	0.816	0.811	0.893	0.771	0.365
	Peri-radiomics	0.893 (0.795–0.969)	0.882	0.874	0.857	0.896	0.117
	Combined	0.940 (0.881–0.988)	0.895	0.890	0.923	0.875	Ref

AUC, aera under the curve, CI, confidence interval

*, p<0.0125. P-values were adjusted for multiple comparisons using the Bonferroni correction (alpha = 0.05, adjusted p-value threshold = 0.0125)