Table 3.

Test set performance (AUC with 95% confidence intervals (CI)) of the radiomics based machine model (M_R), and clinical based machine learning model (M_C) and CovSafeNet on D₁^test_, D₂, D₃ and D₄ datasets. The performance of CovSafeNet was compared with models M_R and M_C using DeLong’s test.

Model AUC (95% CI)		D₁^test N=419	D₂ N=113	D₃ N=2000	D₄ N=282	Combined (D₁^test, D2, D3 D4) N=2814	Combined (D₁^test, D2, D4) N = 814
Radiomics, M_R		0.893 (0.856, 0.931)	0.641 (0.523, 0.752)	0.723 (0.692, 0.742)	0.579 (0.514, 0.648)	0.662 (0.631, 0.684)	0.674 (0.631, 0.703)
Clinical, M_C		0.686 (0.638, 0.726)	0.668 (0.549, 0.781)	-	0.664 (0.599, 0.732)	-	0.602 (0.568, 0.651)
CovSafeNet		0.890 (0.860, 0.921)	0.769 (0.667, 0.870)	0.732 (0.704, 0.761)	0.654 (0.583, 0.724)	0.693^* (0.671, 0.716)	0.688^* (0.653, 0.724)
	DeLong’s Test with M_R	(p=0.6028)	(p=0.0485)	(p=0.4116)	(p=0.0323)	(p<0.0001)	(p<0.0001)
	DeLong’s Test with clinical model M_C	(p<0.0001)	(p=0.0925)	-	(p=0.6321)	-	(p<0.0001)

indicates statistically significant improvement as indicated by DeLong’s test.