Table 2.

Test set performance (AUC with 95% confidence intervals (CI)) of M₁, M₂ and CovSafeNet on D₁^test_, D₂, D₃ and D₄ datasets. The performance of CovSafeNet was compared with models M₁ and M₂ using DeLong’s test.

Model AUC (95% CI)		D₁^test N=419	D₂ N=113	D₃ N=2000	D₄ N=282	Combined N=2814
M₁		0.850 (0.814, 0.888)	0.714 (0.612, 0.816)	0.709 (0.681, 0.738)	0.610 (0.542, 0.678)	0.655 (0.633, 0.678)
M₂		0.867 (0.833, 0.901)	0.770 (0.667, 0.873)	0.697 (0.666, 0.728)	0.650 (0.579, 0.723)	0.680 (0.658, 0.702)
CovSafeNet		0.890^* (0.860, 0.921)	0.769 (0.667, 0.870)	0.732 (0.704, 0.761)	0.654 (0.583, 0.724)	0.693^* (0.671, 0.716)
	DeLong’s Test with M₁	p=0.0342	p= 0.9877	p<0.0001	p=0.812	p=0.0123
	DeLong’s Test with M₂	p=0.0001	p=0.0589	p=0.0548	p=0.0558	p<0.0001

indicates statistically significant improvement as indicated by DeLong’s test.