Table 3.
Test set performance (AUC with 95% confidence intervals (CI)) of the radiomics based machine model (MR), and clinical based machine learning model (MC) and CovSafeNet on D1test, D2, D3 and D4 datasets. The performance of CovSafeNet was compared with models MR and MC using DeLong’s test.
| Model AUC (95% CI) | D1test N=419 | D2 N=113 | D3 N=2000 | D4 N=282 | Combined (D1test, D2, D3 D4) N=2814 | Combined (D1test, D2, D4) N = 814 | |
|---|---|---|---|---|---|---|---|
| Radiomics, MR | 0.893 (0.856, 0.931) | 0.641 (0.523, 0.752) | 0.723 (0.692, 0.742) | 0.579 (0.514, 0.648) | 0.662 (0.631, 0.684) | 0.674 (0.631, 0.703) | |
| Clinical, MC | 0.686 (0.638, 0.726) | 0.668 (0.549, 0.781) | - | 0.664 (0.599, 0.732) | - | 0.602 (0.568, 0.651) | |
| CovSafeNet | 0.890 (0.860, 0.921) | 0.769 (0.667, 0.870) | 0.732 (0.704, 0.761) | 0.654 (0.583, 0.724) | 0.693* (0.671, 0.716) | 0.688* (0.653, 0.724) | |
| DeLong’s Test with MR | (p=0.6028) | (p=0.0485) | (p=0.4116) | (p=0.0323) | (p<0.0001) | (p<0.0001) | |
| DeLong’s Test with clinical model MC | (p<0.0001) | (p=0.0925) | - | (p=0.6321) | - | (p<0.0001) |
indicates statistically significant improvement as indicated by DeLong’s test.