Table 2.
models | AUC | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) |
---|---|---|---|---|---|
DLR model | |||||
Training set (n = 489) | 0.909 (0.879, 0.933) | 0.895 (0.862, 0.923) | 0.763 (0.634, 0.864) | 0.965 (0.942, 0.981) | 0.500 (0.392, 0.608) |
Validation set (n = 114) | 0.830 (0.748, 0.894) | 0.770 (0.675, 0.848) | 0.786 (0.492, 0.953) | 0.962 (0.894, 0.992) | 0.324 (0.174, 0.505) |
Test set (n = 90) | 0.815a (0.719, 0.889) | 0.878 (0.782, 0.943) | 0.500 (0.247, 0.753) | 0.890 (0.795, 0.951) | 0.471 (0.230, 0.722) |
DLP model | |||||
Training set (n = 489) | 0.882 (0.850, 0.909) | 0.730 (0.686, 0.772) | 0.915 (0.813, 0.972) | 0.984 (0.964, 0.995) | 0.318 (0.248, 0.394) |
Validation set (n = 114) | 0.827 (0.745, 0.892) | 0.660 (0.558, 0.752) | 0.857 (0.572, 0.982) | 0.971 (0.898, 0.996) | 0.261 (0.143, 0.411) |
Test set (n = 90) | 0.802b,d (0.704, 0.878) | 0.419 (0.305, 0.539) | 0.875 (0.617, 0.984) | 0.939 (0.798, 0.993) | 0.246 (0.141, 0.378) |
DLP-manual model | |||||
Training set (n = 489) | 0.889 (0.857, 0.915) | 0.754 (0.710, 0.794) | 0.881 (0.771, 0.951) | 0.979 (0.957, 0.991) | 0.329 (0.257, 0.408) |
Validation set (n = 114) | 0.872 (0.797, 0.927) | 0.660 (0.558, 0.752) | 0.929 (0.661, 0.998) | 0.985 (0.920, 1.000) | 0.277 (0.156, 0.426) |
Test set (n = 90) | 0.834c (0.740, 0.904) | 0.405 (0.293, 0.526) | 0.937 (0.698, 0.998) | 0.968 (0.833, 0.999) | 0.254 (0.150, 0.384) |
DLRP model | |||||
Training set (n = 489) | 0.975 (0.956, 0.987) | 0.907 (0.875, 0.933) | 1.000 (0.939, 1.000) | 1.000 (0.991, 1.000) | 0.596 (0.493, 0.693) |
Validation set (n = 114) | 0.929 (0.865, 0.968) | 0.940 (0.874, 0.978) | 0.643 (0.351, 0.872) | 0.949 (0.886, 0.984) | 0.600 (0.323, 0.837) |
Test set (n = 90) | 0.900 (0.819, 0.953) | 0.892 (0.798, 0.952) | 0.812 (0.544, 0.960) | 0.957 (0.878, 0.991) | 0.619 (0.384, 0.819) |
Note: data in parentheses are 95% confidence intervals. DLR, deep learning radiomics; DLP, deep learning pathomics; DLP-manual, deep learning pathomics trained on manually annotated WSI; DLRP, deep learning radiopathomics; AUC, area under the receiver operating characteristic curve; PPV, positive predictive value; NPV, Negative predictive value.
Indicates P = 0.027, Delong et al. in comparison with DLRP model in independent test set.
Indicates P = 0.013, Delong et al. in comparison with DLRP model in independent test set.
Indicates P = 0.023, Delong et al. in comparison with DLRP model in independent test set.
Indicates P = 0.352, Delong et al. in comparison with DLP-manual model in independent test set.