Table 2.
Internal test dataset | Clinical model | Image based-DL predictions | OvcaFinder |
---|---|---|---|
AUC | 0.936 | 0.970 | 0.978 |
(0.902, 0.975) | (0.934, 0.993) | (0.953, 0.998) | |
p | 0.007 | 0.152 | Reference |
Sensitivity (%) | 97.3 | 97.3 | 97.3 |
(93.3, 100.0) | (93.3, 100.0) | (93.3, 100.0) | |
p | 1.00 | 1.00 | Reference |
Specificity (%) | 40.7 | 74.1 | 83.3 |
(28.3, 52.8) | (62.3, 84.9) | (73.6, 92.4) | |
p | 1.52 × 10−5 | 0.062 | Reference |
Accuracy (%) | 73.6 | 87.6 | 91.5 |
(68.0, 79.7) | (82.0, 92.2) | (86.7, 96.1) | |
PPV (%) | 69.5 | 83.9 | 89.0 |
(65.5, 74.8) | (78.5, 90.1) | (83.3, 94.9) | |
NPV (%) | 91.7 | 95.2 | 95.7 |
(79.2, 100.0) | (88.6, 100.0) | (88.9, 100.0) | |
External test dataset | |||
AUC | 0.842 | 0.893 | 0.947 |
(0.776, 0.895) | (0.855, 0.933) | (0.917, 0.970) | |
p | 4.65 × 10−5 | 3.93 × 10−6 | Reference |
Sensitivity (%) | 85.2 | 88.9 | 88.9 |
(76.5, 92.6) | (81.5, 95.1) | (81.5, 95.1) | |
p | 0.581 | 1.000 | Reference |
Specificity (%) | 53.3 | 68.6 | 90.5 |
(047.7, 58.5) | (64.0, 73.5) | (87.3, 93.8) | |
p | 2.21 × 10−29 | 1.36 × 10−20 | Reference |
Accuracy (%) | 59.9 | 72.9 | 90.2 |
(55.3, 64.3) | (68.5, 77.3) | (87.1, 93.0) | |
PPV (%) | 32.5 | 42.9 | 0.713 |
(29.5, 35.7) | (38.6, 47.8) | (64.2, 79.1) | |
NPV (%) | 93.1 | 95.9 | 96.9 |
(89.8, 96.3) | (93.3, 98.2) | (94.8, 98.6) |
Data in parentheses are 95% confidence intervals; DL Deep learning, AUC Area under the receiver operating characteristic curve, PPV Positive predictive value, NPV Negative predictive value. We used an average value of O-RADS scores as the input factor of OvcaFinder. p values are for a comparison with OvcaFinder. The p-values of AUC were calculated using the function ‘roc_test’ in the python package of pROC. The p-values of sensitivity and specificity were calculated via two-sided McNemar test.