. 2024 Mar 27;15:2681. doi: 10.1038/s41467-024-46700-2

Table 3.

Diagnostic performance of OvcaFinder and human readers using O-RADS

Internal dataset	Reader A		Reader B		Reader C		Reader D		Reader E
Internal dataset	O-RADS	OvcaFinder	O-RADS	OvcaFinder	O-RADS	OvcaFinder	O-RADS	OvcaFinder	O-RADS	OvcaFinder
AUC	0.907 (0.860, 0.946)	0.971 (0.943, 0.993)	0.900 (0.838, 0.946)	0.980 (0.957, 0.999)	0.958 (0.919, 0.987)	0.981 (0.949, 0.998)	0.947 (0.907, 0.978)	0.978 (0.949, 0.994)	0.924 (0.874, 0.966)	0.976 (0.951, 0.999)
p	0.002		1.50 × 10⁻³		0.120		0.056		0.007
Sensitivity (%)	97.3 (93.3, 100.0)	97.3 (93.3, 100.0)	96.0 (90.7, 100.0)	96.0 (89.3, 100.0)	93.3 (86.7, 98.7)	97.3 (93.3, 100.0)	97.3 (94.7, 100.0)	97.3 (92.0, 100.0)	97.3 (93.3, 100.0)	97.3 (93.3, 100.0)
p	1.00		1.00		0.375		1.00		1.00
Specificity (%)	61.1 (49.1, 75.5)	81.5 (71.7, 92.5)	72.2 (60.4, 81.1)	92.6 (83.0, 98.1)	87.0 (79.3, 94.3)	92.6 (83.0, 98.1)	77.8 (66.0, 88.7)	83.3 (73.6, 90.6)	68.5 (54.7, 81.1)	83.3 (71.7, 94.3)
p	0.013		9.77 × 10⁻⁴		0.375		0.549		0.057
Accuracy (%)	82.2 (77.3, 88.3)	90.7 (85.9, 94.5)	86.1 (81.3, 90.6)	94.6 (89.8, 97.7)	90.7 (85.2, 95.3)	95.4 (91.4, 97.7)	89.2 (84.4, 94.5)	91.5 (86.7, 95.3)	85.3 (80.5, 90.6)	91.5 (86.7, 95.3)
PPV (%)	77.7 (73.3, 85.2)	88.0 (83.0, 94.9)	82.8(77.9, 88.1)	94.7 (88.9, 98.7)	90.9 (86.3, 96.0)	94.8 (88.9, 98.7)	85.9 (80.2, 92.4)	89.0 (84.1, 93.7)	81.1 (75.5, 87.5)	89.0 (83.3, 95.9)
NPV (%)	94.3 (86.5, 100.0)	95.7 (89.6, 100.0)	92.9 (84.4, 100.0)	94.3 (86.2, 100.0)	90.4 (82.0, 97.9)	96.2 (90.9, 100.0)	95.5 (90.0, 100.0)	95.7 (88.5, 100.0)	94.9 (87.8, 100.0)	95.7 (89.1, 100.0)
External test dataset
AUC	0.888 (0.847, 0.928)	0.941 (0.902, 0.966)	0.894 (0.854, 0.922)	0.935 (0.902, 0.967)	0.894 (0.855, 0.932)	0.942 (0.913, 0.971)	0.915 (0.882, 0.945)	0.943 (0.909, 0.968)	0.927 (0.890, 0.954)	0.946 (0.911, 0.971)
p	0.006		0.005		0.008		0.025		0.149
Sensitivity (%)	87.7 (79.0, 93.8)	88.9 (81.5, 96.3)	86.4 (79.0, 92.6)	87.7 (81.5, 92.6)	77.8 (69.1, 86.4)	87.7 (77.8, 95.1)	87.7 (80.3, 95.1)	88.9 (81.5, 95.1)	88.9 (79.0, 95.1)	88.9 (81.5, 93.8)
	1.00		1.00		0.077		1.00		1.00
Specificity (%)	70.6 (65.0, 75.5)	87.6 (83.7, 91.2)	81.4 (76.1, 85.6)	90.5 (86.9, 93.5)	89.2 (85.3, 92.5)	91.8 (88.6, 94.8)	81.7 (78.4, 86.0)	89.5 (85.6, 92.5)	86.0 (82.0, 89.2)	90.9 (87.3, 93.5)
	3.07 × 10⁻¹²		8.36 × 10⁻⁶		0.185		6.96 × 10⁻⁵		0.004
Accuracy (%)	74.2 (69.8, 78.0)	87.9 (83.7, 91.0)	82.4 (77.8, 86.1)	89.9 (86.1, 93.3)	86.8 (83.7, 89.9)	91.0 (87.9, 94.1)	83.0 (79.6, 86.6)	89.4 (85.8, 91.7)	86.6 (83.2, 89.4)	90.4 (87.6, 92.8)
PPV (%)	44.1 (39.7, 48.7)	65.5 (57.6, 72.6)	55.1 (48.0, 61.5)	71.0 (62.6, 78.4)	65.6 (58.5, 74.2)	74.0 (67.0, 82.0)	55.9 (50.8, 62.0)	69.2 (61.4, 75.3)	62.6 (56.5, 68.9)	72.0 (65.2, 78.3)
NPV (%)	95.6 (92.8, 97.8)	96.8 (94.6, 98.9)	95.8 (93.6, 97.7)	96.5 (94.6, 98.0)	93.8 (91.6, 96.1)	96.6 (94.1, 98.6)	96.2 (94.0, 98.4)	96.8 (94.7, 98.6)	96.7 (94.1, 98.5)	96.8 (95.0, 98.3)

Data in parentheses are 95% confidence intervals; O-RADS Ovarian-Adnexal Reporting and Data System, AUC Area under the receiver operating characteristic curve, PPV Positive predictive value, NPV Negative predictive value. The p-values of AUC were calculated using the function ‘roc_test’ in the python package of pROC. The p-values of sensitivity and specificity were calculated via two-sided McNemar test.