. 2023 Dec 20;14:222. doi: 10.1186/s13244-023-01550-2

Table 2.

Performance of the four models and three radiologists according to the test sets

Modality	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV	Kappa value	F1 score
a. Performance metrics of the models and US specialists on the primary internal test set A.
Clinical	0.70 (0.59–0.80)	0.63 (0.52–0.73)	0.62 (0.49–0.74)	0.64 (0.51–0.77)	0.63	0.63	0.26	0.62
BMUS	0.82 (0.74–0.90)	0.77 (0.67–0.85)	0.81 (0.70–0.91)	0.72 (0.60–0.85)	0.75	0.79	0.53	0.78
CDFI	0.77 (0.67–0.86)	0.70 (0.60–0.79)	0.85 (0.74–0.94)	0.55 (0.40–0.68)	0.66	0.79	0.40	0.74
Ensemble	0.86 (0.78–0.94)	0.79 (0.69–0.86)	0.83 (0.72–0.94)	0.74 (0.62–0.87)	0.78	0.82	0.57	0.79
Expert 1	N/A	0.63 (0.52–0.73)	0.62 (0.46–0.75)	0.64 (0.48–0.77)	0.63	0.63	0.26	0.62
Expert 2	N/A	0.55 (0.42–0.67)	0.43 (0.29–0.58)	0.68 (0.53–0.80)	0.57	0.54	0.15	0.49
Expert 3	N/A	0.49 (0.39–0.60)	0.47 (0.32–0.62)	0.51 (0.36–0.66)	0.49	0.49	0.11	0.48
b. Performance metrics of the models and US specialists on the secondary external test set B.
Clinical	0.62 (0.51–0.72)	0.60 (0.49–0.70)	0.66 (0.52–0.80)	0.58 (0.42–0.71)	0.60	0.63	0.24	0.63
BMUS	0.71 (0.61–0.82)	0.66 (0.54–0.75)	0.73 (0.57–0.85)	0.60 (0.44–0.74)	0.64	0.69	0.33	0.68
CDFI	0.72 (0.62–0.83)	0.67 (0.57–0.77)	0.77 (0.64–0.89)	0.58 (0.42–0.71)	0.64	0.72	0.39	0.70
Ensemble	0.77 (0.68–0.87)	0.72 (0.61–0.81)	0.75 (0.61–0.86)	0.69 (0.56–0.82)	0.70	0.74	0.44	0.72
Expert 1	N/A	0.66 (0.54–0.75)	0.67 (0.51–0.80)	0.66 (0.50–0.79)	0.67	0.66	0.33	0.67
Expert 2	N/A	0.58 (0.47–0.70)	0.62 (0.47–0.76)	0.55 (0.39–0.69)	0.58	0.59	0.17	0.60
Expert 3	N/A	0.52 (0.41–0.63)	0.44 (0.30–0.60)	0.59 (0.43–0.73)	0.53	0.51	0.03	0.48