Skip to main content
. 2023 Dec 20;14:222. doi: 10.1186/s13244-023-01550-2

Table 2.

Performance of the four models and three radiologists according to the test sets

Modality AUC Accuracy Sensitivity Specificity PPV NPV Kappa value F1 score
a. Performance metrics of the models and US specialists on the primary internal test set A.
 Clinical 0.70 (0.59–0.80) 0.63 (0.52–0.73) 0.62 (0.49–0.74) 0.64 (0.51–0.77) 0.63 0.63 0.26 0.62
 BMUS 0.82 (0.74–0.90) 0.77 (0.67–0.85) 0.81 (0.70–0.91) 0.72 (0.60–0.85) 0.75 0.79 0.53 0.78
 CDFI 0.77 (0.67–0.86) 0.70 (0.60–0.79) 0.85 (0.74–0.94) 0.55 (0.40–0.68) 0.66 0.79 0.40 0.74
 Ensemble 0.86 (0.78–0.94) 0.79 (0.69–0.86) 0.83 (0.72–0.94) 0.74 (0.62–0.87) 0.78 0.82 0.57 0.79
 Expert 1 N/A 0.63 (0.52–0.73) 0.62 (0.46–0.75) 0.64 (0.48–0.77) 0.63 0.63 0.26 0.62
 Expert 2 N/A 0.55 (0.42–0.67) 0.43 (0.29–0.58) 0.68 (0.53–0.80) 0.57 0.54 0.15 0.49
 Expert 3 N/A 0.49 (0.39–0.60) 0.47 (0.32–0.62) 0.51 (0.36–0.66) 0.49 0.49 0.11 0.48
b. Performance metrics of the models and US specialists on the secondary external test set B.
 Clinical 0.62 (0.51–0.72) 0.60 (0.49–0.70) 0.66 (0.52–0.80) 0.58 (0.42–0.71) 0.60 0.63 0.24 0.63
 BMUS 0.71 (0.61–0.82) 0.66 (0.54–0.75) 0.73 (0.57–0.85) 0.60 (0.44–0.74) 0.64 0.69 0.33 0.68
 CDFI 0.72 (0.62–0.83) 0.67 (0.57–0.77) 0.77 (0.64–0.89) 0.58 (0.42–0.71) 0.64 0.72 0.39 0.70
 Ensemble 0.77 (0.68–0.87) 0.72 (0.61–0.81) 0.75 (0.61–0.86) 0.69 (0.56–0.82) 0.70 0.74 0.44 0.72
 Expert 1 N/A 0.66 (0.54–0.75) 0.67 (0.51–0.80) 0.66 (0.50–0.79) 0.67 0.66 0.33 0.67
 Expert 2 N/A 0.58 (0.47–0.70) 0.62 (0.47–0.76) 0.55 (0.39–0.69) 0.58 0.59 0.17 0.60
 Expert 3 N/A 0.52 (0.41–0.63) 0.44 (0.30–0.60) 0.59 (0.43–0.73) 0.53 0.51 0.03 0.48