Skip to main content
. 2019 Sep 18;2(4):528–537. doi: 10.1093/jamiaopen/ooz040

Table 3.

Performance of distant labels and classification models using 146 manually reviewed gold standard patients

Performance measurementsa (95% confidence interval)
Area under the curve (AUC) Sensitivity Specificity PPV NPV F-1 score Accuracy
Distant labels NA 0.889 0.797 0.810 0.881 0.848 0.842
(0.818, 0.957) (0.700, 0.889) (0.723, 0.899) (0.797, 0.952) (0.783, 0.908) (0.781, 0.904)
Classifier A (CCR) 0.789 0.542 0.824 0.750 0.649 0.629 0.685
(0.716, 0.861) (0.423, 0.662) (0.727, 0.899) (0.633, 0.863) (0.543, 0.744) (0.521, 0.726) (0.603, 0.760)
B (NLP) 0.917 0.861 0.878 0.873 0.867 0.867 0.870
(0.868, 0.966) (0.778, 0.933) (0.800, 0.944) (0.794, 0.943) (0.783, 0.936) (0.800, 0.925) (0.815, 0.925)
C (NLP + CCR) 0.925 (0.880, 0.969) 0.861 0.878 0.873 0.867 0.867 0.870
(0.778, 0.933) (0.800, 0.944) (0.794, 0.943) (0.783, 0.936) (0.800, 0.925) (0.815, 0.925)
a

Note that positive predictive value (PPV), negative predictive value (NPV), F-1 score, and overall accuracy are highly dependent on the prevalence of the condition, which in our case is 72/146 = 0.49. The actual prevalence of recurrent metastatic breast cancer in our study population is likely to be much lower. However, sensitivity, specificity, and area under the curve (AUC) are intrinsic properties of classifier and are insensitive to prevalence of cases.32,33