Skip to main content
. 2021 Jul 20;32(1):725–736. doi: 10.1007/s00330-021-08132-0

Table 3.

Binary classifier performance evaluated on reference-standard report labels and reference-standard image labels. Comparison is made with a logistic regression model using mean word2vec embeddings and N-grams (N = 1, 2, 3) which has previously been shown to accurately classify head CT reports [6]. AUC-ROC, accuracy, sensitivity, specificity, and F1 score are provided, along with the corresponding 95% confidence intervals

Model AUC-ROC Balanced accuracy (%) Sensitivity (%) Specificity (%) F1 (%)
Our model
Report label test set (n = 600) 0.991 ± 0.004 95.9 ± 0.2 96.5 ± 0.1 95.3 ± 0.2 96.2 ± 0.2
Image label test set (n = 250) 0.973 ± 0.004 91.8 ± 0.6 91.4 ± 0.3 92.1 ± 0.5 93.0 ± 0.5
Word2vec model [6]
Report label test set (n = 600) 0.969 ± 0.003 90.1 ± 0.3 89.1 ± 0.2 91.0 ± 0.2 90.3 ± 0.2
Image label test set (n = 250) 0.935 ± 0.004 86.2 ± 0.6 85.1 ± 0.4 87.3 ± 0.5 85.9 ± 0.5