Table 2.
Metrics | BOW0a | BOW1b | BOW2c | BOW3d | Best systeme (95% CI) |
Domain experts average |
True positives | 29 | 45 | 31 | 30 | 49 | 36 |
True negatives | 207 | 200 | 210 | 209 | 205 | 206 |
False positives | 6 | 13 | 3 | 4 | 8 | 7 |
False negatives | 40 | 24 | 38 | 39 | 20 | 33 |
Sensitivity | .420 | .652 | .449 | .435 | .710 (.683-.737) |
.527 |
Positive predictive value | .829 | .776 | .912 | .882 | .860 (.833-.886) |
.848 |
F1 measure | .556 | .709 | .602 | .583 | .778 | .650 |
Specificity | .972 | .939 | .986 | .981 | .962 (.951-.974) |
.966 |
Accuracy | .837 | .869 | .855 | .847 | .901 (.883-.918) |
.862 |
a BOW0: Initial bag-of-words.
bBOW1: First refined bag-of-words.
cBOW2: Second (more specific) refined bag-of-words.
dBOW3: Third (most specific) refined bag-of-words.
eBOW1 with refined dictionary.