Table 3.
Turker consensus in Phase III.
| Number correct (mean)a | % correct (mean) | Number correct (mode)a | % correct (mode) | Sensitivityb | Specificityb | |
| Phase I: Four-category rating | 5 | 26.3 | 11 | 57.9 | 100.0 | 100.0 |
| Phase 3: Trial 1 (improved training) | 4 | 21.1c | 8d | 42.1 | 100.0 | 57.1 |
| Phase 3: Trial 2 (raised approval rating) | 10 | 52.6 | 11e | 57.9 | 100.0 | 100.0 |
| Phase 3: Trial 3 (Master Graders) | 7 | 36.8 | 11 | 57.9 | 100.0 | 100.0 |
aCalculated by level (eg, Turker consensus matches expert designation as normal, mild, moderate, and severe).
bCalculated for normal versus any disease level using the mode consensus score.
cAfter excluding a single Turker with systematically higher scores, 42.1% correct.
dThree images had no mode and were considered incorrect for “Number Correct” and “% correct” but recoded as abnormal for sensitivity and specificity.
eOne image had no mode and was considered incorrect for “Number Correct” and “% correct” but recoded as abnormal for sensitivity and specificity.