. 2014 Oct 30;16(10):e233. doi: 10.2196/jmir.3807

Table 3.

Turker consensus in Phase III.

	Number correct (mean)^a	% correct (mean)	Number correct (mode)^a	% correct (mode)	Sensitivity^b	Specificity^b
Phase I: Four-category rating	5	26.3	11	57.9	100.0	100.0
Phase 3: Trial 1 (improved training)	4	21.1^c	8^d	42.1	100.0	57.1
Phase 3: Trial 2 (raised approval rating)	10	52.6	11^e	57.9	100.0	100.0
Phase 3: Trial 3 (Master Graders)	7	36.8	11	57.9	100.0	100.0

^aCalculated by level (eg, Turker consensus matches expert designation as normal, mild, moderate, and severe).

^bCalculated for normal versus any disease level using the mode consensus score.

^cAfter excluding a single Turker with systematically higher scores, 42.1% correct.

^dThree images had no mode and were considered incorrect for “Number Correct” and “% correct” but recoded as abnormal for sensitivity and specificity.

^eOne image had no mode and was considered incorrect for “Number Correct” and “% correct” but recoded as abnormal for sensitivity and specificity.