Table 2.
Configuration
|
Performance*
|
||||
---|---|---|---|---|---|
Platform | Cost | Quality | Mean AUC | Mean Worker Sensitivity | Mean Worker Specificity |
|
|
||||
CrowdFlower | Low | Low | 0.52 | 0.67 | 0.71 |
High | Low | 0.73 | 0.66 | 0.70 | |
Low | High | 0.58 | 0.66 | 0.73 | |
High | High | 0.62 | 0.67 | 0.73 | |
Mechanical Turk | Low | Low | 0.48 | 0.65 | 0.72 |
High | Low | 0.60 | 0.65 | 0.73 | |
Low | High | 0.44 | 0.66 | 0.74 | |
High | High | 0.44 | 0.62 | 0.72 |
We measured the performance of the crowd via a bootstrapped AUC and estimated worker sensitivity/specificity in various configurations of platform, cost, and quality. We then compared each configuration to every other configuration to understand whether the performance varied significantly.
Performance on all configuration pairs differed significantly except:
Sensitivity CrowdFlower Low-Cost, Low-Quality vs. CrowdFlower High-Cost, High-Quality
AUC Mechnical Turk Low-Cost, High-Quality vs Mechanical Turk High-Cost, High-Quality