Figure 4:
Average difference in expected performance (i.e., Δ performance) for each measure and case set between the average pair in the pairing scheme and the average single reader in the cohort across all 10,395 pairing schemes from each cohort. Statistically significant p values (p < .05) using bootstrap resampling methods were obtained for Δ sensitivity, Δ specificity, and Δ proportion correct in all three case sets.