Table 2.
Per-Reader diagnostic performance without vs With AI assistance.
| Reader (year of experience) | Metric | Without AI (95% CI) | With AI (95% CI) | Δ (95% CI) | p-value |
|---|---|---|---|---|---|
| Pooled | AUROC | 0.921 (0.889–0.954) | 0.953 (0.930–0.975) | +0.032 (0.013–0.050) | 0.002 |
| AUPRC | 0.932 (0.921- 0.943) | 0.933 (0.923- 0.942) | +0.001 (−0.013- 0.015) | 0.944 | |
| Sensitivity | 94.2% (90.6–96.5) | 96.3% (93.5–97.9) | +2.1% (–1.6–5.6) | 0.243 | |
| Specificity | 64.0% (57.9–69.6) | 71.6% (65.5–76.9) | +7.6% (–0.5–15.8) | 0.069 | |
| PPV | 58.4% (51.8–64.7) | 63.9% (57.1–70.2) | +5.5% (–3.7–14.8) | 0.243 | |
| NPV | 86.3% (80.1–90.9) | 90.8% (85.1–94.4) | +4.5% (–2.6–11.5) | 0.219 | |
| Accuracy | 79.1% (75.1–82.6) | 83.9% (80.2–87.0) | +4.8% (–2.4–11.9) | 0.061 | |
| Reader 1 (17) | AUROC | 0.944 (0.918–0.970) | 0.952 (0.926–0.978) | +0.008 (–0.044–0.060) | 0.339 |
| AUPRC | 0.922 (0.895-0.945) | 0.947 (0.927-0.965) | +0.025 (−0.003-0.057) | 0.495 | |
| Sensitivity | 90.7% (84.3–95.1) | 96.9% (92.3–99.1) | +6.2% (–2.8–14.8) | 0.013 | |
| Specificity | 75.2% (66.8–82.4) | 63.6% (54.6–71.9) | –11.6% (–27.8–5.1) | 0.005 | |
| PPV | 78.5% (71.1–84.8) | 72.7% (65.4–79.2) | –5.8% (–19.4–8.1) | 0.015 | |
| NPV | 89.0% (81.6–94.2) | 95.3% (88.5–98.7) | +6.3% (–5.7–17.1) | 0.019 | |
| Accuracy | 82.9% (77.8–87.3) | 80.2% (74.8–84.9) | –2.7% (–12.5–7.1) | 0.296 | |
| Reader 2 (1) | AUROC | 0.913 (0.878–0.948) | 0.937 (0.908–0.966) | +0.024 (–0.040–0.088) | 0.078 |
| AUPRC | 0.917 (0.890-0.942) | 0.917 (0.888-0.942) | +0.001 (−0.039-0.038) | 0.965 | |
| Sensitivity | 96.9% (92.3–99.1) | 94.6% (89.1–97.8) | –2.3% (–10.0–5.5) | 0.45 | |
| Specificity | 51.9% (43.0–60.8) | 72.9% (64.3–80.3) | +21.0% (3.5–37.3) | <.001 | |
| PPV | 66.8% (59.6–73.5) | 77.7% (70.4–84.0) | +10.9% (–3.1–24.4) | <.001 | |
| NPV | 94.4% (86.2–98.4) | 93.1% (86.2–97.2) | –1.3% (–12.2–11.0) | 0.66 | |
| Accuracy | 74.4% (68.6–79.6) | 83.7% (78.6–88.0) | +9.3% (–1.0–19.4) | <.001 | |
| Reader 3 (14) | AUC | 0.931 (0.902–0.960) | 0.974 (0.960–0.989) | +0.043 (0.000–0.087) | <.001 |
| AUPRC | 0.931 (0.903-0.953) | 0.942 (0.920-0.960) | +0.011 (−0.018-0.043) | 0.546 | |
| Sensitivity | 91.5% (85.3–95.7) | 96.1% (91.2–98.7) | +4.6% (–4.5–13.4) | 0.114 | |
| Specificity | 73.6% (65.2–81.0) | 81.4% (73.6–87.7) | +7.8% (–7.4–22.5) | 0.034 | |
| PPV | 77.6% (70.2–84.0) | 83.8% (76.8–89.3) | +6.2% (–7.2–19.1) | 0.008 | |
| NPV | 85.4% (80.3–89.5) | 86.6% (81.8–90.5) | +1.2% (–7.7–10.2) | 0.604 | |
| Accuracy | 82.6% (77.4–87.0) | 88.8% (84.3–92.3) | +6.2% (–2.7–14.9) | 0.005 | |
| Reader 4 (38) | AUC | 0.942 (0.916–0.969) | 0.970 (0.953–0.987) | +0.028 (–0.016–0.071) | 0.008 |
| AUPRC | 0.965 (0.949-0.977) | 0.938 (0.915-0.959) | -0.027 (−0.052-0.002) | 0.478 | |
| Sensitivity | 95.3% (90.2–98.3) | 97.7% (93.4–99.5) | +2.4% (–4.9–9.3) | 0.450 | |
| Specificity | 76.0% (67.7–83.1) | 72.9% (64.3–80.3) | –3.1% (–18.8–12.6) | 0.556 | |
| PPV | 79.9% (72.7–85.9) | 78.3% (71.1–84.4) | –1.6% (–14.8–11.7) | 0.533 | |
| NPV | 88.2% (83.5–92.0) | 86.6% (81.8–90.5) | –1.6% (–10.2–7.0) | 0.462 | |
| Accuracy | 85.7% (80.8–89.7) | 85.3% (80.3–89.4) | –0.4% (–9.4–8.6) | 1 | |
| Reader 5 (1) | AUROC | 0.890 (0.851–0.930) | 0.941 (0.912–0.970) | +0.051 (–0.018–0.119) | 0.003 |
| AUPRC | 0.948 (0.913-0.974) | 0.908 (0.871-0.9380 | -0.04 (−0.086-0.004) | 0.531 | |
| Sensitivity | 96.9% (92.3–99.1) | 97.7% (93.4–99.5) | +0.8% (–5.7–7.2) | 1.000 | |
| Specificity | 41.1% (32.5–50.1) | 61.2% (52.3–69.7) | +20.1% (2.2–37.2) | <.001 | |
| PPV | 62.2% (55.1–68.9) | 71.6% (64.3–78.1) | +9.4% (–4.6–23.0) | <.001 | |
| NPV | 88.6% (82.7–93.0) | 86.6% (81.8–90.5) | –2.0% (–11.2–7.8) | 0.436 | |
| Accuracy | 69.0% (63.0–74.6) | 79.5% (74.0–84.2) | +10.5% (–0.6–21.2) | <.001 | |
| Reader 6 (1) | AUROC | 0.903 (0.865–0.941) | 0.937 (0.904–0.970) | +0.034 (–0.037–0.105) | 0.081 |
| AUPRC | 0.911 (0.880-0.937) | 0.945 (0.922-0.965) | +0.035 (−0.003-0.072) | 0.493 | |
| Sensitivity | 93.8% (88.1–97.3) | 94.6% (89.1–97.8) | +0.8% (–8.2–9.7) | 1.000 | |
| Specificity | 65.9% (57.0–74.0) | 77.5% (69.3–84.4) | +11.6% (–4.7–27.4) | 0.012 | |
| PPV | 73.3% (65.9–79.9) | 80.8% (73.6–86.7) | +7.5% (–6.3–20.8) | 0.006 | |
| NPV | 86.5% (81.5–90.6) | 86.6% (81.8–90.5) | +0.1% (–8.8–9.0) | 0.967 | |
| Accuracy | 79.8% (74.4–84.6) | 86.0% (81.2–90.0) | +6.2% (–3.4–15.6) | 0.015 |
AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision–recall curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; Δ = absolute change (With AI minus Without AI).