Table 2.
Sensitivity (95% CI); p value* |
Specificity (95% CI); p value* |
PPV (95% CI); p value† |
NPV (95% CI); p value† |
Recall rate (95% CI); p value† |
Arbitration rate (95% CI); p value* |
|
---|---|---|---|---|---|---|
Standalone AI scenario | ||||||
First reader | 63.7 (61.6–65.8); ref. | 97.8 (97.7–97.8); ref. | 18.7 (17.8–19.6); ref. | 99.7 (99.7–99.7); ref. | 2.7 (2.6–2.8); ref. | NA |
Standalone AIsens | 63.7 (61.6–65.8); >0.99 | 96.5 (96.4–96.5); <0.0001 | 12.6 (11.9–13.2); <0.0001 | 99.7 (99.7–99.7); 0.71 | 4.0 (3.9–4.1); <0.0001 | NA |
Standalone AIspec | 58.6 (56.5–60.8); <0.0001 | 97.8 (97.7–97.8); 0.95 | 17.4 (16.5–18.3); 0.01 | 99.7 (99.6–99.7); 0.0002 | 2.7 (2.6–2.7); 0.24 | NA |
AI-integrated screening scenario | ||||||
Combined reading | 73.9 (72.0-75.8); ref. | 97.9 (97.9–98.0); ref. | 22.0 (21.0–23.0); ref. | 99.8 (99.8–99.8); ref. | 2.7 (2.6–2.7); ref. | 2.9 (2.8-3.0); ref. |
Integrated AIsens | 76.2 (74.3–78.0); 0.0004 | 97.3 (97.2–97.3); <0.0001 | 18.1 (17.3–19.0); <0.0001 | 99.8 (99.8–99.8); 0.07 | 3.3 (3.3–3.4); <0.0001 | 5.1 (5.1–5.2); <0.0001 |
Integrated AIspec | 74.6 (72.6–76.4); 0.32 | 97.9 (97.8–97.9); 0.54 | 22.0 (21.0–23.0); 0.99 | 99.8 (99.8–99.8); 0.60 | 2.7 (2.6–2.7); 0.49 | 4.0 (3.9–4.1); <0.0001 |
Data are % (95% CI); p value. PPV = positive predictive value. NPV = negative predictive value. AIsens=artificial intelligence score cut-off point matched at mean first reader sensitivity. AIspec=artificial intelligence score cut-off point matched at mean first reader specificity. *p values were calculated using McNemar’s test. †p values were calculated using exact binomial test