Table 3.
COVID-19 discriminability of the machine learning model and comparison to clinical, radiologist consensus and combined model.
Positive/total | AUCa | Accuracy | Sensitivity | Specificity | PPV | NPV | |
---|---|---|---|---|---|---|---|
n | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | % (95%-CI) | |
Validation set 1 | |||||||
ML model | 40/605 | 89.9 (85.9–93.9) | 89.3 (86.5–91.6) | 57.5 (40.9–73.0) | 91.5 (88.9–93.7) | 32.6 (22.8–42.3) | 97.9 (96.6–99.1) |
Clinical model | 40/605 | N/A | 70.4 (66.6–74.0) | 30.0 (16.6–46.5) | 73.3 (69.4–76.9) | 7.4 (3.4–11.4) | 93.7 (91.4–95.9) |
Radiologist consensus | 40/605 | N/A | 73.2 (69.5–76.7) | 55.0 (38.5–70.7) | 74.5 (70.7–78.1) | 13.3 (8.1–18.4) | 95.9 (94.0–97.8) |
Radiologist + ML model | 40/605 | N/A | 68.4 (64.6–72.1) | 92.5 (79.6–98.4) | 66.7 (62.7–70.6) | 16.4 (11.6–21.3) | 99.2 (98.3–100.1) |
Validation set 2 | |||||||
ML model | 155/3121 | 91.3 (89.2–93.3) | 93.0 (92.0–93.9) | 57.4 (49.2–65.3) | 94.8 (94.0–95.6) | 36.8 (30.7–42.9) | 97.7 (97.2–98.3) |
Validation set 3 | |||||||
ML model | 27/382 | 95.8 (91.6–99.9) | 96.9 (94.6–98.4) | 77.8 (57.7–91.4) | 98.3 (96.4–99.4) | 77.8 (62.1–93.5) | 98.3 (97.0–99.7) |
Clinical model | 27/382 | N/A | 67.2 (62.2–71.9) | 57.7 (36.9–76.6) | 67.9 (62.7–72.8) | 11.8 (6.2–17.4) | 95.6 (93.0–98.1) |
Radiologist readb | 27/382 | N/A | 92.3 (89.1–94.8) | 53.8 (33.4–73.4) | 95.1 (92.3–97.1) | 45.2 (27.6–62.7) | 96.5 (94.6–98.5) |
Radiologist + ML model | 27/382 | N/A | 55.5 (50.3–60.6) | 92.3 (74.9–99.1) | 52.7 (47.3–58.1) | 12.7 (8.0–17.4) | 98.9 (97.4–100.4) |
AUC area under the curve, PPV positive predictive value, NPV negative predictive value, CI confidence intervals, ML machine learning model.
aAUC for Clinical, Radiologist and combined Radiologist and ML model are not applicable.
bFor validation set 2, only one radiologist interpreted the chest radiograph for validation set 3.