Table 2.
SH test set (n = 200) | SMC set (n = 200) | CBMC set (n = 200) | KUH set (n = 200) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Reader 2 | Reader 3 | Averageb | CNNE2 | Reader 3 | Reader 4 | Average | CNNE2 | Reader 1 | Reader 4 | Average | CNNE2 | Reader 1 | Reader 2 | Average | CNNE2 | |
Sensitivity (%)a | 89.2 (83.5–93.1) | 94.0 (89.2–96.7) | 91.6 (87.3–94.5) | 83.7 (77.3–88.6) | 93.7 (88.3–96.7) | 88.7 (82.4–93.0) | 91.2 (86.4–94.4) | 78.2 (70.6–84.2) | 89.0 (82.0–93.5) | 90.7 (84.0–94.8) | 89.8 (84.4–93.5) | 94.1 (88.1–97.2) | 91.8 (84.5–95.9) | 91.8 (84.5–95.9) | 91.8 (85.5–95.5) | 91.8 (84.5–95.9) |
Specificity (%)a | 67.7 (50.5–81.1) | 50 (33.8–66.2) | 58.8 (44.1–72.1) | 91.2 (76.0–97.1) | 39.7 (28.0–52.7) | 56.9 (44.0–68.9) | 48.3 (37.1–59.6) | 93.1 (83.0–97.4) | 67.1 (56.2–76.4) | 45.1 (34.7–56.0) | 56.1 (46.5–65.2) | 62.2 (51.3–72.0) | 60.8 (51.0–69.8) | 71.6 (62.1–79.5) | 66.2 (57.7–73.7) | 59.8 (50.0–68.9) |
Accuracy (%)a | 85.5 (79.9–89.7 ) | 86.5 (81.0–90.6- ) | 86.0 (81.3–89.7) | 85.0 (79.4–89.3) | 78.0 (71.7–83.2) | 79.5 (73.3–84.5) | 78.8 (73.2–83.4) | 82.5 (76.6–87.2 ) | 80.0 (73.9–85.0) | 72.0 (65.4–77.8) | 76.0 (70.4–80.8) | 81.0 (75.0–85.9) | 76.0 (69.6–81.4) | 81.5 (75.5–86.3) | 78.8 (73.3–83.4) | 75.5 (69.1–81.0) |
PPV (%)a | 93.1 (87.9–96.1) | 90.2 (84.8–93.8) | 91.6 (86.7–94.8) | 97.9 (93.7–99.3) | 79.2 (72.4–84.7) | 83.4 (76.6–88.6) | 81.2 (74.7–86.3) | 96.5 (91.1–98.7) | 79.6 (71.8–85.6) | 70.4 (62.7–77.1) | 74.7 (67.3–80.8) | 78.2 (70.6–84.2) | 69.2 (60.8–76.6) | 75.6 (67.1–82.5) | 72.3 (64.3–79.1) | 68.7 (60.3–76.1) |
NPV (%)a | 56.1 (40.8–70.3) | 63.0 (43.8–78.8) | 58.8 (43.8–72.4) | 53.5 (40.7–65.8) | 71.9 (54.2–84.7) | 67.4 (53.2–78.9) | 69.1 (55.4–80.2) | 63.5 (52.8–73.0) | 80.9 (69.8–88.6) | 77.1 (63.2–86.8) | 79.3 (68.9–86.9) | 87.9 (76.8–94.1) | 88.6 (78.8–94.2) | 90.1 (81.5–95.0) | 89.4 (81.3–94.3) | 88.4 (78.5–94.1) |
AUC | 0.842 (0.771–0.914) | 0.838 (0.762–0.913) | 0.840 (0.806–0.873) | 0.932 (0.885–0.978) | 0.799 (0.734–0.863) | 0.847 (0.793–0.901) | 0.823 (0.706–0.940) | 0.899 (0.858–0.940) | 0.850 (0.798–0.902) | 0.810 (0.754–0.866) | 0.830 (0.752–0.909) | 0.885 (0.839–0.930) | 0.842 (0.79–0.894) | 0.897 (0.855–0.94) | 0.870 (0.752–0.987) | 0.854 (0.800–0.908) |
F1 | 91.1 | 92.0 | 91.6 | 90.3 | 85.8 | 86.0 | 85.9 | 86.4 | 84.0 | 79.3 | 81.5 | 85.4 | 79.0 | 83.0 | 80.9 | 78.6 |
aTo calculate the diagnostic performances of each cohort, a cut-off value of 0.6 for cancer probability was used for CNNE2 and ACR TI-RADS category 4 was used for readers.
bThe average reader performance was calculated.