TABLE 2.
AUROC (95% CI) | p-value | Sensitivity, % (95% CI) | Specificity, % (95% CI) | Accuracy, % (95% CI) | PPV, % (95% CI) | NPV, % (95% CI) | |
Internal validation | |||||||
3D multi-task DL | 0.949 (0.930–0.969) | \ | 88.0 (80.9–95.9) | 91.6 (81.7–97.2) | 89.4 (86.6–92.1) | 92.2 (85.5–97.0) | 87.1 (81.3–94.7) |
Average RNFL thickness | 0.913 (0.888–0.939) | < 0.001 | 80.1 (72.2–88.4) | 92.5 (84.0–97.2) | 85.5 (82.2–88.6) | 92.0 (85.7–96.9) | 80.5 (75.0–86.7) |
3D single-task DL | 0.941 (0.920–0.961) | 0.53 | 86.3 (73.4–95.0) | 88.3 (78.4–98.6) | 87.0 (84.1–90.1) | 89.4 (82.8–98.4) | 85.2 (76.4–93.5) |
2D multi-task DL | 0.940 (0.919–0.961) | 0.53 | 84.7 (78.4–92.1) | 92.5 (84.0–97.2) | 88.3 (85.5–91.0) | 92.6 (86.2–97.0) | 84.4 (79.4–90.7) |
External testing 1 | |||||||
3D multi-task DL | 0.890 (0.864–0.917) | \ | 78.9 (70.4–86.4) | 86.1 (77.3–92.8) | 82.0 (78.7–85.1) | 86.9 (81.4–92.4) | 77.7 (72.1–83.3) |
Average RNFL thickness | 0.890 (0.864–0.916) | 0.96 | 69.9 (63.7–76.7) | 94.8 (89.2–98.0) | 81.2 (78.3–84.2) | 94.0 (88.7–97.5) | 73.0 (69.5–77.3) |
3D single-task DL | 0.893 (0.867–0.919) | 0.88 | 82.3 (70.1–89.8) | 81.7 (72.9–92.0) | 81.8 (78.7–85.0) | 84.2 (79.2–91.6) | 79.7. (72.0–86.2) |
2D multi-task DL | 0.900 (0.876–0.925) | 0.58 | 82.7 (75.5–91.8) | 82.1 (70.5–88.8) | 82.3 (78.9–85.3) | 84.3 (78.4–89.2) | 80.2 (74.7–88.2) |
External testing 2 | |||||||
3D multi-task DL | 0.903 (0.867–0.939) | \ | 77.6 (67.1–86.7) | 91.9 (83.1–98.4) | 84.3 (80.2–88.4) | 92.1 (85.0–98.2) | 78.4 (72.0–85.4) |
Average RNFL thickness | 0.915 (0.881–0.949) | 0.38 | 85.3 (78.3–93.7) | 88.7 (77.4–93.6) | 86.5 (82.4–90.3) | 89.4 (82.5–94.1) | 83.7 (77.9–91.6) |
3D single-task DL | 0.883 (0.841–0.925) | 0.48 | 83.9 (69.2–93.7) | 83.1 (70.2–94.4) | 83.2 (79.0–87.3) | 85.2 (78.1–94.1) | 81.8. (72.1–90.8) |
2D multi-task DL | 0.882 (0.843–0.922) | 0.45 | 81.1 (67.1–89.5) | 83.1 (73.4–93.6) | 82.0 (77.5–86.2) | 84.9 (78.5–92.8) | 79.3 (70.8–86.8) |
External testing 3 | |||||||
3D multi-task DL | 0.906 (0.880–0.933) | \ | 79.7 (68.5–88.1) | 88.9 (79.1–96.7) | 82.1 (76.5–86.6) | 94.4 (90.5–98.2) | 64.9 (56.2–74.7) |
Average RNFL thickness | 0.913 (0.885–0.941) | 0.53 | 84.8 (80.4–90.9) | 88.9 (81.1–94.1) | 86.2 (82.9–89.3) | 94.8 (91.7–97.1) | 71.4 (65.6–79.3) |
3D single-task DL | 0.898 (0.868–0.928) | 0.70 | 87.1 (78.0–92.5) | 79.9 (70.5–88.5) | 84.5 (79.4–88.2) | 90.6 (87.4–94.3) | 72.8 (62.2–81.4) |
2D multi-task DL | 0.903 (0.876–0.931) | 0.89 | 82.4 (68.2–89.0) | 84.9 (76.3–95.7) | 82.9 (76.4–86.9) | 92.7 (89.2–97.4) | 67.6 (56.6–76.4) |
External testing 4 | |||||||
3D multi-task DL | 0.950 (0.936–0.963) | \ | 85.2 (79.0–92.5) | 94.0 (86.5–98.1) | 87.3 (83.2–91.1) | 97.9 (95.8–99.3) | 65.6 (58.1–77.4) |
Average RNFL thickness | 0.950 (0.937–0.963) | 0.85 | 87.0 (79.5–90.5) | 90.6 (85.9–96.2) | 87.9 (83.5–90.3) | 96.9 (95.5–98.7) | 67.5 (58.4–73.5) |
3D single-task DL | 0.929 (0.911–0.947) | 0.08 | 83.3 (74.1–92.9) | 87.4 (77.2–95.4) | 84.5 (78.8–89.6) | 95.7 (93.0–98.1) | 61.2 (52.3–76.8) |
2D multi-task DL | 0.939 (0.923–0.955) | 0.31 | 88.0 (76.3–93.2) | 84.7 (77.7–94.9) | 87.1 (80.1–90.4) | 95.1 (93.2–98.0) | 67.8 (54.0–77.8) |
External testing 5 | |||||||
3D multi-task DL | 0.930 (0.915–0.946) | \ | 83.9 (80.4–87.2) | 92.2 (89.6–94.7) | 88.2 (86.3–90.0) | 90.9 (88.2–93.6) | 86.1 (83.6–88.6) |
Average RNFL thickness | 0.921 (0.905–0.937) | 0.15 | 80.2 (73.1–90.2) | 89.1 (78.1–94.5) | 84.5 (82.5–86.6) | 87.2 (79.1–92.9) | 83.1 (78.9–89.7) |
3D single-task DL | 0.936 (0.922–0.951) | 0.31 | 84.1 (80.8–88.8) | 92.4 (87.1–94.9) | 88.3 (86.3–90.2) | 91.1 (86.2–93.8) | 86.2 (83.8–89.5) |
2D multi-task DL | 0.938 (0.924–0.953) | 0.45 | 84.1 (80.2–88.2) | 93.8 (89.6–96.4) | 89.0 (87.2–90.8) | 92.5 (88.6–95.4) | 86.4 (83.7–89.4) |
AUROC, area under the receiver operator characteristic curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; DL, deep learning; RNFL, retinal nerve fibre layer. p-values in bold were AUROC values with significant difference.