Table 3.
Comparison of G-RISK performance versus image-measured VCDR as glaucoma detection proxy.
| Data | #images | AUC G-RISK [95% CI] |
AUC VCDR 1 [95% CI] |
AUC VCDR 2 [95% CI] |
|---|---|---|---|---|
| BMES | 6927 | 0.967 [0.956–0.979] | 0.958 [0.940–0.976] | NA |
| PAPILA (suspect referable) | 488 | 0.769 [0.722–0.815] | 0.748 [0.699–0.798] | 0.743 [0.691–0.795] |
| PAPILA (suspect non-referable) | 488 | 0.882 [0.840–0.923] | 0.789 [0.728–0.851] | 0.782 [0.716–0.847] |
| REFUGE1 test | 400 | 0.986 [0.974–0.999] | 0.946 [0.907–0.984] | NA |
| REFUGE1 all | 1200 | 0.952 [0.925–0.979] | 0.929 [0.902–0.956] | NA |
| REFUGE2 test | 400 | 0.867 [NA] | 0.757 [0.693–0.815] | NA |
| RIM-ONE r3 | 159 | 0.934 [0.889–0.978] | 0.810 [0.723–0.897] | NA |
VCDR was either retrieved from the cup and disc segmentation ground truth available (PAPILA, REFUGE data, RIM-ONE r3), or directly provided by the data set owners (BMES). For PAPILA, G-RISK results are compared against two independent human experts who segmented disc and cup. Best performance (AUC) per row is highlighted in bold text.