Table 3.
Comparison of Deep Learning vs. Ophthalmologist Performance in Pixel-Level Grading of RPD on Near-Infrared Reflectance Images with RPD∗
| Grader | Pixel-Level Comparison |
RPD Lesion Count (Mean ± SD) |
RPD Pixel Area mm2 (Mean ± SD) |
RPD Contour Area mm2 (Mean ± SD) |
|---|---|---|---|---|
| DSC (Mean ± SD) [P Value: Ophthalmologist vs. AI†] | ICC [P Value: Method vs. Ground Truth‡] | |||
| ReticularNet | 0.36 ± 0.17 | 427.5 ± 214 | 6.7 ± 5.32 | 22.7 ± 14.3 |
| 0.44 [0.80] | 0.56 [0.52] | 0.61 [0.24] | ||
| Ophthalmologist 1 | 0.19 ± 0.16 [P = 0.005] | 123.4 ± 122.1 | 2.2 ± 2.9 | 123.4 ± 122.1 |
| 0.23 [P = 0.001] | 0.40 [P = 0.001] | 0.58 [P = 0.001] | ||
| Ophthalmologist 2 | 0.23 ± 0.15 [P = 0.02] | 144.0 ± 115.1 | 3.0 ± 2.44 | 9.2 ± 7.1 |
| 0.12 [P = 0.003] | 0.05 [P = 0.07] | 0.21 [P = 0.03] | ||
| Ophthalmologist 3 | 0.03 ± 0.04 [P = 0.005] | 8.5 ± 7.8 | 0.3 ± 0.3 | 0.7 ± 0.7 |
| 0.00 [P = 0.001] | 0.02 [P = 0.001] | 0.01 [P = 0.001] | ||
| Ophthalmologist 4 | 0.13 ± 0.14 [P = 0.002] | 71.8 ± 70.6 | 0.6 ± 0.7 | 2.7 ± 2.9 |
| –0.08 [P = 0.001] | –0.05 [P = 0.001] | –0.09 [P = 0.001] | ||
| Ground Truth | - | 399.4 ± 191 | 5.4 ± 2.79 | 17.9 ± 11.17 |
| - | - | - | ||
AI = artificial intelligence; DSC = Dice similarity coefficient; ICC = intraclass correlation coefficient; RPD = reticular pseudodrusen; SD = standard deviation.
For each performance metric, the highest performance is shown in bold type.
Dice similarity coefficient (mean ± standard deviation) is shown with P values comparing each ophthalmologist versus the deep learning model (second column). Mean intraclass correlation coefficient (2,1) for lesion count, pixel-based area, and contour-based area are reported as raw mean ± standard deviation, with P values comparing each method versus ground truth. The ground truth row lists only raw mean ± standard deviation values for each metric.
Paired Wilcoxon test comparing each ophthalmologist’s DSC against the deep learning model’s DSC, including only RPD-positive cases.
Paired Wilcoxon test comparing each ophthalmologist’s (or deep learning model’s) metric to the ground truth metric, including only RPD-positive cases.