. 2025 Dec 17;6(2):101038. doi: 10.1016/j.xops.2025.101038

Table 3.

Comparison of Deep Learning vs. Ophthalmologist Performance in Pixel-Level Grading of RPD on Near-Infrared Reflectance Images with RPD^∗

Grader	Pixel-Level Comparison	RPD Lesion Count (Mean ± SD)	RPD Pixel Area mm² (Mean ± SD)	RPD Contour Area mm² (Mean ± SD)
Grader	DSC (Mean ± SD) [P Value: Ophthalmologist vs. AI^†]	ICC [P Value: Method vs. Ground Truth^‡]
ReticularNet	0.36 ± 0.17	427.5 ± 214	6.7 ± 5.32	22.7 ± 14.3
ReticularNet	0.36 ± 0.17	0.44 [0.80]	0.56 [0.52]	0.61 [0.24]
Ophthalmologist 1	0.19 ± 0.16 [P = 0.005]	123.4 ± 122.1	2.2 ± 2.9	123.4 ± 122.1
Ophthalmologist 1	0.19 ± 0.16 [P = 0.005]	0.23 [P = 0.001]	0.40 [P = 0.001]	0.58 [P = 0.001]
Ophthalmologist 2	0.23 ± 0.15 [P = 0.02]	144.0 ± 115.1	3.0 ± 2.44	9.2 ± 7.1
Ophthalmologist 2	0.23 ± 0.15 [P = 0.02]	0.12 [P = 0.003]	0.05 [P = 0.07]	0.21 [P = 0.03]
Ophthalmologist 3	0.03 ± 0.04 [P = 0.005]	8.5 ± 7.8	0.3 ± 0.3	0.7 ± 0.7
Ophthalmologist 3	0.03 ± 0.04 [P = 0.005]	0.00 [P = 0.001]	0.02 [P = 0.001]	0.01 [P = 0.001]
Ophthalmologist 4	0.13 ± 0.14 [P = 0.002]	71.8 ± 70.6	0.6 ± 0.7	2.7 ± 2.9
Ophthalmologist 4	0.13 ± 0.14 [P = 0.002]	–0.08 [P = 0.001]	–0.05 [P = 0.001]	–0.09 [P = 0.001]
Ground Truth	-	399.4 ± 191	5.4 ± 2.79	17.9 ± 11.17
Ground Truth	-	-	-	-

AI = artificial intelligence; DSC = Dice similarity coefficient; ICC = intraclass correlation coefficient; RPD = reticular pseudodrusen; SD = standard deviation.

For each performance metric, the highest performance is shown in bold type.

^∗

Dice similarity coefficient (mean ± standard deviation) is shown with P values comparing each ophthalmologist versus the deep learning model (second column). Mean intraclass correlation coefficient (2,1) for lesion count, pixel-based area, and contour-based area are reported as raw mean ± standard deviation, with P values comparing each method versus ground truth. The ground truth row lists only raw mean ± standard deviation values for each metric.

^†

Paired Wilcoxon test comparing each ophthalmologist’s DSC against the deep learning model’s DSC, including only RPD-positive cases.

^‡

Paired Wilcoxon test comparing each ophthalmologist’s (or deep learning model’s) metric to the ground truth metric, including only RPD-positive cases.