Abstract
This study uses fundus images from a national data set to assess 2 deep learning methods for referability classification of age-related macular degeneration.
In an extension of previous work,1,2 we assessed 2 deep learning (DL) methods addressing a 2-class age-related macular degeneration (AMD) referability classification: referable for the intermediate or advanced stage of AMD or not referable.
Methods
We used 67 401 color fundus images (keeping only 1 image for each original stereo pair) from the National Eye Institute Age-Related Eye Disease Study (AREDS) data set3 that were taken from 4613 individuals (who provided written consent) over a 12-year study, including baseline and follow-up visits, from November 13, 1992, to November 30, 2005.1,2 The original AREDS image grading from certified graders at a fundus photograph reading center3 were used as the gold standard. The present analysis was performed from January 22, 2018, through April 19, 2018. Use of the AREDS data set was performed following Johns Hopkins University School of Medicine Institutional Review Board approval.
2- and 4-Step Scales
This study addressed a standard 2-class classification problem—referable or nonreferable AMD—that was based on the original AREDS 4-step scale; details and criteria for the scale are described in the Box. Grades 3 and 4 in the 4-step scale correspond to higher risk for progression to advanced AMD.
Box. Description of the Classification Scales Used in This Study, Including the Eye-Based 4-Step Severity Scale Originally Used in AREDS to Provide Baseline Severity Levels of Enrolling Participants.
4-Step AREDS AMD Classificationa
AMD 1: Eye has no or only small drusen (DS, <63 μm) and no pigmentation abnormalities.
AMD 2: Eye has multiple small drusen or medium-sized drusen (DS, 63-125 μm) and/or pigmentation abnormalities related to AMD.
AMD 3: Eye has large drusen (DS, ≥125 μm) or numerous medium-sized drusen and pigmentation abnormalities.
AMD 4: Eye has lesions associated with CNV or GA (eg, retinal pigment epithelial detachment, subretinal pigment epithelial hemorrhage) if the fellow eye does not have central geographic atrophy or choroidal neovascular AMD.
2-Step AMD Classification
AMD 1 and 2: Nonreferable AMD class.
AMD 3 and 4: Referable AMD class.
Automated Classification Challenge
Recent studies have demonstrated the benefits of DL for addressing automated screening of both referable diabetic retinopathy and AMD2,4 over traditional machine learning approaches.1 Compared with the aforementioned previous study,2 this study used 2 deep learning methods: DL-D, which leveraged the ResNet5 deep convolutional neural network (DCNN) architecture, and DL4→2, which first performed a 4-step classification using ResNet, then fused 4 classes into 2 (referable and not referable). For both methods, we used transfer learning and fine-tuning of the original DCNN weights for the 2-class referable AMD classification. We used stochastic gradient descent, with Nesterov momentum of 0.9, a dynamic learning rate schedule, base learning rate of 0.001, patience of 20 epochs for stopping training, and batch size of 32. Data augmentation included horizontal flipping, blurring, and sharpening and adjustments to saturation, brightness, contrast, and color balance. The specific machine learning details have been described previously.2
Data Partition
Images were partitioned into 3 separate disjoint data sets for DCNN training (88% of the data); model validation, that is, selecting model hyperparameters and deciding when to stop training (2% of the data); and testing (10% of the data) for computing the resulting model performance. Care was taken that all images of a given study participant were comprised wholly within a single partition.2
Human-Machine Comparisons
We also compared the DCNN algorithm performance with performance of a highly trained retinal specialist (ophthalmologist) (K.D.P.), who independently graded a subset of 5000 AREDS images from 2016 to 2017 using the criteria defined in the Box. Machine- and human (K.D.P.)–generated grades were compared with the gold standard AREDS AMD grades.2,3
Metrics
We used standard metrics including area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and κ.2
Results
The Table summarizes the results for methods DL4→2 and DL-D and compares these methods with a previous best-performing DL approach2 and a retinal specialist, demonstrating comparable performance between the deep learning algorithms and the retinal specialist (κ = 0.829 for DL4→2, κ = 0.803 for DL-D, and κ = 0.800 for the retinal specialist). Other machine metrics included sensitivity and specificity (for DL4→2, sensitivity was 89.00% and specificity was 93.62%; for DL-D, sensitivity was 83.05% and specificity was 96.60%). In general, DL4→2 outperformed DL-D. By comparison, the retinal specialist’s annotation had a sensitivity of 86.45% and a specificity of 93.16%.
Table. Performance Results for 2-Class AMD Severity Problem.
Method | Samples per Class | AUC ROC (Error Margin for 95% CI) | Accuracy, % (Error Margin for 95% CI) | Sensitivity, % (Error Margin for 95% CI) | Specificity, % (Error Margin for 95% CI) | PPV, % (Error Margin for 95% CI) | NPV, % (Error Margin for 95% CI) | κ |
---|---|---|---|---|---|---|---|---|
Ophthalmologist | 1 and 2: 2779; 3 and 4: 2221 | NA | 90.20 (0.82) | 86.45 (1.42) | 93.16 (0.94) | 91.00 (1.22) | 89.58 (1.11) | 0.800 |
DL-4→2a | 1 and 2: 3746; 3 and 4: 2908 | 0.972 (0.004) | 91.60 (0.67) | 89.00 (1.14) | 93.62 (0.78) | 91.55 (1.03) | 91.64 (1.03) | 0.829 |
DL-Db | 1 and 2: 3617; 3 and 4: 3198 | 0.970 (0.004) | 90.20 (0.7) | 83.05 (1.3) | 96.60 (0.59) | 95.57 (0.76) | 86.57 (1.05) | 0.803 |
DL from referencec | 1 and 2: 37 418; 3and 4: 29 983 | 0.940 | 88.4 (0.5) | 84.5 (0.9) | 91.5 (0.7) | 88.9 (1.0) | 88.0 (0.5) | 0.764 |
Abbreviations: AMD, age-related macular degeneration; AUC ROC, area under the receiver operating characteristic curve; DL, deep learning; NPV, negative predictive value; PPV, positive predictive value.
First used a 4-step classification, then fused 4 classes into 2 (referable and not referable).
Used a deep convolutional neural network for direct 2-class classification.
Data are from a prior study2 in which testing was done using 5-fold cross-validation.
Discussion
A potential limitation of the study includes the recognition that AREDS predominantly included white participants, although other studies suggest that this should not have a substantial influence on the analyses.6 Results demonstrated machine performance exceeding that of recently published DCNN referable AMD classification2 and comparable to that of a highly trained retinal specialist: κ for the machine showed substantial agreement with the gold standard and was on par with the expert retinal specialist. These results show promise for public screening applications, an important endeavor considering that the number of individuals in the population at risk of intermediate or advanced AMD is expected to exceed 2.4 billion worldwide by 2050.7 Monitoring these individuals for advanced AMD and consideration of AREDS-like supplements to reduce risk of progression to advanced AMD is warranted.
Footnotes
Abbreviations: AMD, age-related macular degeneration; AREDS, Age-Related Eye Disease Study; DS, drusen size; CNV, choroidal neovascularization; GA, geographic atrophy.
The 4-step scale3 has been proposed for use to identify individuals with intermediate- or advanced-stage AMD who should be seen by an ophthalmologist for monitoring for the development or treatment of advanced AMD and for consideration of dietary supplement treatment to reduce the risk of progression to vision loss from advanced AMD.
References
- 1.Burlina P, Freund DE, Dupas B, Bressler N. Automatic screening of age-related macular degeneration and retinal abnormalities. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:3962-3966. [DOI] [PubMed] [Google Scholar]
- 2.Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135(11):1170-1176. doi: 10.1001/jamaophthalmol.2017.3782 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Age-Related Eye Disease Study Research Group The Age-Related Eye Disease Study system for classifying age-related macular degeneration from stereoscopic color fundus photographs: the Age-Related Eye Disease Study report number 6. Am J Ophthalmol. 2001;132(5):668-681. doi: 10.1016/S0002-9394(01)01218-1 [DOI] [PubMed] [Google Scholar]
- 4.Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M. Deep image mining for diabetic retinopathy screening. Med Image Anal. 2017;39(7):178-193. doi: 10.1016/j.media.2017.04.012 [DOI] [PubMed] [Google Scholar]
- 5.He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; June 27-30, 2016; Las Vegas, NV. 771-778. [Google Scholar]
- 6.Ting DSW, Cheung CY, Lim G, et al. . Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223. doi: 10.1001/jama.2017.18152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Velez-Montoya R, Oliver SCN, Olson JL, Fine SL, Quiroz-Mercado H, Mandava N. Current knowledge and trends in age-related macular degeneration: genetics, epidemiology, and prevention. Retina. 2014;34(3):423-441. doi: 10.1097/IAE.0000000000000036 [DOI] [PubMed] [Google Scholar]