Utility of Deep Learning Methods for Referability Classification of Age-Related Macular Degeneration

Phillippe Burlina; Neil Joshi; Katia D Pacheco; David E Freund; Jun Kong; Neil M Bressler

doi:10.1001/jamaophthalmol.2018.3799

. 2018 Sep 6;136(11):1305–1307. doi: 10.1001/jamaophthalmol.2018.3799

Utility of Deep Learning Methods for Referability Classification of Age-Related Macular Degeneration

Phillippe Burlina ^1,^2,³, Neil Joshi ¹, Katia D Pacheco ⁴, David E Freund ¹, Jun Kong ⁵, Neil M Bressler ^2,^6,^✉

¹Applied Physics Laboratory, Johns Hopkins University, Baltimore, Maryland

²Retina Division, Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland

³Department of Computer Science, Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, Maryland

⁴Retina Division, Department of Ophthalmology, Brazilian Center of Vision Eye Hospital, Brasilia, Brazil

⁵Department of Ophthalmology, The Fourth Affiliated Hospital of China Medical University, Eye Hospital of China Medical University, Shenyang, China

⁶Editor, JAMA Ophthalmology

Accepted for Publication: June 27, 2018.

^✉

Corresponding Author: Neil M. Bressler, MD, Retina Division, Wilmer Eye Institute, Johns Hopkins University School of Medicine, 600 N Wolfe St, Baltimore, MD 21287 (nmboffice@jhmi.edu).

Published Online: September 6, 2018. doi:10.1001/jamaophthalmol.2018.3799

Author Contributions: Dr Burlina and Mr Joshi had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Burlina, Joshi, Freund, Bressler.

Acquisition, analysis, or interpretation of data: All authors.

Drafting of the manuscript: Burlina, Joshi, Freund

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Burlina, Joshi, Freund.

Obtained funding: Burlina, Bressler.

Administrative, technical, or material support: Burlina, Joshi, Kong.

Supervision: Burlina, Freund, Bressler.

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. Drs Burlina, Freund, and Bressler report holding a patent on a system and method for the automated detection of age-related macular degeneration and other retinal abnormalities. No other disclosures are reported.

Funding/Support: This work was supported in part by award R21EY024310 from the National Eye Institute (Drs Burlina and Bressler), the Johns Hopkins Applied Physics Laboratory, the James P. Gills Professorship, and unrestricted research funds to the Johns Hopkins University School of Medicine Retina Division for Macular Degeneration and Related Diseases Research.

Role of the Funder/Sponsor: The National Eye Institute and Johns Hopkins University had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: Dr Bressler is Editor of JAMA Ophthalmology, but he was not involved in any of the decisions regarding review of the manuscript or its acceptance.

^✉

Corresponding author.

PMCID: PMC6248178 PMID: 30193354

Abstract

This study uses fundus images from a national data set to assess 2 deep learning methods for referability classification of age-related macular degeneration.

In an extension of previous work,^1,2 we assessed 2 deep learning (DL) methods addressing a 2-class age-related macular degeneration (AMD) referability classification: referable for the intermediate or advanced stage of AMD or not referable.

Methods

We used 67 401 color fundus images (keeping only 1 image for each original stereo pair) from the National Eye Institute Age-Related Eye Disease Study (AREDS) data set³ that were taken from 4613 individuals (who provided written consent) over a 12-year study, including baseline and follow-up visits, from November 13, 1992, to November 30, 2005.^1,2 The original AREDS image grading from certified graders at a fundus photograph reading center³ were used as the gold standard. The present analysis was performed from January 22, 2018, through April 19, 2018. Use of the AREDS data set was performed following Johns Hopkins University School of Medicine Institutional Review Board approval.

2- and 4-Step Scales

This study addressed a standard 2-class classification problem—referable or nonreferable AMD—that was based on the original AREDS 4-step scale; details and criteria for the scale are described in the Box. Grades 3 and 4 in the 4-step scale correspond to higher risk for progression to advanced AMD.

Box. Description of the Classification Scales Used in This Study, Including the Eye-Based 4-Step Severity Scale Originally Used in AREDS to Provide Baseline Severity Levels of Enrolling Participants.

4-Step AREDS AMD Classification^a

AMD 1: Eye has no or only small drusen (DS, <63 μm) and no pigmentation abnormalities.
AMD 2: Eye has multiple small drusen or medium-sized drusen (DS, 63-125 μm) and/or pigmentation abnormalities related to AMD.
AMD 3: Eye has large drusen (DS, ≥125 μm) or numerous medium-sized drusen and pigmentation abnormalities.
AMD 4: Eye has lesions associated with CNV or GA (eg, retinal pigment epithelial detachment, subretinal pigment epithelial hemorrhage) if the fellow eye does not have central geographic atrophy or choroidal neovascular AMD.

2-Step AMD Classification

AMD 1 and 2: Nonreferable AMD class.
AMD 3 and 4: Referable AMD class.

Automated Classification Challenge

Recent studies have demonstrated the benefits of DL for addressing automated screening of both referable diabetic retinopathy and AMD^2,4 over traditional machine learning approaches.¹ Compared with the aforementioned previous study,² this study used 2 deep learning methods: DL-D, which leveraged the ResNet⁵ deep convolutional neural network (DCNN) architecture, and DL4→2, which first performed a 4-step classification using ResNet, then fused 4 classes into 2 (referable and not referable). For both methods, we used transfer learning and fine-tuning of the original DCNN weights for the 2-class referable AMD classification. We used stochastic gradient descent, with Nesterov momentum of 0.9, a dynamic learning rate schedule, base learning rate of 0.001, patience of 20 epochs for stopping training, and batch size of 32. Data augmentation included horizontal flipping, blurring, and sharpening and adjustments to saturation, brightness, contrast, and color balance. The specific machine learning details have been described previously.²

Data Partition

Images were partitioned into 3 separate disjoint data sets for DCNN training (88% of the data); model validation, that is, selecting model hyperparameters and deciding when to stop training (2% of the data); and testing (10% of the data) for computing the resulting model performance. Care was taken that all images of a given study participant were comprised wholly within a single partition.²

Human-Machine Comparisons

We also compared the DCNN algorithm performance with performance of a highly trained retinal specialist (ophthalmologist) (K.D.P.), who independently graded a subset of 5000 AREDS images from 2016 to 2017 using the criteria defined in the Box. Machine- and human (K.D.P.)–generated grades were compared with the gold standard AREDS AMD grades.^2,3

Metrics

We used standard metrics including area under the receiver operating characteristic curve, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and κ.²

Results

The Table summarizes the results for methods DL4→2 and DL-D and compares these methods with a previous best-performing DL approach² and a retinal specialist, demonstrating comparable performance between the deep learning algorithms and the retinal specialist (κ = 0.829 for DL4→2, κ = 0.803 for DL-D, and κ = 0.800 for the retinal specialist). Other machine metrics included sensitivity and specificity (for DL4→2, sensitivity was 89.00% and specificity was 93.62%; for DL-D, sensitivity was 83.05% and specificity was 96.60%). In general, DL4→2 outperformed DL-D. By comparison, the retinal specialist’s annotation had a sensitivity of 86.45% and a specificity of 93.16%.

Table. Performance Results for 2-Class AMD Severity Problem.

Method	Samples per Class	AUC ROC (Error Margin for 95% CI)	Accuracy, % (Error Margin for 95% CI)	Sensitivity, % (Error Margin for 95% CI)	Specificity, % (Error Margin for 95% CI)	PPV, % (Error Margin for 95% CI)	NPV, % (Error Margin for 95% CI)	κ
Ophthalmologist	1 and 2: 2779; 3 and 4: 2221	NA	90.20 (0.82)	86.45 (1.42)	93.16 (0.94)	91.00 (1.22)	89.58 (1.11)	0.800
DL-4→2^a	1 and 2: 3746; 3 and 4: 2908	0.972 (0.004)	91.60 (0.67)	89.00 (1.14)	93.62 (0.78)	91.55 (1.03)	91.64 (1.03)	0.829
DL-D^b	1 and 2: 3617; 3 and 4: 3198	0.970 (0.004)	90.20 (0.7)	83.05 (1.3)	96.60 (0.59)	95.57 (0.76)	86.57 (1.05)	0.803
DL from reference^c	1 and 2: 37 418; 3and 4: 29 983	0.940	88.4 (0.5)	84.5 (0.9)	91.5 (0.7)	88.9 (1.0)	88.0 (0.5)	0.764

Open in a new tab

Abbreviations: AMD, age-related macular degeneration; AUC ROC, area under the receiver operating characteristic curve; DL, deep learning; NPV, negative predictive value; PPV, positive predictive value.

^{^a}

First used a 4-step classification, then fused 4 classes into 2 (referable and not referable).

^{^b}

Used a deep convolutional neural network for direct 2-class classification.

^{^c}

Data are from a prior study² in which testing was done using 5-fold cross-validation.

Discussion

A potential limitation of the study includes the recognition that AREDS predominantly included white participants, although other studies suggest that this should not have a substantial influence on the analyses.⁶ Results demonstrated machine performance exceeding that of recently published DCNN referable AMD classification² and comparable to that of a highly trained retinal specialist: κ for the machine showed substantial agreement with the gold standard and was on par with the expert retinal specialist. These results show promise for public screening applications, an important endeavor considering that the number of individuals in the population at risk of intermediate or advanced AMD is expected to exceed 2.4 billion worldwide by 2050.⁷ Monitoring these individuals for advanced AMD and consideration of AREDS-like supplements to reduce risk of progression to advanced AMD is warranted.

Footnotes

Abbreviations: AMD, age-related macular degeneration; AREDS, Age-Related Eye Disease Study; DS, drusen size; CNV, choroidal neovascularization; GA, geographic atrophy.

^{^a}

The 4-step scale³ has been proposed for use to identify individuals with intermediate- or advanced-stage AMD who should be seen by an ophthalmologist for monitoring for the development or treatment of advanced AMD and for consideration of dietary supplement treatment to reduce the risk of progression to vision loss from advanced AMD.

References

1.Burlina P, Freund DE, Dupas B, Bressler N. Automatic screening of age-related macular degeneration and retinal abnormalities. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:3962-3966. [DOI] [PubMed] [Google Scholar]
2.Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135(11):1170-1176. doi: 10.1001/jamaophthalmol.2017.3782 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Age-Related Eye Disease Study Research Group The Age-Related Eye Disease Study system for classifying age-related macular degeneration from stereoscopic color fundus photographs: the Age-Related Eye Disease Study report number 6. Am J Ophthalmol. 2001;132(5):668-681. doi: 10.1016/S0002-9394(01)01218-1 [DOI] [PubMed] [Google Scholar]
4.Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M. Deep image mining for diabetic retinopathy screening. Med Image Anal. 2017;39(7):178-193. doi: 10.1016/j.media.2017.04.012 [DOI] [PubMed] [Google Scholar]
5.He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; June 27-30, 2016; Las Vegas, NV. 771-778. [Google Scholar]
6.Ting DSW, Cheung CY, Lim G, et al. . Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223. doi: 10.1001/jama.2017.18152 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Velez-Montoya R, Oliver SCN, Olson JL, Fine SL, Quiroz-Mercado H, Mandava N. Current knowledge and trends in age-related macular degeneration: genetics, epidemiology, and prevention. Retina. 2014;34(3):423-441. doi: 10.1097/IAE.0000000000000036 [DOI] [PubMed] [Google Scholar]

[eld180005r1] 1.Burlina P, Freund DE, Dupas B, Bressler N. Automatic screening of age-related macular degeneration and retinal abnormalities. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:3962-3966. [DOI] [PubMed] [Google Scholar]

[eld180005r2] 2.Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135(11):1170-1176. doi: 10.1001/jamaophthalmol.2017.3782 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eld180005r3] 3.Age-Related Eye Disease Study Research Group The Age-Related Eye Disease Study system for classifying age-related macular degeneration from stereoscopic color fundus photographs: the Age-Related Eye Disease Study report number 6. Am J Ophthalmol. 2001;132(5):668-681. doi: 10.1016/S0002-9394(01)01218-1 [DOI] [PubMed] [Google Scholar]

[eld180005r4] 4.Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M. Deep image mining for diabetic retinopathy screening. Med Image Anal. 2017;39(7):178-193. doi: 10.1016/j.media.2017.04.012 [DOI] [PubMed] [Google Scholar]

[eld180005r5] 5.He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition; June 27-30, 2016; Las Vegas, NV. 771-778. [Google Scholar]

[eld180005r6] 6.Ting DSW, Cheung CY, Lim G, et al. . Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223. doi: 10.1001/jama.2017.18152 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eld180005r7] 7.Velez-Montoya R, Oliver SCN, Olson JL, Fine SL, Quiroz-Mercado H, Mandava N. Current knowledge and trends in age-related macular degeneration: genetics, epidemiology, and prevention. Retina. 2014;34(3):423-441. doi: 10.1097/IAE.0000000000000036 [DOI] [PubMed] [Google Scholar]

PERMALINK

Utility of Deep Learning Methods for Referability Classification of Age-Related Macular Degeneration

Phillippe Burlina, PhD

Neil Joshi, BS

Katia D Pacheco, MD

David E Freund, PhD

Jun Kong, MD, PhD

Neil M Bressler, MD

Abstract

Methods

2- and 4-Step Scales

Box. Description of the Classification Scales Used in This Study, Including the Eye-Based 4-Step Severity Scale Originally Used in AREDS to Provide Baseline Severity Levels of Enrolling Participants.

4-Step AREDS AMD Classification^a

2-Step AMD Classification

Automated Classification Challenge

Data Partition

Human-Machine Comparisons

Metrics

Results

Table. Performance Results for 2-Class AMD Severity Problem.

Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Utility of Deep Learning Methods for Referability Classification of Age-Related Macular Degeneration

Phillippe Burlina, PhD

Neil Joshi, BS

Katia D Pacheco, MD

David E Freund, PhD

Jun Kong, MD, PhD

Neil M Bressler, MD

Abstract

Methods

2- and 4-Step Scales

Box. Description of the Classification Scales Used in This Study, Including the Eye-Based 4-Step Severity Scale Originally Used in AREDS to Provide Baseline Severity Levels of Enrolling Participants.

4-Step AREDS AMD Classificationa

2-Step AMD Classification

Automated Classification Challenge

Data Partition

Human-Machine Comparisons

Metrics

Results

Table. Performance Results for 2-Class AMD Severity Problem.

Discussion

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

4-Step AREDS AMD Classification^a