Skip to main content
Indian Journal of Ophthalmology logoLink to Indian Journal of Ophthalmology
. 2023 Dec 15;72(3):408–411. doi: 10.4103/IJO.IJO_613_23

Artificial intelligence in glaucoma detection using color fundus photographs

Zubin Sidhu 1, Tarannum Mansoori 1,
PMCID: PMC11001223  PMID: 38099383

Abstract

Purpose:

To explore the potential of artificial intelligence (AI) for glaucoma detection using deep learning algorithm and evaluate its accuracy for image classification of glaucomatous optic neuropathy (GON) from color fundus photographs.

Methods:

A total of 1375 color fundus photographs, 735 normal optic nerve head and 640 GON, were uploaded on the AI software for training, validation, and testing using deep learning model, which is based on Residual Network (Res Net) 50V2. For initial training and validation, 400 fundus images (200 normal and 200 GON) were uploaded and for the final training and testing 975 (535 normal and 440 GON) were uploaded later. Accuracy, sensitivity, and specificity were used to evaluate the image classification performance of the algorithm. Also, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio were calculated.

Results:

The model used in the study showed an image classification accuracy of 81.3%, sensitivity of 83%, and specificity of 80% for the detection of GON. The false-negative grading was 17% and false-positive grading was 20% for the image classification of GON. Coexistence of glaucoma in patients with high myopia, early glaucoma in a small disc, and software misclassification of GON were the reasons for false-negative results. Physiological large cupping in a large disc, myopic or titled disc, and software misclassification of normal optic disc were the reasons for false-positive results.

Conclusion:

The model employed in this study achieved a good accuracy, and hence has a good potential in detection of GON using color fundus photographs.

Keywords: Artificial intelligence, fundus photographs, glaucoma


Glaucoma is a chronic, progressive optic neuropathy, which causes irreversible structural damage to the optic nerve head (ONH) and the retinal nerve fiber layer (RNFL) due to retinal ganglion cell loss, and a high intraocular pressure is considered as a major risk factor. It is a leading cause of irreversible blindness worldwide.[1] High incidence of undiagnosed disease can be attributed to the glaucoma being asymptomatic until the advanced stages, where the central vision is affected. As glaucoma progresses from the early to late stage, it poses a financial burden on the health-care system due to an increase in the health-care cost. As additional tools are required for effective glaucoma screening in a large population, artificial intelligence (AI) can interpret ONH images to help in decision support system in the diagnosis of glaucoma.

AI has the potential to revolutionize the screening and diagnosis of glaucoma through the automated processing of large datasets.[2] An emerging area of use of AI in glaucoma detection involves the use of an image interpretation and classification with machine learning (ML) or deep learning algorithm. In ML, the algorithm process inputs on the training dataset, without being explicitly programmed, and involves training the algorithm on datasets of labeled images, so that the machine learns from predefined criteria of the data and provides a specific output to classify the input. Deep learning methods represent an advancement in the imaging recognition research as they process information via interconnected neurons and the raw data is transformed into a suitable feature vector, which can identify patterns in the input.

Recent evidences suggest that the ML and deep learning algorithms can grade optic disc images with good diagnostic accuracy in identifying glaucomatous optic neuropathy (GON).[3,4,5]

Lobe (Microsoft 2021) is an open-access application, which runs and trains the project images using residual network (ResNet) 50V2 model, which is a deep learning network.[6] Lobe provides a user interface to help build a model, which can be used for image recognition via the supervised learning process in which the model is trained by providing labeled images to guide it to learn the image pattern recognition.

The purpose of this study was to explore the potential of AI model using online software of deep learning algorithm for glaucoma detection and evaluate its accuracy for image classification of GON from the color fundus photographs of normal ONH and GON.

Methods

The internal institute review board approved the study and waived the approval required for the written informed consent. The study adhered to the tenets of Declaration of Helsinki and was conducted from December 2022 to January 2023.

In this study, 1375 color fundus photographs were randomly collected from the stored and saved images of fundus camera, of which 735 fundus photographs were graded as normal and 640 photographs were labeled as having GON by a single senior glaucoma specialist. The field of view for the retinal fundus photographs was 30° and was constant for all the images retrieved from the fundus camera. Fundus photographs of poor quality, poor location of the optic disc, proliferative diabetic retinopathy, retinal venous occlusion, and optic disc anomalies other than glaucoma were excluded from the dataset.

The dataset of fundus images was imported to the Lobe (Microsoft 2021) software, which uses convolutional neural network (CNN) for deep residual learning to perform image recognition and is based upon ResNet50V2. This deep learning model is created by training it with an initial dataset of images using the “supervised learning approach,” in which the input images are explicitly labeled to help the model train itself before it can start the prediction process. ResNet50V2 provides a novel way to add more convolutional layers to a CNN without running into a vanishing gradient problem using the concept of shortcut connections. A shortcut connection can “skip over” some layers, thus converting a regular network into a residual network. One additional attribute is that ResNet50V2 uses “bottleneck” design for the building block, which reduces the number of parameters and matrix multiplications, thus enabling much faster training of each layer. Given the complexity of identification of attributes on an image that would indicate glaucoma, use of ResNet50V2 architecture is very suitable.

The initial training dataset batch consisted of 200 normal and 200 GON fundus images annotated by a single glaucoma specialist, which was used for training of the model to guide it to learn the image pattern recognition. The rationale behind initial training of a small dataset was that, if the prediction accuracy on this smaller dataset tracked above 80%, the model training will continue. If the accuracy was poor, the training and dataset will be discarded and a new dataset will be uploaded to repeat the process.

Once validated with good prediction accuracy, 535 normal ONH and 440 GON images were uploaded, which were not merged with the initial 400 images that were used for pretraining the model and testing the initial prediction accuracy [Fig. 1]. Hence, a total of 975 images were used for the final analysis.

Figure 1.

Figure 1

Flow chart depicting overview of the image workflow in the study

The software randomly splits the dataset images using 80%–20% split, in which 80% images are used to train the model and 20% are used for the “held-out” test to test the model. The software gives output that shows which of the images are predicted correctly (green labels) or incorrectly (red labels) by the model, and the width of the label bar represents accuracy of the model in prediction of the disease [Fig. 2]. The more correct the prediction label bar, the better the model performance. The output was reviewed by the glaucoma specialist to identify any misclassification by the software and classify the reasons for any false-positive (FP) and false-negative (FN) results.

Figure 2.

Figure 2

The software output of 975 fundus photographs used in the study for final analysis

Statistical analyses were performed using commercially available Minitab statistical software (version 21.3.0, Minitab, LLC). Accuracy, sensitivity, and specificity were used to evaluate the performance of this algorithm. Also, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) were calculated.

Results

The demographic details of the participants, whose images were used for the study, is given in Table 1. An accuracy of 81.3%, a sensitivity of 83%, and a specificity of 80% were obtained for the image classification of GON [Table 2, Fig. 2]. The FN grading was 17% (n = 75), and FP was 20% (n = 107). The most common reasons for FN were early glaucoma in a small disc in 17 eyes (22.7%), glaucoma in myopia in 12 (16%) eyes, and misclassification of GON by the software in 46 eyes (61.3%). The reasons for FP results were physiological large cupping in a large-size disc in 40 eyes (37.4%), myopic disc with peripapillary chorioretinal atrophy in 20 eyes (18.7%), and misclassification of normal disc by the software in 47 eyes (43.9%).

Table 1.

Demography of study participants whose fundus images were used to test the model

Initial testing dataset
Final dataset
Normal Glaucoma Normal Glaucoma
Total images 200 200 535 440
Mean age (in years) 55.7±12.5 56.1±11.2 57.6±11.4 58.2±10.5
Gender (male/female) 97/103 110/90 235/300 230/210
Number of images based on glaucoma severity
  Early glaucoma - 90 - 255
  Moderate glaucoma - 60 - 110
  Advanced glaucoma - 50 - 75

Table 2.

Statistical analysis of the output data

Test Glaucoma Normal Total
Positive 365 (True positive) 107 (False positive) 472
Negative 75 (False negative) 428 (True negative) 503
Total 440 535 975

Statistic Value 95% Confidence interval

Sensitivity 82.95% 79.11%–86.35%
Specificity 80.00% 76.36%–83.31%
Positive likelihood ratio 4.15 3.48%–4.94
Negative likelihood ratio 0.21 0.17%–0.26
Positive predictive value 77.33% 73.28%–81.03%
Negative predictive value 85.09% 81.67%–88.09%
Accuracy 81.33% 78.74%–83.73%

PPV was 77.33% and NPV was 85.1%. PLR was 4.15 and NLR was 0.21 [Table 2].

Discussion

In our study, image classification accuracy for the detection of GON was 81.3%. A sensitivity of 83% and a specificity of 80% were obtained using 975 fundus photographs, which were used for training and testing the model. A diagnostic test with high specificity ensures avoiding incorrect labeling of the patients with the disease and overwhelming the health-care system.

Several reports of automated methods for glaucoma detection have been published recently.[2,3,4,5] To accurately diagnose glaucoma, we require a testing algorithm with high sensitivity and specificity.

Using ML model, Kim et al.[3] reported classification accuracy and sensitivity of 98% and specificity of 97%. While their study used fundus photographs, perimetry, and optical coherence tomography, our study is based only on color fundus photographs. Chakrabarty et al.[4] reported a sensitivity of 71.6% and a specificity of 71.7% using a feature extraction technique. In the validation dataset used by Li et al.,[5] the deep learning system model achieved a sensitivity of 95.6% and a specificity of 92%.

As the performance of the algorithm is affected by the quality of dataset images used for training, we used only good-quality images in our study. In this study, the FN grading of images was 17% and FP was 20%. We identified the reasons for the same and, in future studies, would like to explore the model by classifying GON images into small, medium, and large size optic disc and taking into consideration the severity of glaucoma for the comparison. The clinician must keep in mind the possibility of misclassification by the software in the challenging clinical scenarios, such as in patients with myopic and titled disc, who may not have glaucoma.

If this technology could be adopted to provide accurate identification of GON, it will help in glaucoma screening program for the population at risk. It would also improve access to the health care by decreasing the cost of glaucoma screening, especially in the remote areas and underserved communities. Before its implementation into the health-care system, its cost effectiveness has to taken into consideration.

The strength of this study is use of images acquired from the same fundus camera, uniform field of view of fundus images, and having excluded fundus images of coexisting ocular diseases, which can affect the structure of ONH. Also, the Lobe model used for the study is an open access software and, hence, cost effective. The software does not upload images to the cloud, thereby protecting patient’s privacy. Fundus photography is a relatively inexpensive method to identify GON, and smartphone can be used to capture patients fundus images in a primary health-care or community screening settings.

Limitation of the study is with the use of the model, in which the software itself figures out the instructions, based on the inputs and outputs of the labeled images uploaded for the training. Once the software has finished learning, it is not apparent to the programmer, how exactly the program has generated the model output.[7] Also, the data was acquired from patients of same ethnicity in a hospital-based setting, and hence cannot be extrapolated to a population of different ethnicity.

Future studies with larger sample size, diverse dataset, and combination of multiple inputs in the model would help to increase the prediction accuracy of the model in glaucoma detection.

Conclusion

Glaucoma detection with an AI deep learning model using color fundus photographs shows a good accuracy and has a good potential to be used as an adjunct tool in image classification of GON. Currently, its use can be applied in population-based screening programs and teleophthalmology to help in glaucoma detection and identify the patients in need of referral to a glaucoma specialist.

Financial support and sponsorship:

Nil.

Conflicts of interest:

There are no conflicts of interest.

References

  • 1.Tham YC, Li X, Wong TY, Quigley HA, Aung T, Cheng CY. Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta-analysis. Ophthalmology. 2014;121:2081–90. doi: 10.1016/j.ophtha.2014.05.013. [DOI] [PubMed] [Google Scholar]
  • 2.Zheng C, Johnson TV, Garg A, Boland MV. Artificial intelligence in glaucoma. Curr Opin Ophthalmol. 2019;30:97–103. doi: 10.1097/ICU.0000000000000552. [DOI] [PubMed] [Google Scholar]
  • 3.Kim SJ, Cho KJ, Oh S. Development of machine learning models for diagnosis of glaucoma. PLoS One. 2017;12:e0177726.. doi: 10.1371/journal.pone.0177726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chakrabarty L, Joshi GD, Chakravarty A, Raman GV, Krishnadas SR, Sivaswamy J. Automated detection of glaucoma from topographic features of the optic nerve head in color fundus photographs. J Glaucoma. 2016;25:590–7. doi: 10.1097/IJG.0000000000000354. [DOI] [PubMed] [Google Scholar]
  • 5.Li Z, He Y, Keel S, Meng W, Chang RT, He M. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology. 2018;125:1199–206. doi: 10.1016/j.ophtha.2018.01.023. [DOI] [PubMed] [Google Scholar]
  • 6.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016:770–8. [Google Scholar]
  • 7.Akkara JD, Kuriakose A. Role of artificial intelligence and machine learning in ophthalmology. Kerala J Ophthalmol. 2019;31:150–60. doi: 10.4103/ijo.IJO_622_19. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Indian Journal of Ophthalmology are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES