Skip to main content
Middle East African Journal of Ophthalmology logoLink to Middle East African Journal of Ophthalmology
. 2021 Sep 25;28(2):81–86. doi: 10.4103/meajo.meajo_406_20

Validation of Artificial Intelligence Algorithm in the Detection and Staging of Diabetic Retinopathy through Fundus Photography: An Automated Tool for Detection and Grading of Diabetic Retinopathy

Bhargavi Pawar 1,, Suneetha N Lobo 1, Mary Joseph 1, Sangeetha Jegannathan 1, Hariprasad Jayraj 2
PMCID: PMC8547660  PMID: 34759664

Abstract

PURPOSE:

Diabetic retinopathy (DR) is one of the leading causes of vision loss globally, and early detection plays a significant role in the prognosis. Several studies have been done on the single field fundus photography and artificial intelligence (AI) in DR screening using standardized data sets in urban outpatient settings. This study was carried out to validate AI algorithm in the detection of DR severity using fundus photography in real-time rural setting.

METHODS:

This cross-sectional study was carried out among 138 patients who underwent routine ophthalmic examination, irrespective of their diabetic status. The participants were subjected to a single field color fundus photography using nonmydriatic fundus camera. The images acquired were processed by AI algorithm for image quality, presence and refer ability of DR. The results were graded by four ophthalmologists. Interobserver variability between the four observers was also calculated.

RESULTS:

Of the 138 patients, 26 patients (18.84%) had some stage of DR, represented by 47 images (17.03%) positive for signs of DR. All 26 patients were immoderate or severe stage. About 6.5% of the images were considered as not gradable due to poor optical quality. The average agreement between pairs of the four graders was 95.16% for referable DR (RDR). The AI showed 100% sensitivity in detecting DR while the specificity for RDR was 91.47%.

CONCLUSION:

AI has shown excellent sensitivity and specificity in RDR detection, at par with the performance of individual ophthalmologists and is an invaluable tool for DR screening.

Keywords: Artificial intelligence, diabetic retinopathy, fundus photography, referability

Introduction

India has emerged as the diabetic capital of the world with currently over 40 million diabetics in the country.[1] Although several recent advances developed in the management of type 2 diabetes mellitus have significantly reduced the mortality, complications and comorbidities such as diabetic retinopathy (DR) have significantly risen in the recent years. DR continues to be one of the leading causes of vision loss globally. In addition to an increasing incidence, lack of awareness and poor access to early screening are the factors responsible for making DR a global epidemic.[1] The All India Ophthalmological Society Diabetic Retinopathy Eye Screening Study done in 2014 estimated the prevalence of DR to be 21.7% in Indian urban setting.[2] It is estimated that the number of people with DR will grow from 126.6 million in 2011 to 191.0 million by 2030.[3] Low- and middle-income countries face the biggest challenge from diabetes while in Africa, two-thirds of diabetics still remain undiagnosed.

All diabetic patients are recommended to undergo annual or more frequent retinal screening to enable early detection and intervention.[4] However, periodic retinal screening is often associated with various challenges ranging from paucity of ophthalmologists in remote and rural areas, accessibility to technology in resource-limited settings. Faced with this challenge, many innovative solutions have been proposed to build better access to DR screening, and one such innovation is the use of artificial intelligence (AI). AI can address the current barriers to early screening, i.e., scarcity of ophthalmologists and trained resources for reading fundus images, accessibility to screening in rural and low-resource areas.[4] Literature has showed several studies on incorporating AI in screening of DR, however, those studies were largely restricted to standard data sets, and in urban outpatient settings. Implementing innovations like AI at larger population settings like rural areas in real time are necessary to obliterate the feasibility challenges which prevail in delivering ophthalmic care.

Objectives

This study was carried out to validate AI algorithm for the detection of DR and compare the findings with clinical evaluation of DR.

Methods

Study setting and participants

This study was carried out as a cross-sectional study by department of ophthalmology in the rural community eye clinic attached to our tertiary teaching institution for a period of 3 months between November 2017 and January 2018. Patients with clear view of the retina at least one eye were community considered for the study. Patients with bilateral cataracts were excluded.

Sample size and sampling technique

All the patients who presented to the rural center of the department in outskirts of Bengaluru for routine eye examination during the study period were selected for the study. A total of 156 patients were selected by convenience sampling during the course of this study. Out of this, 138 patients were selected for fundus imaging and 16 patients were excluded due to bilateral dense cataract obstructing clear view of retina; 2 patients were excluded due to age <18 years. In this study, 114 patients (82.6%) underwent pupillary dilatation and 24 patients (17.4%) did not undergo dilatation.

Ethical approval and consent

Institutional ethics committee approval was taken, and the study was conducted in accordance with the Declaration of Helsinki. Implied consent for fundoscopy and fundus photography was considered as the patients had presented themselves for routine ophthalmological assessment at the eye clinic.

Data collection

Fundus images were taken using Intucam Prime which is a low-cost portable fundus camera (mydriatic and nonmydriatic) that provides color fundus images with a 40° field of vision. Images were acquired by a technician with 6 months of experience in operating the device. For the purpose of this study, single field, posterior pole-centered fundus imaging was considered for DR screening based on imaging recommendations.[5,6] The technician was allowed to take more than one image per eye to obtain the best possible quality wherever necessary.

For the purpose of this study, two images were selected from the images taken for each patient – one per eye, as perceived by the technician as being the best quality images. A total of 276 images of good photographic quality were selected for the study. Images for each patient were stored in a folder in a numerical sequence from 1 to 138.

The selected images were uploaded to a cloud platform for DR analysis, and the AI output regarding the presence or absence of DR (of any grade) along with referable DR (RDR) (indicating DR stage moderate nonproliferative diabetic retinopathy [NPDR] or worse, i.e., those that needed referral to a tertiary center) was derived.

The AI algorithm is a deep learning algorithm programmed to detect and stage DR, similar to other deep learning algorithms.[4] AI results were not communicated to the patient, and the patient's journey in the eye examination was not affected by the process.

The images were independently graded by four ophthalmologists using an online annotation tool, without access to the patient or clinical data. International Clinical Diabetic Retinopathy (ICDR) Classification Scale, [Table 1, ICDR classification],[7] was followed for disease severity staging as this helped the programmed algorithm to be able to classify a disease as referable or nonreferable (compared to other standard classification systems such as Early treatment of Diabetic Retinopathy Study (ETDRS)). Sample image classification with lesion markings is as shown in.

Table 1.

International clinical diabetic retinopathy classification

ICDR stage Findings on fundus image
No apparent retinopathy No abnormalities
Mild NPDR Microaneurysms only
Moderate NPDR More than just microaneurysms but less than severe NPDR
Severe NPDR Any of the following
 >20 intraretinal hemorrhages in each of the 4 quadrants
 Definite venous beading in 2 or more quadrants
 Prominent IRMA in 1+ quadrant
 And no signs of proliferative retinopathy
PDR One or more of the following
 Neovascularization
 Vitreous or preretinal hemorrhage

ICDR: International clinical diabetic retinopathy, NPDR: Nonproliferative diabetic retinopathy, IRMA: Intraretinal microvascular abnormality, PDR: Proliferative diabetic retinopathy

Operational definitions

RDR was defined as moderate NPDR or above and/or presence of macular edema. No DR and mild NPDR were considered as not referable. Sight-threatening DR was defined as severe NPDR and PDR or above. Not gradable: Suboptimal quality of images for DR grading.

As only one field (posterior pole) was considered for this study, severe NPDR was considered without application of the 4-2-1 rule. The presence of more than 20 intraretinal hemorrhages or venous beading or intraretinal microvascular abnormality in the single posterior pole centered image was considered to be severe NPDR.

The most frequent grade among the four ophthalmologists was considered as the ground truth for each image, in the event of a split decision where two ophthalmologists each agreed on a grade, the higher grade was considered as ground truth. There were no instances where each ophthalmologist had assigned a different grade for the same image. Ground truth grades thus derived were compared with the AIs output for presence, staging and referability of DR, based on the markings by AI algorithm [Figure 1].

Figure 1.

Figure 1

Annotations as marked by the AI algorithm

Interobserver reliability, ground truth, and artificial intelligence performance

The single field fundus images were graded independently by four ophthalmologists. The images were accessed through an online grading tool using a computer and were graded using the ICDR scale. The results were collated and compared between the ophthalmologists to assess the intergrader agreement and to derive the “ground truth” for each image, defined as the most common grade assigned for the image, or the consensus grading. During the process, the graders did not discuss the images to draw consensus or assent.

AI performance was then evaluated in comparison with the ground truth.

Statistical analysis

Data were analyzed using Microsoft Excel 2016 and MedCalc Statistical Software version 19.0.3 (Excel the freely available software by Microsoft. MedCalc Software bvba, Ostend, Belgium) Interobserver agreement was calculated for the following parameters using the intraclass correlation coefficient (ICC) between all four graders and k statistic between 2 graders. Classification of Cohen's kappa (k) was used as per guidelines[8] as <0.20 (poor), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (good), and 0.81–1.00 (very good).

Classification of ICC was done using reference guideline as by Koo and Li[9] as follows: < 0.50 (poor); 0.50–0.75 (moderate); 0.75–0.90 (good); >0.90 (excellent).

Artificial intelligence performance in comparison with the ground truth

The images were processed through AI deep learning algorithm for 3 parameters:

  • Image quality assessment – gradable or not gradable

  • DR detection – yes (ICDR severity mild or worse) or no (no DR)

  • RDR detection– yes (ICDR severity moderate or worse) or no (no DR or mild NPDR).

The results were compared at the image level with the ground truth grades derived from the ophthalmologists grading data. For this, the sensitivity, specificity, positive predictive value, negative predictive value, and receiver operating characteristic were calculated for DR detection and RDR detection. AI has identified all 18 poor-quality images as “ungradable."

Results

The study was carried out among 138 participants of which 84 (60.8%) were diabetic. The age and sex distribution of the study participants is given in Table 2. Of the 276 images from 138 patients, the agreement distribution between the four graders is as given in Table 3. Where all the four graders have assigned, the same grade is considered as 100%, three graders agreeing is considered 75%, and two graders agreeing is considered 50% agreement. There were no instances where all 4 graders assigned different grades to a given image.

Table 2.

Age and gender distribution of patients undergoing fundus imaging

Characteristics Count (%)
Age group
 <20 6 (2.2)
 20-29 16 (5.8)
 30-39 36 (13)
 40-49 72 (26)
 50-59 36 (13)
 60-69 80 (29)
 70-79 20 (7.2)
 80-89 10 (3.6)
Gender
 Female 134 (48.55)
 Male 142 (51.45)

Table 3.

Diabetic retinopathy grade agreement between ophthalmologists

DR grade Ground truth (%) Grader 1 (%) Grader 2 (%) Grader 3 (%) Grader 4 (%)
No DR 76.4 77.5 77.5 77.2 73.9
Mild NPDR 0.0 2.2 0.4 1.8 22.0
Moderate NPDR 6.5 4.7 8.3 7.2 7.6
Severe NPDR 7.6 5.4 5.1 5.8 6.5
PDR 2.9 3.6 2.2 1.4 3.3
Not gradable 6.5 6.5 6.5 6.5 6.5

Not gradable: 18 images with considered as not gradable due to insufficient image quality accordingly to all 4 graders. NPDR: Nonproliferative diabetic retinopathy, PDR: Proliferative diabetic retinopathy, DR: Diabetic retinopathy

In the 15 instances where there was a split but balanced agreement where different DR grades were assigned by two graders each, the higher grade of DR was considered as a ground truth DR grade for the image.

The ground truth grades/gold standard of ophthalmologist's evaluation

The prevalence of DR in this study was 18.84% seen among 21 participants out of the 138 evaluated. The prevalence of DR was represented in 44 images (15.94%) positive for signs of DR. Twenty patients were considered to have RDR (DR severity moderate or worse), and 18 images out of the 276 were considered as not gradable due to image quality falling short of providing diagnostic clarity [Table 4].

Table 4.

Ground truth: Grading of diabetic retinopathy by ophthalmologist

DR grade Count of patient (%) Count of images (%)
No DR 112 (81.16) 211 (76.45)
Mild NPDR 0 0
Moderate NPDR 5 (3.62) 18 (6.52)
Severe NPDR 16 (11.59) 21 (7.61)
PDR 5 (3.62) 8 (2.90)
Not gradable 0 18 (6.52)

NPDR: Nonproliferative diabetic retinopathy, PDR: Proliferative diabetic retinopathy, DR: Diabetic retinopathy

ICC between the four graders showed excellent consistency and the Cohen's Kappa compared agreement between 2 graders and showed “very good agreement” with an ICC score of 0.9174 [Table 5].

Table 5.

Cohen’s kappa between any 2 graders and with ground truth grades to calculate interobserver variability

Grade pair DR grade
Any DR
RDR
Cohen’s kappa Agreement Cohen’s kappa Agreement Cohen’s kappa Agreement
1 and 2 0.86 876 0.91 95.35 0.9 94.96
1 and 3 0.84 89.49 0.88 94.19 0.87 93.8
1 and 4 0.87 88.04 0.88 93.8 0.91 95.35
2 and 3 0.86 90.58 0.93 96.51 0.91 95.74
2 and 4 0.87 89.49 0.88 93.8 0.93 96.51
3 and 4 0.83 88.04 0.87 93.41 0.89 94.57
1 and GT 0.94 93.45 0.94 97.28 0.92 96.51
2 and GT 0.93 94.55 0.96 98 0.96 98.4
3 and GT 0.88 92.73 0.94 96.89 0.91 95.73
4 and GT 0.92 94.18 0.92 95.73 0.96 98.06
AI and GT 0.7 62.32 0.51 66.27 0.87 93.02

DR: Diabetic retinopathy, RDR: Referable diabetic retinopathy, GT: Ground truth

The average sensitivity and specificity among the graders in comparison with ground truth grades were 93.18% and 97.77% for any DR and 89.63% and 98.84% for RDR [Table 6]. Results showed that for any grade of DR, the sensitivity in detecting any grade of DR (mild or greater) is 100% while the specificity is 59.24% and area under curve (AUC) is 0.796.

Table 6.

Sensitivity and specificity of individual graders and artificial intelligence in comparison with ground truth

Grade pair Any DR (%)
RDR (%)
Sn Sp Sn Sp
1 and GT 93.18 98.59 87.8 99.54
2 and GT 90.91 99.53 92.68 100
3 and GT 90.91 98.12 80.49 98.15
4 and GT 97.73 94.84 97.56 97.69
AI and GT 100 59.24 100 91.47

Sn: Sensitivity, Sp: Specificity, GT: Ground truth, DR: Diabetic retinopathy, RDR: Referable diabetic retinopathy

The AUC for individual ophthalmologists varies from 0.90 to 0.97 [Tables 7 and Graph 1].

Table 7.

Artificial intelligence performance in the detection of any stage of diabetic retinopathy and referable diabetic retinopathy in comparison with ground truth grades

Row labels AI grades
Grant total
No RDR RDR Not gradable
Ground truth
 No RDR 193 18 0 211
 RDR 0 47 0 47
 Not gradable 0 0 18 18
Grant total 193 65 18 276

AI has shown 100% sensitivity and 91.47% specificity in the detection of RDR. This is comparable to the performance of individual ophthalmologist. RDR=Moderate NPDR or worse, No RDR=No DR or mild NPDR. RDR: Referable diabetic retinopathy, NPDR: Nonproliferative diabetic retinopathy, AI: Artificial intelligence

Graph 1.

Graph 1

AUC for individual graders

The artificial intelligence performance

The AI algorithm has falsely detected DR in 86 images which were otherwise classified as no DR under ground truth/clinicians observation. We identified most of these images as having artifacts, reflections, or innocuous pigmentary changes which the AI marked as cotton wool spots or dot hemorrhages.

Whereas, when we consider the RDR detection (grade of moderate NPDR or higher), AI has detected all images with RDR, therefore, sensitivity = 100% and specificity = 91.47% for RDR. The AUC = 0.957.

Thus, the AI has performed at par with the ophthalmologists in detecting RDR.

Discussion

AI is finding varied applications in the field of medical science from screening for disease to assistance in robotic surgery. Screening for DR is the need of the hour – from rural, inaccessible areas to the elite urban health-care institute due to the overwhelming increase in the prevalence of comorbidities of diabetes mellitus. When used as a screening tool in a primary or tertiary care center, AI minimizes the dependence of manpower, especially the ophthalmologist, thereby limiting the patient load for each clinician and improving clinical productivity, in an already overwhelmed health-care system with a high patient to doctor ratio.

Screening for DR can easily be performed by a technician at rural areas such as that chosen in this study, or at the general physicians, endocrinologists, or nephrologists clinic. In a prototype rural area where this algorithm was validated, there is tremendous scope for its application, where tertiary medical care is less easily accessible and referral by means of a screening tool will be well received among the local and medical communities. It holds relevance in rural and underdeveloped tribal areas of developing nations where diabetes remains undetected due to lack of awareness and lack of accessible health care.

This AI model has been trained to detect and refer to all retinopathy which is classified as moderate NPDR or worse as per the ICDR standards. The high sensitivity of the AI (>95%) makes it a good screening tool for referable disease. It is trained to identify microaneurysms, hemorrhages, hard exudates, cotton wool spots, and neovascularization. The AI performs automated annotations of lesions and is an excellent tool for patient education and also for accurate monitoring of disease progress in follow-up visits. In nonretinopathy fundus and mild NPDR, however, the algorithm can overdiagnose or underdiagnose retinopathy as it can label artifacts and pigmentary changes as diabetic-related changes or miss out on an occasional microaneurysms that a trained clinical eye will detect. This is an area which needs to be improvised upon, but as mild NPDR qualifies as nonreferable according to our study, it does not affect the validation of the tool.

In conditions which mimic DR, or conditions with similar fundoscopic attributes (vein occlusion, wet age-related macular degeneration, light amplification by stimulated emission of radiation scars), the algorithm will annotate the lesions and hemorrhages and classify the patient as RDR, as it has not been trained to differentiate clinical conditions with similar findings based on the photographic lesions detected. Hence, the lower range of specificity (58%) due to the high false positives. Although this may appear to be a shortfall of the algorithm, the clinical implications are not of concern, as most of these conditions would require an ophthalmic evaluation nonetheless.

Conclusion

The benefits of using AI in medical science are manifold, from rural outreach screening tool to robotic surgery at tertiary care. Although the present study does not suggest that the gold standard ability of an ophthalmologists examination should be replaced or substituted, it is recommended that AI and its application should be used as a valuable screening tool and clinical aid to screen large populations and to help in the early detection, referral, and treatment of DR with the goal to limit morbidity due to the disease. Since there has been no evidence in the existing literature regarding the validation of AI software in a real-time setting, the present study serves as a pilot study for the purpose of validation of the same.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

References

  • 1.Mohan V, Sandeep S, Deepa R, Shah B, Varghese C. Epidemiology of type 2 diabetes: Indian scenario. Indian J Med Res. 2007;125:217–30. [PubMed] [Google Scholar]
  • 2.Gadkari SS, Maskati QB, Nayak BK. Prevalence of diabetic retinopathy in India: The All India Ophthalmological Society Diabetic Retinopathy Eye Screening Study 2014. Indian J Ophthalmol. 2016;64:38–44. doi: 10.4103/0301-4738.178144. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Zheng Y, He M, Congdon N. The worldwide epidemic of diabetic retinopathy. Indian J Ophthalmol. 2012;60:428–31. doi: 10.4103/0301-4738.100542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ting DS, Cheung CY, Lim G, Tan GS, Quang ND, Gan A, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211–23. doi: 10.1001/jama.2017.18152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Williams GA, Scott IU, Haller JA, Maguire AM, Marcus D, McDonald HR. Single-Field Fundus Photography for Diabetic Retinopathy Screening Ophthalmology. 2004;111:1055–62. doi: 10.1016/j.ophtha.2004.02.004. [DOI] [PubMed] [Google Scholar]
  • 6.Solanki K, Ramachandra C, Bhat S, Bhaskaranand M, Nittala MG, Sadda SR. EyeArt: Automated, high-throughput, image analysis for diabetic retinopathy screening. Invest Ophthalmol Vis Sci. 2015;56:1429. [Google Scholar]
  • 7.International Clinical Diabetic Retinopathy Disease Severity Scale, Detailed Table Authored by American Academy of Ophthalmology Posted on. 2010 [Google Scholar]
  • 8.Meta-analysis of Cohen's kappa. Health Serv Outcomes Res Methodol 2011;11:Shuyan Sun : Health Serv Outcomes Res Method. 2011;11:145–63. [Google Scholar]
  • 9.Koo T, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–63. doi: 10.1016/j.jcm.2016.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Middle East African Journal of Ophthalmology are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES