Key Points
Question
How does an artificial intelligence (AI) system for autonomous detection of vision-threatening diabetic retinopathy (vtDR) and more than mild diabetic retinopathy (mtmDR) compare with the reading center clinical reference standard?
Findings
In this multicenter cross-sectional diagnostic study including 942 individuals with diabetes, the accuracy of the EyeArt autonomous AI system vs the reference standard was high (mtmDR sensitivity: 96%, specificity: 88% and vtDR sensitivity: 97%, specificity: 90%). The AI system successfully graded more than 97% of the eyes scored manually, with most not requiring dilation.
Meaning
An autonomous AI system can accurately detect vtDR and mtmDR without physician oversight or need for dilation in most individuals, facilitating diabetic eye examinations at nonspecialist facilities and enabling accelerated referral of vtDR.
Abstract
Importance
Diabetic retinopathy (DR) is a leading cause of blindness in adults worldwide. Early detection and intervention can prevent blindness; however, many patients do not receive their recommended annual diabetic eye examinations, primarily owing to limited access.
Objective
To evaluate the safety and accuracy of an artificial intelligence (AI) system (the EyeArt Automated DR Detection System, version 2.1.0) in detecting both more-than-mild diabetic retinopathy (mtmDR) and vision-threatening diabetic retinopathy (vtDR).
Design, Setting, and Participants
A prospective multicenter cross-sectional diagnostic study was preregistered (NCT03112005) and conducted from April 17, 2017, to May 30, 2018. A total of 942 individuals aged 18 years or older who had diabetes gave consent to participate at 15 primary care and eye care facilities. Data analysis was performed from February 14 to July 10, 2019.
Interventions
Retinal imaging for the autonomous AI system and Early Treatment Diabetic Retinopathy Study (ETDRS) reference standard determination.
Main Outcomes and Measures
Primary outcome measures included the sensitivity and specificity of the AI system in identifying participants’ eyes with mtmDR and/or vtDR by 2-field undilated fundus photography vs a rigorous clinical reference standard comprising reading center grading of 4 wide-field dilated images using the ETDRS severity scale. Secondary outcome measures included the evaluation of imageability, dilated-if-needed analysis, enrichment correction analysis, worst-case imputation, and safety outcomes.
Results
Of 942 consenting individuals, 893 patients (1786 eyes) met the inclusion criteria and completed the study protocol. The population included 449 men (50.3%). Mean (SD) participant age was 53.9 (15.2) years (median, 56; range, 18-88 years), 655 were White (73.3%), and 206 had type 1 diabetes (23.1%). Sensitivity and specificity of the AI system were high in detecting mtmDR (sensitivity: 95.5%; 95% CI, 92.4%-98.5% and specificity: 85.0%; 95% CI, 82.6%-87.4%) and vtDR (sensitivity: 95.1%; 95% CI, 90.1%-100% and specificity: 89.0%; 95% CI, 87.0%-91.1%) without dilation. Imageability was high without dilation, with the AI system able to grade 87.4% (95% CI, 85.2%-89.6%) of the eyes with reading center grades. When eyes with ungradable results were dilated per the protocol, the imageability improved to 97.4% (95% CI, 96.4%-98.5%), with the sensitivity and specificity being similar. After correcting for enrichment, the mtmDR specificity increased to 87.8% (95% CI, 86.3%-89.5%) and the sensitivity remained similar; for vtDR, both sensitivity (97.0%; 95% CI, 91.2%-100%) and specificity (90.1%; 95% CI, 89.4%-91.5%) improved.
Conclusions and Relevance
This prospective multicenter cross-sectional diagnostic study noted safety and accuracy with use of the EyeArt Automated DR Detection System in detecting both mtmDR and, for the first time, vtDR, without physician assistance. These findings suggest that improved access to accurate, reliable diabetic eye examinations may increase adherence to recommended annual screenings and allow for accelerated referral of patients identified as having vtDR.
This diagnostic study compares the accuracy of an automated diabetic retinopathy detection system with the Early Treatment Diabetic Retinopathy Study reference standard in adults with diabetic retinopathy.
Introduction
Worldwide, the prevalence of type 1 and type 2 diabetes in adults is expected to increase from approximately 415 million in 2015 to 642 million by 2040.1 Approximately 35% of patients are at risk of developing diabetic retinopathy (DR), with more than 10% at risk of more severe vision-threatening DR (vtDR).2 Vision loss from DR may occur asymptomatically, with patients unaware of progressive damage.3 Despite twice as many patients with diabetes reporting fear of vision loss being the most prevalent disease-related concern over any other complication, recommended annual vision screening is rarely completed.4 Indeed, 21% of patients with diabetes worldwide have never undergone DR screening,4 and in the US, only 60% of the patients receive annual dilated examinations—even lower among low-income patients of minority race and ethnicity.5,6 These levels may be even lower in low-income countries where patients experience waiting lists or long-distance referrals owing to the scarcity of specialists.7 Patients commonly report barriers to regular DR screening consisting of limited access to eye care specialists, including long wait times for appointments and high costs.4 In low-income communities, this difficulty is compounded by failure to attend existing appointments for DR screening.8 Limited access to eye care specialists highlights a need for efficient and convenient DR screening at easily accessible sites in primary care,9 facilitating early diagnosis and prioritization for treatment of vtDR before vision loss occurs.10
Recent advances in automated DR detection using artificial intelligence (AI) algorithms provide patients with increased opportunities for care, improving patient access to diabetic eye examinations and identifying patients requiring specialist referrals.11,12,13,14,15,16,17 Artificial intelligence facilitates safe detection of DR in local primary care offices.18 Consequently, adherence to regular eye examinations may improve, particularly in individuals with limited access to specialists.
Beyond providing regular examinations to patients with diabetes, AI-based automated DR detection allows consistent interpretation and real-time reporting of results.16 Automated systems are cost-effective alternatives to human grading17 and can increase access to screening for patients.10 In addition, automated DR screening systems reduce physician workload associated with manual grading of images.15 Several automated DR image assessment systems have been reported (eg, EyeArt, Eyenuk Inc; IDx-DR, Digital Diagnostics Inc; SELENA, Singapore Eye Research Institute; Retmarker, Retmarker Ltd; and Automated Retinal Disease Assessment, Google LLC).12,14,16,17,19,20,21 Diagnostic study has shown IDx-DR to be sensitive and specific for DR screening, and the AI system has been cleared by the US Food and Drug Administration (FDA) for the detection of more-than-mild DR (mtmDR) only (ie, not for vtDR).12
Many patients with mtmDR need to be referred for eye specialist care (hence, mtmDR is typically considered referable), but vtDR may need more urgent intervention. This need is particularly important in medical systems in which retinal screening and specialty care are limited7,8,9,10 An AI approach that specifically detects vtDR allows for prioritized appointment schedules for vtDR that conform with recommended referral time guidelines for urgent treatment.9,22
The EyeArt AI Automated DR Detection System is an FDA-cleared cloud-based retinal diagnostic software device that analyzes digital color fundus photographs (CFPs) of patient eyes for signs of DR.14,19 The system is designed to detect both mtmDR and vtDR in each eye of patients with diabetes. A recent retrospective real-world study assessed the diagnostic outcomes of EyeArt, version 2.0, in 101 710 consecutive patient visits from more than 400 primary care centers in previously obtained CFPs.14 The system achieved a 91.3% sensitivity and 91.1% specificity compared with a clinical reference. Moreover, the AI system achieved a 98.5% sensitivity for a positive referral output in patients with vtDR (potentially treatable DR).14 The present prospective multicenter cross-sectional diagnostic study evaluated the use of EyeArt, version 2.1.0, in detecting mtmDR and vtDR in eyes of patients with diabetes.
Methods
The prospective multicenter cross-sectional diagnostic study was preregistered (NCT03112005) and conducted from April 17, 2017, to May 30, 2018. Data analysis was performed from February 14 to July 10, 2019. The overall study design is depicted in Figure 1 and described herein. The protocol was approved by the Alpha Institutional Review Board and site-specific institutional review boards, where required. All participants provided written informed consent and received nominal compensation that was reviewed and approved by the institutional review boards. The study was conducted in accordance with International Conference on Harmonization Good Clinical Practice,23 Declaration of Helsinki,24 and all applicable laws and regulations. This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline.
Study Population
Fifteen US study centers participated, including primary care (6), general ophthalmology (6), and retina specialty (3) centers. After a prespecified check for poolability, data from study centers were pooled for analysis.
Participants were aged 18 years or older and had diabetes. Exclusion criteria were persistent visual impairment in 1 or both eyes; history of macular edema or retinal vascular occlusion, ocular injections, retinal laser treatment, or intraocular surgery other than uncomplicated cataract surgery; and contraindication for fundus photography. Data on age, sex, and self-reported race and ethnicity were collected per requirements of the FDA and to show possible generalizability of the findings.
Patients with scheduled study center visits were sequentially assessed for eligibility by medical records review before being invited to participate during the sequential recruitment period. A subsequent enrichment-permitted period was performed to increase the likelihood of enrolling individuals with a more advanced level of disease. During this period, sites could invite patients who met eligibility requirements and 1 or more enrichment criterion by medical records review or a prescreening questionnaire including diagnosis of diabetes for 10 or more years, type 2 diabetes diagnosis with insulin dependence for 3 or more years, diagnosis of diabetes for 5 or more years with no prior diabetic eye examination, or diagnosis of diabetes for 5 or more years with hemoglobin A1c level 9% or higher (to convert to proportion of total hemoglobin, multiply by 0.01) within the past 6 months. During this period, enrichment was permitted but not required. To eliminate potential spectrum bias caused by enrichment, the study analysis plan included enrichment correction analyses by evaluating performance using disease prevalence statistics of the study population enrolled sequentially.
Image Acquisition and Reference Standard
Two-field retinal CFP images (1 disc-centered and 1 macula-centered) were taken for each eye (Canon CR-2 AF or Canon CR-2 Plus AF; Canon USA Inc). The images were submitted to the cloud for analysis.
After 2-field CFP images were captured, participants underwent dilation followed by 4-wide-field stereoscopic CFP imaging in accordance with the Wisconsin Reading Center (FPRC)25 imaging protocol by FPRC-certified staff at all sites. Instances in which a participant’s pupil did not dilate or media opacities resulted in poor-quality images or narrow field of view (<45°) were noted by imaging staff. The dilate-if-needed imaging protocol allowed the inclusion of disc and macula-centered images obtained following dilation.
The AI system results were compared with the clinical reference standard of Early Treatment Diabetic Retinopathy Study (ETDRS) grading of 4-wide-field stereoscopic dilated fundus photographs (equivalent to 7-field 30° ETDRS photographs) by the FPRC.26,27 Two independent certified graders masked to the AI system’s results examined the images using standardized procedures to establish the reference standards and provide the ETDRS level, which was translated to mtmDR and vtDR. Between-grader differences exceeding prespecified criteria were adjudicated by a third more senior grader.
Per the FPRC, the reference standard mtmDR was considered positive if the reading center determined an ETDRS level greater than or equal to 35 (but not equal to 90) and/or the presence of clinically significant macular edema (CSME) was detected. The reference standard mtmDR was considered negative if an ETDRS level less than or equal to 20 was given and CSME was absent. The reference standard vtDR was considered to be positive if the reading center determined an ETDRS level greater than or equal to 53 (but not equal to 90) and/or presence of CSME was determined. The reference standard vtDR was considered negative if the reading center determined an ETDRS level less than or equal to 47 and CSME was determined to be absent.
Outcome Measures
Primary outcome measures included the sensitivity and specificity of the AI system in identifying eyes with mtmDR or vtDR by 2-field undilated CFP vs the FPRC reference standard. Secondary outcome measures included the evaluation of imageability, sensitivity, and specificity of the AI system vs the reference standard using the dilate-if-needed protocol, comparison of sequential vs enriched enrollment populations, worst-case imputation, and safety outcomes. Additional prespecified analyses (eAppendix in Supplement 1) were performed on a subset of participants determined using FDA-specified criteria (eFigure 1 in Supplement 1) to support FDA clearance. Participant baseline characteristics per FDA-specified analysis are presented in eTable 1 in the Supplement 1, the performance of the AI system in eTable 2 and eTable 3 in Supplement 1, and imageability in eTable 4 and eTable 5 in Supplement 1.
Statistical Analysis
Results for mtmDR and vtDR were examined independently. Sensitivity was defined as the accuracy among positive findings per the clinical reference standard, calculated as the proportion of eyes with positive findings per the reference standard that also test positive with the AI system. Specificity was defined as the accuracy among negative findings per the clinical reference standard, calculated as the proportion of eyes with negative findings per the reference standard that also test negative with the AI system. Imageability was defined as the percentage of eyes that received a disease detection result from the AI system (positive or negative) among all images determined gradable by the FPRC.
A prespecified enrichment correction analysis to adjust for any potential spectrum bias introduced by transitioning from sequential enrollment to the enrichment-permitted period evaluated performance using disease prevalence statistics of the study population enrolled sequentially. Enrichment-corrected accuracies were computed as prevalence-weighted sum of accuracies at each DR severity level.
Hypothesis tests with 1-sided 2.5% type I error for the null hypotheses (sensitivity ≤80% and specificity ≤77.5%) were designed per prespecified regulatory requirements and assessed using methods for correlated binary data.28 Alternative hypotheses of 90.0% for sensitivity and 82.5% for specificity were established. Statistical analyses in this study were conducted using Python, version 2.7 (Python Software Foundation) and NumPy, version, 1.11.3 (NumPy). Significance threshold was P=.025.
Results
Of 942 consenting individuals, 915 participants (1830 eyes) met eligibility criteria and 893 participants (1786 eyes) (intent-to-screen) completed the study according to protocol (Figure 2). A total of 22 participants did not complete 2-field imaging for the AI system analysis and/or 4-wide field imaging and hence were excluded. Of the intent-to-screen eyes, 1701 were analyzable for mtmDR and 1677 were analyzable for vtDR under the dilate-if-needed protocol. Mean (SD) participant age was 53.9 (15.2) years (median, 56; range, 18-88 years). Of 1786 individuals in the overall cohort, 206 (23.1%) had type 1 diabetes. Race and ethnicity groups represented in the analyzable cohort were American Indian or Alaska Native (3 [0.3%]), Asian (22 [2.5%]), Black or African American (159 [17.8%]), Native Hawaiian or other Pacific Islander (4 [0.4%]), White (655 [73.3%]), and Other (50 [5.6%]) (eTable 1 in Supplement 1). A total of 444 participants were women (49.7%) and 449 were men (50.3%). No notable differences in age, ethnicity, and race were identified within the analyzable and nonanalyzable cohorts (Table 1). Complete characteristics are included in eTable 1 in Supplement 1.
Table 1. Demographic Characteristics for Analyzable (N = 1701) and Nonanalyzable (N = 85) Intent-to-Screen Eyes.
Subgroup | Eyes, No. (%) | |
---|---|---|
Analyzable (n = 1701) | Nonanalyzable (n = 85) | |
Age, y | ||
<65 | 1278 (75.1) | 42 (49.4) |
≥65 | 423 (24.9) | 43 (50.6) |
Sex | ||
Men | 853 (50.1) | 45 (52.9) |
Women | 848 (49.9) | 40 (47.1) |
Ethnicitya | ||
Hispanic/Latino | 374 (22.0) | 22 (25.9) |
Non-Hispanic/Latino | 1327 (78.0) | 63 (74.1) |
Racea | ||
American Indian or Alaska Native | 6/ (0.4) | 0 |
Asian | 38 (2.2) | 6 (7.1) |
Black or African American | 301 (17.7) | 17 (20.0) |
Native Hawaiian or other Pacific Islander | 8 (0.5) | 0 |
White | 1251 (73.5) | 59 (69.4) |
Other | 97 (5.7) | 3 (3.5) |
Race and ethnicity were self-reported; Other category did not specify groups.
Undilated and Dilate-if-Needed Imaging Protocols
With the undilated imaging protocol, the AI system exceeded the prespecified superiority end points for both sensitivity (>90.0%) and specificity (>82.5%) in detecting both mtmDR and vtDR. For mtmDR, the AI system detected 273 of 286 eyes identified as positive by the FPRC, for a sensitivity of 95.5% (95% CI, 92.4%-98.5%), and 1054 of 1240 eyes were identified as negative by the FPRC, for a specificity of 85.0% (95% CI, 82.6%-87.4%) (Table 2). For vtDR, the AI system detected 58 of 61 eyes identified as positive by the FPRC, for a sensitivity of 95.1% (95% CI, 90.1%-100%), and 1288 of 1447 eyes were identified as negative for vtDR, for a specificity of 89.0% (95% CI, 87.0%-91.1%) (Table 3).
Table 2. EyeArt Performance for Detecting mtmDR Using Undilated and Dilate-if-Needed Protocolsa,b.
Variable | mtmDRc,d | |||
---|---|---|---|---|
Undilated protocol | Dilate-if-needed protocol | |||
Observed (95% CI) [No./total No.] | Enrichment corrected (95% CI) | Observed (95% CI) [No./total No.] | Enrichment corrected (95% CI) | |
Sensitivity | 95.5 (92.4-98.5) [273/286] | 95.5 (92.6-97.7) | 95.5 (92.6-98.4) [296/310] | 95.5 (92.9-97.7) |
Specificity | 85.0 (82.6-87.4) [1054/1240] | 87.7 (86.0-89.5) | 85.3 (83.0-87.5) [1186/1391] | 87.8 (86.3-89.5) |
Imageability | 87.4 (85.2-89.6) [1526/1746] | 87.6 (85.0-89.3) | 97.4 (96.4-98.5) [1701/1746] | 97.7 (96.4-98.3) |
PPV | 59.5 (53.9-63.9) [273/459] | 62.7 (57.8-64.7) | 59.1 (53.8-64.4) [296/501] | 62.8 (58.1-64.7) |
NPV | 98.8 (98.2-99.4) [1054/1067] | 98.9 (98.3-99.5) | 98.8 (98.2-99.5) [1186/1200] | 98.9 (98.4-99.5) |
Abbreviations: mtmDR, more-than-mild diabetic retinopathy; NPV, negative predictive value; PPV, positive predictive value; vtDR, vision-threatening diabetic retinopathy.
EyeArt is an artificial intelligence system for autonomous detection of mtmDR and vtDR.
Enrichment-corrected estimates are adjusted for prevalence.
The 95% CIs were estimated using clustered bootstrap to account for the correlation between eyes.
The undilated protocol included only undilated images and the dilate-if-needed protocol included images obtained following dilation for a small fraction of cases.
Table 3. EyeArt Performance for Detecting vtDR Using Undilated and Dilate-if-Needed Protocolsa,b.
Variable | vtDRc,d | |||
---|---|---|---|---|
Undilated protocol | Dilate-if-needed protocol | |||
Observed (95% CI) [No./total No.] | Enrichment corrected (95% CI) | Observed (95% CI) [No./total No.] | Enrichment corrected (95% CI) | |
Sensitivity | 95.1 (90.1-100) [58/61] | 96.9 (91.2-100) | 95.2 (90.4-100) [60/63] | 97.0 (91.2-100) |
Specificity | 89.0 (87.0-91.1) [1288/1447] | 90.0 (89.2-91.5) | 89.5 (87.6-91.4) [1444/1614] | 90.1 (89.4-91.5) |
Imageability | 87.6 (85.4-89.8) [1508/1721] | 87.8 (85.0-89.3) | 97.4 (96.4-98.5) [1677/1721] | 97 .7 (96.4-98.3) |
PPV | 26.7 (19.5-33.0) [58/217] | 29.6 (24.4-29.9) | 26.1 (19.6-32.6) [60/230] | 29.9 (24.7-30.1) |
NPV | 99.8 (99.5-100) [1288/1291] | 99.8 (99.6-100) | 99.8 [99.5-100) [1444/1447] | 99.9 (99.6-100) |
Abbreviations: mtmDR, more-than-mild diabetic retinopathy; NPV, negative predictive value; PPV, positive predictive value; vtDR, vision-threatening diabetic retinopathy.
EyeArt is an artificial intelligence system for autonomous detection of mtmDR and vtDR.
Enrichment-corrected estimates are adjusted for prevalence.
The 95% CIs were estimated using clustered bootstrap to account for the correlation between eyes.
The undilated protocol included only undilated images and the dilate-if-needed protocol included images obtained following dilation for a small fraction of cases.
Of participants with gradable images under the dilate-if-needed protocol, mtmDR was detected by the AI system in 296 of 310 eyes identified by the FPRC, for a sensitivity of 95.5% (95% CI, 92.6%-98.4%), indicating only 14 false-negative results (all of which had mild nonproliferative DR). The AI system correctly identified 1186 of 1391 eyes found to be negative for mtmDR by the FPRC, for a specificity of 85.3% (95% CI, 83.0%-87.5%) (Table 2). Of the 205 eyes with false-positive results, 141 eyes (68.8%) were graded by the FPRC as having mild nonproliferative DR or other non-DR conditions; 31.3% (546 of 1746) of the eyes were referred if ungradable eyes were put together with the disease-positive eyes.
Sensitivity and specificity of the AI system vs the FPRC were similar for vtDR. The AI system detected vtDR in 60 of 63 eyes positively identified by the FPRC, for a sensitivity of 95.2% (95% CI, 90.4%-100%), and correctly identified 1444 of 1614 eyes graded as negative for vtDR by the FPRC, for a specificity of 89.5% (95% CI, 87.6%-91.4%) (Table 3). Of the 3 false-negative vtDR identifications, 2 were identified as positive for mtmDR by the AI system and would have received a referral per protocol regardless. Of 170 false-positive eyes, 131 eyes (77.1%) were graded by the FPRC as having mild nonproliferative DR or other non-DR conditions.
Enrichment correction analysis was conducted to correct for enrichment that was allowed during the enrichment-permitted period. After this enrichment correction, for mtmDR detection the sensitivity was 95.5% (95% CI, 92.9%-97.7%) and specificity was 87.8% (95% CI,86.3%-89.5%), and for vtDR detection the sensitivity was 97.0% (95% CI, 91.2%-100%) and specificity was 90.1% (95% CI, 89.4%-91.5%). Enrichment-corrected imageability outcomes were similar for mtmDR (97.7%; 95% CI, 96.4%-98.3%) and vtDR (97.7%; 95% CI, 96.4%-98.3%) (Table 2).
Positive predictive values, indicating the percentage of eyes with true mtmDR or vtDR per FPRC among those with a positive AI result, were 62.8% (95% CI, 58.1%-64.7%) for mtmDR and 29.9% (95% CI, 24.7%-30.1%) for vtDR in the enrichment-corrected population. Negative predictive values, indicating the percentage of eyes without FPRC mtmDR or vtDR among those with a negative AI result, were 98.9% (95% CI, 98.4%-99.5%) for mtmDR and 99.9% (95% CI, 99.6%-100%) for vtDR (Table 3).
Imageability
Of the 1746 eyes whose images were rated as gradable for mtmDR by the FPRC, the AI system successfully graded 1701 for an imageability of 97.4% (95% CI, 96.4%-98.5%) (Table 2) under the dilate-if-needed protocol. A total of 1526 of 1746 eyes (87.4%; 95% CI, 85.2%-89.6%) received an mtmDR detection result for the AI system using 2-field CFP without dilation (Table 2). All eyes with an ETDRS level greater than or equal to 43 were correctly identified as having mtmDR by the AI system.
Similarly, imageability was high in the analysis of vtDR under the dilate-if-needed protocol. Of 1721 eyes rated as gradable by the FPRC, 1677 were gradable by the AI system for an imageability of 97.4% (95% CI, 96.4%-98.5%). Of eyes that received 2-field retinal imaging for AI analysis, 1508 of 1721 (87.6%; 95% CI, 85.4%-89.8%) did not require dilation to obtain a vtDR detection result (Table 3).
Further Analyses
No notable differences were observed between disease prevalence (eTables 2 and 3 in Supplement 1) and the AI system performance when comparing primary care with eye care sites. Both site types demonstrated similar sensitivity, specificity, and imageability for mtmDR and vtDR (eTable 6 in Supplement 1).
Imputation analysis classified all eyes with images ungradable by the AI system as the opposite of the reference standard (worse-case imputation) or the same as the reference standard (best-case imputation). Outcomes for both mtmDR and vtDR are included in eTable 7 in Supplement 1. In brief, the worst-case imputation of sensitivity was 90.8% (95% CI, 86.7%-94.9%) and, of specificity, 83.5% (95% CI, 81.2%-85.9%) for mtmDR and 84.5% (95% CI, 74.7%-94.3%) of sensitivity and 87.5% (95% CI, 85.5%-89.6%) of specificity for vtDR. No notable differences were observed between the imputed outcomes and those of the per-protocol analysis. No adverse events were reported during this study.
Baseline characteristics of the FDA-specified analysis subgroup of 655 participants (eFigure 1 in Supplement 1) are included in eTable 1 in Supplement 1. Full results from the FDA-specified analyses of the subgroup population are included in eTables 2-5 in Supplement 1.
Briefly, in participants from the FDA-specified analysis (eAppendix in Supplement 1), the sensitivity of the AI system in detecting mtmDR was 96.0% (95% CI, 89.4%-100%) and the specificity was 87.7% (95% CI, 83.9%-91.2%). Similarly, sensitivity of the AI system in detecting vtDR was 92.3% (95% CI, 70.0%-100%) and the specificity was 94.4% (95% CI, 91.7%-97.0%). At primary care sites, the sensitivity of the AI system for mtmDR was 100% (95% CI, 74.1%-100%) and the specificity was 92.0% (95% CI, 85.1%-97.5%). Comparable results were observed for vtDR, with a sensitivity of 100% (95% CI, 51.0%-100%) and specificity of 97.5% (95% CI, 93.4%-100%) (eTables 2 and 3 in Supplement 1). Imageability using the AI system at primary care sites where most operators had no prior ophthalmic imaging experience was 97% under the dilate-if-needed protocol and 90% in the first attempt without dilation. For comparison, the AI system’s imageability at eye care sites was 98% (eTables 4 and 5 in Supplement 1).
Discussion
In this prospective multicenter cross-sectional diagnostic study, use of the EyeArt Automated DR Detection System in both primary care and eye care centers compared favorably with the reference standard of 4-wide field stereoscopic images and reading center assessment in detecting both mtmDR and vtDR in patients older than 18 years with diabetes. The AI system consistently met the predetermined sensitivity and specificity end points for detection of mtmDR and vtDR using 2-field CFP imaging.
Overall imageability of the AI system was high under the dilate-if-needed protocol, yet also high without dilation. The rate of cases classified as ungradable by the AI system was 12.5% without dilation and 2.7% under the dilate-if-needed protocol, which is consistent with the rates of human graders (10%-15%).15,29 When the AI system’s images are considered ungradable, physician referral is indicated for further examination to minimize missed diagnoses, reducing any risk of undiagnosed retinal findings. Both the per-protocol and FDA-specified analysis cohorts showed high levels of sensitivity, specificity, and imageability across study site types and enrollment categories, providing added validity for the entire cohort and its findings.
Ease-of-use in primary care is important for application of an autonomous AI system because the intended user population includes technicians and staff with no prior retinal imaging experience. The findings of this study show this ease-of-use; the AI system’s performance at primary care sites, which included this user base, was good and comparable to the overall study. The high imageability for these operators suggests that, with standardized training, reliable disease detection results can be obtained by staff without prior retinal imaging experience. In addition, the need for dilation in only a few participants allows examinations to be more easily performed in non–eye care centers and/or in patients who refuse dilation.
Use of point-of-care DR screening with the AI system is especially helpful for triage of 2 types of patients: those not requiring specialist referrals and those with vtDR. The referral rate in this study was 31.3% (546 of 1746) when referrals for disease were combined with ungradable eyes. Therefore, most patients do not require referrals, reducing the diagnostic burden on eye care specialists and time costs for patients. Second, the unique clearance by the FDA to identify vtDR allows for accelerated referrals to more rapidly confirm and treat potentially vision-threatening disease. This is important in triaging those patients to eye care specialists within the guideline-recommended referral time for urgent treatment.9,29
The automated AI screening system eliminates other patient-reported barriers inhibiting the routine completion of an eye examination, including high cost and limited access to eye care specialists.4 In this study, comparable efficacy was demonstrated by the AI system across primary care and eye care facilities. Therefore, patients can receive prompt, accurate, and consistent detection of mtmDR or vtDR at their facility of choice without specialist involvement. Furthermore, this prompt detection at primary care may help eliminate the disparity in care for patients who live far from eye care specialists. The rapid on-site eye-level DR detection by the AI system enables prompt diagnosis allowing for same-day referral requests for follow-up care, improving the chances of preventing vision loss. In addition, the ability for patients to receive an accurate diagnosis at nonspecialist sites can lower the cost to the patient and health system. Elimination of these patient-identified barriers can improve overall adherence to annual screenings and may result in decreased vision loss through earlier identification of vtDR. The study required a small number of eyes to be dilated to facilitate disease detection results, whereas a large proportion of participants (87.4%) did not require dilation.
To our knowledge, only one other automated DR detection system has been examined in a prospective, multicenter study.12 The mtmDR findings reported on the IDx-DR system were 87.2% for sensitivity, 90.7% for specificity, and 96.1% for imageability rate after enrichment correction in a cohort of 819 participants. Using the EyeArt system, we found sensitivity of 95.5%, specificity of 87.8%, and imageability rate of 97.7% for mtmDR after enrichment correction in a cohort of 893 patients. Both studies used the same reference standard: ETDRS grading of 4-wide field stereoscopic dilated fundus photographs by FPRC graders. The IDx-DR system is indicated only for detection of mtmDR, whereas the EyeArt system is indicated for detection of both mtmDR and vtDR.
Strengths and Limitations
Study strengths include the moderately sized, diverse analysis population and the inclusion of study centers representative of the intended use population. In addition, the analysis of both undilated and dilated images shows the applicability of the EyeArt system for locations or situations in which dilation is not possible or desired.
This study has limitations. A limitation of the study is that optical coherence tomography was not used to determine CSME as an alternative reference standard. However, stereo measurement as used in this study from CFP is known to be an accurate, sufficient, and widely accepted clinical reference standard, including by the FDA.27
Conclusions
This prospective multicenter cross-sectional diagnostic study observed safe and accurate clinical performance of the EyeArt Automated DR Detection System in detecting both mtmDR and vtDR without physician assistance. This AI system may broadly improve DR screening and monitoring in people with diabetes by non–eye care professionals to safely and reliably detect referable DR in clinical practice.
References
- 1.Ogurtsova K, da Rocha Fernandes JD, Huang Y, et al. IDF Diabetes Atlas: global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res Clin Pract. 2017;128:40-50. doi: 10.1016/j.diabres.2017.03.024 [DOI] [PubMed] [Google Scholar]
- 2.Yau JW, Rogers SL, Kawasaki R, et al. ; Meta-analysis for Eye Disease (META-EYE) Study Group . Global prevalence and major risk factors of diabetic retinopathy. Diabetes Care. 2012;35(3):556-564. doi: 10.2337/dc11-1909 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Flaxel CJAR, Bailey ST, Fawzi A, Lim JI, Vemulakonda GA. Diabetic retinopathy preferred practice pattern. Ophthalmology. 2020;127(1):66-P145. doi: 10.1016/j.ophtha.2019.09.025 [DOI] [PubMed] [Google Scholar]
- 4.International Federation on Ageing . International Agency for the Prevention of Blindness; International Diabetes Federation. The Diabetic Retinopathy Barometer Report Global Findings. Accessed October 1, 2021. https://www.iapb.org/wp-content/uploads/DR-Global-Report-1.pdf
- 5.Centers for Disease Control and Prevention . Diabetes Report Card 2017. US Dept of Health and Human Services; 2018. [Google Scholar]
- 6.Shi Q, Zhao Y, Fonseca V, Krousel-Wood M, Shi L. Racial disparity of eye examinations among the U.S. working-age population with diabetes: 2002-2009. Diabetes Care. 2014;37(5):1321-1328. doi: 10.2337/dc13-1038 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Lin S, Ramulu P, Lamoureux EL, Sabanayagam C. Addressing risk factors, screening, and preventative treatment for diabetic retinopathy in developing countries: a review. Clin Exp Ophthalmol. 2016;44(4):300-320. doi: 10.1111/ceo.12745 [DOI] [PubMed] [Google Scholar]
- 8.Mehranbod C, Genter P, Serpas L, et al. Automated reminders improve retinal screening rates in low income, minority patients with diabetes and correct the African American disparity. J Med Syst. 2019;44(1):17. doi: 10.1007/s10916-019-1510-3 [DOI] [PubMed] [Google Scholar]
- 9.Daskivich LP, Vasquez C, Martinez C Jr, Tseng CH, Mangione CM. Implementation and evaluation of a large-scale teleretinal diabetic retinopathy screening program in the Los Angeles County Department of Health Services. JAMA Intern Med. 2017;177(5):642-649. doi: 10.1001/jamainternmed.2017.0204 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Garg S, Davis RM. Diabetic retinopathy screening update. Clin Diabetes. 2009;27(4):140-145. doi: 10.2337/diaclin.27.4.140 [DOI] [Google Scholar]
- 11.Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017;124(7):962-969. doi: 10.1016/j.ophtha.2017.02.008 [DOI] [PubMed] [Google Scholar]
- 12.Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. doi: 10.1038/s41746-018-0040-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Agurto C, Barriga ES, Murray V, et al. Automatic detection of diabetic retinopathy and age-related macular degeneration in digital fundus images. Invest Ophthalmol Vis Sci. 2011;52(8):5862-5871. doi: 10.1167/iovs.10-7075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Bhaskaranand M, Ramachandra C, Bhat S, et al. The value of automated diabetic retinopathy screening with the EyeArt system: a study of more than 100,000 consecutive encounters from people with diabetes. Diabetes Technol Ther. 2019;21(11):635-643. doi: 10.1089/dia.2019.0164 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fleming AD, Goatman KA, Philip S, Prescott GJ, Sharp PF, Olson JA. Automated grading for diabetic retinopathy: a large-scale audit using arbitration by clinical experts. Br J Ophthalmol. 2010;94(12):1606-1610. doi: 10.1136/bjo.2009.176784 [DOI] [PubMed] [Google Scholar]
- 16.Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. doi: 10.1001/jama.2016.17216 [DOI] [PubMed] [Google Scholar]
- 17.Tufail A, Rudisill C, Egan C, et al. Automated diabetic retinopathy image assessment software: diagnostic accuracy and cost-effectiveness compared with human graders. Ophthalmology. 2017;124(3):343-351. doi: 10.1016/j.ophtha.2016.11.014 [DOI] [PubMed] [Google Scholar]
- 18.Davis RM, Fowler S, Bellis K, Pockl J, Al Pakalnis V, Woldorf A. Telemedicine improves eye examination rates in individuals with diabetes: a model for eye-care delivery in underserved communities. Diabetes Care. 2003;26(8):2476. doi: 10.2337/diacare.26.8.2476 [DOI] [PubMed] [Google Scholar]
- 19.Bhaskaranand M, Ramachandra C, Bhat S, et al. Automated diabetic retinopathy screening and monitoring using retinal fundus image analysis. J Diabetes Sci Technol. 2016;10(2):254-261. doi: 10.1177/1932296816628546 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Ting DSW, Cheung CY, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223. doi: 10.1001/jama.2017.18152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lee AY, Yanagihara RT, Lee CS, et al. Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care. 2021;44(5):1168-1175. doi: 10.2337/dc20-1877 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wong TY, Sun J, Kawasaki R, et al. Guidelines on Diabetic Eye Care: The International Council of Ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology. 2018;125(10):1608-1622. doi: 10.1016/j.ophtha.2018.04.007 [DOI] [PubMed] [Google Scholar]
- 23.International Council for Harmonisation . Guideline for good clinical practice E6. Accessed November 19, 2019. https://www.ich.org/products/guidelines/efficacy/efficacy-single/article/good-clinical-practice.html
- 24.World Medical Association . World Medical Association Declaration of Helsinki—ethical principles for medical research involving human subjects. Accessed November 21, 2019. https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/ [DOI] [PubMed]
- 25.The Wisconsin Reading Center . Department of Ophthalmology and Visual Sciences, University of Wisconsin School of Medicine and Public Health. Accessed October 20, 2021. https://www.ophth.wisc.edu/research/wrc/
- 26.Gangaputra S, Almukhtar T, Glassman AR, et al. ; Diabetic Retinopathy Clinical Research Network . Comparison of film and digital fundus photographs in eyes of individuals with diabetes mellitus. Invest Ophthalmol Vis Sci. 2011;52(9):6168-6173. doi: 10.1167/iovs.11-7321 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Early Treatment Diabetic Retinopathy Study Research Group . Grading diabetic retinopathy from stereoscopic color fundus photographs—an extension of the modified Airlie House classification: ETDRS report number 10. Ophthalmology. 1991;98(5)(suppl):786-806. doi: 10.1016/S0161-6420(13)38012-9 [DOI] [PubMed] [Google Scholar]
- 28.Jung SH, Kang SH, Ahn C. Sample size calculations for clustered binary data. Stat Med. 2001;20(13):1971-1982. doi: 10.1002/sim.846 [DOI] [PubMed] [Google Scholar]
- 29.Kim HM, Lowery JC, Kurtz R. Accuracy of digital images for assessing diabetic retinopathy. J Diabetes Sci Technol. 2007;1(4):531-539. doi: 10.1177/193229680700100411 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.