This diagnostic study investigates the use of a precision-modeling approach in reducing the number of eye examinations required to screen for retinopathy of prematurity among infants from low- to middle-income countries.
Key Points
Question
Can the number of eye examinations required to screen for retinopathy of prematurity (ROP) in a population be reduced using a precision-modeling approach based on demographics and artificial intelligence (AI) in low- and middle-income countries (LMICs)?
Findings
In this diagnostic study of 762 infants in India, 330 infants in Nepal, and 317 infants in Mongolia, using an AI model based on gestational age and vascular severity, all infants who would eventually develop treatment-requiring ROP weeks before clinical diagnosis were identified and with fewer examinations.
Meaning
Implementation of this model could reduce the number of examinations required and limit the risk of late treatment in an ROP-screening population in LMICs.
Abstract
Importance
Retinopathy of prematurity (ROP) is a leading cause of preventable blindness that disproportionately affects children born in low- and middle-income countries (LMICs). In-person and telemedical screening examinations can reduce this risk but are challenging to implement in LMICs owing to the multitude of at-risk infants and lack of trained ophthalmologists.
Objective
To implement an ROP risk model using retinal images from a single baseline examination to identify infants who will develop treatment-requiring (TR)–ROP in LMIC telemedicine programs.
Design, Setting, and Participants
In this diagnostic study conducted from February 1, 2019, to June 30, 2021, retinal fundus images were collected from infants as part of an Indian ROP telemedicine screening program. An artificial intelligence (AI)–derived vascular severity score (VSS) was obtained from images from the first examination after 30 weeks’ postmenstrual age. Using 5-fold cross-validation, logistic regression models were trained on 2 variables (gestational age and VSS) for prediction of TR-ROP. The model was externally validated on test data sets from India, Nepal, and Mongolia. Data were analyzed from October 20, 2021, to April 20, 2022.
Main Outcomes and Measures
Primary outcome measures included sensitivity, specificity, positive predictive value, and negative predictive value for predictions of future occurrences of TR-ROP; the number of weeks before clinical diagnosis when a prediction was made; and the potential reduction in number of examinations required.
Results
A total of 3760 infants (median [IQR] postmenstrual age, 37 [5] weeks; 1950 male infants [51.9%]) were included in the study. The diagnostic model had a sensitivity and specificity, respectively, for each of the data sets as follows: India, 100.0% (95% CI, 87.2%-100.0%) and 63.3% (95% CI, 59.7%-66.8%); Nepal, 100.0% (95% CI, 54.1%-100.0%) and 77.8% (95% CI, 72.9%-82.2%); and Mongolia, 100.0% (95% CI, 93.3%-100.0%) and 45.8% (95% CI, 39.7%-52.1%). With the AI model, infants with TR-ROP were identified a median (IQR) of 2.0 (0-11) weeks before TR-ROP diagnosis in India, 0.5 (0-2.0) weeks before TR-ROP diagnosis in Nepal, and 0 (0-5.0) weeks before TR-ROP diagnosis in Mongolia. If low-risk infants were never screened again, the population could be effectively screened with 45.0% (India, 664/1476), 38.4% (Nepal, 151/393), and 51.3% (Mongolia, 266/519) fewer examinations required.
Conclusions and Relevance
Results of this diagnostic study suggest that there were 2 advantages to implementation of this risk model: (1) the number of examinations for low-risk infants could be reduced without missing cases of TR-ROP, and (2) high-risk infants could be identified and closely monitored before development of TR-ROP.
Introduction
Although visual loss from retinopathy of prematurity (ROP) is preventable, ROP remains a leading cause of childhood blindness and is increasing in low- and middle income countries (LMICs).1,2,3,4,5 In the US, owing to the exceedingly low incidence of ROP in less premature babies, infants born after 30 weeks’ gestational age (GA) and with birth weight (BW) greater than 1500 g do not require ROP screening.3 In LMICs, however, the situation is more challenging. As neonatal care units (NCUs) are increasing in number and capability, neonatal mortality is decreasing; however, primary prevention of ROP through strict oxygen monitoring and titration can be challenging owing to limitations in human and material resources. In addition, there is a higher incidence of preterm birth in LMICs. These factors result in both a higher incidence and larger population at risk for ROP in regions where the number of ophthalmologists per capita is lower.6,7,8,9
India has one of the highest rates of premature births in the world and has experienced an increased ROP incidence over the last 2 decades.7,10,11 Indian screening guidelines recommend screening infants born with GA less than 37 weeks or BW less than 2000 g.10,11 In parts of the country, especially remote locations, this has resulted in an imbalance between the number of infants who need ROP screening and the availability of proximate, trained ophthalmologists to screen those infants. This problem is not unique to India but is disproportionately common across LMICs and has been referred to as the third ROP epidemic, with roughly 50 000 infants becoming severely visually impaired or blind annually.6,7,10
Telemedicine has emerged as a force multiplier to extend the geographic reach and efficiency of ophthalmologists who screen for ROP with several successful high-volume ROP telescreening programs in India, Nepal, and Mongolia.12,13,14,15,16,17 However, there remain challenges. Telemedicine programs typically operate with weekly screening examinations to ensure that no disease is missed.10,11,12,15,16 Thus, the already increased number of at-risk infants in LMICs must also be screened more frequently, leading to a relatively larger number of examinations in regions with fewer available resources (ie, trained photographers and portable camera systems). Moreover, ROP screening guidelines err on the side of high sensitivity/low specificity, meaning that roughly 90% of those screened (weekly) never require treatment.1,2,7,9,10,11 Unnecessary examinations inefficiently use resources and subject premature infants to unnecessary physiological stress.18,19,20,21
We previously developed a risk model for a population of US patients to better identify infants at risk of developing treatment-requiring (TR)–ROP.22 This model incorporated 2 variables: GA at birth and an artificial intelligence (AI)−based vascular severity score (VSS) derived from a fundus image at 32 to 33 weeks’ postmenstrual age (PMA). This time point was based on prior work suggesting that infants who would eventually develop TR-ROP could be identified as early as 4 to 6 weeks before diagnosis.23,24 In 2 US populations, we found 100.0% sensitivity and 48.9% to 80.8% specificity for TR-ROP, with identification of at-risk infants occurring an average of 4 weeks before diagnosis.22 However, it was presumed that this model would not generalize well to LMICs, where the potential effect could be much greater, as the epidemiology is substantially different.22 To address the challenges of ROP screening in LMICs, we retrained and evaluated this risk model using previously published methods, but tailored it to the LMIC setting using a data set acquired by an Indian ROP telemedicine screening program and externally validated it on 3 separate ROP screening populations from India, Mongolia, and Nepal.
Methods
This study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guidelines and was approved both by the institutional review boards at Oregon Health & Science University, Portland, and at each ROP screening program included in this study. Analysis of the Retinopathy of Prematurity–Save Our Sight (ROPE-SOS) data was performed under a waiver of informed consent for retrospective evaluation of imaging data obtained during routine clinical care. The waiver was provided by the Institutional Human Ethics Committee PSG Institute of Medical Sciences and Research, Coimbatore, Tamil Nadu, India. All institutions abided by the tenets of the Declaration of Helsinki.
Data Sets
Between February 1, 2019, and December 31, 2020, the multicenter ROPE-SOS cohort study in India longitudinally screened infants with BW of 2000 g or less or GA less than 37 weeks for ROP at the Aravind Eye Hospitals (eFigure in the Supplement). Trained technicians captured multiple views of each retina using a Retcam Shuttle (Natus). ROP diagnoses were determined via telemedical review by Indian clinicians using a web-based interface (iTELEGEN). Clinical comorbidities and demographics were recorded for each infant. This data set was used for model training and validation. A follow-up data set of infants was similarly collected between January 1, 2021, and June 30, 2021; it was used as a test data set.
Between December 1, 2015, and January 31, 2017, infants from Mongolia who were born with BW of 2000 g or less or GA less than 37 weeks were screened for ROP at the National Center for Maternal and Child Health in Ulaanbaatar, Mongolia.25 Screenings were conducted similarly to the ROPE-SOS cohort. Additionally, infants from Nepal were screened for ROP between October 1, 2016, and August 31, 2018, at 1 of 4 urban hospitals: Patan Hospital, Kanti Children’s Hospital, Paropakar Maternity and Women’s Hospital, and Tilganga Institute of Ophthalmology; all hospitals were located in Kathmandu, Nepal. In this cohort, infants were screened if they were born with a BW of 1700 g or less or GA less than 36 weeks. These 2 data sets were used as external test data sets.
For each eye examination, RetCam Shuttle (Natus, India), RetCam Portable (Natus, Mongolia), or Forus 3nethra neo (Forus Health, Nepal) images centered on the posterior retina (approximate field of view equal to zone I) were identified and selected using a previously trained optic disc detection algorithm.14 These images were analyzed by i-ROP DL, a deep learning algorithm that assigns a VSS from 1.0 to 9.0 based on the relative probability of a diagnosis of preplus or plus disease (manifestations of moderate to severe ROP).26,27 The mean VSS of all images collected from an eye was calculated.
Because the goal was to develop a predictive model—that is, a model that predicts whether an infant would develop TR-ROP in the future (rather than one that identifies disease in the present)—infants in the training data set who developed TR-ROP within 7 days of the first examination that occurred within the imaging window were excluded. The validation and test data sets contained all infants eligible for ROP screening, regardless of if or when they developed TR-ROP. The data set from February 1, 2019, to December 31, 2020, was partitioned 4:1 into training and validation data sets by participant identification number and stratified by TR-ROP diagnosis. The training data set was further partitioned (by participant identification number) into 5 equally sized partitions for 5-fold cross-validation. Race and ethnicity data were not gathered, as these studies were conducted in regions primarily composed of either Indian, Nepalese, or Mongolian babies.
Model Development
To identify the opening of the imaging window—the earliest PMA that images could be acquired and used in this model—the VSS of infants in the data set from February 1, 2019, to December 31, 2020, who eventually developed TR-ROP, was compared with that of infants who did not develop TR-ROP across a range of PMAs using box plots. Examinations where TR-ROP was revealed and/or treatment was performed were excluded from this analysis. Because ROP is diagnosed at the eye level (2 eyes in the same baby may not have the same degree of ROP), but babies (not eyes) are referred for treatment, we trained our model at the eye level but evaluated performance at the baby level. That is, the model was trained to generate eye-level predictions using the mean VSS of all images collected from an eye. However, during evaluation, the worse of the 2 predictions was used to create an infant-level prediction both to remove intereye correlations and output a more clinically interpretable result. To do so, 5-fold cross-validation was used to train 5 logistic regression models to predict future occurrences of TR-ROP using VSS and GA. Receiver operating characteristics (ROC) curves were computed for each model and, to find an optimal operating point, Youden J statistic was calculated at each operating point in the range (0-1; interval: 0.005) along each curve28:
J = Sensitivity + Specificity − 1. |
The highest operating point that maximized the J statistic (high sensitivity) was returned for each curve and the mean operating point of all 5 models was calculated.
A new logistic regression model was trained on all available training data (ie, all 5 training data partitions), and the operating point was set equal to the previously calculated mean operating point. This model and operating point were evaluated on the validation data set. Evaluations were conducted at the infant level (ie, if 1 or both eyes were predicted to develop TR-ROP, the infant was labeled as such). The same model and operating point were then evaluated on the 3 external test data sets.
Post Hoc Modeling of Impact on Population-Level Screening
The impact the model would have on the total number of examinations was calculated via the external Indian, Nepalese, and Mongolian test data sets. We assessed how many examinations could have been avoided if this model were implemented to the extreme (low-risk infants not screened after initially screening negative).
Statistical Analysis
Infant- or eye-level differences in demographics between treated and untreated infants were evaluated via assessment of 95% CIs through Welch 2-sample t test. Significant separation between box plots for imaging window identification was determined using notches, which extend
(1.58 × IQR)/n0.5
and roughly provide a 95% CI for comparing medians.29 For sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), 95% CIs were computed using the conservative Clopper-Pearson method.30 All tests, where applicable, were 2-sided and a cutoff for significance was set at .05. Analyses were conducted between October 20, 2021, and April 20, 2022, using the R language for statistical computing, version 4.1.0.
Results
Data Sets
A total of 3760 infants (median [IQR] postmenstrual age, 37 [5] weeks; 1950 male infants [51.9%]; 1810 female infants [48.1%]) were included in the study. Table 1 summarizes demographics and PMA for infants who did and did not develop TR-ROP, as well as the VSS at the first examination after 30 weeks’ PMA in the groups for the training, validation, and test data sets. Unsurprisingly, infants who developed TR-ROP had lower BW and GA at birth. The mean (SD) PMA at time of treatment was 34.0 (3.5) weeks in Indian data sets, 34.8 (2.6) weeks in Nepal data sets, and 35.5 (2.2) weeks in Mongolian test data sets. The mean (SD) number of usable images collected per eye per examination, for each data set, was 3.6 (2.5) in India, 5.8 (2.9) in Nepal, and 4.8 (1.6) in Mongolia. Notably, TR-ROP infants’ VSS at the first examination was higher in all 3 test data sets (though Nepal did not reach statistical significance), even though the diagnosis of TR-ROP often did not occur until much later. With the AI model, infants with TR-ROP were identified a median (IQR) of 2.0 (0-11) weeks before TR-ROP diagnosis in India, 0.5 (0-2.0) weeks in Nepal, and 0 (0-5.0) weeks in Mongolia.
Table 1. Data Set Demographics per Clinical Outcome.
Patient characteristic | Mean (SD) | Difference (95% CI) | |
---|---|---|---|
Not treated (n = 3633)a | Treated (n = 127)a | ||
Training data set | |||
Birth weight, g | 1734.2 (441.3) | 1276.3 (252.6) | 457.9 (356.1 to 559.6) |
Gestational age, wk | 33.5 (2.8) | 29.7 (2.2) | 3.8 (2.9 to 4.7) |
Vascular severity score | 2.4 (1.2) | 4.4 (2.5) | − 2.0 (−3.1 to −1.1) |
Postmenstrual age, wk | 38.4 (5.9) | 33.9 (2.2) | 4.5 (3.7 to 5.5) |
Weeks of life | 4.9 (5.3) | 4.1 (1.8) | 0.8 (0 to 1.5) |
Eyes, No. (%) | 3697 (98.7) | 47 (1.3) | NA |
Infants, No. (%) | 1848 (98.6) | 27 (1.4) | NA |
Validation data set | |||
Birth weight, g | 1716.0 (424.4) | 1213.6 (249.1) | 502.4 (404.5 to 601.0) |
Gestational age, wk | 33.5 (2.8) | 29.6 (1.8) | 4.0 (3.3 to 4.7) |
Vascular severity score | 2.3 (1.3) | 4.2 (1.8) | −1.9 (−3.0 to −1.5) |
Postmenstrual age, wk | 38.5 (5.3) | 33.1 (2.4) | 5.4 (4.0 to 6.9) |
Weeks of life | 5.0 (4.9) | 3.5 (1.7) | 1.5 (0.4 to 2.5) |
Eyes, No. (%) | 924 (97.2) | 27 (2.8) | NA |
Infants, No. (%) | 462 (97.1) | 14 (2.9) | NA |
Test data set (India) | |||
Birth weight, g | 1734.4 (466.7) | 1259.6 (319.4) | 474.8 (344.4 to 605.1) |
Gestational age, wk | 33.5 (2.9) | 29.0 (2.4) | 4.5 (3.5 to 5.5) |
Vascular severity score | 3.4 (1.8) | 6.9 (2.0) | −3.5 (−4.3 to −2.7) |
Postmenstrual age, wk | 37.2 (3.6) | 34.0 (3.5) | 3.2 (1.7 to 4.6) |
Weeks of life | 3.7 (2.4) | 5.1 (3.5) | −1.4 (−2.7 to 0) |
Eyes, No. (%) | 1472 (96.6) | 52 (3.4) | NA |
Infants, No. (%) | 735 (96.5) | 27 (3.5) | NA |
Test data set (Mongolia) | |||
Birth weight, g | 1544.9 (391.2) | 1374.9 (343.2) | 170.0 (64.7 to 275.3) |
Gestational age, wk | 30.6 (2.0) | 29.7 (2.1) | 0.9 (0.3 to 1.5) |
Vascular severity score | 2.5 (1.7) | 7.0 (1.8) | −4.5 (−5.1 to −4.0) |
Postmenstrual age, wk | 35.2 (2.1) | 35.5 (2.2) | 0.3 (−1.0 to 0.3) |
Weeks of life | 4.6 (2.0) | 5.9 (1.7) | −1.3 (−1.8 to −0.8) |
Eyes, No. (%) | 528 (83.7) | 103 (16.3) | NA |
Infants, No. (%) | 264 (83.3) | 53 (16.7) | NA |
Test data set (Nepal) | |||
Birth weight, g | 1971.5 (553.6) | 1229.2 (327.7) | 457.9 (399.6 to 1085.0) |
Gestational age, wk | 33.4 (2.5) | 28.7 (2.1) | 4.7 (2.6 to 6.9) |
Vascular severity score | 2.3 (0.9) | 4.9 (1.5) | −2.6 (−4.5 to −1.4) |
Postmenstrual age, wk | 38.4 (3.3) | 34.8 (2.6) | 3.6 (0.8 to 6.4) |
Weeks of life | 5.0 (2.2) | 6.2 (2.0) | −1.2 (−3.3 to 1.0) |
Eyes, No. (%) | 639 (98.5) | 10 (1.5) | NA |
Infants, No. (%) | 324 (98.2) | 6 (1.8) | NA |
Abbreviation: NA, not applicable.
Indicates number of participants, not eyes.
Model Development
The notches between box plots of the VSSs of infants who eventually developed TR-ROP vs those who did not ceased to overlap at 30 weeks PMA (Figure), suggesting their medians were significantly different. Thus, we used the VSS at the first examination on or after 30 weeks’ PMA. This time point is also clinically relevant, as ROP screening in the US does not occur before 30 weeks PMA and aggressive ROP (A-ROP) can occur before 4 weeks of life.
Figure. Vascular Severity Score vs Postmenstrual Age.
Beginning at approximately 30 weeks’ postmenstrual age, eyes from the India training data set that eventually developed treatment-requiring retinopathy of maturity and were treated had vascular severity scores that began to increase and diverge from those who did not.
The mean (SD) of the area under the ROC curves (AUROCs) computed during 5-fold cross-validation was 0.919 (0.034). A J score of 1.0 (perfect sensitivity) could be achieved in all models at a mean operating point of 0.01. This point was chosen to minimize the risk of a false negatives (missed cases of TR-ROP). The eye-level AUROC on the validation data set was 0.944; thus, we felt confident that we could perform infant-level evaluations on the external test data sets because taking the worse of the 2 eyes can only increase sensitivity (albeit at the cost of specificity). In the validation and test data sets, the diagnostic model had a sensitivity and specificity, respectively, for each of the data sets as follows: India, 100.0% (95% CI, 87.2%-100.0%) and 63.3% (95% CI, 59.7%-66.8%); Nepal, 100.0% (95% CI, 54.1%-100.0%) and 77.8% (95% CI, 72.9%-82.2%); and Mongolia, 100.0% (95% CI, 93.3%-100.0%) and 45.8% (95% CI, 39.7%-52.1%) (Tables 2 and 3). The estimate (SE) for intercept was 6.694 (1.894) (P = 4.1 × 10−4), for GA was −0.405 (0.060) (P = 1.2 × 10−11), and for VSS regression coefficients was 0.548 (0.078) (P = 2.5 × 10−12); all features were significant predictors. In comparison, for the US model, the mean (SE) for intercept was 10.956 (2.764) (P = 7.3 ×10−6), for GA was −0.552 (0.107) (P = 7.7 × 10−14), and for VSS regression coefficients was 0.410 (0.096) (P = 3.4 × 10−12).22
Table 2. Confusion Matrix of Model Predictions on Each Test Data Set.
Model prediction | Clinical outcome | |||||||
---|---|---|---|---|---|---|---|---|
Validation data set | Test data set | |||||||
India | Mongolia | Nepal | ||||||
Not TR | TR | Not TR | TR | Not TR | TR | Not TR | TR | |
Not TR | 354 | 0 | 465 | 0 | 121 | 0 | 252 | 0 |
TR | 108 | 14 | 270 | 27 | 143 | 53 | 72 | 6 |
Abbreviation: TR, treatment requiring.
Table 3. Summary Statistics of Model Predictions on Each Test Data Set.
Statistic | % (95% CI) | |||
---|---|---|---|---|
Validation data set | Test data set | |||
India | Mongolia | Nepal | ||
Sensitivity | 100.0 (76.8-100.0) | 100.0 (87.2-100.0) | 100.0 (93.3-100.0) | 100.0 (54.1-100.0) |
Specificity | 76.6 (72.5-80.4) | 63.3 (59.7-66.8) | 45.8 (39.7-52.1) | 77.8 (72.9-82.2) |
PPV | 11.5 (6.4-18.5) | 9.1 (6.1-13.0) | 27.0 (31.0-33.8) | 7.7 (2.9-16.0) |
NPV | 100.0 (99.0-100.0) | 100.0 (99.2-100.0) | 100.0 (97.0-100.0) | 100.0 (98.5-100.0) |
Abbreviations: NPV, negative predictive value; PPV, positive predictive value.
Post Hoc Modeling of Association With Population-Level Screening
The mean (SD) number of examinations for infants in the Indian, Nepalese, and Mongolian test data sets are reported in Table 4. Implementing this model with a single examination for infants predicted to be low risk would have reduced the overall number of examinations by 45.0% (664/1476) in India, 38.4% (151/393) in Nepal, and 51.3% (266/519) in Mongolia.
Table 4. Examination Reduction Analysis.
Measure | Mean (SD) | ||
---|---|---|---|
India | Nepal | Mongolia | |
Examinations received, No. | |||
Treated | 4.3 (2.4) | 1.5 (0.6) | 3.6 (1.8) |
Not treated | 3.6 (2.2) | 1.4 (0.6) | 2.2 (2.3) |
High risk | 4.4 (2.3) | 1.6 (0.8) | 3.1 (2.5) |
Low risk | 2.8 (1.8) | 1.3 (0.6) | 1.3 (1.0) |
Potential examination reduction, % | 45.0 | 38.4 | 51.3 |
Discussion
In this diagnostic study, we evaluated whether an image-based risk model could be optimized for populations of premature infants from LMICs.22 We found that this model, using just GA and VSS, could identify all at-risk infants up to 11 weeks before clinical diagnosis in 3 separate ROP screening cohorts in Asia. Further validation and implementation could lead to earlier diagnoses of severe ROP and reduce the overall screening burden in LMICs.
In LMICs, the epidemiology of ROP is different than that in high-income countries.10,13,14 Whereas infants born with BW greater than 1500 g or after 30 weeks of gestation are not screened in the US, heavier and older infants remain at risk for ROP in LMICs.10,11,13,31,32 This also raises the risk of developing A-ROP, which is more common in LMICs and can occur within 2 to 3 weeks of life.4,33 As a result, ROP screenings in LMICs start within this range, whereas examinations in the US are not indicated until at least 4 weeks of life.3,11 For all these reasons, we retrained an AI-based risk model on a large data set of images and clinical information from India, and chose an imaging window that was earlier than that of the US population.
Risk models have the potential to both reduce the screening burden and improve the early diagnosis of severe ROP if implemented effectively. There have been several attempts at developing risk models using a variety of clinical variables; however, even the most promising models have generalizability issues when evaluated on larger, more-diverse data sets, especially in the LMIC setting.22,34,35 One advantage of this approach is that it does not require multiple data points or measurements and, in the telemedical setting where digital images are part of clinical care, can be implemented and evaluated without any additional resources. Although this model demonstrated 100% sensitivity in all 3 test data sets, expectations should be tempered as the calculated 95% CIs were wide owing to relatively small numbers (Table 3). As suggested by Binenbaum et al,34 95% CIs should ideally be within 1.0% of one another (ie, 99.0%-100.0%) to be assured of model performance; however, this is challenging to achieve for low-prevalence diseases. Nonetheless, adoption of any model that changes the practice of ROP screening should be implemented and evaluated carefully to ensure that sensitivity, specificity, and potential examination reduction has not been overestimated or underestimated, especially because the risk of misestimation could result in blindness.
One advantage of ruling in all infants at risk of TR-ROP in advance is that this process facilitates early recognition of disease progression and timely treatment. In both this study and the prior one,22 we found that 100% of infants who eventually required treatment were screened positive by the models, weeks before requiring treatment. By focusing resources on those who screen positive, it is less likely TR-ROP will be missed or treated late, which is a common cause of adverse outcomes in ROP care. This is especially true for A-ROP, which progresses earlier and faster than typical TR-ROP.4,24,33
An advantage of ruling out more than 50% of infants is that low-risk infants could have fewer examinations, which can potentially reduce the screening burden for ophthalmologists and the physiological stress caused to infants during examination.18,19,20,21 Implementation of such a model ought to begin with reduced screening frequency in low-risk infants, rather than no further screening, to ensure that no cases of TR-ROP are missed. However, the specificity of this model requires further evaluation in various risk groups and may be even higher in lower-risk populations.22 Moreover, by using the VSS to monitor disease progression over time (Figure), it is likely that false positives could be identified (by their lack of disease progression) and screened less often as well.22
Limitations
There are several limitations that slow implementation of this concept. First, the i-ROP DL system was developed and has been predominantly validated for the RetCam camera.26 Although preliminary data suggest that the algorithm may also work on images from the Forus system, future regulatory approval will require specific validation of the effectiveness on each device.36 Second, access to digital fundus photography and accurate measures of GA are required. Thus, in telemedical settings, the AI model should be easy to implement, but the majority of ROP screening occurs in person and without photography.4,10,13,14 Similarly, it may occasionally be challenging to estimate GA; therefore, BW may be a viable alternative given the tight correlation between GA and BW (although GA outperformed BW in terms of predictive power in the US, where both are measured accurately).22 Third, image quality can limit performance of AI algorithms. To address this potential limitation, in this study, we did not exclude poor quality images, but we averaged the VSS from multiple images per session so that results would be as generalizable as possible. If images are good enough for a clinician to perform telemedicine, then this model is applicable. We also previously developed an automated method for image quality detection for ROP diagnosis that could be applied here.37,38 Finally, in the same way the model needed to be adapted between the US and India, further validation will be important for other populations.
Conclusions
In conclusion, blindness from ROP is largely preventable with timely ROP screenings; however, they can be challenging to implement in many parts of the world. Telemedicine has emerged as a solution to extend the workforce of ophthalmologists beyond practical geographic barriers and has facilitated efficient, high-volume ROP screenings in India, Nepal, and Mongolia. In this diagnostic study, we have demonstrated that incorporation of an AI-based risk model into existing telemedicine infrastructure has the potential to reduce the screening burden for ophthalmologists, reduce the number of examinations for low-risk infants, and enable early diagnosis and timely treatment of infants with TR-ROP—particularly A-ROP—possibly leading to improved visual outcomes for prematurely born infants around the world.
eFigure. Selection of Study Population
References
- 1.Good WV, Hardy RJ, Dobson V, et al. ; Early Treatment for Retinopathy of Prematurity Cooperative Group . The incidence and course of retinopathy of prematurity: findings from the early treatment for retinopathy of prematurity study. Pediatrics. 2005;116(1):15-23. doi: 10.1542/peds.2004-1413 [DOI] [PubMed] [Google Scholar]
- 2.Early Treatment for Retinopathy Of Prematurity Cooperative Group . Revised indications for the treatment of retinopathy of prematurity: results of the early treatment for retinopathy of prematurity randomized trial. Arch Ophthalmol. 2003;121(12):1684-1694. doi: 10.1001/archopht.121.12.1684 [DOI] [PubMed] [Google Scholar]
- 3.Fierson WM; American Academy of Pediatrics Section on Ophthalmology; American Academy of Ophthalmology; American Association for Pediatric Ophthalmology and Strabismus; American Association of Certified Orthoptists . Screening examination of premature infants for retinopathy of prematurity. Pediatrics. 2018;142(6):e20183061. doi: 10.1542/peds.2018-3061 [DOI] [PubMed] [Google Scholar]
- 4.Chiang MF, Quinn GE, Fielder AR, et al. International classification of retinopathy of prematurity, third edition. Ophthalmology. 2021;128(10):e51-e68. doi: 10.1016/j.ophtha.2021.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lawn JE, Davidge R, Paul VK, et al. Born too soon: care for the preterm baby. Reprod Health. 2013;10(suppl 1):S5. doi: 10.1186/1742-4755-10-S1-S5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Blencowe H, Cousens S, Chou D, et al. ; Born Too Soon Preterm Birth Action Group . Born too soon: the global epidemiology of 15 million preterm births. Reprod Health. 2013;10(suppl 1):S2. doi: 10.1186/1742-4755-10-S1-S2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blencowe H, Lawn JE, Vazquez T, Fielder A, Gilbert C. Preterm-associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010. Pediatr Res. 2013;74(suppl 1):35-49. doi: 10.1038/pr.2013.205 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Quinn GE. Retinopathy of prematurity blindness worldwide: phenotypes in the third epidemic. Eye Brain. 2016;8:31-36. doi: 10.2147/EB.S94436 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gilbert C, Fielder A, Gordillo L, et al. ; International NO-ROP Group . Characteristics of infants with severe retinopathy of prematurity in countries with low, moderate, and high levels of development: implications for screening programs. Pediatrics. 2005;115(5):e518-e525. doi: 10.1542/peds.2004-1180 [DOI] [PubMed] [Google Scholar]
- 10.Bowe T, Nyamai L, Ademola-Popoola D, et al. The current state of retinopathy of prematurity in India, Kenya, Mexico, Nigeria, Philippines, Romania, Thailand, and Venezuela. Digit J Ophthalmol. 2019;25(4):49-58. doi: 10.5693/djo.01.2019.08.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Shukla R, Murthy GVS, Gilbert C, Vidyadhar B, Mukpalkar S. Operational guidelines for ROP in India: a summary. Indian J Ophthalmol. 2020;68(suppl 1):S108-S114. doi: 10.4103/ijo.IJO_1827_19 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Shah PK, Ramya A, Narendran V. Telemedicine for ROP. Asia Pac J Ophthalmol (Phila). 2018;7(1):52-55. doi: 10.22608/APO.2017478 [DOI] [PubMed] [Google Scholar]
- 13.Vinekar A, Gilbert C, Dogra M, et al. The KIDROP model of combining strategies for providing retinopathy of prematurity screening in underserved areas in India using wide-field imaging, telemedicine, nonphysician graders, and smart phone reporting. Indian J Ophthalmol. 2014;62(1):41-49. doi: 10.4103/0301-4738.126178 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Campbell JP, Singh P, Redd TK, et al. Applications of artificial intelligence for retinopathy of prematurity screening. Pediatrics. 2021;147(3):e2020016618. doi: 10.1542/peds.2020-016618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Richter GM, Williams SL, Starren J, Flynn JT, Chiang MF. Telemedicine for retinopathy of prematurity diagnosis: evaluation and challenges. Surv Ophthalmol. 2009;54(6):671-685. doi: 10.1016/j.survophthal.2009.02.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Biten H, Redd TK, Moleta C, et al. ; Imaging & Informatics in Retinopathy of Prematurity (ROP) Research Consortium . Diagnostic accuracy of ophthalmoscopy vs telemedicine in examinations for retinopathy of prematurity. JAMA Ophthalmol. 2018;136(5):498-504. doi: 10.1001/jamaophthalmol.2018.0649 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Greenwald MF, Danford ID, Shahrawat M, et al. Evaluation of artificial intelligence-based telemedicine screening for retinopathy of prematurity. J AAPOS. 2020;24(3):160-162. doi: 10.1016/j.jaapos.2020.01.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Mangalesh S, Sarin N, McGeehan B, et al. ; BabySTEPS Group . Preterm infant stress during handheld optical coherence tomography vs binocular indirect ophthalmoscopy examination for retinopathy of prematurity. JAMA Ophthalmol. 2021;139(5):567-574. doi: 10.1001/jamaophthalmol.2021.0377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Anand KJ; International Evidence-Based Group for Neonatal Pain . Consensus statement for the prevention and management of pain in the newborn. Arch Pediatr Adolesc Med. 2001;155(2):173-180. doi: 10.1001/archpedi.155.2.173 [DOI] [PubMed] [Google Scholar]
- 20.Rush R, Rush S, Ighani F, Anderson B, Irwin M, Naqvi M. The effects of comfort care on the pain response in preterm infants undergoing screening for retinopathy of prematurity. Retina. 2005;25(1):59-62. doi: 10.1097/00006982-200501000-00008 [DOI] [PubMed] [Google Scholar]
- 21.Mitchell AJ, Green A, Jeffs DA, Roberson PK. Physiologic effects of retinopathy of prematurity screening examinations. Adv Neonatal Care. 2011;11(4):291-297. doi: 10.1097/ANC.0b013e318225a332 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Coyner AS, Chen JS, Singh P, et al. Single-examination risk prediction of severe retinopathy of prematurity. Pediatrics. 2021;148(6):e2021051772. doi: 10.1542/peds.2021-051772 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Taylor S, Brown JM, Gupta K, et al. ; Imaging and Informatics in Retinopathy of Prematurity Consortium . Monitoring disease progression with a quantitative severity scale for retinopathy of prematurity using deep learning. JAMA Ophthalmol. 2019;137(9):1022. doi: 10.1001/jamaophthalmol.2019.2433 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bellsmith KN, Brown J, Kim SJ, et al. Aggressive posterior retinopathy of prematurity: clinical and quantitative Imaging features in a large North American cohort. Ophthalmology. 2020;127(8):1105-1112. doi: 10.1016/j.ophtha.2020.01.052 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Olson SL, Chuluunbat T, Cole ED, et al. Development of screening criteria for retinopathy of prematurity in Ulaanbaatar, Mongolia, using a web-based data management system. J Pediatr Ophthalmol Strabismus. 2020;57(5):333-339. doi: 10.3928/01913913-20200804-01 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Brown JM, Campbell JP, Beers A, et al. ; Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium . Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803-810. doi: 10.1001/jamaophthalmol.2018.1934 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Campbell JP, Kim SJ, Brown JM, et al. ; of the Imaging and Informatics in Retinopathy of Prematurity Consortium . Evaluation of a deep learning-derived quantitative retinopathy of prematurity severity scale. Ophthalmology. 2021;128(7):1070-1076. Published online October 27, 2020. doi: 10.1016/j.ophtha.2020.10.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32-35. doi: [DOI] [PubMed] [Google Scholar]
- 29.McGill R, Tukey JW, Larsen WA. Variations of box plots. Am Stat. 1978;32(1):12-16. doi: 10.2307/2683468 [DOI] [Google Scholar]
- 30.Ying GS, Maguire MG, Glynn RJ, Rosner B. Calculating sensitivity, specificity, and predictive values for correlated eye data. Invest Ophthalmol Vis Sci. 2020;61(11):29. doi: 10.1167/iovs.61.11.29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Charan R, Dogra MR, Gupta A, Narang A. The incidence of retinopathy of prematurity in a neonatal care unit. Indian J Ophthalmol. 1995;43(3):123-126. [PubMed] [Google Scholar]
- 32.Chattopadhyay MP, Pradhan A, Singh R, Datta S. Incidence and risk factors for retinopathy of prematurity in neonates. Indian Pediatr. 2015;52(2):157-158. doi: 10.1007/s13312-015-0594-1 [DOI] [PubMed] [Google Scholar]
- 33.Shah PK, Subramanian P, Venkatapathy N, Chan RVP, Chiang MF, Campbell JP. Aggressive posterior retinopathy of prematurity in 2 cohorts of patients in South India: implications for primary, secondary, and tertiary prevention. J AAPOS. 2019;23(5):264.e1-264.e4. doi: 10.1016/j.jaapos.2019.05.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Binenbaum G, Ying GS, Quinn GE, et al. The CHOP postnatal weight gain, birth weight, and gestational age retinopathy of prematurity risk model. Arch Ophthalmol. 2012;130(12):1560-1565. doi: 10.1001/archophthalmol.2012.2524 [DOI] [PubMed] [Google Scholar]
- 35.Binenbaum G, Ying GS, Tomlinson LA; Postnatal Growth and Retinopathy of Prematurity (G-ROP) Study Group . Validation of the Children’s Hospital of Philadelphia Retinopathy of Prematurity (CHOP ROP) model. JAMA Ophthalmol. 2017;135(8):871-877. doi: 10.1001/jamaophthalmol.2017.2295 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Cole E, Valikodath NG, Al-Khaled T, et al. Evaluation of an artificial intelligence system for retinopathy of prematurity screening in Nepal and Mongolia. Ophthalmol Sci. Published online April 25, 2022. doi: 10.1016/j.xops.2022.100165 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Coyner AS, Swan R, Campbell JP, et al. ; Imaging and Informatics in Retinopathy of Prematurity Research Consortium . Automated fundus image quality assessment in retinopathy of prematurity using deep convolutional neural networks. Ophthalmol Retina. 2019;3(5):444-450. doi: 10.1016/j.oret.2019.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Coyner AS, Swan R, Brown JM, et al. Deep learning for image quality assessment of fundus images in retinopathy of prematurity. AMIA Annu Symp Proc. 2018;2018:1224-1232. [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eFigure. Selection of Study Population