Single-Exam Risk Prediction of Severe Retinopathy of Prematurity

Aaron S Coyner; Jimmy S Chen; Praveer Singh; Robert L Schelonka; Brian K Jordan; Cindy T McEvoy; Jamie E Anderson; RV Paul Chan; Kemal Sonmez; Deniz Erdogmus; Michael F Chiang; Jayashree Kalpathy-Cramer; J Peter Campbell

doi:10.1542/peds.2021-051772

. Author manuscript; available in PMC: 2022 Dec 1.

Published in final edited form as: Pediatrics. 2021 Dec 1;148(6):e2021051772. doi: 10.1542/peds.2021-051772

Single-Exam Risk Prediction of Severe Retinopathy of Prematurity

Aaron S Coyner ^1,², Jimmy S Chen ¹, Praveer Singh ^3,⁴, Robert L Schelonka ⁵, Brian K Jordan ⁵, Cindy T McEvoy ⁵, Jamie E Anderson ¹, RV Paul Chan ⁶, Kemal Sonmez ², Deniz Erdogmus ⁷, Michael F Chiang ⁸, Jayashree Kalpathy-Cramer ^3,^4,^*, J Peter Campbell ^1,^*, on behalf of the Imaging and Informatics in Retinopathy of Prematurity Consortium

PMCID: PMC8919718 NIHMSID: NIHMS1786874 PMID: 34814160

Abstract

Background and Objectives

Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Screening and treatment reduces this risk, but requires multiple examinations of babies, most of whom will not develop severe disease. Previous work has suggested that artificial intelligence (AI) may be able to detect incident severe (treatment-requiring [TR]) ROP prior to clinical diagnosis. We aimed to build a risk model that combined AI with clinical demographics to reduce the number of examinations without missing cases of TR-ROP.

Methods

Infants undergoing routine ROP screening examinations (1579 total eyes, 190 with TR-ROP) were recruited from eight North American study centers. A vascular severity score (VSS) was derived from retinal fundus images obtained at 32–33 weeks postmenstrual age. Seven ElasticNet logistic regression models were trained on all combinations of birthweight (BW), gestational age (GA), and VSS. Area under the precision-recall curve (AUPR) identified the highest performing model.

Results

The GA + VSS model had the highest performance (mean ± SD AUPR: 0.35 ± 0.11). On two different test datasets (n = 444 subjects and n = 132 subjects), sensitivity was 100% (positive predictive value: 28.1% and 22.6%) and specificity was 48.9% and 80.8% (negative predictive value: 100.0%).

Conclusions

Using a single exam, this model identified all infants who developed TR-ROP, on average more than one month prior to diagnosis, with moderate to high specificity. This approach could lead to earlier identification of incident severe ROP, reducing late diagnosis and treatment, while simultaneously reducing the number of ROP exams and unnecessary physiological stress for low-risk babies.

Table of Contents Summary:

A risk model based on gestational age and an artificial intelligence-based assessment of disease severity predicts treatment-requiring retinopathy of prematurity one month prior to treatment.

INTRODUCTION

Retinopathy of prematurity (ROP) is a leading cause of childhood blindness, even though visual impairment can be prevented with appropriate screening and treatment.^1–4 In the context of prematurely born infants, the epidemiology of ROP is directly related to two primary factors: neonatal mortality and exposure to supra-physiological oxygen for resuscitation.^1,5 Primary prevention of ROP, through careful oxygen titration, effectively reduces the incidence of treatment-requiring (TR-) ROP; however, there exists a delicate balance: a lower fraction of inspired oxygen (FiO₂) reduces the probability of developing ROP, but consequently increases the probability of mortality, and vice-versa.⁵ To err on the side of caution, higher FiO₂ is supplied and neonatal intensive care units (NICUs) are responsible for ensuring that secondary prevention, through timely ROP screenings, occurs for all at-risk neonates.^1,4,5 The risk of blindness can be reduced, but not eliminated, with optimal primary and secondary prevention; however, because adverse outcomes are at times preventable, ROP is a leading cause of medico-legal liability in ophthalmology.^6,7

ROP screenings help identify eyes progressing to TR-ROP, so that timely treatments may be provided. However, screening guidelines must balance the risk of missing cases of TR-ROP with the risks of discomfort and potentially life-threatening events from the screenings themselves.^3–5 In the United States, screenings are recommended based on demographic criteria (gestational age [GA] < 31 weeks or birthweight [BW] < 1501 grams).⁴ Exams begin at either four weeks of chronological age or 31 weeks postmenstrual age (PMA; whichever is later), and are repeated every one to two weeks until the retina is fully developed or ROP requires treatment.^2,4 On average, babies who meet screening criteria receive 3–8 examinations, yet fewer than 10% develop TR-ROP. Thus, current screening guidelines, while highly sensitive, are not specific and subject low-risk infants to exams that would not be necessary if high-risk babies could be better identified.^1–3,8,9 Numerous risk models have attempted to add specificity by incorporating comorbidities, but many of them are rare or are confounded by BW and GA.^10,11 The best performing models have demonstrated promise, but thus far have not generalized well to larger, more diverse populations.^10,12,13 Ultimately, these models have not gained traction, as they have either failed to ensure 100% sensitivity or been clinically impractical to implement.^10–13

Herein, we explore whether the specificity of risk models can be improved by including biometric information. Deep learning (DL) has demonstrated promise for objective diagnosis of ROP and may be useful for screening.^14–19 Previous work, using the Imaging and Informatics in ROP (i-ROP) DL algorithm, has suggested that a DL-derived vascular severity score (VSS) may identify babies progressing to TR-ROP weeks before treatment.^16,17 To address this gap in knowledge, we incorporated the output of the i-ROP DL algorithm in a predictive risk model for incident TR-ROP. We hypothesize that adding biometric information relevant to ROP may add specificity to risk models based only on demographic variables, without sacrificing TR-ROP detection sensitivity.

METHODS

i-ROP Study Details

This study was approved by the Institutional Review Boards at the coordinating center (Oregon Health & Science University) and at each of seven study centers (Columbia University, University of Illinois at Chicago, William Beaumont Hospital, Children’s Hospital Los Angeles, Cedars-Sinai Medical Center, University of Miami, Weill Cornell Medical Center) and was conducted in accordance with the Declaration of Helsinki. Written, informed consent was obtained from parents of all enrolled infants.

As part of the multi-center i-ROP cohort study, 842 unique subjects (BW < 1501 grams or GA < 31 weeks) were screened multiple times for ROP between January 2012 and July 2020. During each exam, retinal fundus images were captured via a RetCam (Natus; Pleasanton, CA). Subjects were clinically examined at the bedside, but also received image-based ROP diagnoses, which were determined by a consensus of three ROP experts using the full International Classification of ROP (ICROP) criteria.⁴ Subjects’ retinal images were required to have expert consensus agreement that their quality was acceptable for diagnosis; thirty-three images did not meet this criterion. Clinical comorbidities and demographics were recorded for all subjects’ exams (Table 1, Supplemental Table 1). Statistical significance, where applicable, was determined using Welch’s Two Sample t-test and was defined at a cutoff of p ≤ 0.05.

Table 1.

i-ROP and Salem Dataset Demographics and Clinical Outcomes.

Study Patient Characteristics	Not Treated	Treated	p-value
i-ROP Training Dataset
Birth Weight (grams, mean ± SD)	944.5 ± 248.3	673.0 ± 206.3	< 0.001
Gestational Age (weeks, mean ± SD)	26.7 ± 1.7	24.7 ± 1.4	< 0.001
Vascular Severity Score (mean ± SD)	1.4 ± 0.9	2.9 ± 1.9	< 0.001
Total Patients n (%)	345 (91.8)	31 (8.2)	—
Total Eyes n (%)	660 (91.9)	58 (8.1)	—
i-ROP Test Dataset
Birth Weight (grams, mean ± SD)	930.6 ± 275.8	632.6 ± 136.1	< 0.001
Gestational Age (weeks, mean ± SD)	26.9 ± 2.1	24.3 ± 1.1	< 0.001
Vascular Severity Score (mean ± SD)	1.8 ± 1.4	3.9 ± 2.6	< 0.001
Total Patients n (%)	377 (84.9)	67 (15.1)	—
Total Eyes n (%)	729 (84.7)	132 (15.3)	—
Salem Dataset
Birth Weight (grams, mean ± SD)	1265.4 ± 281.5	823.0 ± 200.9	0.052
Gestational Age (weeks, mean ± SD)	29.2 ± 2.2	25.0 ± 0.7	< 0.001
Vascular Severity Score (mean ± SD)	1.6 ± 0.5	2.3 ± 0.9	0.029
Total Patients n (%)	125 (94.7)	7 (5.3)	—
Total Eyes n (%)	248 (94.7)	14 (5.3)	—

Open in a new tab

Vascular Severity Score and Dataset Preparation

Each eye examination was represented by a single RetCam image centered on the macula, which is approximately the field of view of zone I. Images were analyzed by i-ROP DL, an algorithm developed to detect plus disease (a manifestation of severe ROP).¹⁴ i-ROP DL provided a softmax probability of each image having normal, pre-plus, or plus disease vasculature (i.e., it approximated the probability of each class, where values range between 0.0 and 1.0, but must sum to 1.0 across all classes). From these values, a VSS, ranging from 1.0 to 9.0, was developed:

Vascular Severity Score = P (normal) + 5 * P (p r e - plus) + 9 * P (plus)

The VSS has been shown to independently correlate with more posterior disease (zone), higher stage, and higher extent of stage 3 ROP in addition to plus disease (all the components of ICROP).^15–19 Based on prior work, the 32–33 week PMA imaging window was identified as potentially predictive of TR-ROP.^16,17 Thus, the first eye examination in this window was used for each subject. Because the goal was to develop a predictive (rather than diagnostic) model, infants who were diagnosed with TR-ROP within this window were excluded from the training dataset — specifically, if they developed TR-ROP within seven days of the first exam to occur within the 32–33 week PMA window. The held-out test dataset (a subset of exams from the i-ROP dataset that were only used for model evaluation) contained all infants eligible for ROP screening, regardless of if/when they developed TR-ROP. Subjects were mutually exclusive to the training (n = 376 subjects) and test (n = 444 subjects) datasets. The training dataset contained 58 eyes that eventually developed TR-ROP and 660 eyes that did not.

Risk Model Development

BW, GA, and VSS were evaluated via recursive feature elimination using multiple ElasticNet models trained using Sci-Kit Learn in Python.²¹ ElasticNet is a type of logistic regression that uses a mixture of L1 and L2 regularization.²⁰ L1 and L2 regularization are useful for feature selection and when collinear/codependent features are included in a model, respectively, and help to improve model generalizability. The ElasticNet mixing parameter was tuned via five-fold cross-validation using 11 evenly-distributed operating points from 0.0 to 1.0. Values of 1.0 and 0.0 are equal to L1 and L2 regularization, respectively. Due to the class imbalance (i.e., eyes that eventually developed TR-ROP versus those that did not), area under the precision-recall curve (AUPR) was the primary measure of model performance, rather than area under the receiver operating characteristics curve (AUROC), as AUROC may be too optimistic. That is, a random classifier theoretically has an AUROC of 0.5, but AUPR only equal to the proportion of positive cases divided by the total number of cases.

Operating Point Selection

The performance of the model with the highest AUPR was assessed via the F_β score using five-fold cross-validation across 101 evenly distributed operating points from 0.00 to 1.00. Whereas the F₁ score (β = 1) attempts to balance the proportion of false negatives to false positives, increasing β (e.g., F₂, F₃, etc.) prioritizes minimizing false negatives over minimizing false positives. The F₂ score is commonly used to slightly prioritize minimization of false negatives. To minimize false negatives, β was set to 4. The mean operating point (minus one standard deviation [SD]) that maximized the F₄ score was selected and used to evaluate both test datasets.

Model Evaluation

This model was then evaluated on the held-out i-ROP test dataset and on an independent dataset that was collected between September 2015 and June 2018 from 132 unique subjects born at a hospital in Salem, OR (Table 1). Data collection and exclusion criteria were similar to the i-ROP dataset. Retrospective evaluation of these data was performed under a waiver of consent from the Oregon Health & Science University Institutional Review Board. Because patients are referred for treatment (not individual eyes), test dataset evaluations were conducted at the patient-level (i.e., if one or both eyes were predicted to develop TR-ROP, the patient was labeled as such). The i-ROP test dataset contained 74 patients (132 eyes) that eventually developed TR-ROP and 370 patients (729 eyes) that did not. The salem dataset, contained seven patients (14 eyes) that developed TR-ROP and 125 patients (248 eyes) that did not. The main outcome measures were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and their respective 95% confidence intervals (CI), evaluated independently using the conservative Clopper-Pearson method, as suggested by Ying et al.²²

Secondary Analysis of Positive Cases

In a secondary analysis, the maximum VSS between eyes for subjects in the i-ROP test dataset that screened positive were followed over time. Based on prior work, the VSS has shown potential for use as a monitoring tool to detect disease progression. The change in VSS over time for subjects that screened positive and eventually developed TR-ROP was compared to those who screened positive but did not develop TR-ROP. Statistical significance was set at a cutoff of p ≤ 0.05 and was determined using an analysis of variance (ANOVA) and a Welch’s Two Sample t-test.

RESULTS

Datasets

Table 1 displays the relevant demographics and VSS at 32–33 weeks PMA, and clinical outcomes in the two datasets used in this study. In both datasets, eyes that developed TR-ROP tended to have higher VSS at 32–33 weeks and babies that required treatment in one or both eyes tended to have lower BW and GA.

Risk Model Development

ElasticNet was tuned via five-fold cross-validation for all combinations of BW, GA, and VSS. An ElasticNet model with an L1 ratio of 0.4 using the predictors GA and VSS had the highest AUPR (0.35 ± 0.11, Table 2, Figure 1). Note that a random classifier would have an AUPR approximately equal to 0.08 (the proportion of TR-ROP cases in the training dataset).

Table 2:

Five-fold cross-validation results for every combination of birth weight, gestational age, and vascular severity score.

Variables	AUPR^a	AUROC^a	L1 Ratio
BW	0.21 ± 0.14	0.77 ± 0.12	0.0
GA	0.23 ± 0.20	0.79 ± 0.09	1.0
VSS	0.29 ± 0.05	0.76 ± 0.03	0.0
BW + GA	0.23 ± 0.20	0.78 ± 0.10	0.0
BW + VSS	0.32 ± 0.13	0.82 ± 0.11	0.0
GA + VSS	0.35 ± 0.11	0.82 ± 0.07	0.4
BW + GA + VSS	0.31 ± 0.11	0.81 ± 0.11	0.0

Open in a new tab

Mean ± standard deviation results from five-fold cross-validation

Abbreviations: AUPR = area under the precision-recall curve, AUROC = area under the receiver operating characteristic curve, BW = birthweight, GA = gestational age, VSS = vascular severity score at 32–33 weeks postmenstrual age. L1 Ratio: weighting of L1 versus L2 regularization in ElasticNet.

Figure 1: — The mean ± standard deviation of the (A) AUPR and (B) AUROC, respectively, were 0.35 ± 0.11 and 0.82 ± 0.07 on five-fold cross-validation.

The operating point was tuned for increased sensitivity — so that all cases of TR-ROP would be identified — prior to evaluating performance on the test datasets. The maximum F₄ score ± SD (0.74 ± 0.12) occurred at an operating point of 0.33 ± 0.08. To further increase sensitivity, this operating point was lowered by one SD to 0.25.

Model Evaluation

The model was then evaluated on the held-out test dataset from the i-ROP database (Table 3). It identified all babies that eventually required treatment (sensitivity [CI]: 100.0% [95.1%, 100.0%], PPV [CI]: 28.1% [22.8%, 34.0%]) while correctly identifying nearly half of the babies that never would (specificity [CI]: 48.9% [43.7%, 54.1%], NPV [CI]: 100.0% [98.0%, 100.0%]). For infants who developed TR-ROP, the average number of weeks ± SD to TR-ROP diagnosis was 3.7 ± 2.7 weeks (range: 0.1–11.0 weeks) following prediction.

Table 3:

Confusion matrix of the model compared with the ground truth in two test datasets.

		True Label
		i-ROP Test Dataset		Salem Test Dataset
		Not Treated	Treated	Not Treated	Treated
Model Predictions	Predicted Not Treated	181 (TN)	0 (FN)	101 (TN)	0 (FN)
Model Predictions	Predicted Treated	189 (FP)	74 (TP)	24 (FP)	7 (TP)

Open in a new tab

Abbreviations: TN = True Negative, TP = True Positive, FP = False Positive, FN = False Negative

It was also evaluated on an independent test dataset collected from a hospital located in Salem, OR (Table 3). Again, it correctly identified all babies that eventually required treatment (sensitivity [CI]: 100.0% [59.0%, 100.0%], PPV [CI]: 22.6% [9.6%, 41.1%]), and specificity [CI] increased to 80.8% [72.8%, 87.3%] (NPV [CI]: 100.0% [96.4%, 100.0%]). The average time ± SD to TR-ROP diagnosis, following prediction, was 3.4 ± 2.1 weeks (range: 0.1–5.0 weeks).

Secondary Analysis of Positive Cases

Among positive predictions in the i-ROP test dataset (Table 3), the average VSS was followed over time. Subjects that developed TR-ROP appeared to have a greater change in average VSS as compared to those who screened positive but never required treatment (p ≤ 0.05), suggesting that specificity could be further improved by analyzing change in VSS over time (Figure 2).

Figure 2: — Among patients that screened positive by the optimal model, patients that developed TR-ROP had higher maximum inter-eye vascular severity scores at every subsequent follow up (p ≤ 0.05).

DISCUSSION

We tested whether incorporation of an artificial intelligence (AI)-based assessment of vascular severity could improve the performance of ROP risk prediction models. We found that using just GA and VSS — obtained during a single eye examination at 32–33 weeks PMA — can identify all infants who are at risk of developing TR-ROP nearly one month prior to diagnosis and simultaneously rule out more than half of the low-risk population. With further validation, implementation of this model could reduce the number of ROP examinations and associated physiological stress for low-risk babies. Finally, quantitative monitoring of vascular severity may lead to earlier and more consistent diagnosis of TR-ROP in infants who are at the highest risk, thus minimizing the overall risk of adverse outcomes.

This hypothesis was based on prior work that demonstrated that a DL-derived VSS may identify high-risk eyes as early as one month prior to TR-ROP diagnosis.^16,17 This proved to be accurate, as the AUPR of VSS at predicting TR-ROP was 0.07 points higher than the BW or GA univariate models, or the combination thereof (Table 1). This suggested that diagnostic prediction might be higher if a combination of VSS and GA and/or BW were to be used in a risk model. After optimizing the operating point of the highest performing algorithm (GA + VSS) for increased sensitivity (to avoid missing cases of TR-ROP), the model correctly identified 100% of babies who developed TR-ROP in two separate populations.

The intended use population and the potential impact of the PPV and NPV in each target population must also be considered. In the i-ROP dataset, consistent with a population of babies from academic medical centers (who may be higher risk than the average NICU), the specificity of the model was 48.9% as compared to 80.8% in the Salem, OR hospital, where the incidence of TR-ROP is lower. Even in the higher-risk population (i-ROP), these results suggest that, by 32–33 weeks PMA, half the population could be accurately identified as low risk and no longer require frequent examinations. The Salem, OR population suggests that this proportion may be substantially higher in community ROP screening programs.

We also found that using VSS to monitor disease progression may further enhance early detection of incident TR-ROP in babies that screen positive (Figure 2). This is consistent with prior work demonstrating that quantitative monitoring of vascular severity may be useful not just for screening, but for quantitative diagnosis and determining whether the disease is stable, progressing, or regressing.^14–19 This could lay the framework for a new model of ROP screening, where low-risk babies receive less exams, and high-risk babies receive earlier and more precise diagnoses. To this point, it may be worth investigating the role of oxygen exposure, intraventricular hemorrhages, sepsis, necrotizing enterocolitis, thrombocytopenia, and other previously-associated risk factors to further increase specificity, although they may complicate this model and/or introduce confounding effects.

This model may also be easier to implement than prior ROP risk models. The performance of the GA + VSS model is comparable to the initial performance measurements of the Children’s Hospital of Philadelphia (CHOP) ROP model, which used a combination of BW + GA + weight gain to predict future occurrences of Type-II and TR-ROP.¹² Both models achieved 100.0 % sensitivity in predicting TR-ROP, and had similar specificities. However, when the CHOP ROP model was applied to an external validation cohort of infants admitted to 30 hospitals across North America, the operating point had to be lowered to achieve 100.0% sensitivity, consequently reducing specificity to just 6.8% — too low to have a substantive impact on screening protocols.¹³ Another advantage of the proposed model is that it only requires data from a single examination. In general, GA is known with high precision, except in low- and middle-income countries (LMICs) where dating pregnancies may be less reliable. In these settings, it may be worth exploring a model that uses BW + VSS instead, since Table 1 suggests almost comparable performance. However, a retinal fundus photo obtained at 32–33 weeks PMA is also required, and herein lies the main barrier to implementation at this time. Images are not part of the standard of care and digital fundus cameras can be expensive, so images are not often obtained.^2,4 As cameras drop in price and smartphone-based cameras become viable alternatives, it may be that future studies validating this concept demonstrate that the clinical benefit of earlier detection of high-risk babies, along with the reduced screening burden, outweighs the cost of implementing routine imaging.^23–25 Nonetheless, this remains a barrier to implementation and the main disadvantage of this method.

Additionally, this model is not likely to initially generalize well to populations different from the North American screening population. In many LMICs, the epidemiology and demographic risk factors are quite different, and the model would need to be re-tuned based on local disease epidemiology.^9,26,27 For example, high-risk babies could be less premature and a time point other than 32–33 weeks PMA may be more predictive. There is, however, evidence that the i-ROP DL system accurately diagnoses TR-ROP in an Indian ROP telemedicine program, suggesting that the technology is effective in that context, and thus may be translatable.¹⁹

Regardless, this model has potential to create a paradigm shift: transitioning from ophthalmology-led to neonatology-led ROP screenings, since the only required inputs are gestational age and a fundus photograph (not a complete ophthalmoscopic examination). Such a paradigm could, in addition to reducing the number of exams needed for low-risk infants, dramatically reduce the number of exams for which an ophthalmologist is needed. This could lead to better use of scarce resources, especially in rural regions and LMICs, where this is a significant issue.^26,27

Conclusion

In conclusion, we have trained and optimized an interpretable, parsimonious model for the prediction of TR-ROP. In two separate validation cohorts, we demonstrated that a single exam at 32–33 weeks PMA detected all babies who eventually developed TR-ROP and more than half of those who did not. Implementation of this model could lead to significantly fewer ROP exams for low-risk babies, better utilization of ROP screening resources, and earlier recognition of TR-ROP disease progression. Future work will validate this concept in LMICs, where the potential added value may be even greater given the increasing prevalence of disease and scarcity of resources, with the goal of reducing or eliminating blindness from ROP.

Supplementary Material

NIHMS1786874-supplement-1.pdf^{(119.5KB, pdf)}

What’s known on this subject:

Retinopathy of prematurity screenings are an essential service in NICUs; however, current risk models subject infants to multiple physiologically stressful exams. Previous work has demonstrated that an artificial intelligence-derived vascular severity score may prove useful for identifying severe disease.

What this study adds:

We developed an image-based risk model that, using a single retinal photograph, accurately detects severe ROP one month prior to diagnosis. Implementation of this screening approach could result in a paradigm shift toward neonatology-led ROP screenings.

Conflict of Interest Disclosures:

R.V. Paul Chan is on the Scientific Advisory Board for Phoenix Technology Group (Pleasanton, CA), a Consultant for Novartis (Basel, Switzerland), and a Consultant for Alcon (Ft. Worth, TX). Michael F. Chiang was previously a consultant for Novartis (Basel, Switzerland), and is an equity owner of Inteleretina (Honolulu, HI). Michael F. Chiang, J. Peter Campbell, R.V. Paul Chan, and Jayashree Kalpathy-Cramer receive research support from Genentech. R.V. Paul Chan receives research support from Regeneron. The i-ROP DL system has been licensed to Boston AI labs by Oregon Health & Science University, Massachusetts General Hospital, Northeastern University, and the University of Illinois, Chicago, which may result in royalties to Drs. Chan, Campbell, Kalpathy-Cramer in the future. Dr. Campbell is an unpaid advisor to Boston AI.

Funding/Support:

This work was supported by grants T15 LM007088, R01 EY19474, R01 EY031331, R21 EY031883, and P30 EY10572 from the National Institutes of Health (Bethesda, MD), and by unrestricted departmental funding and a Career Development Award (JPC) from Research to Prevent Blindness (New York, NY).

Abbreviations:

ANOVA: analysis of variance
AUPR: area under the precision-recall curve
AUROC: area under the receiver operating characteristics curve
BW: birthweight
CHOP: Children’s Hospital of Philadelphia
DL: deep learning
FN: false negative
FP: false positive
FiO₂: fraction of inspired oxygen
GA: gestational age
i-ROP: Imaging and Informatics in Retinopathy of Prematurity
i-ROP DL: Imaging and Informatics in Retinopathy of Prematurity Deep Learning
ICROP: International Classification of Retinopathy of Prematurity
LMIC: low- and middle-income countries
NPV: negative predictive value
PPV: positive predictive value
PMA: postmenstrual age
ROP: retinopathy of prematurity
SD: standard deviation
TR-ROP: treatment-requiring retinopathy of prematurity
TN: true negative
TP: true positive
VSS: vascular severity score

REFERENCES

1.Good WV, Hardy RJ, Dobson V, et al. The incidence and course of retinopathy of prematurity: findings from the early treatment for retinopathy of prematurity study. Pediatrics. 2005;116(1):15–23. [DOI] [PubMed] [Google Scholar]
2.Fierson WM; American academy of pediatrics section on ophthalmology; American academy of ophthalmology; American association for pediatric ophthalmology and strabismus; American association of certified orthoptists. Screening examination of premature infants for retinopathy of prematurity. Pediatrics. 2018;142(6):E20183061. Pediatrics. 2019;143(3):e20183810. [DOI] [PubMed] [Google Scholar]
3.Early Treatment For Retinopathy Of Prematurity Cooperative Group. Revised indications for the treatment of retinopathy of prematurity: results of the early treatment for retinopathy of prematurity randomized trial. Arch Ophthalmol. 2003;121(12):1684–1694. [DOI] [PubMed] [Google Scholar]
4.International Committee for the Classification of Retinopathy of Prematurity. The International Classification of Retinopathy of Prematurity revisited. Arch Ophthalmol. 2005;123(7):991–999. [DOI] [PubMed] [Google Scholar]
5.Lawn JE, Davidge R, Paul VK, et al. Born Too Soon: Care for the preterm baby. Reprod Health. 2013;10(S1):S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Braverman RS, Enzenauer RW. Socioeconomics of retinopathy of prematurity in-hospital care. Arch Ophthalmol. 2010;128(8):1055–1058. [DOI] [PubMed] [Google Scholar]
7.Braverman RS, Enzenauer RW. Socioeconomics of retinopathy of prematurity care in the United States. Am Orthopt J. 2013;63(1):92–96. [DOI] [PubMed] [Google Scholar]
8.Blencowe H, Lawn JE, Vazquez T, Fielder A, Gilbert C. Preterm-associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010. Pediatr Res. 2013;74 Suppl 1(S1):35–49. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Quinn GE. Retinopathy of prematurity blindness worldwide: phenotypes in the third epidemic. Eye Brain. 2016;8:31–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kim SJ, Port AD, Swan R, Campbell JP, Chan RVP, Chiang MF. Retinopathy of prematurity: a review of risk factors and their clinical significance. Surv Ophthalmol. 2018;63(5):618–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Hutchinson AK, Melia M, Yang MB, VanderVeen DK, Wilson LB, Lambert SR. Clinical models and algorithms for the prediction of retinopathy of prematurity: A report by the American academy of ophthalmology. Ophthalmology. 2016;123(4):804–816. [DOI] [PubMed] [Google Scholar]
12.Binenbaum G, Ying G-S, Quinn GE, et al. The CHOP postnatal weight gain, birth weight, and gestational age retinopathy of prematurity risk model. Arch Ophthalmol. 2012;130(12):1560–1565. [DOI] [PubMed] [Google Scholar]
13.Binenbaum G, Ying G-S, Tomlinson LA, Postnatal Growth and Retinopathy of Prematurity (G-ROP) Study Group. Validation of the Children’s Hospital of Philadelphia Retinopathy of prematurity (CHOP ROP) model. JAMA Ophthalmol. 2017;135(8):871–877. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Campbell JP, Kim SJ, Brown JM, et al. Evaluation of a Deep Learning-Derived Quantitative Retinopathy of Prematurity Severity Scale. Ophthalmology. Published online October 27, 2020. doi: 10.1016/j.ophtha.2020.10.025 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Taylor S, Brown JM, Gupta K, et al. Monitoring disease progression with a quantitative severity scale for retinopathy of prematurity using deep learning. JAMA Ophthalmol. 2019;137(9):1022. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bellsmith KN, Brown J, Kim SJ, et al. Aggressive posterior retinopathy of prematurity: Clinical and quantitative Imaging features in a large North American cohort. Ophthalmology. 2020;127(8):1105–1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Greenwald MF, Danford ID, Shahrawat M, et al. Evaluation of artificial intelligence-based telemedicine screening for retinopathy of prematurity. J AAPOS. 2020;24(3):160–162. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Campbell JP, Singh P, Redd TK, et al. Applications of artificial intelligence for retinopathy of prematurity screening. Pediatrics. 2021;147(3):e2020016618. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd ed. Springer; 2017. [Google Scholar]
21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12(85):2825–2830. [Google Scholar]
22.Ying G-S, Maguire MG, Glynn RJ, Rosner B. Calculating sensitivity, specificity, and predictive values for correlated eye data. Invest Ophthalmol Vis Sci. 2020;61(11):29. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Wang S, Jin K, Lu H, Cheng C, Ye J, Qian D. Human visual system-based fundus image quality assessment of portable fundus camera photographs. IEEE Trans Med Imaging. 2016;35(4):1046–1055. [DOI] [PubMed] [Google Scholar]
24.Raju B, Raju NSD, Akkara JD, Pathengay A. Do it yourself smartphone fundus camera - DIYretCAM. Indian J Ophthalmol. 2016;64(9):663–667. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Nazari Khanamiri H, Nakatsuka A, El-Annan J. Smartphone Fundus Photography. J Vis Exp. 2017;(125). doi: 10.3791/55958 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Gilbert C, Fielder A, Gordillo L, et al. Characteristics of infants with severe retinopathy of prematurity in countries with low, moderate, and high levels of development: implications for screening programs. Pediatrics. 2005;115(5):e518–25. [DOI] [PubMed] [Google Scholar]
27.Gilbert C Retinopathy of prematurity: a global perspective of the epidemics, population of babies at risk and implications for control. Early Hum Dev. 2008;84(2):77–82. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

NIHMS1786874-supplement-1.pdf^{(119.5KB, pdf)}

[R1] 1.Good WV, Hardy RJ, Dobson V, et al. The incidence and course of retinopathy of prematurity: findings from the early treatment for retinopathy of prematurity study. Pediatrics. 2005;116(1):15–23. [DOI] [PubMed] [Google Scholar]

[R2] 2.Fierson WM; American academy of pediatrics section on ophthalmology; American academy of ophthalmology; American association for pediatric ophthalmology and strabismus; American association of certified orthoptists. Screening examination of premature infants for retinopathy of prematurity. Pediatrics. 2018;142(6):E20183061. Pediatrics. 2019;143(3):e20183810. [DOI] [PubMed] [Google Scholar]

[R3] 3.Early Treatment For Retinopathy Of Prematurity Cooperative Group. Revised indications for the treatment of retinopathy of prematurity: results of the early treatment for retinopathy of prematurity randomized trial. Arch Ophthalmol. 2003;121(12):1684–1694. [DOI] [PubMed] [Google Scholar]

[R4] 4.International Committee for the Classification of Retinopathy of Prematurity. The International Classification of Retinopathy of Prematurity revisited. Arch Ophthalmol. 2005;123(7):991–999. [DOI] [PubMed] [Google Scholar]

[R5] 5.Lawn JE, Davidge R, Paul VK, et al. Born Too Soon: Care for the preterm baby. Reprod Health. 2013;10(S1):S5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Braverman RS, Enzenauer RW. Socioeconomics of retinopathy of prematurity in-hospital care. Arch Ophthalmol. 2010;128(8):1055–1058. [DOI] [PubMed] [Google Scholar]

[R7] 7.Braverman RS, Enzenauer RW. Socioeconomics of retinopathy of prematurity care in the United States. Am Orthopt J. 2013;63(1):92–96. [DOI] [PubMed] [Google Scholar]

[R8] 8.Blencowe H, Lawn JE, Vazquez T, Fielder A, Gilbert C. Preterm-associated visual impairment and estimates of retinopathy of prematurity at regional and global levels for 2010. Pediatr Res. 2013;74 Suppl 1(S1):35–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Quinn GE. Retinopathy of prematurity blindness worldwide: phenotypes in the third epidemic. Eye Brain. 2016;8:31–36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Kim SJ, Port AD, Swan R, Campbell JP, Chan RVP, Chiang MF. Retinopathy of prematurity: a review of risk factors and their clinical significance. Surv Ophthalmol. 2018;63(5):618–637. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Hutchinson AK, Melia M, Yang MB, VanderVeen DK, Wilson LB, Lambert SR. Clinical models and algorithms for the prediction of retinopathy of prematurity: A report by the American academy of ophthalmology. Ophthalmology. 2016;123(4):804–816. [DOI] [PubMed] [Google Scholar]

[R12] 12.Binenbaum G, Ying G-S, Quinn GE, et al. The CHOP postnatal weight gain, birth weight, and gestational age retinopathy of prematurity risk model. Arch Ophthalmol. 2012;130(12):1560–1565. [DOI] [PubMed] [Google Scholar]

[R13] 13.Binenbaum G, Ying G-S, Tomlinson LA, Postnatal Growth and Retinopathy of Prematurity (G-ROP) Study Group. Validation of the Children’s Hospital of Philadelphia Retinopathy of prematurity (CHOP ROP) model. JAMA Ophthalmol. 2017;135(8):871–877. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Campbell JP, Kim SJ, Brown JM, et al. Evaluation of a Deep Learning-Derived Quantitative Retinopathy of Prematurity Severity Scale. Ophthalmology. Published online October 27, 2020. doi: 10.1016/j.ophtha.2020.10.025 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Taylor S, Brown JM, Gupta K, et al. Monitoring disease progression with a quantitative severity scale for retinopathy of prematurity using deep learning. JAMA Ophthalmol. 2019;137(9):1022. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Bellsmith KN, Brown J, Kim SJ, et al. Aggressive posterior retinopathy of prematurity: Clinical and quantitative Imaging features in a large North American cohort. Ophthalmology. 2020;127(8):1105–1112. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Greenwald MF, Danford ID, Shahrawat M, et al. Evaluation of artificial intelligence-based telemedicine screening for retinopathy of prematurity. J AAPOS. 2020;24(3):160–162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Campbell JP, Singh P, Redd TK, et al. Applications of artificial intelligence for retinopathy of prematurity screening. Pediatrics. 2021;147(3):e2020016618. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. 2nd ed. Springer; 2017. [Google Scholar]

[R21] 21.Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12(85):2825–2830. [Google Scholar]

[R22] 22.Ying G-S, Maguire MG, Glynn RJ, Rosner B. Calculating sensitivity, specificity, and predictive values for correlated eye data. Invest Ophthalmol Vis Sci. 2020;61(11):29. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Wang S, Jin K, Lu H, Cheng C, Ye J, Qian D. Human visual system-based fundus image quality assessment of portable fundus camera photographs. IEEE Trans Med Imaging. 2016;35(4):1046–1055. [DOI] [PubMed] [Google Scholar]

[R24] 24.Raju B, Raju NSD, Akkara JD, Pathengay A. Do it yourself smartphone fundus camera - DIYretCAM. Indian J Ophthalmol. 2016;64(9):663–667. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Nazari Khanamiri H, Nakatsuka A, El-Annan J. Smartphone Fundus Photography. J Vis Exp. 2017;(125). doi: 10.3791/55958 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Gilbert C, Fielder A, Gordillo L, et al. Characteristics of infants with severe retinopathy of prematurity in countries with low, moderate, and high levels of development: implications for screening programs. Pediatrics. 2005;115(5):e518–25. [DOI] [PubMed] [Google Scholar]

[R27] 27.Gilbert C Retinopathy of prematurity: a global perspective of the epidemics, population of babies at risk and implications for control. Early Hum Dev. 2008;84(2):77–82. [DOI] [PubMed] [Google Scholar]

PERMALINK

Single-Exam Risk Prediction of Severe Retinopathy of Prematurity

Aaron S Coyner, PhD

Jimmy S Chen, BS

Praveer Singh, PhD

Robert L Schelonka, MD

Brian K Jordan, MD, PhD

Cindy T McEvoy, MD

Jamie E Anderson, BS

RV Paul Chan, MD, MSc

Kemal Sonmez, PhD

Deniz Erdogmus, PhD

Michael F Chiang, MD, MA

Jayashree Kalpathy-Cramer, PhD

J Peter Campbell, MD, MPH

Abstract

Background and Objectives

Methods

Results

Conclusions

Table of Contents Summary:

INTRODUCTION

METHODS

i-ROP Study Details

Table 1.

Vascular Severity Score and Dataset Preparation

Risk Model Development

Operating Point Selection

Model Evaluation

Secondary Analysis of Positive Cases

RESULTS

Datasets

Risk Model Development

Table 2:

Figure 1: Areas under the precision-recall (PR) and receiver operating characteristics (ROC) curves for the gestational age + vascular severity score model.

Model Evaluation

Table 3:

Secondary Analysis of Positive Cases

Figure 2: Change in maximum inter-eye vascular severity score over time among patients that screened positive, by treatment group.

DISCUSSION

Conclusion

Supplementary Material

What’s known on this subject:

What this study adds:

Conflict of Interest Disclosures:

Funding/Support:

Abbreviations:

REFERENCES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases