Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2016 Jun 16.
Published in final edited form as: JAMA Ophthalmol. 2015 Jun;133(6):675–682. doi: 10.1001/jamaophthalmol.2015.0460

A Validated System for Centralized Grading of Retinopathy of Prematurity: Telemedicine Approaches to Evaluating Acute Phase ROP (e-ROP) Study

Ebenezer Daniel 1, Graham E Quinn 1,2, P Lloyd Hildebrand 3, Anna Ells 4, Baker Hubbard 5, Antonio Capone 6, Revell W Martin 1, Candace P Ostroff 1, Eli Smith 1, Maxwell Pistilli 1, Gui-Shuang Ying 1, For the e-ROP Cooperative Group
PMCID: PMC4910817  NIHMSID: NIHMS781033  PMID: 25811772

Abstract

Importance

Measurable competence derived from comprehensive and advanced training in grading digital images is critical in studies using a Reading Center to evaluate retinal fundus images from infants at risk for Retinopathy of Prematurity (ROP). Details of certification for non-physician Trained Readers (TR) have not yet been described.

Objective

Describe a centralized system for grading ROP digital images by TRs in the “Telemedicine Approaches to evaluating acute-phase Retinopathy of Prematurity (e-ROP) Study”.

Design

Multi-center observational cohort study.

Setting

TRs were trained by experienced ROP specialists and certified to detect ROP morphology in digital retinal images under supervision of an ophthalmologist Reading Center Director (RCD). An ROP Reading Center was developed with standard hardware, secure Internet access and customized image viewing software with an electronic grading form. A detailed protocol for grading was developed. Based on results of TR gradings, a computerized algorithm determined whether Referral-Warranted ROP (RW-ROP, defined as presence of plus disease, zone I ROP, and stage 3 or worse ROP) was present. Independent double grading was done by the TRs with adjudication of discrepant fields performed by the RCD.

Participants

Infants with birth weights <1251g.

Interventions/Exposures

Digital retinal images

Main Outcome Measure(s)

Intra- and inter-grader variability, and monitoring for temporal drift.

Results

Four TRs underwent rigorous training and certification. 5,520 image sets were double graded with 25% requiring adjudication for at least one component of RW-ROP. The weighted kappas for inter-grader agreement (N= 80 image sets) were 0.72 (0.52 – 0.93) for RW-ROP, 0.57(95% CI: 0.37 – 0.77) for plus disease, 0.43 (0.24 – 0.63) for zone I ROP and 0.67(0.47 – 0.88) for Stage 3 or worse ROP. The weighted kappa for grade re-grade agreement were 0.77(0.57 – 0.97) for RW- ROP, 0.87(0.67 – 1.00) for plus disease, 0.70(0.51 – 0.90) for zone I ROP and 0.77 (0.57 – 0.97) for Stage 3 or worse ROP.

Conclusions

These data suggest that e-ROP system for training and certifying non-physicians to grade ROP images under the supervision of a Reading Center Director reliably detects potentially serious ROP with good intra-grader, inter-grader consistency and minimal temporal drift.

Introduction

Worldwide, there is limited availability of ophthalmologists experienced in detection of severe retinopathy of prematurity (ROP).1,2 A telemedicine system that can accurately identify infants with potentially severe ROP can maximize the likelihood of detecting eyes with referral warranted ROP (RW-ROP), i.e. morphologic features associated with severe ROP, such as plus disease, zone I ROP, or Stage 3 or worse ROP, that may indicate a need for intervention.3 The Telemedicine Approaches to evaluating Acute Phase ROP (e-ROP) Study was a large multi-center, National-Eye-Institute funded, clinical study undertaken to evaluate the validity of an ROP telemedicine system to detect eyes that have RW-ROP.4 It compared results of digital image grading to findings of binocular indirect ophthalmoscopy examinations performed by study-certified and ROP-experienced ophthalmologists.5 Similar to telemedicine approaches in diabetic retinopathy, the e-ROP Study established a reading center (RC) where trained, certified non-physician readers supervised by an ophthalmologist RC Director, graded standardized image sets of eyes at risk for ROP from infants with birth weights of <1251g. We describe the training, certification, operational workflow and quality assurance in a ROP RC that supported the e-ROP Study.

Methods

Reading Center Infrastructure

Standardized independent workstations with secure internet access were provided to all Trained Readers (TR) in the e-ROP study that included similarly configured computers with monitors that were calibrated every 2 weeks to maintain consistency in brightness and hue. Software was developed for displaying and manipulating contrast, brightness and magnification in the ROP images and data from grading was captured using web-based forms (Figure 1).

Figure 1.

Figure 1

Screen shot of e-ROP image display and web-based forms for grading.

After obtaining approval from the institutional review boards at each of the participating clinical centers, standard six-image sets were acquired for each eye, sorted by field of view with respect to the ideal field, focus and clarify, and then uploaded to the Inoveon Data Center (IDC) by non-physician imagers. The image sets were assigned to reading queues that could be randomly assigned a general reader queue or assigned to a specific reader. Trained Readers (TR) used a structured grading protocol to document morphologic ROP features on electronic grading forms. All image sets were graded independently by two TRs with discrepancies adjudicated by the RC Director. All images and grading data were stored in the e-ROP IDC and then exported to the Data Coordination Center (DCC) for review and statistical analysis.

Training and Certification of Trained Readers

The Ophthalmology Department of the University of Pennsylvania had an existing RC with 3 TRs with extensive experience in reading color and fluorescein digital retinal images in studies related to age-related macular degeneration and diabetic retinopathy, but with no previous experience in grading ROP images. Requirements for TRs included demonstrating ability to grade digital images systematically and to adhere strictly to the protocol. The TRs had diverse undergraduate backgrounds. They underwent a three-phase training, a precertification process, and a final certification process for e-ROP.

Phase 1 training included didactic lectures, interactive sessions, and assigned readings that covered classification of ROP,6 the e-ROP study protocol, telemedicine principles, the grading protocol and current ROP treatments. A broad spectrum of ROP clinical images were shown and discussed during interactive training sessions through face-to-face meetings and webinars with participation by expert graders, RC Director and the Study Chair. TRs visited the neonatal intensive care unit (NICU) at The Children’s Hospital of Philadelphia to observe the imaging of premature babies. To complete Phase 1, the trainees were required to successfully complete a knowledge assessment test.

In Phase 2 training, TRs independently viewed and graded training image sets with known ROP grading from a previous ROP study database.3 Prior to their use in training, a group of experts generated consensus final grading results for each image set that were used as the answer keys to assess TR grading performance. Using a paper grading form, the TRs graded each training image set for the presence or absence of plus disease, Zone I ROP and stage 3 or worse ROP. The percent agreement of each grading variable from each reader was determined by comparing TR grading results with expert consensus results. Each training session included independent grading of an average of 15 image sets by each TR, followed by the review of training image sets and discussion of their grading results. The Study biostatistician (GSY) reviewed the analysis of the grader training results with the RC Director and Study Chair to identify areas that warranted additional training.

In Phase 3 training, TRs graded additional ROP RetCam image sets using the electronic form and grading protocol. De-identified training image sets that included classic ROP morphology, various artifacts and different aspects of quality related to focus, clarity and field were provided. The TRs graded and reviewed 100 ROP training image sets. TRs graded image sets individually and met once a week with the Study Chair, RC Director and a clinical expert (AE) through teleconference to compare findings and discuss discrepancies with the image sets displayed on shared monitors.

During the pre-certification process, TRs were required to demonstrate good agreement with the consensus grading of training image sets and then ten image sets from e-ROP pilot study were provided for independent grading. Grading results for RW-ROP and its components were compared with the consensus grading for that image set. An agreement of 85% or higher was judged to be satisfactory, and re-training was required if the TR did not achieve a satisfactory score.

Final certification was conducted once 85% agreement in pre-certification images was reached. An additional fifteen image sets derived from e-ROP pilot submissions were queued for the final certification process. If <80% agreement was achieved on these image sets, retraining for a week was performed before another 15 images sets were queued and the process repeated until there was at least >80% agreement with the consensus grading, resulting in TR certification.

Grading Workflow

The RC workflow (eFigure 1) was developed using the defined roles of a Data Manager, TRs, RC Director and Study Chair. TRs graded all images from infants who developed RW-ROP based on the diagnostic examination. Approximately 80% of infants were not expected to develop RW-ROP; therefore we had decided a priori to select for the primary outcome paper a random sample of ~60% of infants who never developed RW-ROP. All image sets from this selected subsample of infants were graded by TRs. The Data Manager at the DCC selected and assigned image sets for grading. Two TRs independently graded each image set. The RC Director oversaw the operations of the RC and provided adjudication for discrepancies arising out of the TR double grading that were above a pre-determined threshold (eTable1), On rare occasions, the Study chair provided adjudication for grading disparities when referred by the RC Director. More details of the grading process are given in eMethods1.

Grading Protocol

The e-ROP study grading protocol required evaluation of both image quality and the key morphologic features of ROP. In developing the final e-ROP grading protocol, there were two iterations of protocol and a final third version was used to grade all of the study image sets. Details of grading of the image quality and morphological features (posterior pole vessels, zone and stage of ROP) are described under eMethods2.

Quality Assurance

To ensure the integrity and completeness of the image evaluation, the following procedures were used: the TRs were completely masked to all infant demographic information including birth weight and gestational age, clinical data on ROP findings from the diagnostic eye examination; the grading results from image sets of previous visits and image sets from fellow eye. In addition, real time consistency checks were performed and automatic edit queries were generated once TRs finalized the evaluation of an image set. Finally, the RC dual monitors used in grading were calibrated every week.

To monitor the reproducibility of grading, a random sample of 20 image sets (contemporaneous variability sample) were selected quarterly for regrading by each TRs to determine intra- and inter-grader variability. To monitor the consistency of grading over time, a random sample of 25 image sets (temporal drift sample) were selected, and this same sample was re-graded quarterly by each TR, to assess for agreement between regrading results and the initial grading results. All image sets selected for quality assurance were randomly intermingled within the regular grading queue so that the TRs were unaware which image sets were being regarded.

Results

Four TRs were trained and certified for the Study in two groups. After one member of the initial group of three TRs left for another job, a replacement TR was recruited and trained. TR certification required at least 85% agreement in interpreting RW-ROP with consensus grading. As shown in eTable 2, the agreements are very high in pre-certification (80% to 100%) and final certification (93% – 100%) for each TR.

For the primary outcome paper, a total of 5,520 image sets were double graded by TRs with adjudication of discrepancies. Approximately 56% of image set grading had discrepancies in at least one grading field (either image quality fields, or ROP morphology fields) that required adjudication by the RC Director. A small number of discrepancies (246 image sets, 4.5%) were also reviewed by the Study Chair at the beginning of study to assure that the non-physician readers and the RC Director were following the grading protocol. Overall, a quarter of the image sets required adjudication for a feature that determined whether or not RW-ROP was present. For individual RW-ROP components, the adjudication rate was 4% for plus disease, 12% for zone I ROP and 17% for stage 3 or higher ROP (Table 1).

Table 1.

Adjudication for components of RW-ROP in the e-ROP Study

n Any Adjudication
Yes (%)
Total RC Grading 5520 3115 (56.4%)
RW-ROP from RC Final grading Any Adjudicated for RW-ROP
Yes (%)
 No 3911 505 (12.9%)
 Yes 1495 787 (52.6%)
 CD 114 60 (52.6%)
 Total 5520 1,352 (24.5%)
Plus disease from RC Final grading Any Adjudicated for Plus disease
Yes (%)
 No 5315 122 (2.3%)
 Yes 130 71 (54.6%)
 CD 75 20 (26.7%)
 Total 5520 213 (3.86%)
Zone 1 Disease from RC Final grading Any Adjudicated for Zone 1 disease
Yes (%)
 No 5018 421 (8.4%)
 Yes 402 228 (56.7%)
 CD 100 36 (36.0%)
 Total 5520 685 (12.4%)
Stage 3 or higher from RC Final grading Any Adjudicated for Stage 3 or higher
Yes (%)
 No 4067 469 (11.5%)
 Yes 1359 425 (31.3%)
 CD 94 38 (40.4%)
 Total 5520 932 (16.9%)

RC=Reading Center, CD=Cannot Determine, RW-ROP=Referral Warranted Retinopathy of Prematurity

The temporal drift for each of the TRs is shown in Table 2. The temporal drift sample of 25 image sets graded during November 2012 was regraded 3 times by each of the TRs. The grade- regrade agreement for RW-ROP was high with weighted Kappa ranging from 0.57 to 0.94 across the TRs. A total of 80 image sets from 4 samples of the contemporaneous variability sample were regraded by each TR during the grading period. The weighted kappa for inter-grader agreement from these 80 image sets is shown in Table 3. The weighted kappa was 0.72 (0.52 – 0.93) for RW-ROP, 0.57 (95% CI: 0.37 – 0.77) for plus disease, 0.43 (0.24 – 0.63) for zone I ROP and 0.67 (0.47 – 0.88) for Stage 3 or worse ROP. Any ROP had a weighted kappa score of 0.89 (95% CI: 0.68–1.00). The weighted kappa for grade-regrade agreement of final consensus grading from these 80 image sets is given in Table 4. The weighted kappa was 0.77 (0.57 – 0.97) for RW- ROP, 0.87 (0.67 – 1.00) for plus disease, 0.70 (0.51 – 0.90) for zone I ROP and 0.77 (0.57 – 0.97) for Stage 3 or worse ROP. Any ROP had a perfect agreement.

Table 2.

Temporal Drift among Trained Readers in the e-ROP Study.

Original sample graded 11/30/2012
Re-grade period 1 02/28/2013 – 03/01/2013 Re-grade period 2 06/17/2013 – 06/18/2013 Re-grade period 3 10/24/2013 – 10/25/2013
Grader Variable Percent Agreement Weighted Kappa Percent Agreement Weighted Kappa Percent Agreement Weighted Kappa
1 Any ROP 100% 1.00 (1.00, 1.00) 96% 0.94 (0.59, 1.00) 100% 1.00 (1.00, 1.00)
Ridge 92% 0.84 (0.48, 1.00) 92% 0.87 (0.51, 1.00) 92% 0.84 (0.48, 1.00)
Flat Pre-retinal Neovascular Proliferation 100% 1.00 (1.00, 1.00) 96% 0.65 (0.28, 1.00) 100% 1.00 (1.00, 1.00)
Plus Disease 92% 0.00 (n/a) 96% 0.00 (n/a) 100% 0.00 (n/a)
Zone 1 ROP 100% 1.00 (1.00, 1.00) 96% 0.65 (0.28, 1.00) 100% 1.00 (1.00, 1.00)
Stage 3 or Worse 88% 0.77 (0.42, 1.00) 84% 0.71 (0.36, 1.00) 84% 0.70 (0.35, 1.00)
Referral-Warranted ROP 88% 0.77 (0.42, 1.00) 84% 0.71 (0.36, 1.00) 84% 0.70 (0.35, 1.00)
2 Any ROP 100% 1.00 (1.00, 1.00) 96% 0.94 (0.59, 1.00) 100% 1.00 (1.00, 1.00)
Ridge 100% 1.00 (1.00, 1.00) 96% 0.94 (0.59, 1.00) 96% 0.92 (0.56, 1.00)
Flat Pre-retinal Neovascular Proliferation 100% 1.00 (1.00, 1.00) 96% 0.65 (0.28, 1.00) 100% 1.00 (1.00, 1.00)
Plus Disease 100% 1.00 (1.00, 1.00) 92% −0.02 (−0.24, 0.20) 96% 0.00 (n/a)
Zone 1 ROP 88% 0.31 (0.05, 0.57) 88% 0.31 (0.05, 0.57) 96% 0.76 (0.45, 1.00)
Stage 3 or Worse 96% 0.94 (0.59, 1.00) 76% 0.57 (0.24, 0.90) 80% 0.62 (0.29, 0.96)
Referral-Warranted ROP 96% 0.94 (0.59, 1.00) 76% 0.57 (0.24, 0.90) 80% 0.62 (0.29, 0.96)
3 Any ROP 96% 0.91 (0.55, 1.00) 96% 0.91 (0.55, 1.00) 96% 0.91 (0.55, 1.00)
Ridge 96% 0.91 (0.55, 1.00) 96% 0.91 (0.55, 1.00) 96% 0.91 (0.55, 1.00)
Flat Pre-retinal Neovascular Proliferation 100% 1.00 (1.00, 1.00) 100% 1.00 (1.00, 1.00) 100% 1.00 (1.00, 1.00)
Plus Disease 100% 1.00 (1.00, 1.00) 96% 0.00 (n/a) 100% 1.00 (1.00, 1.00)
Zone 1 ROP 100% 1.00 (1.00, 1.00) 96% 0.86 (0.52, 1.00) 92% 0.75 (0.41, 1.00)
Stage 3 or Worse 92% 0.84 (0.48, 1.00) 88% 0.76 (0.40, 1.00) 92% 0.85 (0.49, 1.00)
Referral-Warranted ROP 92% 0.84 (0.48, 1.00) 92% 0.84 (0.48, 1.00) 92% 0.85 (0.49, 1.00)

Table 3.

Inter grader agreement in e-ROP Study

Inter-Grader Variability for the Samples of Contemporaneous Variability. All Samples Combined (n=80)
Variable Percent Agreement Weighted Kappa
Pupil Quality 94% 0.73 (0.56, 0.91)
Disc Center Quality 66% 0.47 (0.30, 0.65)
Disc Up Quality 74% 0.56 (0.43, 0.69)
Disc Down Quality 78% 0.40 (0.25, 0.56)
Disc Temporal Quality 81% 0.66 (0.51, 0.81)
Disc Nasal Quality 74% 0.49 (0.34, 0.64)
Posterior Pole Vessels 80% 0.60 (0.39, 0.82)
Preplus/plus in each quadrant
 ST Quadrant 79% 0.69 (0.52, 0.86)
 IT Quadrant 71% 0.59 (0.42, 0.76)
 IN Quadrant 71% 0.49 (0.32, 0.67)
 SN Quadrant 78% 0.59 (0.42, 0.76)
Total Quadrants of Plus/Preplus 65% 0.68 (0.51, 0.84)
Total Quadrants of Plus 88% 0.50 (0.32, 0.68)
Dominant Feature 71% 0.58 (0.42, 0.75)
Any ROP 95% 0.89 (0.68, 1.00)
Demarcation Line 99% 0.74 (0.56, 0.92)
Ridge 83% 0.65 (0.45, 0.86)
Extra Retinal Fibrovascular Proliferation 83% 0.67 (0.47, 0.88)
Flat Pre-retinal Neovascular Proliferation 100% 1.00 (1.00, 1.00)
Retinal Detachment 100% 1.00 (1.00, 1.00)
Highest Stage 81% 0.85 (0.67, 1.00)
Lowest Zone 80% 0.81 (0.63, 0.99)
Plus Disease 90% 0.57 (0.37, 0.77)
Zone 1 ROP 83% 0.43 (0.24, 0.63)
Stage 3 or Worse 83% 0.67 (0.47, 0.88)
Referral-Warranted ROP 85% 0.72 (0.52, 0.93)

Table 4.

Grade-regrade variability among Trained Readers in the e-ROP study

Intra-Grader Variability
Variable Percent Agreement Weighted Kappa
Pupil Quality 95% 0.76 (0.59, 0.93)
Disc Center Quality 85% 0.67 (0.46, 0.88)
Disc Up Quality 91% 0.78 (0.63, 0.93)
Disc Down Quality 96% 0.81 (0.63, 0.99)
Disc Temporal Quality 90% 0.76 (0.61, 0.91)
Disc Nasal Quality 89% 0.64 (0.50, 0.78)
Posterior Pole Vessels 86% 0.73 (0.51, 0.94)
Preplus/plus in each quadrant
ST Quadrant 88% 0.81 (0.64, 0.98)
IT Quadrant 81% 0.73 (0.56, 0.90)
IN Quadrant 89% 0.81 (0.64, 0.99)
SN Quadrant 88% 0.80 (0.62, 0.97)
Total Quadrants of Plus/Preplus 74% 0.75 (0.57, 0.94)
Total Quadrants of Plus 98% 0.87 (0.69, 1.00)
Dominant Feature of vessels 69% 0.65 (0.48, 0.82)
Any ROP 100% 1.00 (1.00, 1.00)
Demarcation Line 100% 1.00 (1.00, 1.00)
Ridge 98% 0.95 (0.74, 1.00)
Extra Retinal Fibro-vascular Proliferation 88% 0.77 (0.57, 0.97)
Flat Pre-retinal Neovascular Proliferation 100% 1.00 (1.00, 1.00)
Retinal Detachment 100% 1.00 (1.00, 1.00)
Highest Stage 89% 0.95 (0.77, 1.00)
Lowest Zone 93% 0.95 (0.77, 1.00)
Plus Disease 98% 0.87 (0.67, 1.00)
Zone 1 ROP 90% 0.70 (0.51, 0.90)
Stage 3 or Worse 88% 0.77 (0.57, 0.97)
Referral-Warranted ROP 88% 0.77 (0.57, 0.97)

Discussion

Measurable competence developed during comprehensive and advanced training in grading digital images is critical in studies that use a centralized RC system to evaluate retinal fundus images from infants at risk for ROP. Previously published reports studying telemedicine approaches to ROP do not detail the processes involved in certifying non-physician TRs, except one that describes a brief training session for non-expert graders consisting mostly of medical students and ophthalmology residents.7 Previous studies on ROP telemedicine have used different type of readers: single ROP experienced ophthalmologist as an unmasked reader;3 single masked ROP experienced ophthalmologists8,9; more than 2 masked ROP experienced ophthalmologists;8,9 ROP experienced ophthalmologists who performed clinical exams and evaluated retinal images from the same infants after a few months;12 and non-physician imagers who also read the images they had taken.13 Previous clinical studies have not studied a system to evaluate the competency of remote non-physician graders.

The e-ROP RC developed an ROP curriculum, training and certification for non-physician readers (TRs) and developed and implemented a standardized grading protocol using electronic data capture. The e-ROP study protocol required two TRs to grade image sets independently and significant discrepancies were adjudicated by the RC Director and, if needed, by the Study Chair. By implementing an extensive quality management system that included quality assessment of images, inter- and intra-reader variability, temporal and contemporaneous drift, the e-ROP study maintained a consistent and repeatable grading system throughout the study period.

It was important to have a robust certification system that keeps TR agreement levels high and this was particularly important in accurately identifying morphological features of plus disease, zone I ROP, and stage 3 ROP or worse. An important feature in the development of the TR certification system was establishing a standard criterion of reference for not only the different morphology of eyes of RW-ROP, but also for any ROP and pre-plus disease. This was done through a process of integrating the grading of three expert readers, the RC Director and the Study Chair and using this consensus grading as the standard criterion against which the grading of the TRs were compared. The excellent agreement between TRs reflects the extensive and rigorous training and certification process.

Intra-grader agreement was good both in grading quality of images, as well as in grading each of the morphologic features of ROP. Among quality assessments, the disc center image quality had the lowest re-grade agreement and this could be a measure of the variability in judging the quality within an area of 3 disc radius surrounding the optic disc without using a standard template. In re-grading the presence of Zone I ROP, the intra-grader agreement was less when compared with agreements of other RW-ROP components. This may be attributable to the difficulty in accurately identifying the foveal center in the images. This difficulty is consistent with the results from a previous study that reported variability when ROP specialized ophthalmologists identified the macular center in printed digital images.15 The reliability of identifying the foveal center and subsequent delineation of Zone I consistently in digital images could be increased by using a standard Zone I template for digital images.

Inter-grader variability also measured differences in judging the quality of the retinal images. The weighted kappa of agreement ranged from 0.40 to 0.66 for each of the five retinal images. The nasal and superior fields had the lowest agreements in terms of image quality. The quality of images becomes extremely important when the TR is unable to determine RW-ROP while perceiving morphological changes in poor quality digital images. However, this study showed uncertainty in grading RW-ROP pathology in less than 2% of all image sets graded and less than 1% required adjudication for such uncertainty by the RC Director (Table 3). Enhancing the appearance of the ROP morphology and attenuating background noise in poor quality images by manipulating the contrast, brightness, magnification and grey tone brings more certainty to detecting ROP pathology in digital images. While some studies have refrained from using image enhancement to avoid the confounding effects of difference in the skills in enhancing images by the readers,16 we allowed the TRs in our study to use the functional modalities of the e-ROP grading software. Since all TRs graded images used these functionalities, we are unable to assess what impact these enhancements had on reducing undecided readings or the extent to which they increased or decreased agreement between readers and we propose investigating this in future studies.

Identifying plus and pre-plus showed an inter-grader variability weighted kappa of 0.57. Using ICROP images6 as standards for the tortuosity and dilation in the four quadrants to identify plus and pre-plus disease did not appear to adequately minimize inter-grader variability. This was not surprising as identification of plus disease in ROP among experts also appears to be highly variable over several studies.16,17,18 Four experienced ROP ophthalmologists grading high-quality RetCam images disagreed 10% of the time on the presence of plus disease and the disagreement increased almost three-fold when grading was confined to images having only plus and pre-plus disease.16 Among 22 experienced ROP experts who interpreted wide angle images for the presence of plus disease, only 27% had mean kappa scores over 0.80 (substantial agreement) while 18% had scores below 0.41 (slight or fair agreement).18 Unlike our study where TRs were allowed to look at the four peripheral retinal images before identifying plus disease in the disc center image, the studies had these expert readers looking only at the disc center image. In the e-ROP study TR disagreement on presence of plus disease was 55%. Thus, these disagreements identifying plus disease persist in telemedicine ROP studies and need more rigorous refinements on the definition and quantitative methods of detecting plus disease in digital images.

Study strengths included having the TRs and the RCD completely masked to all infant details and having them grade the image sets of each eye independent of the morphological changes that could be present in the other eye. Paradoxically this could also have been a limitation of the study as allowing access to information on the gestational age and birth weight of the infant together with the findings in the other eye could have improved the sensitivity and specificity in the study. This hypothesis is being tested in a future study. Another limitation of this study and other similar past studies is that there was no gold standard to assess the competency of the TR in identifying morphological features in the retinal images; the consensus opinion of a few experts in ROP, which is subject to error, was used as the standard for comparison for training and certification of TRs.

These results suggest that a reliable, comprehensive, systematic training and certification of non-physician readers of digital image sets of premature infants at risk for ROP is feasible. To our knowledge this is the first study that has demonstrated consistent and good agreement between and among trained non-physician readers grading ROP from digital images using a centralized reading facility.

Supplementary Material

eFigure
eMethods 1
eMethods 2
eTable 1
eTable 2

Acknowledgments

Funding/Support: This study was funded by a cooperative agreement grant (U10 EY017014) from the National Eye Institute of the National Institutes of Health, Department of Health and Human Services.

Footnotes

Dr Gui-shuang Ying and Dr Graham Quinn had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.

Conflict of Interest Disclosures: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest: Anna Ells is a member of the scientific advisory board for Clarity Systems and Lloyd Hildebrand is Chairman of Inoveon board. Dr. Hubbard reports personal fees from the University of Pennsylvania, during the conduct of the study; personal fees from VisionQuest Biomedical, LLC, outside the submitted work.

The other authors have no conflict of interest with regard to the material presented in the manuscript.

Author Contributions:

Study concept and design: Ebenezer Daniel, Graham E Quinn.

Acquisition, analysis, or interpretation of data: Ebenezer Daniel, Graham E Quinn, P. Lloyd Hildebrand, Anna Ells, Baker Hubbard, Antonio Capone, Revell W. Martin, Candace P. Ostroff, Eli Smith, Maxwell Pistilli, Gui-Shuang Ying.

Drafting of the manuscript: Ebenezer Daniel, Graham E Quinn

Critical revision of the manuscript for important intellectual content: Ebenezer Daniel, Graham E Quinn, P. Lloyd Hildebrand, Anna Ells, Baker Hubbard, Antonio Capone, Revell W. Martin, Candace P. Ostroff, Eli Smith, Maxwell Pistilli, Gui-Shuang Ying,

Statistical analysis: Gui-shuang Ying, Graham Quinn, Max Pistilli, Ebenezer Daniel,

Obtained funding: Graham E Quinn.

Administrative, technical, or material support: Ebenezer Daniel, Graham E Quinn, P. Lloyd Hildebrand, Anna Ells, Baker Hubbard, Antonio Capone, Revell W. Martin, Candace P. Ostroff, Eli Smith, Maxwell Pistilli, Gui-Shuang Ying.

Study supervision: Ebenezer Daniel, Graham Quinn,

Role of the Sponsor: The funding source had no role in the design, and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, and approval of the manuscript; and decision to submit the manuscript for publication.

References

  • 1.Kong L, Fry M, Al-Samarraie M, et al. An update on progress and the changing epidemiology of causes of childhood blindness worldwide. J AAPOS. 2012;16:501–507. doi: 10.1016/j.jaapos.2012.09.004. [DOI] [PubMed] [Google Scholar]
  • 2.Gilbert C, Foster A. Childhood blindness in the context of VISION 2020—the right to sight. Bull World Health Organ. 2001;79:227–32. [PMC free article] [PubMed] [Google Scholar]
  • 3.Ells A, Holmes JM, Astle W, et al. Telemedicine approach to screening for severe retinopathy of prematurity: A pilot study. Ophthalmology. 2003;110:2113–7. doi: 10.1016/S0161-6420(03)00831-5. [DOI] [PubMed] [Google Scholar]
  • 4.Quinn Graham E e-ROP Cooperative Group. Telemedicine approaches to evaluating acute-phase retinopathy of prematurity: study design. Ophthalmic Epidemiol. 2014;21:256–67. doi: 10.3109/09286586.2014.926940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Quinn GE, Ying GS, Daniel E, et al. e-ROP Cooperative Group. Validity of a telemedicine system for the evaluation of acute-phase retinopathy of prematurity. JAMA Ophthalmol. 2014;132:1178–84. doi: 10.1001/jamaophthalmol.2014.1604. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.International Committee for the Classification of Retinopathy of Prematurity. The International Classification of Retinopathy of Prematurity revisited. Arch Ophthalmol. 2005;123:991–9. doi: 10.1001/archopht.123.7.991. [DOI] [PubMed] [Google Scholar]
  • 7.Williams SL, Wang L, Kane SA, et al. Telemedical diagnosis of retinopathy of prematurity: accuracy of expert versus non-expert graders. Br J Ophthalmol. 2010;94:351–6. doi: 10.1136/bjo.2009.166348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Fijalkowski N, Zheng LL, Henderson MT, et al. Stanford University Network for Diagnosis of Retinopathy of Prematurity (SUNDROP): five years of screening with telemedicine. Ophthalmic Surg Lasers Imaging Retina. 2014;45:106–13. doi: 10.3928/23258160-20140122-01. [DOI] [PubMed] [Google Scholar]
  • 9.Murthy KR, Murthy PR, Shah DA, Nandan MR, SNH, Benakappa N. Comparison of profile of retinopathy of prematurity in semiurban/rural and urban NICUs in Karnataka, India. Br J Ophthalmol. 2013;97:687–9. doi: 10.1136/bjophthalmol-2012-302801. [DOI] [PubMed] [Google Scholar]
  • 10.Chiang MF, Keenan JD, Starren J, et al. Accuracy and reliability of remote retinopathy of prematurity diagnosis. Arch Ophthalmol. 2006;124:322–7. doi: 10.1001/archopht.124.3.322. [DOI] [PubMed] [Google Scholar]
  • 11.Skalet AH, Quinn GE, Ying GS, et al. Telemedicine screening for retinopathy of prematurity in developing countries using digital retinal images: a feasibility project. J AAPOS. 2008;12:252–8. doi: 10.1016/j.jaapos.2007.11.009. [DOI] [PubMed] [Google Scholar]
  • 12.Scott KE, Kim DY, Wang L, et al. Telemedical diagnosis of retinopathy of prematurity intraphysician agreement between ophthalmoscopic examination and image-based interpretation. Ophthalmology. 2008;115:1222–1228. doi: 10.1016/j.ophtha.2007.09.006. [DOI] [PubMed] [Google Scholar]
  • 13.Vinekar A, Gilbert C, Dogra M, et al. The KIDROP model of combining strategies for providing retinopathy of prematurity screening in underserved areas in India using wide-field imaging, tele-medicine, non-physician graders and smart phone reporting. Indian J Ophthalmol. 2014;62:41–9. doi: 10.4103/0301-4738.126178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Chiang MF, Thyparampil PJ, Rabinowitz D. Interexpert agreement in the identification of macular location in infants at risk for retinopathy of prematurity. Arch Ophthalmol. 2010;128:1153–9. doi: 10.1001/archophthalmol.2010.199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chiang MF, Wang L, Busuioc M, et al. Telemedical retinopathy of prematurity diagnosis: accuracy, reliability, and image quality. Arch Ophthalmol. 2007;125:1531–8. doi: 10.1001/archopht.125.11.1531. [DOI] [PubMed] [Google Scholar]
  • 16.Wallace DK, Quinn GE, Freedman SF, Chiang MF. Agreement among pediatric ophthalmologists in diagnosing plus and pre-plus disease in retinopathy of prematurity. J AAPOS. 2008;12:352–356. doi: 10.1016/j.jaapos.2007.11.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hewing NJ, Kaufman DR, Chan RV, Chiang MF. Plus disease in retinopathy of prematurity: qualitative analysis of diagnostic process by experts. JAMA Ophthalmol. 2013;131:1026–32. doi: 10.1001/jamaophthalmol.2013.135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Chiang MF, Gelman R, Jiang L, Martinez-Perez ME, Du YE, Flynn JT. Plus disease in retinopathy of prematurity: an analysis of diagnostic performance. Trans Am Ophthalmol Soc. 2007;105:73–84. [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

eFigure
eMethods 1
eMethods 2
eTable 1
eTable 2

RESOURCES