Fairness and generalisability in deep learning of retinopathy of prematurity screening algorithms: a literature review

Luis Filipe Nakayama; William Greig Mitchell; Lucas Zago Ribeiro; Robyn Gayle Dychiao; Warachaya Phanphruk; Leo Anthony Celi; Khumbo Kalua; Alvina Pauline Dy Santiago; Caio Vinicius Saito Regatieri; Nilva Simeren Bueno Moraes

doi:10.1136/bmjophth-2022-001216

. 2023 Aug 9;8(1):e001216. doi: 10.1136/bmjophth-2022-001216

Fairness and generalisability in deep learning of retinopathy of prematurity screening algorithms: a literature review

Luis Filipe Nakayama ^1,^2,^✉, William Greig Mitchell ³, Lucas Zago Ribeiro ², Robyn Gayle Dychiao ⁴, Warachaya Phanphruk ⁵, Leo Anthony Celi ^1,⁶, Khumbo Kalua ⁷, Alvina Pauline Dy Santiago ⁸, Caio Vinicius Saito Regatieri ², Nilva Simeren Bueno Moraes ²

PMCID: PMC10414056 PMID: 37558406

Abstract

Background

Retinopathy of prematurity (ROP) is a vasoproliferative disease responsible for more than 30 000 blind children worldwide. Its diagnosis and treatment are challenging due to the lack of specialists, divergent diagnostic concordance and variation in classification standards. While artificial intelligence (AI) can address the shortage of professionals and provide more cost-effective management, its development needs fairness, generalisability and bias controls prior to deployment to avoid producing harmful unpredictable results. This review aims to compare AI and ROP study’s characteristics, fairness and generalisability efforts.

Methods

Our review yielded 220 articles, of which 18 were included after full-text assessment. The articles were classified into ROP severity grading, plus detection, detecting treatment requiring, ROP prediction and detection of retinal zones.

Results

All the article’s authors and included patients are from middle-income and high-income countries, with no low-income countries, South America, Australia and Africa Continents representation.

Code is available in two articles and in one on request, while data are not available in any article. 88.9% of the studies use the same retinal camera. In two articles, patients’ sex was described, but none applied a bias control in their models.

Conclusion

The reviewed articles included 180 228 images and reported good metrics, but fairness, generalisability and bias control remained limited. Reproducibility is also a critical limitation, with few articles sharing codes and none sharing data. Fair and generalisable ROP and AI studies are needed that include diverse datasets, data and code sharing, collaborative research, and bias control to avoid unpredictable and harmful deployments.

Keywords: Imaging, Retina

WHAT IS ALREADY KNOWN ON THIS TOPIC

Retinopathy of prematurity is the most common avoidable cause of childhood blindness, which is a burden mainly in undeveloped countries due to inadequate preterm and neonatal care and a lack of specialists to diagnose and treat patients.
Teleophthalmology has been applied in retinopathy of prematurity screening, creating an opportunity for algorithm development and artificial intelligence applications.

WHAT THIS STUDY ADDS

This study shows that although studies apply artificial intelligence in retinopathy of prematurity, fairness and generalisability efforts are limited in this study field.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

Fairness, generalisability and bias controls are fundamental for adequate artificial intelligence implementation. Retinopathy of prematurity is promising, but more efforts are needed to avoid unpredictable and harmful results.

Background

Prematurity is defined as birth before 37 weeks of pregnancy, classified as either extreme preterm (<28 weeks), very preterm (28–32 weeks) or moderate to late preterm (32–37 weeks).¹ There are an estimated 15 million live preterm deliveries each year, primarily in low-income and middle-income countries (LMICs). While the number of preterm births is increasing each year globally, so too are survival rates into adulthood, in large part due to improvements in neonatal intensive care facilities and technology (especially in LMICs). However, with many regions still lacking access to such advancements, maternal and fetal sequelae of prematurity (including retinopathy of prematurity (ROP)) are becoming more and more consequential.^1–3

ROP is a proliferative retinal vasculopathy and one of the most common avoidable causes of childhood blindness. Low gestational age, low birth weight and supplemental oxygen at birth are major risk factors for ROP, leading to more than 30 000 children losing vision annually.^{4 5} Recognition of pertinent screening periods and timely diagnosis and management are challenging due to the lack of available paediatric ophthalmic specialists. And even when available, the variation in classification standards, equipment and examination technique, and treatment thresholds lead to divergent diagnostic concordance even among experts.^{6 7}

Artificial intelligence (AI) algorithms use inputted data to mathematically generate clinical predictions, and the extensive use of ancillary imaging in ophthalmology makes them especially pertinent in aiding diagnosis and informing management for conditions such as ROP^{8 9} AI has to date been widely applied in the detection of ophthalmological conditions, including diabetic retinopathy, age-related macular degeneration, glaucoma and ROP—shown to perform on-par or better than human clinician.^10–12 Through minimising the bias inherent to AI, and external validation of algorithms to optimise for consistency and replicability of predictions, the potential for AI to mitigate the consequences of relative specialist scarcity and provide cost-effective diagnosis and decision-making is powerful. However, currently, there are no commercial AI tools clinically approved for ROP screening.¹³

The application of teleophthalmology in ROP diagnosis and management in particular (applied to address underserved remote areas) has created an opportunity to collect rich volumes of ophthalmic imaging data, which can be used secondarily to further AI development in this field.⁷

Prioritising generalisability, fairness and reproducibility when developing AI algorithms is essential to promote nondiscriminatory models. By way of definition, ‘generalisability’ is the ability to provide accurate predictions in a new sample of patients not included in the original training population,¹⁴ ‘fairness’ is the assurance that AI systems are not biased in their predictions for subpopulations,¹⁵ and ‘reproducibility’ is the system’s capacity to replicate the accuracy in patients not included in the development.¹⁴ Code and data sharing are crucial components to facilitate generalisable and reproducible research and validation studies and enable an understanding of how models can be adapted and applied to heterogeneous patient populations globally who stand most to benefit from advancement in AI in ophthalmology.^{11 12 16}

The risk of biased algorithms is a prominent concern in the development and implementation of safe AI and must be addressed to avoid perpetuating existing healthcare disparities. Because of the nature of the patient population in ROP screening and management, medicolegal aspects are also especially crucial before AI can be implemented safely in this space.¹⁷

Here, we review ROP studies that implement AI techniques, compare datasets and algorithms characteristics, and analyse efforts to ensure fairness, generalisability and reproducibility of findings.

Methods

A literature search was conducted using PubMed, EMBASE and MEDLINE databases. The search strategy used the following combination of key terms ROP and AI (search strategy detailed in online supplemental file 1).

Supplementary data

bmjophth-2022-001216supp001.pdf^{(22KB, pdf)}

Two authors (LFN and LZR) assessed articles found in the above search. First, we screened all articles and excluded those for non-human studies and those written in a language other than English, Portuguese or Spanish. Next, a second screening process evaluated article titles and excluded non-relevant articles (see below for a definition of ‘relevant’), reviews, clinical cases and comments. Finally, a third screening consisted of full-text analysis and excluded non-relevant articles, and those not available online.

The final cohort of articles deemed relevant included those mentioning AI that applied computer vision algorithms to ROP and compared the following variables: articles’ objectives, retinal camera, model preprocessing techniques, applied neural network and performance, data and code availability, authors’ nationality, and cohort demographics and nationality.

Results

The search strategy initially identified 220 articles, with 29 deemed eligible for full-text analysis (figure 1). After the full-text analysis, 18 articles were included in the final review (online supplemental file 2).

Supplementary data

bmjophth-2022-001216supp002.pdf^{(19.1KB, pdf)}

General characteristics

All articles were published between 2016 and 2022, with an annual crescent rate of publications (figure 2). According to the model objective, the articles were classified into ROP severity grading algorithms (7 articles—38.9%), plus detection algorithms (6 articles—33.3%), detecting patients that need treatment (3 articles—27.8%), ROP prediction (1 article—5.5%) and automated detection of retinal zones (1 article—5.5%).

Regarding author’s representation, eight authors were from China (44.4%), six were from the USA (33.3%) and four were from India (22.2%). In 13 articles (72.2%), all authors were from a single country, and in 5 (27.8%) from an international collaboration group (figure 3).

In 2 of the 18 articles (11.1%), the code is publicly available.^{18 19} A further single article (5.5%) has made the code available on request.²⁰ The datasets used for development and validation processes are not available in any of the reviewed articles. None of the articles report a bias control analysis, such as the metrics between different demographics and datasets.

Images and Camera

The most commonly used imaging system in ROP and AI datasets were the RetCam II, III and Shuttle cameras (Natus Medical, Pleasanton, California, USA) in 16 articles of 18 articles (88.9%). One article (5.5%) also used the 3nethra Neo camera (Forus Health, Bangalore, India), and two (11.1%) did not specify the retinal imaging hardware.^{20 21} In three articles (16.7%), only good-quality imaging examination was included,^22–24 one article (5.5%) included both good and bad-quality imaging examination,²⁰ and in the others, no quality control was described.

Dataset

A total of 180 228 images were included in the 18 reviewed articles. The number of individual patients included was not described. In three articles (16.7%),^22–24 the sex of the included patients was described, but race/ethnicity information was not available in any article. The countries most represented by the data were China (seven articles) and India (four articles). There was no representation of countries in continental Africa or South America or Australia (figure 4).

Map with study’s population distribution.

Preprocessing

In computer vision algorithms, preprocessing is a fundamental preparatory step in data harmonisation before final model development. Of the articles reviewed, the image preprocessing stages consisted of applying a mask over the retinal image, image resizing, colour normalisation, vessel segmentation, image enhancement, illumination adjustment and image augmentation techniques (flipping and rotation). In seven articles (38.9%), details of the preprocessing techniques applied were not described.^23–28

ROP severity grading algorithms

In seven articles (38.9%), the main objective was to automate the grading of ROP severity through ultra-wide colour retinal fundus photos (RetCam and non-specified camera).19 22 23 25–27 29 The grading algorithms included 93 383 images and the convolutional neural networks (CNN): Visual Geometry Group, Inception, Residual Network, Support Vector Machine and DenseNet. The reported metrics consisted of accuracy ranging from 91.9% to 99%, with sensitivity from 88.4% to 96.6%, specificity from 92.3% to 99.3% and area under the curve (AUC) from 0.92% to 0.99%.

Plus and preplus disease detection algorithms

Plus disease is one of the most critical features of ROP, indicating severe treatment-requiring ROP. Plus disease in ROP is characterised by arteriolar tortuosity and venous dilation in ≥2 quadrants of the posterior pole, and ‘preplus’ disease describes aforementioned posterior pole vascular anomalies not fulfilling plus criteria.^{30 31} In six articles, the main objective was to detect plus or preplus disease in retinal fundus photos through extracted vessel analysis from ultra-wide retinal fundus photos (RetCam and non-specified camera).^{30 32–36}

The plus detection algorithms included 17 176 images and applied a U-Net and a modified U-Net CNN, with reported metrics of accuracy ranging from 72.3% to 94%, sensitivity from 92.4% to 95%, specificity from 92.4% to 94% and AUC from 0.88 to 0.98.

Detecting patients that need treatment

In three articles, the main objective was to detect ROP patients that require treatment. Two articles included RetCam colour fundus photos and one RetCam fluorescein angiography exam.^{18 28 37} The articles included 59 636 images from 3254 patients and applied a Gridding Residual Network and a U-Net model, with reported metrics of AUC ranging from 0.91 to 0.99. The Campbell et al article reported a 100% sensitivity and 78% specificity,³⁷ and in the others, the sensitivity/specificity was not recorded.

ROP prediction

In one article, the objective was to predict the occurrence and severity of ROP using deep learning in a prospective dataset.²⁴ The article included 7033 RetCam images from 725 patients and applied a ResNet CNN, with a reported accuracy of 68%, sensitivity of 100%, specificity of 46.6% and AUC of 0.87.

Automated detection of retinal zones

According to the International Classification of ROP, the retina is divided into three anatomical zones according to the optic disc, macula and ora serrata distances.³¹ Zone I is the circle centre on the optic disc with a radius of twice the distance from the optic disc to the centre of the macula, the zone II is the area from the edge of zone I to a circle with a radius distance of the optic disc to nasal ora serrata, and the zone III the residual temporal area outer than zone II.³⁸

Determining the zones is important to classify the ROP stage, determine follow-up frequency and estimate the risk of ROP sequelae.

In one article, the objective was to automate detecting retinal zones in a retinal fundus photograph that included 3000 images (RetCam and 3nethra) and apply a U-Net CNN, with reported metrics of 98% accuracy in detecting retinal zones.²⁰ However, this article does not report sensitivity, specificity and AUC metrics.

Discussion

The social morbidity of ROP is becoming increasingly relevant as global preterm birth and survival to adulthood continue to rise. However, consistent ROP diagnosis and treatment are fraught with challenges. This is in part due to a lack of expert availability, and in part because of the variation in examination technique, findings and treatment thresholds intrinsic to the process of human clinicians making imaging-based clinical diagnoses and decisions.⁵ AI and deep learning algorithms have the potential to ameliorate these challenges in ROP screening, detection and management, especially in remote areas and LMIC countries.⁶

Here, we find that most ROP articles employ AI techniques to grade ROP severity, detect plus disease, predict future ROP and identify patients requiring treatment. While metrics indicate promising results, we found that generalisability and fairness efforts are extremely limited in all ROP and AI articles. To ascertain representativeness, more ethnicity/race and country representation are needed in model development. The algorithm’s bias assessment is necessary to promote fairness, and is missing in all of the models.

Representativeness is needed in AI research, Coyner et al demonstrated worse ROP screening algorithm metrics when models are applied to a distinct population.²¹ Among the articles included in this review, the study population came from 13 countries, with most participants from China and India. There was no representation from South America, Africa and Australia/New Zealand.

The National Institutes of Health encourages sex and race/ethnicity description in clinical studies to assess diverse representation in biomedical research.³⁹ In the reviewed articles, race and ethnicity label is absent; the patients’ sex is available in only two articles. None of the articles reported performance metrics disaggregated according to demographics, and none performed a bias control assessment was performed. Race and ethnicity reports enable biases assessment in model development and fairness analyses.

Aggressive posterior is a particular ROP severe plus disease classification with worse treatment outcomes. None of the included studies focus on aggressive posterior ROP as the target or included group.

In most studies, the retinal fundus photographs came from the ultrawide RetCam camera, which costs approximately US$100 000, and is rarely available in LMIC.⁴⁰ Images from the 3nethra Neo, a more affordable alternative retinal ultrawide camera retinal camera,⁴¹ were included in only one study. More affordable cameras, such as smartphone-based cameras, are already applied in ROP screening but not in AI models.^{40 42 43} Better data collection and image quality assessment frameworks, in addition to prospective and validation studies, are essential to enable AI-assisted screening programmes in LMICs.

Of the reviewed articles, two shared codes and queries,^{18 19} but none has data readily available to share. Publicly available datasets and code repository sharing are important to promote reproducibility in AI research.

Model generalisability in machine learning research is ideal but is likely, not feasible because of dataset shifts across place and time.⁴⁴ AI models should not perpetuate or magnify existing biases in diagnosis and treatment. In this review of ROP and AI articles, limited representation, biased datasets and the lack of bias control assessments are poised to upend successful implementation.

More diverse, representative and fair datasets, generalisable models, prospective studies, and collaborative efforts are needed before real-world deployment. These are particularly more challenging in LMIC countries. The feasibility of algorithm deployment in a clinical setting remains a promise at this time if AI readiness is gauged using published literature.

Conclusion

Distinct modelling approaches have been applied in ROP and AI research to grade ROP severity, detect plus disease, identify treatment-warranted cases, predict outcomes and delimit retinal zones. Although 180 228 images were included in the reviewed studies, most studies use the same ultra-wide retinal camera and lack demographic information and bias control.

The articles showed good reported metrics, but fairness and generalisability remained limited in all AI and ROP articles. Reproducibility is also a critical limitation, with few articles sharing codes, and none of the images and data being publicly available. To avoid perpetuating global healthcare inequalities and ensure access to such technologies to those who stand most to benefit from them, fair and generalisable studies are needed that include diverse datasets, data, and code sharing, collaborative research, and bias control to avoid unpredictable and harmful deployments.

Acknowledgments

LFN is a researcher supported by Lemann Foundation, Instituto da Visão-IPEPO.

Footnotes

Twitter: @WPhanphruk, @kidseyes88

Contributors: LFN: conceptualisation, data curation, investigation, content guarantor, WGM: data curation, writing—original draft, writing—review and editing, LZR: writing—original draft, writing—review and editing, RGD: writing—original draft, writing—review and editing, WP: data curation, writing—review and editing, LAC: conceptualisation, supervision, data curation, writing—review and editing, KK: data curation, writing—review and editing, APDS: data curation, writing—review and editing, CVSR: data curation, writing—review and editing, NSBM: data curation, writing—review and editing, supervision.

Funding: The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Disclaimer: MIT Library contributed to the article processing charges (APC).

Map disclaimer: The depiction of boundaries on this map does not imply the expression of any opinion whatsoever on the part of BMJ (or any member of its group) concerning the legal status of any country, territory, jurisdiction or area or of its authorities. This map is provided without any warranty of any kind, either express or implied.

Competing interests: None declared.

Provenance and peer review: Not commissioned; externally peer reviewed.

Supplemental material: This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Data availability statement

Data are available on reasonable request. All retrieved data, bar plot and map plot are available at: https://github.com/luisnakayama/rop_review.

Ethics statements

Patient consent for publication

Not applicable.

References

1.Preterm birth. Available: https://www.who.int/en/news-room/fact-sheets/detail/preterm-birth [Accessed 05 Sep 2022].
2.Dance A. Survival of the littlest: the long-term impacts of being born extremely early. Nature 2020;582:20–3. 10.1038/d41586-020-01517-z [DOI] [PubMed] [Google Scholar]
3.Harrison MS, Goldenberg RL. Global burden of prematurity. Semin Fetal Neonatal Med 2016;21:74–9. 10.1016/j.siny.2015.12.007 [DOI] [PubMed] [Google Scholar]
4.Solebo AL, Teoh L, Rahi J. Epidemiology of blindness in children. Arch Dis Child 2017;102:853–7. 10.1136/archdischild-2016-310532 [DOI] [PubMed] [Google Scholar]
5.Hellström A, Smith LEH, Dammann O. Retinopathy of prematurity. Lancet 2013;382:1445–57. 10.1016/S0140-6736(13)60178-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Kim SJ, Campbell JP, Kalpathy-Cramer J, et al. Accuracy and reliability of eye-based vs quadrant-based diagnosis of plus disease in retinopathy of prematurity. JAMA Ophthalmol 2018;136:648–55. 10.1001/jamaophthalmol.2018.1195 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Li J-P, Liu H, Ting DSJ, et al. Digital technology, TELE-medicine and artificial intelligence in ophthalmology: a global perspective. Prog Retin Eye Res 2021;82:100900. 10.1016/j.preteyeres.2020.100900 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Muthukrishnan N, Maleki F, Ovens K, et al. Brief history of artificial intelligence. Neuroimaging Clin N Am 2020;30:393–9. 10.1016/j.nic.2020.07.004 [DOI] [PubMed] [Google Scholar]
9.Tong Y, Lu W, Yu Y, et al. Application of machine learning in ophthalmic imaging modalities. Eye Vis (Lond) 2020;7:22. 10.1186/s40662-020-00183-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol 2019;103:167–75. 10.1136/bjophthalmol-2018-313173 [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Mitchell WG, Dee EC, Celi LA. Generalisability through local validation: overcoming barriers due to data disparity in healthcare. BMC Ophthalmol 2021;21:228. 10.1186/s12886-021-01992-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Celi LA, Cellini J, Charpignon M-L, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review. PLOS Digit Health 2022;1:e0000022. 10.1371/journal.pdig.0000022 [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39. 10.1038/s41746-018-0040-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999;130:515–24. 10.7326/0003-4819-130-6-199903160-00016 [DOI] [PubMed] [Google Scholar]
15.Ricci Lara MA, Echeveste R, Ferrante E. Addressing fairness in artificial intelligence for medical imaging. Nat Commun 2022;13:4581. 10.1038/s41467-022-32186-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Seastedt KP, Schwab P, O’Brien Z, et al. Global healthcare fairness: we should be sharing more, not less, data. PLOS Digit Health 2022;1:e0000102. 10.1371/journal.pdig.0000102 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Wawira Gichoya J, McCoy LG, Celi LA, et al. Equity in essence: a call for operationalising fairness in machine learning for healthcare. BMJ Health Care Inform 2021;28:e100289. 10.1136/bmjhci-2020-100289 [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lepore D, Ji MH, Pagliara MM, et al. Convolutional neural network based on fluorescein angiography images for retinopathy of prematurity management. Transl Vis Sci Technol 2020;9:37. 10.1167/tvst.9.2.37 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Wang J, Ju R, Chen Y, et al. Automated retinopathy of prematurity screening using deep neural networks. EBioMedicine 2018;35:361–8. 10.1016/j.ebiom.2018.08.033 [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Agrawal R, Kulkarni S, Walambe R, et al. Assistive framework for automatic detection of all the zones in retinopathy of prematurity using deep learning. J Digit Imaging 2021;34:932–47. 10.1007/s10278-021-00477-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Coyner AS, Oh MA, Shah PK, et al. External validation of a retinopathy of prematurity screening model using artificial intelligence in 3 Low- and middle-income populations. JAMA Ophthalmol 2022;140:791. 10.1001/jamaophthalmol.2022.2135 [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Huang Y-P, Basanta H, Kang EY-C, et al. Automated detection of early-stage ROP using a deep convolutional neural network. Br J Ophthalmol 2021;105:1099–103. 10.1136/bjophthalmol-2020-316526 [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Li P, Liu J. Early diagnosis and quantitative analysis of stages in retinopathy of prematurity based on deep convolutional neural networks. Trans Vis Sci Tech 2022;11:17. 10.1167/tvst.11.5.17 [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wu Q, Hu Y, Mo Z, et al. Development and validation of a deep learning model to predict the occurrence and severity of retinopathy of prematurity. JAMA Netw Open 2022;5:e2217447. 10.1001/jamanetworkopen.2022.17447 [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Wang J, Ji J, Zhang M, et al. Automated explainable multidimensional deep learning platform of retinal images for retinopathy of prematurity screening. JAMA Netw Open 2021;4:e218758. 10.1001/jamanetworkopen.2021.8758 [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Hu J, Chen Y, Zhong J, et al. Automated analysis for retinopathy of prematurity by deep neural networks. IEEE Trans Med Imaging 2019;38:269–79. 10.1109/TMI.2018.2863562 [DOI] [PubMed] [Google Scholar]
27.Vijayalakshmi C, Sakthivel P, Vinekar A. Automated detection and classification of telemedical retinopathy of prematurity images. Telemed J E Health 2020;26:354–8. 10.1089/tmj.2019.0004 [DOI] [PubMed] [Google Scholar]
28.Li J, Huang K, Ju R, et al. Evaluation of artificial intelligence-based quantitative analysis to identify clinically significant severe retinopathy of prematurity. Retina 2022;42:195–203. 10.1097/IAE.0000000000003284 [DOI] [PubMed] [Google Scholar]
29.Peng Y, Zhu W, Chen Z, et al. Automatic staging for retinopathy of prematurity with deep feature fusion and ordinal classification strategy. IEEE Trans Med Imaging 2021;40:1750–62. 10.1109/TMI.2021.3065753 [DOI] [PubMed] [Google Scholar]
30.Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol 2018;136:803–10. 10.1001/jamaophthalmol.2018.1934 [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Chiang MF, Quinn GE, Fielder AR, et al. International classification of retinopathy of prematurity, third edition. Ophthalmology 2021;128:e51–68. 10.1016/j.ophtha.2021.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Campbell JP, Ataer-Cansizoglu E, Bolon-Canedo V, et al. Expert diagnosis of plus disease in retinopathy of prematurity from computer-based image analysis. JAMA Ophthalmol 2016;134:651–7. 10.1001/jamaophthalmol.2016.0611 [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Pour EK, Pourreza H, Zamani KA, et al. Retinopathy of prematurity-assist: novel software for detecting plus disease. Korean J Ophthalmol 2017;31:524–32. 10.3341/kjo.2015.0143 [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Nisha KL, G S, Sathidevi PS, et al. A computer-aided diagnosis system for plus disease in retinopathy of prematurity with structure adaptive segmentation and vessel based features. Comput Med Imaging Graph 2019;74:72–94. 10.1016/j.compmedimag.2019.04.003 [DOI] [PubMed] [Google Scholar]
35.Mao J, Luo Y, Liu L, et al. Automated diagnosis and quantitative analysis of plus disease in retinopathy of prematurity based on deep convolutional neural networks. Acta Ophthalmol 2020;98:e339–45. 10.1111/aos.14264 [DOI] [PubMed] [Google Scholar]
36.Yildiz VM, Tian P, Yildiz I, et al. Plus disease in retinopathy of prematurity: convolutional neural network performance using a combined neural network and feature extraction approach. Transl Vis Sci Technol 2020;9:10. 10.1167/tvst.9.2.10 [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Campbell JP, Singh P, Redd TK, et al. Applications of artificial intelligence for retinopathy of prematurity screening. Pediatrics 2021;147:e2020016618. 10.1542/peds.2020-016618 [DOI] [PMC free article] [PubMed] [Google Scholar]
38.International Committee for the Classification of Retinopathy of Prematurity . The international classification of retinopathy of prematurity revisited. Arch Ophthalmol 2005;123:991–9. 10.1001/archopht.123.7.991 [DOI] [PubMed] [Google Scholar]
39.NIH policy and guidelines on the inclusion of women and minorities as subjects in clinical research. Available: https://grants.nih.gov/policy/inclusion/women-and-minorities/guidelines.htm [Accessed 21 Jun 2022].
40.Lin J-Y, Kang EY-C, Banker AS, et al. Comparison of retcam and smartphone-based photography for retinopathy of prematurity screening. Diagnostics (Basel) 2022;12:945. 10.3390/diagnostics12040945 [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Vinekar A, Rao SV, Murthy S, et al. A novel, low-cost, wide-field, infant retinal camera, ‘Neo’: technical and safety report for the use on premature infants. Trans Vis Sci Tech 2019;8:2. 10.1167/tvst.8.2.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Lekha T, Ramesh S, Sharma A, et al. MII retcam assisted smartphone based fundus imaging for retinopathy of prematurity. Indian J Ophthalmol 2019;67:834–9. 10.4103/ijo.IJO_268_19 [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Sharma A, Goyal A, Bilong Y, et al. Comparison of a smartphone-based photography method with indirect ophthalmoscopic assessment in referable retinopathy of prematurity: a smart retinopathy of prematurity model pilot study. Ophthalmol Retina 2019;3:911–2. 10.1016/j.oret.2019.06.006 [DOI] [PubMed] [Google Scholar]
44.Futoma J, Simons M, Panch T, et al. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health 2020;2:e489–92. 10.1016/S2589-7500(20)30186-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data

bmjophth-2022-001216supp001.pdf^{(22KB, pdf)}

Supplementary data

bmjophth-2022-001216supp002.pdf^{(19.1KB, pdf)}

Data Availability Statement

Data are available on reasonable request. All retrieved data, bar plot and map plot are available at: https://github.com/luisnakayama/rop_review.

[R1] 1.Preterm birth. Available: https://www.who.int/en/news-room/fact-sheets/detail/preterm-birth [Accessed 05 Sep 2022].

[R2] 2.Dance A. Survival of the littlest: the long-term impacts of being born extremely early. Nature 2020;582:20–3. 10.1038/d41586-020-01517-z [DOI] [PubMed] [Google Scholar]

[R3] 3.Harrison MS, Goldenberg RL. Global burden of prematurity. Semin Fetal Neonatal Med 2016;21:74–9. 10.1016/j.siny.2015.12.007 [DOI] [PubMed] [Google Scholar]

[R4] 4.Solebo AL, Teoh L, Rahi J. Epidemiology of blindness in children. Arch Dis Child 2017;102:853–7. 10.1136/archdischild-2016-310532 [DOI] [PubMed] [Google Scholar]

[R5] 5.Hellström A, Smith LEH, Dammann O. Retinopathy of prematurity. Lancet 2013;382:1445–57. 10.1016/S0140-6736(13)60178-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Kim SJ, Campbell JP, Kalpathy-Cramer J, et al. Accuracy and reliability of eye-based vs quadrant-based diagnosis of plus disease in retinopathy of prematurity. JAMA Ophthalmol 2018;136:648–55. 10.1001/jamaophthalmol.2018.1195 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Li J-P, Liu H, Ting DSJ, et al. Digital technology, TELE-medicine and artificial intelligence in ophthalmology: a global perspective. Prog Retin Eye Res 2021;82:100900. 10.1016/j.preteyeres.2020.100900 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Muthukrishnan N, Maleki F, Ovens K, et al. Brief history of artificial intelligence. Neuroimaging Clin N Am 2020;30:393–9. 10.1016/j.nic.2020.07.004 [DOI] [PubMed] [Google Scholar]

[R9] 9.Tong Y, Lu W, Yu Y, et al. Application of machine learning in ophthalmic imaging modalities. Eye Vis (Lond) 2020;7:22. 10.1186/s40662-020-00183-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol 2019;103:167–75. 10.1136/bjophthalmol-2018-313173 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Mitchell WG, Dee EC, Celi LA. Generalisability through local validation: overcoming barriers due to data disparity in healthcare. BMC Ophthalmol 2021;21:228. 10.1186/s12886-021-01992-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Celi LA, Cellini J, Charpignon M-L, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—A global review. PLOS Digit Health 2022;1:e0000022. 10.1371/journal.pdig.0000022 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Abràmoff MD, Lavin PT, Birch M, et al. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 2018;1:39. 10.1038/s41746-018-0040-6 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999;130:515–24. 10.7326/0003-4819-130-6-199903160-00016 [DOI] [PubMed] [Google Scholar]

[R15] 15.Ricci Lara MA, Echeveste R, Ferrante E. Addressing fairness in artificial intelligence for medical imaging. Nat Commun 2022;13:4581. 10.1038/s41467-022-32186-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Seastedt KP, Schwab P, O’Brien Z, et al. Global healthcare fairness: we should be sharing more, not less, data. PLOS Digit Health 2022;1:e0000102. 10.1371/journal.pdig.0000102 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Wawira Gichoya J, McCoy LG, Celi LA, et al. Equity in essence: a call for operationalising fairness in machine learning for healthcare. BMJ Health Care Inform 2021;28:e100289. 10.1136/bmjhci-2020-100289 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Lepore D, Ji MH, Pagliara MM, et al. Convolutional neural network based on fluorescein angiography images for retinopathy of prematurity management. Transl Vis Sci Technol 2020;9:37. 10.1167/tvst.9.2.37 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Wang J, Ju R, Chen Y, et al. Automated retinopathy of prematurity screening using deep neural networks. EBioMedicine 2018;35:361–8. 10.1016/j.ebiom.2018.08.033 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Agrawal R, Kulkarni S, Walambe R, et al. Assistive framework for automatic detection of all the zones in retinopathy of prematurity using deep learning. J Digit Imaging 2021;34:932–47. 10.1007/s10278-021-00477-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Coyner AS, Oh MA, Shah PK, et al. External validation of a retinopathy of prematurity screening model using artificial intelligence in 3 Low- and middle-income populations. JAMA Ophthalmol 2022;140:791. 10.1001/jamaophthalmol.2022.2135 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Huang Y-P, Basanta H, Kang EY-C, et al. Automated detection of early-stage ROP using a deep convolutional neural network. Br J Ophthalmol 2021;105:1099–103. 10.1136/bjophthalmol-2020-316526 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Li P, Liu J. Early diagnosis and quantitative analysis of stages in retinopathy of prematurity based on deep convolutional neural networks. Trans Vis Sci Tech 2022;11:17. 10.1167/tvst.11.5.17 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Wu Q, Hu Y, Mo Z, et al. Development and validation of a deep learning model to predict the occurrence and severity of retinopathy of prematurity. JAMA Netw Open 2022;5:e2217447. 10.1001/jamanetworkopen.2022.17447 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Wang J, Ji J, Zhang M, et al. Automated explainable multidimensional deep learning platform of retinal images for retinopathy of prematurity screening. JAMA Netw Open 2021;4:e218758. 10.1001/jamanetworkopen.2021.8758 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.Hu J, Chen Y, Zhong J, et al. Automated analysis for retinopathy of prematurity by deep neural networks. IEEE Trans Med Imaging 2019;38:269–79. 10.1109/TMI.2018.2863562 [DOI] [PubMed] [Google Scholar]

[R27] 27.Vijayalakshmi C, Sakthivel P, Vinekar A. Automated detection and classification of telemedical retinopathy of prematurity images. Telemed J E Health 2020;26:354–8. 10.1089/tmj.2019.0004 [DOI] [PubMed] [Google Scholar]

[R28] 28.Li J, Huang K, Ju R, et al. Evaluation of artificial intelligence-based quantitative analysis to identify clinically significant severe retinopathy of prematurity. Retina 2022;42:195–203. 10.1097/IAE.0000000000003284 [DOI] [PubMed] [Google Scholar]

[R29] 29.Peng Y, Zhu W, Chen Z, et al. Automatic staging for retinopathy of prematurity with deep feature fusion and ordinal classification strategy. IEEE Trans Med Imaging 2021;40:1750–62. 10.1109/TMI.2021.3065753 [DOI] [PubMed] [Google Scholar]

[R30] 30.Brown JM, Campbell JP, Beers A, et al. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol 2018;136:803–10. 10.1001/jamaophthalmol.2018.1934 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Chiang MF, Quinn GE, Fielder AR, et al. International classification of retinopathy of prematurity, third edition. Ophthalmology 2021;128:e51–68. 10.1016/j.ophtha.2021.05.031 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Campbell JP, Ataer-Cansizoglu E, Bolon-Canedo V, et al. Expert diagnosis of plus disease in retinopathy of prematurity from computer-based image analysis. JAMA Ophthalmol 2016;134:651–7. 10.1001/jamaophthalmol.2016.0611 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Pour EK, Pourreza H, Zamani KA, et al. Retinopathy of prematurity-assist: novel software for detecting plus disease. Korean J Ophthalmol 2017;31:524–32. 10.3341/kjo.2015.0143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Nisha KL, G S, Sathidevi PS, et al. A computer-aided diagnosis system for plus disease in retinopathy of prematurity with structure adaptive segmentation and vessel based features. Comput Med Imaging Graph 2019;74:72–94. 10.1016/j.compmedimag.2019.04.003 [DOI] [PubMed] [Google Scholar]

[R35] 35.Mao J, Luo Y, Liu L, et al. Automated diagnosis and quantitative analysis of plus disease in retinopathy of prematurity based on deep convolutional neural networks. Acta Ophthalmol 2020;98:e339–45. 10.1111/aos.14264 [DOI] [PubMed] [Google Scholar]

[R36] 36.Yildiz VM, Tian P, Yildiz I, et al. Plus disease in retinopathy of prematurity: convolutional neural network performance using a combined neural network and feature extraction approach. Transl Vis Sci Technol 2020;9:10. 10.1167/tvst.9.2.10 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Campbell JP, Singh P, Redd TK, et al. Applications of artificial intelligence for retinopathy of prematurity screening. Pediatrics 2021;147:e2020016618. 10.1542/peds.2020-016618 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.International Committee for the Classification of Retinopathy of Prematurity . The international classification of retinopathy of prematurity revisited. Arch Ophthalmol 2005;123:991–9. 10.1001/archopht.123.7.991 [DOI] [PubMed] [Google Scholar]

[R39] 39.NIH policy and guidelines on the inclusion of women and minorities as subjects in clinical research. Available: https://grants.nih.gov/policy/inclusion/women-and-minorities/guidelines.htm [Accessed 21 Jun 2022].

[R40] 40.Lin J-Y, Kang EY-C, Banker AS, et al. Comparison of retcam and smartphone-based photography for retinopathy of prematurity screening. Diagnostics (Basel) 2022;12:945. 10.3390/diagnostics12040945 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Vinekar A, Rao SV, Murthy S, et al. A novel, low-cost, wide-field, infant retinal camera, ‘Neo’: technical and safety report for the use on premature infants. Trans Vis Sci Tech 2019;8:2. 10.1167/tvst.8.2.2 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Lekha T, Ramesh S, Sharma A, et al. MII retcam assisted smartphone based fundus imaging for retinopathy of prematurity. Indian J Ophthalmol 2019;67:834–9. 10.4103/ijo.IJO_268_19 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Sharma A, Goyal A, Bilong Y, et al. Comparison of a smartphone-based photography method with indirect ophthalmoscopic assessment in referable retinopathy of prematurity: a smart retinopathy of prematurity model pilot study. Ophthalmol Retina 2019;3:911–2. 10.1016/j.oret.2019.06.006 [DOI] [PubMed] [Google Scholar]

[R44] 44.Futoma J, Simons M, Panch T, et al. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health 2020;2:e489–92. 10.1016/S2589-7500(20)30186-2 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Fairness and generalisability in deep learning of retinopathy of prematurity screening algorithms: a literature review

Luis Filipe Nakayama

William Greig Mitchell

Lucas Zago Ribeiro

Robyn Gayle Dychiao

Warachaya Phanphruk

Leo Anthony Celi

Khumbo Kalua

Alvina Pauline Dy Santiago

Caio Vinicius Saito Regatieri

Nilva Simeren Bueno Moraes

Abstract

Background

Methods

Results

Conclusion

WHAT IS ALREADY KNOWN ON THIS TOPIC

WHAT THIS STUDY ADDS

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

Background

Methods

Results

Figure 1.

General characteristics

Figure 2.

Figure 3.

Images and Camera

Dataset

Figure 4.

Preprocessing

ROP severity grading algorithms

Plus and preplus disease detection algorithms

Detecting patients that need treatment

ROP prediction

Automated detection of retinal zones

Discussion

Conclusion

Acknowledgments

Footnotes

Data availability statement

Ethics statements

Patient consent for publication

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases