Skip to main content
Eye logoLink to Eye
. 2023 Oct 17;38(4):730–736. doi: 10.1038/s41433-023-02765-9

Clinical evaluation of deep learning systems for assisting in the diagnosis of the epiretinal membrane grade in general ophthalmologists

Yan Yan 1, Xiaoling Huang 1, Xiaoyu Jiang 2, Zhiyuan Gao 1, Xindi Liu 1, Kai Jin 1,, Juan Ye 1,
PMCID: PMC10920879  PMID: 37848677

Abstract

Background

Epiretinal membrane (ERM) is a common age-related retinal disease detected by optical coherence tomography (OCT), with a prevalence of 34.1% among people over 60 years old. This study aims to develop artificial intelligence (AI) systems to assist in the diagnosis of ERM grade using OCT images and to clinically evaluate the potential benefits and risks of our AI systems with a comparative experiment.

Methods

A segmentation deep learning (DL) model that segments retinal features associated with ERM severity and a classification DL model that grades the severity of ERM were developed based on an OCT dataset obtained from three hospitals. A comparative experiment was conducted to compare the performance of four general ophthalmologists with and without assistance from the AI in diagnosing ERM severity.

Results

The segmentation network had a pixel accuracy (PA) of 0.980 and a mean intersection over union (MIoU) of 0.873, while the six-classification network had a total accuracy of 81.3%. The diagnostic accuracy scores of the four ophthalmologists increased with AI assistance from 81.7%, 80.7%, 78.0%, and 80.7% to 87.7%, 86.7%, 89.0%, and 91.3%, respectively, while the corresponding time expenditures were reduced. The specific results of the study as well as the misinterpretations of the AI systems were analysed.

Conclusion

Through our comparative experiment, the AI systems proved to be valuable references for medical diagnosis and demonstrated the potential to accelerate clinical workflows. Systematic efforts are needed to ensure the safe and rapid integration of AI systems into ophthalmic practice.

Subject terms: Outcomes research, Tomography

Introduction

Epiretinal membrane (ERM) is a pathological proliferative membrane that develops on the inner surface of the retina and has a prevalence of 34.1% among people over 60 years old [1, 2]. Progression of the ERM can result in retinal traction and distortion, which cause various visual symptoms, including decreased visual acuity [3]. The clinical diagnosis of ERM relies on optical coherence tomography (OCT), which can provide a qualitative descriptive-based severity grade classification scheme rather than simply detecting the presence of ERM with high sensitivity [47].

Given that interpreting numerous OCT images for precisely diagnosing an ERM remains a difficult and time-consuming task in clinical practice, several studies have used deep learning (DL) systems based on OCT images to diagnose ERM automatically with human specialist-comparable performance [810]. However, several problems remain. First, these studies only detected the presence of ERM without grading its severity, which is associated with referral and treatment. Second, their DL systems lacked a clinical evaluation to demonstrate their practicability and safety in clinical practice, which is of great concern to the clinicians.

In recent years, an increasing number of artificial intelligence (AI) algorithms have been introduced in medical image processing, particularly in ophthalmology [1114]. As ophthalmologists, we will be the ones most likely to be asked to evaluate the clinical value and scientific evidence for AI systems [15]. Any misdiagnosis could result in significant harm to the patient in the real world; thus, there is a risk that patients will receive medical diagnoses or decisions generated by AI systems without the supervision of doctors. Despite the advanced performance of many AI systems, the most practical application of AI systems at this stage is as an auxiliary tool that can help improve diagnostic accuracy and reduce the clinical workload [16]. Clinical evaluation should be performed to assess the potential benefits, as well as the limitations or risks, that AI systems bring to medical diagnosis, which would serve as a more valuable reference than simply comparing machine and human diagnostic accuracy.

In this study, we developed two DL systems based on OCT images, one to segment retinal features associated with ERM severity, and the other to grade the severity of the ERM. In our experiment, the predicted grade and segmentation map generated by the AI systems were given to ophthalmologists to help them diagnose the ERM. The benefits and underlying issues of AI-assisted medical diagnosis were analyzed. This study could help bridge the critical gap between the development and integration of AI systems in ophthalmic practice.

Methods and materials

Dataset

This was a multicenter, retrospective, cross-sectional study.

Our dataset was collected from idiopathic ERM patients who underwent OCT at three different hospitals, namely, the Second Affiliated Hospital of Zhejiang University School of Medicine, the Second Affiliated Hospital of Xi’an Jiaotong University School of Medicine, and Jinhua Municipal Central Hospital. Normal retinal images were collected from the healthy fellow eyes of ERM patients. The OCT images were all obtained with horizontal B-scans across the fovea performed with the same OCT device (Heidelberg, Germany). The patient exclusion criteria were diabetes, overt media opacity, or retinal diseases other than ERM.

This study is a subanalysis of data from a clinical study (A New Technique for Retinal Disease Treatment, ClinicalTrials. gov identifier: NCT04718532). The Ethics Committee of the Second Affiliated Hospital of Zhejiang University School of Medicine (No Y2020–1027) granted ethical approval for the study. The study followed the principles of the Helsinki Declaration and the Health Portability and Accessibility Act.

Annotation

According to the classification systems proposed by Govetto et al. in 2017 [7], our dataset was divided into six categories: one corresponding to a normal retina and five to the severity stages of ERM. In stage 1, the foveal pit is present, as are well-defined retinal layers, whereas, in stage 2, the foveal pit is absent. In stage 3, in addition to an absent foveal pit, a continuous ectopic inner foveal layer (EIFL) is present, but the retinal layers are still well defined, whereas in stage 4, the retinal layers are disrupted due to traction. Stage 5 is defined as ERM-induced tractional lamellar macular holes or macular pseudoholes.

It was previously concluded that the morphology of the fovea is the distinguishing feature of stage 1, stage 2, and stage 5 ERM, whereas the morphological changes in the inner nuclear layer are crucial for distinguishing stages 2 to 4. Hence, we divided the retina on the OCT images into five layers (as shown in Fig. 1). The morphological curve of the upper boundary of the retina was segmented as the first layer, and the inner nuclear layer was segmented as the third layer. The other three layers were segmented to demonstrate retinal layer disruption.

Fig. 1. Performance of our artificial intelligence (AI) systems.

Fig. 1

a Original optical coherence tomography (OCT) image, (b) manual annotations from retinal specialists, (c) segmentation map generated by the deep learning (DL) model, (d) merging of the manual annotation and the segmentation map, (e) receiver operating characteristic (ROC) curve of the classification DL model, and (f) confusion matrices of the classification DL model.

Four experienced retinal specialists with over ten years of clinical experience were recruited to label the ERM stage depending on the OCT images separately, and the labels that achieved consensus were identified as the ground truth. Images to which three or four specialists attached the same label were instantly included. The labels of any questionable OCT images were verified and corrected according to clinical records and other retinal examinations. Table SI shows the distribution of our dataset. Two specialists annotated the boundaries of the five segmentation layers against 300 OCT images (50 images for each category).

Development of the DL system

A SegNet was trained for the retinal segmentation task [17]. The data were divided into a training set and a testing set at a 9:1 ratio. The learning rate was set to 0.0003, the batch size was set to 7, and the maximum epoch was 200. The linear rectification function was used as the activation function. Additionally, we trained a ResNet for the six-classification task [18, 19]. The sizes of the training and testing sets were 3653 (92%) and 300 images (8%), respectively. The learning rate was set to 0.0001. The learning rate was set to 0.0001, the batch size was set to 128, and the maximum epoch was 200. The linear rectification function was used as the activation function.

Experiment

We set up a comparative experiment to evaluate our DL systems clinically. Figure 2 shows a brief demonstration of our experiment. As Test set A, 300 OCT images were randomly selected from the whole dataset obtained from three hospitals. To generate Test set B, the order of the images in Test set A was randomly rearranged. For Test set A, four general ophthalmologists were recruited and asked to identify the specific stage of ERM based on the original OCT images, and the corresponding duration of the analysis was recorded. The general ophthalmologist was specified as a junior ophthalmologist with one to three years of clinical experience but no specialty in retinal diseases, rather than a trainee. After one month, the same four ophthalmologists graded the OCT images in Test set B based on the original OCT images as well as the auxiliary information generated by AI systems, namely, the predicted grade and the segmentation map. As before, the corresponding duration of the analysis was recorded.

Fig. 2. Illustration and results of our comparative experiment.

Fig. 2

a Illustration of our comparative experiment, (b) improvement in diagnostic accuracy and reduction in diagnosis duration with artificial intelligence (AI) assistance, (cf) differences between the four ophthalmologists’ confusion matrices obtained with and without AI assistance, revealing changes in the numbers of correct diagnoses in each category.

Statistical analysis

To evaluate the performance of the segmentation task, the pixel accuracy (PA), mean pixel accuracy (MPA), and mean intersection over union (MIoU) were calculated relative to the ground truth. We generated a receiver operating characteristic (ROC) curve by plotting the true positive rate versus the false positive rate and calculated the precision, recall, F1-score, area under the ROC curve, and total accuracy to evaluate the performances of the six-classification task and the four ophthalmologists in the comparative experiment. All statistical analyses were performed using SPSS (v26.0, IBM). The significance level was designated at 95%, and p < 0.05 was considered to be statistically significant.

Results

In this study, our dataset consisted of 3953 OCT images (2071 with ERM and 1882 normal). A total of 3557 images were collected from 1593 eyes (883 with ERM, and 710 normal) of 1046 subjects in the ZJU dataset, 270 images were collected from 224 eyes (138 with ERM, and 86 normal) of 178 subjects in the XAH dataset, and 126 images were collected from 104 eyes (84 with ERM, and 20 normal) of 85 subjects in the JHH dataset. The demographic characteristics of our datasets are shown in Table S2.

The PA and MPA of our segmentation network were 0.980 and 0.950, respectively, while the MIoU was 0.873, which represented a high level of concordance with manual retinal specialist assessment. Our classification model had a total accuracy of 81.3%, with accuracy values of 98% and 84% in diagnosing normal retinas and those with stage 3 ERMs, respectively, but it did not perform enough well for the other categories of ERM, particularly stage 4. Figure 1 shows the corresponding ROC curve and confusion matrix. Table S3 shows the detailed results of the six-classification tasks.

For the assessment of Test set A, the four ophthalmologists achieved total accuracy scores of 81.7%, 80.7%, 78.0%, and 80.7%, respectively. The mean total accuracy score was 80.3%, which was comparable to that of the DL model (80.7%). For each specific category, the mean accuracy scores were 90%, 68%, 73%, 61%, 89%, and 92%. Ophthalmologists were excellent at discriminating normal retinas and those with stage 4 and stage 5 ERM but struggled to distinguish among stage 1, stage 2, and stage 3 ERM. There was an unignorable difference between the ophthalmologists (61%) and the DL model (85%) in diagnosing stage 3 ERM. The confusion matrices of each ophthalmologist are shown in Fig. S1. The corresponding time costs of the four ophthalmologists for diagnosing ERM were 35 min, 30 min, 28 min, and 25 min.

In the assessment of Test set B, the total accuracy scores of the ophthalmologists with AI assistance were improved to 87.7%, 86.7%, 89.0%, and 91.3%. The mean total accuracy score was 88.7%, which was higher than that of the DL model alone (80.7%) or of the ophthalmologists without AI assistance (80.3%). The improvement in diagnostic accuracy is shown in Table 1 and Fig. 2. For each specific category, the mean accuracy scores were 98%, 77%, 70%, 87%, 75%, and 91%. The accuracy in discriminating stage 3 ERM was increased by 26 percentage points when compared to that of ophthalmologists who did not use AI, and the accuracy in discriminating normal retinas was increased to 98%. However, the accuracy in discriminating stage 4 ERM decreased from 89% to 75%, while that in discriminating stage 2 and stage 5 ERM slightly decreased. We subtracted the confusion matrices of Test set A from those of Test set B for the four ophthalmologists to determine the changes in the number of correct diagnoses in each category (as shown in Fig. 2). The corresponding time costs of the four ophthalmologists for performing AI-assisted diagnoses were reduced to 27 min, 23 min, 24 min, and 22 min. Figure 2 illustrates the reductions in the diagnosis time costs.

Table 1.

The total accuracy scores of four ophthalmologists and the deep learning (DL) model in the comparative experiment are presented with 95% CIs.

Accuracy P-value
without AI with AI
Oph 1 0.817 (0.773–0.861) 0.877 (0.839–0.914) 0.041
Oph 2 0.807 (0.762–0.852) 0.867 (0.828–0.905) 0.041
Oph 3 0.780 (0.733–0.827) 0.890 (0.854–0.926) 0.000
Oph 4 0.807 (0.762–0.852) 0.913 (0.881–0.945) 0.000
Model 0.807 (0.762–0.852) *

Oph ophthalmologist.

*there is no P-value.

Discussion

In this study, ophthalmologists were given the segmentation map generated by a segmentation DL model and the predicted ERM grade generated by a classification DL model to help them diagnose the severity of ERM. With AI assistance, the diagnostic accuracy scores of the ophthalmologists were improved, while the corresponding time costs were reduced. Through our comparative experiment, the AI systems were shown to be valuable references for medical diagnosis and for accelerating clinical workflows, but several underlying issues remain. The following is an analysis of the specific study results.

This ERM classification system using SD-OCT depending on the absence of a foveal pit, presence of EIFLs and disorganization of the retinal layers can serve as a reference for ophthalmologists to make decisions [3]. In previous studies, A higher ERM stage using this classification system correlated with poorer vision acuity, and EIFL might be considered a good indicator for metamorphopsia in patients with advanced ERMs [20, 21]. The decision to intervene surgically in patients with ERM usually depends on the severity of the patient’s symptoms, especially the impact on their activities of daily living [22]. For patients with higher ERM stages, combined with their symptoms, more frequent follow-up or early surgery is recommended.

The fine classification of ERM progression is a difficult task because the differences between each category are subtle, causing the different grades to appear visually similar, and the variability among images caused by different degrees of traction in the specific category is diverse. The fact that DL models’ performance is data-driven is one of their basic problems [23, 24]. Advanced results require a large-scale training dataset, which is not available for many conditions. That is why our DL model diagnosed normal and stage 3 ERM retinas well but did not perform as well for those categories with relatively low training image numbers. However, from an ophthalmologist’s perspective, the key features of stages 4 and 5 ERM are relatively easy to recognize, whereas stages 1 to 3 are more difficult to distinguish. Therefore, there was a difference in the performance of the ophthalmologists and the DL model in diagnosing each specific category, particularly stage 3 ERM retinas.

The improvement in diagnostic accuracy could be attributed to two factors. First, AI systems compensate for the ophthalmologists’ deficiencies in diagnosing normal and stage 3 conditions. The predicted labels had a higher accuracy score of 85% for the stage 3 category, providing a valuable reference for the ophthalmologists, whose mean accuracy was 61%. Although four ophthalmologists achieved a mean accuracy of 90% for the normal category on their own, some highly reflective tissues on the inner surface of the retina on the OCT images were misinterpreted as ERM in some cases. Taking case 1 and case 2 in Fig. 3 as examples, three ophthalmologists mistook a normal retina as stage 1 ERM or stage 3 ERM as stage 4, and all of them corrected their diagnosis with AI assistance. Second, the DL model provided objective and stable segmentation of key features on OCT images, reducing human observational error. These two factors may be responsible for the reduction in the duration of the diagnosis. First, the judgments of the ophthalmologists and the DL model on most images were consistent. The ophthalmologists can minimize their hesitancy and thus the duration of diagnosis by confirming their answers with the predicted labels generated by the DL model. Second, the gray values of the pixels on the OCT images between adjacent layers of the retina are similar, making it difficult to distinguish the entire inner nuclear layer from other retina tissue with the naked eye, a task for which the DL model is better suited.

Fig. 3. Typical cases and results of our comparative experiment.

Fig. 3

ad Typical cases in our comparative experiment; e total diagnostic accuracy scores of the artificial intelligence (AI) systems, ophthalmologists without AI assistance, and ophthalmologists with AI assistance; f precision and recall of the four ophthalmologists with and without AI assistance.

Nevertheless, AI assistance did not increase the accuracy scores for all categories, which indicates that some underlying issues exist that hinder the clinical integration of AI-assisted diagnosis. The diagnostic accuracy for stage 4 dropped from 89% to 75% after AI system intervention. In one case, ophthalmologists overestimated the predicted results of the AI systems rather than using their own critical judgment, which had a negative impact on diagnostic accuracy and could put patient safety at risk [25]. For example, in case 3 in Fig. 3, four ophthalmologists correctly judged the stage of ERM alone, but three of them were misled by a falsely predicted label and returned that incorrect result. In another case, the ophthalmologists were skeptical of the AI’s correct results. After referring to the correct predicted label and the corresponding segmentation map for case 4 (Fig. 3), two ophthalmologists still gave the incorrect answer. Clinicians’ lack of understanding of AI and the inherent inexplicability of DL decrease its acceptability [26, 27]. To avoid the misinterpretation of AI results and the misuse of AI systems, systematic efforts are required [28]. First, AI systems should be further improved to produce the best possible results; second, ophthalmologists should learn more about AI systems, including their design, development, advantages, and drawbacks.

The limitations of this study should be considered. First, the number of images for several categories was relatively low. To improve the performance of the DL model, a larger-scale dataset with a balanced number of images for each category is needed. Second, although our dataset was collected from three hospitals, only the Heidelberg OCT device was used to obtain the images. Given that junior ophthalmologists are the most likely users of AI systems, the comparative experiment only recruited four junior ophthalmologists for the clinical evaluation. More ophthalmologists of various seniorities will be recruited for further analysis in a future study.

Finally, a comparative experiment was used to examine the clinical benefits and potential risks of AI systems. The integration of AI causes a shift in many aspects of clinical workflow and will have a large impact on patients. Clinicians and other relevant practitioners must maintain their efforts to ensure the safe and rapid integration of AI systems into ophthalmic practice.

Supplemental material is available at Eye’s website.

Summary

What was known before

  • Several studies have used deep learning systems based on OCT images to diagnose ERM automatically with human specialist-comparable performance.

What this study adds

  • The benefits and underlying issues of AI-assisted medical diagnosis were analysed.

Supplementary information

Supplemental Material (388.8KB, docx)

Acknowledgements

This work was financially supported by the National Natural Science Foundation Regional Innovation and Development Joint Fund (U20A20386), the National Key Research and Development Program of China (grant number 2019YFC0118400), Zhejiang Provincial Key Research and Development Plan (grant number 2019C03020), Natural Science Foundation of Zhejiang Province (grant number LQ21H120002), the Natural Science Foundation of China (grant number 81670888), and the Clinical Medical Research Center for Eye Diseases of Zhejiang Province (2021E50007).

Author contributions

YY, XH, KJ and JY conceived and designed the experiments. YY, ZG, XL and KJ collected and provided the data. XJ preprocessed the data and developed the deep learning systems. YY and XH analysed the results. All authors reviewed the manuscript.

Data availability

The de-identified individual participant data can be requested from the correspondence author, who will evaluate such requests on a case-by-case basis.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Kai Jin, Email: jinkai@zju.edu.cn.

Juan Ye, Email: yejuan@zju.edu.cn.

Supplementary information

The online version contains supplementary material available at 10.1038/s41433-023-02765-9.

References

  • 1.Bu S-C, Kuijer R, Li X-R, Hooymans JMM, Los LI. Idiopathic epiretinal membrane. Retina. 2014;34:2317–35. doi: 10.1097/IAE.0000000000000349. [DOI] [PubMed] [Google Scholar]
  • 2.Meuer SM, Myers CE, Klein BEK, Swift MK, Huang Y, Gangaputra S, et al. The epidemiology of vitreoretinal interface abnormalities as detected by spectral-domain optical coherence tomography: the beaver dam eye study. Ophthalmology. 2015;122:787–95. doi: 10.1016/j.ophtha.2014.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Fung AT, Galvin J, Tran T. Epiretinal membrane: a review. Clin Exp Ophthalmol. 2021;49:289–308. doi: 10.1111/ceo.13914. [DOI] [PubMed] [Google Scholar]
  • 4.Delyfer M-N, Legout P, Le Goff M, Blaizeau M, Rougier M-B, Schweitzer C, et al. Prevalence of epiretinal membranes in the ageing population using retinal colour images and SD-OCT: the Alienor sudy. Acta ophthalmol. 2020;98:e830–8. doi: 10.1111/aos.14422. [DOI] [PubMed] [Google Scholar]
  • 5.Hwang J-U, Sohn J, Moon BG, Joe SG, Lee JY, Kim J-G, et al. Assessment of macular function for idiopathic epiretinal membranes classified by spectral-domain optical coherence tomography. Invest Ophthalmol Vis Sci. 2012;53:3562–9. doi: 10.1167/iovs.12-9762. [DOI] [PubMed] [Google Scholar]
  • 6.Stevenson W, Prospero Ponce CM, Agarwal DR, Gelman R, Christoforidis JB. Epiretinal membrane: optical coherence tomography-based diagnosis and classification. Clin Ophthalmol. 2016;10:527–34. doi: 10.2147/OPTH.S97722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Govetto A, Lalane RA, Sarraf D, Figueroa MS, Hubschman JP. Insights into epiretinal membranes: presence of ectopic inner foveal layers and a new optical coherence tomography staging scheme. Am. J Ophthalmol. 2017;175:99–113. [DOI] [PubMed]
  • 8.Lu W, Tong Y, Yu Y, Xing Y, Chen C, Shen Y. Deep learning-based automated classification of multi-categorical abnormalities from optical coherence tomography images. Transl Vis Sci Technol. 2018;7:41. doi: 10.1167/tvst.7.6.41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sonobe T, Tabuchi H, Ohsugi H, Masumoto H, Ishitobi N, Morita S, et al. Comparison between support vector machine and deep learning, machine-learning technologies for detecting epiretinal membrane using 3D-OCT. Int Ophthalmol. 2019;39:1871–7. doi: 10.1007/s10792-018-1016-x. [DOI] [PubMed] [Google Scholar]
  • 10.Lo Y-C, Lin K-H, Bair H, Sheu WH-H, Chang C-S, Shen Y-C, et al. Epiretinal membrane detection at the ophthalmologist level using deep learning of optical coherence tomography. Sci Rep. 2020;10:8424. doi: 10.1038/s41598-020-65405-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017;135:1170–6. doi: 10.1001/jamaophthalmol.2017.3782. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kermany DS, Goldbaum M, Cai W, Valentim CCS, Liang H, Baxter SL, et al. identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172:1122–31. [DOI] [PubMed]
  • 13.De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24:1342–50. doi: 10.1038/s41591-018-0107-6. [DOI] [PubMed] [Google Scholar]
  • 14.Ting DSW, Liu Y, Burlina P, Xu X, Bressler NM, Wong TY. AI for medical imaging goes deep. Nat Med. 2018;24:539–40. doi: 10.1038/s41591-018-0029-3. [DOI] [PubMed] [Google Scholar]
  • 15.Abràmoff MD, Tobey D, Char DS. Lessons learned about autonomous AI: finding a safe, efficacious, and ethical path through the development process. Am J Ophthalmol. 2020;214:134–42. doi: 10.1016/j.ajo.2020.02.022. [DOI] [PubMed] [Google Scholar]
  • 16.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 17.Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39:2481–95. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]
  • 18.He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In Proc. IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR). 2016:90:770–8.
  • 19.Jiang XY, Ge ZQ. Data augmentation classifier for imbalanced fault classification. IEEE Trans Autom Sci Eng. 2021;18:1206–17. doi: 10.1109/TASE.2020.2998467. [DOI] [Google Scholar]
  • 20.Doguizi S, Sekeroglu MA, Ozkoyuncu D, Omay AE, Yilmazbas P. Clinical significance of ectopic inner foveal layers in patients with idiopathic epiretinal membranes. Eye (Lond) 2018;32:1652–60. doi: 10.1038/s41433-018-0153-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Alkabes M, Fogagnolo P, Vujosevic S, Rossetti L, Casini G, De Cillà S. Correlation between new OCT parameters and metamorphopsia in advanced stages of epiretinal membranes. Acta Ophthalmol. 2020;98:780–6. doi: 10.1111/aos.14336. [DOI] [PubMed] [Google Scholar]
  • 22.Flaxel CJ, Adelman RA, Bailey ST, Fawzi A, Lim JI, Vemulakonda GA, et al. Idiopathic epiretinal membrane and vitreomacular traction preferred practice pattern®. Ophthalmology. 2020;127:P145–83. doi: 10.1016/j.ophtha.2019.09.022. [DOI] [PubMed] [Google Scholar]
  • 23.Burlina P, Paul W, Mathew P, Joshi N, Pacheco KD, Bressler NM. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020;138:1070–7. doi: 10.1001/jamaophthalmol.2020.3269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Burlina P, Paul W, Liu TYA, Bressler NM. Detecting anomalies in retinal diseases using generative, discriminative, and self-supervised deep learning. JAMA Ophthalmol. 2022;140:185–9. doi: 10.1001/jamaophthalmol.2021.5557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Lehman CD, Wellman RD, Buist DSM, Kerlikowske K, Tosteson ANA, Miglioretti DL. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175:1828–37. doi: 10.1001/jamainternmed.2015.5231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Waldron L, Wang B, et al. Transparency and reproducibility in artificial intelligence. Nature. 2020;586:E14–6. doi: 10.1038/s41586-020-2766-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Daneshjou R, Smith MP, Sun MD, Rotemberg V, Zou J. Lack of transparency and potential bias in artificial intelligence data sets and algorithms: a scoping review. JAMA Dermatol. 2021;157:1362–9. doi: 10.1001/jamadermatol.2021.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.González-Gonzalo C, Thee EF, Klaver CCW, Lee AY, Schlingemann RO, Tufail A, et al. Trustworthy AI: Closing the gap between development and integration of AI systems in ophthalmic practice. Prog Retin Eye Res. 2021:90:101034. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Material (388.8KB, docx)

Data Availability Statement

The de-identified individual participant data can be requested from the correspondence author, who will evaluate such requests on a case-by-case basis.


Articles from Eye are provided here courtesy of Nature Publishing Group

RESOURCES