Skip to main content
JAMA Network logoLink to JAMA Network
. 2021 Dec 30;140(2):185–189. doi: 10.1001/jamaophthalmol.2021.5557

Detecting Anomalies in Retinal Diseases Using Generative, Discriminative, and Self-supervised Deep Learning

Philippe Burlina 1,2,3, William Paul 1, T Y Alvin Liu 2,3, Neil M Bressler 3,4,
PMCID: PMC8719271  PMID: 34967890

This cross-sectional study examines the application of anomaly detection to retinal diseases.

Key Points

Question

Can artificial intelligence systems trained on normal data recognize anomalies in retinal images?

Findings

In this cross-sectional study of 88 692 high-resolution retinal images of 44 346 individuals with varying severity of diabetic retinopathy, novel anomaly detectors were developed and tested to a surrogate problem using EyePACS data, wherein detectors were trained only on normal retinas (nonreferable diabetic retinopathy) and subsequently tasked to detect abnormality (referable diabetic retinopathy). The detectors had a relatively high area under the receiver operating characteristic curve.

Meaning

This surrogate example suggests anomaly detectors, not trained with diseased retina, can detect diabetic retinopathy; these detectors might play a role by training on normal retina to detect retinal anomalies or novel retinal disease presentations.

Abstract

Importance

Anomaly detectors could be pursued for retinal diagnoses based on artificial intelligence systems that may not have access to training examples for all retinal diseases in all phenotypic presentations. Possible applications could include screening of population for any retinal disease rather than a specific disease such as diabetic retinopathy, detection of novel retinal diseases or novel presentations of common retinal diseases, and detection of rare diseases with little or no data available for training.

Objective

To study the application of anomaly detection to retinal diseases.

Design, Setting, and Participants

High-resolution retinal images from the publicly available EyePACS data set with fundus images with a corresponding label ranging from 0 to 4 for representing different severities of diabetic retinopathy. Sixteen variants of anomaly detectors were designed. For evaluation, a surrogate problem was constructed, using diabetic retinopathy images, in which only retinas with nonreferable diabetic retinopathy, ie, no diabetic macular edema, and no diabetic retinopathy or mild to moderate nonproliferative diabetic retinopathy were used for training an artificial intelligence system, but both nonreferable and referable diabetic retinopathy (including diabetic macular edema or proliferative diabetic retinopathy) were used to test the system for detecting retinal disease.

Main Outcomes and Measures

Anomaly detectors were evaluated by commonly accepted performance metrics, including area under the receiver operating characteristic curve, F1 score, and accuracy.

Results

A total of 88 692 high-resolution retinal images of 44 346 individuals with varying severity of diabetic retinopathy were analyzed. The best performing across all anomaly detectors had an area under the receiver operating characteristic of 0.808 (95% CI, 0.789-0.827) and was obtained using an embedding method that involved a self-supervised network.

Conclusions and Relevance

This study suggests when abnormal (diseased) data, ie, referable diabetic retinopathy in this study, were not available for training of retinal diagnostic systems wherein only nonreferable diabetic retinopathy was used for training, anomaly detection techniques were useful in identifying images with and without referable diabetic retinopathy. This suggests that anomaly detectors may be used to detect retinal diseases in more generalized settings and potentially could play a role in screening of populations for retinal diseases or identifying novel diseases and phenotyping or detecting unusual presentations of common retinal diseases.

Introduction

Training of deep learning systems1 (DLSs), a type of artificial intelligence (AI) system, may be limited for rare presentations of common retinal diseases (eg, diabetic macular edema in a 95-year-old individual) or for rare ophthalmic diseases (eg, serpiginous choroidopathy) by the small number of annotated retinal images available for training. One potential solution is low-shot learning, wherein algorithms learn from a relatively low number of images as training data.2 However, ophthalmologists may be concerned with AI systems with only narrow capabilities, such as discriminating between diabetic retinopathy, age-related macular degeneration, and normal retinas, but failing to detect a relatively rarer abnormality (eg, a macular hole or choroidal melanoma) that was not included in the training of the DLS. Therefore, expanding on prior work2 to situations where no prior data are available for DLS training for certain types of diseases, we explored another approach called anomaly detection,3 which could be used to identify common or rare ophthalmic diseases, potentially facilitating referral of affected individuals.

Unlike conventional ophthalmic diagnostics scenarios where all disease classes are known, training examples are available for every disease, and fully supervised methods are used,1,4 anomaly detection can be trained using exclusively images of normal retinas to flag novel or abnormal presentations suggestive of disease or new disease phenotypes with no prior assumptions on what these abnormal images should look like.

While there have been many developments in anomaly detection algorithms using various machine learning and AI techniques,1 these approaches have only recently been applied to ophthalmology. Recent literature includes examples of medical applications of anomaly detection to optical coherence tomography segmentation,5 fundus image drusen delineation,6 brain magnetic resonance imaging segmentation,7 myopathy detection by ultrasonography,8 and, recently, retinal fundus screening.9

Anomaly detection may be relevant when AI is used for retinal diagnostic purposes, but access to annotated training data for all diseases is not available. Sample applications include screening of the general population for any retinal disease rather than a specific disease such as diabetic retinopathy or detection of novel retinal diseases, novel presentations of common retinal diseases, and rare diseases with little or no data available for training. In this investigation, we aimed to determine whether anomaly detection could be used to address these challenges.

Methods

This study investigated 16 different variants of anomaly detectors, including 12 generative and 4 discriminative detectors. The taxonomy of these detectors, from the perspective of an ophthalmologist, is described next. DLSs learn to generate alternate representations (called embeddings) of input images, representing different features. Embeddings can be used directly with additional machine learning algorithms, as done in this investigation, or in upstream layers of a DLS to accomplish tasks like classification or segmentation.

There are 2 types of DLSs. (1) Discriminative networks perform discriminative tasks such as classification1,4 and segmentation.5,7 Anomaly detection is fundamentally a discriminative task. (2) Generative networks generate realistic synthetic data (eg, fundi showing diabetic retinopathy).10 Expanding on previous work,11 our anomaly detection algorithms used either discriminative or generative embeddings, which is done by repurposing the embeddings of a network originally trained for a generative task, for use in anomaly detection.11

Finally, AI research has recently tackled unsupervised problems where training data have no labels via self-supervised techniques,12 eg, networks trained to predict one image patch from another patch within the same image. Embeddings obtained via a pretrained self-supervised networks were used in one of our anomaly detection methods.

All anomaly detection methods in this study followed a 2-step embed-and-detect approach: first, data were transformed via one of the DLS types described above to obtain a new embedding; next, the new embedding was used to detect anomalies using a classical anomaly detection machine learning method, either using a 1-class support vector machine13 or local outlier factor.13

Discriminative embedding systems used either a fully supervised network (InceptionV313) or a self-supervised network (Deep InfoMax12). Both networks were pretrained on ImageNet. For InceptionV3, the output of the global average pooling layer was used, and for Deep InfoMax, the global feature was used.12 Generative embedding systems used either InfoStyleGAN13 or StyleGAN.14 InfoStyleGAN13 is a variation of StyleGAN that finds representations corresponding to semantic variations of images (eg, for retinal images, fundus orientation, or choroidal pigmentation). For generative adversarial networks’ (GANs) embeddings, the outputs of 3 different network layers were considered, denoted as Q, Conv, and Dense (see architecture in the Figure and additional explanations13).

Figure. Architecture for the GANs Used, StyleGAN and InfoStyleGAN, and the Resulting Embeddings.

Figure.

Architecture for generative adversarial networks (GANs), StyleGAN and InfoStyleGAN,13 used as generative embeddings, in some of the anomaly detection methods used in this study. GANs, by learning to generate high-fidelity synthetic data, learn representations that might be useful for anomaly detection. Both generative adversarial networks share common components (dark blue) that map a latent space variable (including noise and semantic variables on the left) to generate synthetic data. Real and synthetic fundi are passed on to additional discriminative layers on the right. InfoStyleGAN adds a new semantic property to StyleGAN: it includes a penalty in the loss function and network component (denoted Q and shown in orange) that seeks to align the latent space representation (called semantic variables and shown on the left) with semantic features of the images (for retina, this could be orientation or choroidal pigmentation). Once trained, 3 alternative embeddings, shown as dotted lines and corresponding to different layers of the network, were evaluated as input to the anomaly detection process. The anomaly detection itself is implemented using 1 of 2 anomaly detection machine learning approaches: 1-class support vector machines (OCSVM) or local outlier factor methods (LOF) (dark gray).

This study used EyePACS (eAppendix in the Supplement), a public domain data set with 88 692 fundi of 44 346 individuals originally designed to be balanced across races and sex. Each image comes with a corresponding label ranging from 0 to 4, representing different severity of diabetic retinopathy. All anomaly detection algorithms were trained only with normal (nonreferable) retinas and then were challenged to identify previously unseen diseased retinas (diabetic retinopathy referable). Testing involved an equal number of normal and abnormal samples, with data partitioned on a patient level. Generative DLSs also were exclusively trained with normal retina (Table 1). Analysis took place from September 2019 to September 2020.

Table 1. Characteristics for All Methods for Training and Testing Imagesa.

Method Images
Training Testing
InfoStyleGAN and StyleGAN 30 000 NA
OCSVM and LOF on top of generative embeddings (StyleGAN→Q, StyleGAN→Conv, StyleGAN→Dense, InfoStyleGAN→Q, InfoStyleGAN→Conv, InfoStyleGAN→Dense) or discriminative embeddings (InceptionV3, Deep InfoMax) with discriminative embeddings 4000 (a Subset of the 30 000 above) 1000 Normal, and 1000 abnormal (DR referable)

Abbreviations: DR, diabetic retinopathy; LOF, local outlier factor; NA, not applicable; OCSVM, 1-class support vector machine.

a

Training generative adversarial networks used 30 000 randomly selected normal images, of which a subset of randomly selected 4000 were used to train the anomaly detection methods. Testing included using a randomly selected (and different from training) set of 1000 normal (healthy) and 1000 abnormal retinas (unhealthy retina with diabetic retinopathy). No validation is needed in either the training of the generative networks (InfoStyleGAN and InfoGAN), nor in the training of the anomaly detection methods, LOF and OCSVM.

Results

Table 2 reports performance and its corresponding 95% CIs of different combinations of embedding and anomaly detection machine learning methods evaluated. Among discriminative systems, the combination of using Deep InfoMax for embedding and 1-class support vector machine as anomaly detection achieved the best performance across all criteria, with an area under the receiver operating characteristic of 0.808 (95% CI, 0.789-0.827), accuracy of 73.90% (95% CI, 71.98%-75.82%), and F1 of 0.755. Among generative anomaly detection methods, InfoStyleGAN embedding achieved the best performance across all metrics (with area under the receiver operating characteristic of 0.667 [95% CI, 0.644-0.691]). Overall, discriminative embedding systems outperformed generative ones.

Table 2. Anomaly Detection Performance (AUROC, Accuracy, and F1 Score) for 16 Variations of Proposed Anomaly Detection Methodsa.

Network Embedding Anomaly method AUROC (95% CI) Accuracy, % (95% CI) F1 score
Anomaly detection methods using generative embeddings b
InfoStyleGAN Q OCSVM 0.548 (0.522-0.573) 54.10 (51.92-56.28) 0.503
InfoStyleGAN Q LOF 0.538 (0.513-0.564) 51.10 (48.91-53.29) 0.206
StyleGAN Q OCSVM 0.639 (0.615-0.663) 59.95 (57.80-62.10) 0.598
StyleGAN Q LOF 0.648 (0.624-0.672) 57.00 (54.83-59.17) 0.347
InfoStyleGAN Conv OCSVM 0.631 (0.607-0.656) 58.85 (56.69-61.01) 0.623
InfoStyleGAN Conv LOF 0.627 (0.602-0.651) 55.00 (52.82-57.18) 0.398
StyleGAN Conv OCSVM 0.652 (0.628-0.676) 61.20 (59.06-63.34) 0.639
StyleGAN Conv LOF 0.657 (0.634-0.681) 56.80 (54.63-58.97) 0.389
InfoStyleGANc Densec OCSVMc 0.658 (0.635-0.682) 61.65 (59.52-63.78)c 0.648c
InfoStyleGANc Densec LOFc 0.667 (0.644-0.691)c 59.80 (57.65-61.95) 0.472
StyleGAN Dense OCSVM 0.651 (0.627-0.674) 60.75 (58.61-62.89) 0.645
StyleGAN Dense LOF 0.656 (0.632-0.679) 59.15 (57.00-61.30) 0.436
Anomaly detection methods using discriminative embeddings d
Inception NA OCSVM 0.767 (0.746-0.787) 69.05 (67.02-71.08) 0.719
InceptionV3 NA LOF 0.784 (0.764-0.804) 66.25 (64.18-68.32) 0.542
Deep InfoMaxc NA OCSVMc 0.808 (0.789-0.827)c 73.90 (71.98-75.82)c 0.755c
Deep InfoMax NA LOF 0.802 (0.783-0.821) 65.15 (63.06-67.24) 0.528

Abbreviations: AUROC, area under the receiver operating characteristic curve; LOF, local outlier factor; NA, not applicable; OCSVM, 1-class support vector machine.

a

Anomaly detection performance, in terms of AUROC, overall accuracy, and F1 score, for 16 variations of proposed anomaly detection methods. Results indicate that among generative systems, the top performance is obtained for embeddings using InfoStyleGAN.13 Best overall results are obtained with discriminative self-supervised embedding.

b

Performance for anomaly detection systems using generative methods for representation learning: We denote these methods via the triplet of N, X, Y, where N indicates the generative network used (either StyleGAN14 or InfoStyleGAN13), X indicates the type of embedding layer used for representation3 (either the layer Q, Conv, or Dense; Figure), and Y denotes the anomaly detector algorithm used (OCSVM or LOF).

c

Best performance within a given family type.

d

Performance for anomaly detection systems using discriminative representation learning embeddings via networks pretrained on ImageNet: a pretrained supervised network (InceptionV3) and a pretrained network using self-supervision (Deep InfoMax). We denote them as the tuple X, Y, where X indicates the type of network used, and Y denotes the anomaly detector used.

Discussion

In AI-based retinal diagnostics, unavailable annotations or lack of training examples for certain diseases (eg, rare diseases, novel phenotypes, or rare variants of existing diseases) create a challenge for DLSs. To address this challenge, anomaly detection only requires training with normal images and can flag anomalies or previously unknown presentations of diseases.

This study compared 16 variations of anomaly detectors, either generative or discriminative, with different types of network embeddings. The study used a surrogate problem, which included nonreferable and referable diabetic retinopathy, using only nonreferable diabetic retinopathy images for training, from the EyePACS data set. Self-supervised discriminative embeddings performed best and resulted in good performance (area under the receiver operating characteristic = 0.808 [95% CI, 0.789-0.827]). Among generative embedding systems, the semantic InfoStyleGAN had best performance (area under the receiver operating characteristic = 0.667 [95% CI, 0.644-0.691]). Of note, a recently published study9 tackled a similar problem with broader data sets but restricted to only 1 type of detector. Comparisons between that study and this one is difficult because data sets and partitions used were not identical. However, both studies support the promise of anomaly detection for training systems with readily available and annotated normal retinal images to recognize, subsequently, anomalies in retinal images, including rare presentations of common retinal diseases or rare retinal diseases.

Limitations

Limitations of this study included the use of a surrogate problem limited to nonreferable vs referable diabetic retinopathy only, and within that, to EyePACS clinical settings only. Therefore, caution seems warranted before making broad conclusions on the applicability of anomaly detectors in the presence of other retinal diseases. Future work might investigate the potential use of anomaly detection for other uses in clinical settings involving retinal images. It also may include investigating using generative methods herein for other applications, eg, addressing retinal AI fairness or privacy,15 important considerations for furthering the deployment of ophthalmic AI.

Conclusions

This investigation suggests anomaly detection could be useful for diabetic retinopathy screening for retinal diseases, and its use could be pursued in broader applications, eg, detecting rare diseases or new presentations of common retinal diseases.

Supplement.

eAppendix. About the EyePACS dataset

References

  • 1.Ting DSW, Liu Y, Burlina P, Xu X, Bressler NM, Wong TY. AI for medical imaging goes deep. Nat Med. 2018;24(5):539-540. doi: 10.1038/s41591-018-0029-3 [DOI] [PubMed] [Google Scholar]
  • 2.Burlina PM, Paul W, Mathew P, Joshi N, Pacheco KD, Bressler NM. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020;138(10):1070-1077. doi: 10.1001/jamaophthalmol.2020.3269 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Chalapathy R, Chawla S. Deep learning for anomaly detection: a survey. arXiv. Preprint revised January 23, 2019.
  • 4.Burlina P, Joshi N, Pacheco KD, Freund DE, Kong J, Bressler NM. Utility of deep learning methods for referability classification of age-related macular degeneration. JAMA Ophthalmol. 2018;136(11):1305-1307. doi: 10.1001/jamaophthalmol.2018.3799 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Seeböck P, Orlando JI, Schlegl T, et al. Exploiting epistemic uncertainty of anatomy segmentation for anomaly detection in retinal OCT. IEEE Trans Med Imaging. 2020;39(1):87-98. doi: 10.1109/TMI.2019.2919951 [DOI] [PubMed] [Google Scholar]
  • 6.Burlina P, Freund DE, Dupas B, Bressler N. Automatic screening of age-related macular degeneration and retinal abnormalities. Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:3962-3966. doi: 10.1109/IEMBS.2011.6090984 [DOI] [PubMed] [Google Scholar]
  • 7.Baur C, Wiestler B, Albarqouni S, Navab N. Deep autoencoding models for unsupervised anomaly segmentation in brain MR images. International MICCAI Brain Lesion Workshop. 2018. [Google Scholar]
  • 8.Burlina P, Joshi N, Billings S, Wang IJ, Albayda J. Unsupervised deep novelty detection: application to muscle ultrasound and myositis screening. IEEE 16th International Symposium on Biomedical Imaging, 2019. doi: 10.1109/ISBI.2019.8759565 [DOI] [Google Scholar]
  • 9.Han Y, Li W, Liu M, et al. Application of an anomaly detection model to screen for ocular diseases using color retinal fundus images: design and evaluation study. J Med Internet Res. 2021;23(7):e27822. doi: 10.2196/27822 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Burlina PM, Joshi N, Pacheco KD, Liu TYA, Bressler NM. Assessment of deep generative models for high-resolution synthetic retinal image generation of age-related macular degeneration. JAMA Ophthalmol. 2019;137(3):258-264. doi: 10.1001/jamaophthalmol.2018.6156 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Burlina P, Joshi N, Wang IJ. Where's Wally now? deep generative and discriminative embeddings for novelty detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. doi: 10.1109/CVPR.2019.01177 [DOI] [Google Scholar]
  • 12.Bachman P, Hjelm RD, Buchwalter W. Learning representations by maximizing mutual information across views. arXiv. Preprint revised July 8, 2019.
  • 13.Paul W, Wang IJ, Alajaji F, Burlina P. Unsupervised discovery, control, and disentanglement of semantic attributes with applications to anomaly detection. Neural Comput. 2021;33(3):802-826. doi: 10.1162/neco_a_01359 [DOI] [PubMed] [Google Scholar]
  • 14.Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. doi: 10.1109/CVPR.2019.00453 [DOI] [PubMed] [Google Scholar]
  • 15.Paul W, Cao Y, Zhang M, Burlina P. Defending medical image diagnostics against privacy attacks using generative methods. arXiv. Preprint revised August 20, 2021.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement.

eAppendix. About the EyePACS dataset


Articles from JAMA Ophthalmology are provided here courtesy of American Medical Association

RESOURCES