Self-Supervised Learning for Improved Optical Coherence Tomography Detection of Macular Telangiectasia Type 2

Shahrzad Gholami; Lea Scheppke; Meghana Kshirsagar; Yue Wu; Rahul Dodhia; Roberto Bonelli; Irene Leung; Ferenc B Sallo; Alyson Muldrew; Catherine Jamison; Tunde Peto; Juan Lavista Ferres; William B Weeks; Martin Friedlander; Aaron Y Lee

doi:10.1001/jamaophthalmol.2023.6454

. 2024 Feb 8;142(3):226–233. doi: 10.1001/jamaophthalmol.2023.6454

Self-Supervised Learning for Improved Optical Coherence Tomography Detection of Macular Telangiectasia Type 2

Shahrzad Gholami ¹, Lea Scheppke ², Meghana Kshirsagar ¹, Yue Wu ^3,⁴, Rahul Dodhia ¹, Roberto Bonelli ², Irene Leung ⁶, Ferenc B Sallo ⁷, Alyson Muldrew ⁸, Catherine Jamison ⁸, Tunde Peto ⁸, Juan Lavista Ferres ¹, William B Weeks ¹, Martin Friedlander ^2,⁵, Aaron Y Lee ^3,^4,^✉, for the MacTel Research Group

¹AI for Good Lab, Microsoft Research, Redmond, Washington

²The Lowy Medical Research Institute, La Jolla, California

³Department of Ophthalmology, University of Washington, Seattle

⁴Roger and Angie Karalis Johnson Retina Center, Seattle, Washington

⁵The Scripps Research Institute, La Jolla, California

⁶Moorfields Eye Hospital, London, United Kingdom

⁷Hôpital Ophtalmique Jules-Gonin, Fondation Asile des Aveugles, University of Lausanne, Lausanne, Switzerland

⁸Queen’s University Belfast, Belfast, Northern Ireland

Group Information: The members of the MacTel Research Group appear in Supplement 2.

Accepted for Publication: November 29, 2023.

Published Online: February 8, 2024. doi:10.1001/jamaophthalmol.2023.6454

^✉

Corresponding Author: Aaron Y. Lee, MD, MSCI, Department of Ophthalmology, University of Washington, 325 Ninth Ave, Box 359608, Seattle, WA 98104 (leeay@uw.edu).

Author Contributions: Dr Gholami had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Concept and design: Gholami, Kshirsagar, Dodhia, Bonelli, Lavista Ferres, Weeks, Friedlander, Lee.

Acquisition, analysis, or interpretation of data: Gholami, Scheppke, Wu, Leung, Sallo, Muldrew, Jamison, Peto, Lavista Ferres, Weeks.

Drafting of the manuscript: Gholami, Lee.

Critical review of the manuscript for important intellectual content: All authors.

Statistical analysis: Gholami, Dodhia, Bonelli, Lavista Ferres.

Obtained funding: Lavista Ferres, Friedlander, Lee.

Administrative, technical, or material support: Scheppke, Wu, Leung, Sallo, Muldrew, Jamison, Peto, Lavista Ferres, Friedlander, Lee.

Supervision: Dodhia, Bonelli, Lavista Ferres, Weeks.

Conflict of Interest Disclosures: Dr Lee reported grants from the Lowy Medical Research Institute and the National Institutes of Health and nonfinancial support from Microsoft during the conduct of the study as well as nonfinancial support from Topcon, iCareWorld, Optomed, and Heidelberg; grants from Santen, Carl Zeiss Meditec, Regeneron, Amazon, and Meta; and personal fees from Genentech, Johnson and Johnson, Boehringer Ingelheim, and Gyroscope outside the submitted work. No other disclosures were reported.

Funding/Support: This study was funded by grants from the Lowy Medical Research Institute, Microsoft AI for Good Research Lab, National Eye Institute (K23EY029246; Dr Lee), National Institute on Aging (U19AG066567; Dr Lee), the National Institutes of Health (OT2OD032644; Dr Lee), Research to Prevent Blindness (Dr Lee), the C. Dan and Irene Hunter Endowed Professorship (Dr Lee), the Karalis Johnson Retina Center (Dr Lee), and by an unrestricted grant from Research to Prevent Blindness (Dr Lee).

Role of the Funder/Sponsor: The funders had no role in the design and conduct of this study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Group Information: The members of the MacTel Research Group appear in Supplement 2.

Data Sharing Statement: See Supplement 3.

Additional Contributions: We would like to thank the Lowy Family for their funding support of the MacTel Project and the Lowy Medical Research Institute. We would like to thank the participants in the MacTel Project, the clinical research staff and physicians at MacTel clinical sites, the clinicians at Moorfields Eye Hospital Reading Centre, especially Cathy Egan, MD (no additional compensation beyond salary), and Alan Bird, MD (compensation provided), and the staff at Queen’s University Belfast Reading Centre.

^✉

Corresponding author.

PMCID: PMC10853868 PMID: 38329740

Key Points

Question

Can self-supervised learning improve automated macular telangiectasia type 2 (MacTel) classification on optical coherence tomography (OCT) images in the setting of limited labeled training data?

Findings

This comparative effectiveness research study including 5200 scans from 2680 patients compared self-supervised models trained on unlabeled data and fine-tuned on labeled data to traditional supervised models trained on the labeled data. Self-supervised models demonstrated the highest performance and better agreement with the more experienced human expert graders.

Meaning

The findings support self-supervised learning improved accuracy of MacTel classification on OCT images; however, studies would be needed to determine if this approach may be applicable to other rare diseases where lack of labeled training data are a challenge.

This comparative effectiveness research study describes a self-learning approach designed to improve optical coherence tomography detection of macular telangiectasia type 2 using limited labeled data.

Abstract

Importance

Deep learning image analysis often depends on large, labeled datasets, which are difficult to obtain for rare diseases.

Objective

To develop a self-supervised approach for automated classification of macular telangiectasia type 2 (MacTel) on optical coherence tomography (OCT) with limited labeled data.

Design, Setting, and Participants

This was a retrospective comparative study. OCT images from May 2014 to May 2019 were collected by the Lowy Medical Research Institute, La Jolla, California, and the University of Washington, Seattle, from January 2016 to October 2022. Clinical diagnoses of patients with and without MacTel were confirmed by retina specialists. Data were analyzed from January to September 2023.

Exposures

Two convolutional neural networks were pretrained using the Bootstrap Your Own Latent algorithm on unlabeled training data and fine-tuned with labeled training data to predict MacTel (self-supervised method). ResNet18 and ResNet50 models were also trained using all labeled data (supervised method).

Main Outcomes and Measures

The ground truth yes vs no MacTel diagnosis is determined by retinal specialists based on spectral-domain OCT. The models’ predictions were compared against human graders using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), area under precision recall curve (AUPRC), and area under the receiver operating characteristic curve (AUROC). Uniform manifold approximation and projection was performed for dimension reduction and GradCAM visualizations for supervised and self-supervised methods.

Results

A total of 2636 OCT scans from 780 patients with MacTel and 131 patients without MacTel were included from the MacTel Project (mean [SD] age, 60.8 [11.7] years; 63.8% female), and another 2564 from 1769 patients without MacTel from the University of Washington (mean [SD] age, 61.2 [18.1] years; 53.4% female). The self-supervised approach fine-tuned on 100% of the labeled training data with ResNet50 as the feature extractor performed the best, achieving an AUPRC of 0.971 (95% CI, 0.969-0.972), an AUROC of 0.970 (95% CI, 0.970-0.973), accuracy of 0.898%, sensitivity of 0.898, specificity of 0.949, PPV of 0.935, and NPV of 0.919. With only 419 OCT volumes (185 MacTel patients in 10% of labeled training dataset), the ResNet18 self-supervised model achieved comparable performance, with an AUPRC of 0.958 (95% CI, 0.957-0.960), an AUROC of 0.966 (95% CI, 0.964-0.967), and accuracy, sensitivity, specificity, PPV, and NPV of 90.2%, 0.884, 0.916, 0.896, and 0.906, respectively. The self-supervised models showed better agreement with the more experienced human expert graders.

Conclusions and Relevance

The findings suggest that self-supervised learning may improve the accuracy of automated MacTel vs non-MacTel binary classification on OCT with limited labeled training data, and these approaches may be applicable to other rare diseases, although further research is warranted.

Introduction

Macular telangiectasia type 2 (MacTel) is a slowly progressing age-related neurodegenerative disease of the macula. In advanced stages, hyperplasia of the retinal pigment epithelium and subretinal neovascularization can result in vision impairment.¹ Although the current standard for MacTel diagnosis relies on multimodal imaging (fundus photography, fluorescein angiography, fundus autofluorescence, and optical coherence tomography [OCT] imaging),² a variety of findings are apparent on OCT imaging at all stages of disease.^3,4

Machine learning–based image analysis provides an ideal method for assessing structural changes on OCT for many retinal diseases,^5,6 including as a noninvasive approach for monitoring MacTel disease progression. Previous studies have used deep learning approaches to segment and quantify retinal cavitation⁷ and the ellipsoid zone defect area on OCT images of patients with MacTel,⁸ as well as to estimate visual functioning (retinal sensitivity on microperimetry testing) from OCT structural images.⁹

One limitation with applying deep learning OCT image analysis in this setting is that MacTel is rare, with an estimated prevalence as low as 0.005%.¹⁰ Obtaining a large, labeled imaging dataset with MacTel findings is a considerable challenge. Self-supervised learning (SSL) is a machine learning approach that allows models to learn from unlabeled data without the need for explicit annotations or labels and can be helpful when labeled datasets are limited or difficult to obtain.¹¹ By pretraining models using SSL, researchers can leverage large amounts of unlabeled data to improve the performance of downstream tasks that require labeled data, such as predicting whether an OCT scan shows MacTel disease.

In this study, we explored the efficacy of 1 SSL approach to improve MacTel classification by pretraining neural networks using the Bootstrap Your Own Latent (BYOL)¹² algorithm on unlabeled data. BYOL is an SSL technique that uses a pair of differently augmented views of the same image to learn the image features. We aimed to demonstrate that pretraining on unlabeled data can reduce reliance on large, labeled datasets, thereby making it easier for researchers to build accurate disease detection models for rare diseases with limited labeled data.

Methods

Data Collection

MacTel Project Registry Study

Participants with and without MacTel were enrolled in the MacTel Project Registry study¹³ at participating clinical sites. The study was approved by central or local institutional review boards and is in adherence with the tenets of the Declaration of Helsinki. All participants provided written informed consent. Participants underwent dilated retinal examinations and standard ophthalmic imaging. Clinical diagnoses were confirmed by retinal specialists at the Moorfields Eye Hospital Reading Centre, London, England, and Queen’s University Belfast Reading Centre, Belfast, Northern Ireland, who were not aware of previous diagnoses. MacTel vs non-MacTel diagnoses were made based on multimodal imaging: fundus photography, fundus autofluorescence, fluorescein angiography, and OCT. Data were collected from May 2014 to May 2019.

University of Washington

OCT images were extracted from the SPECTRALIS imaging database at the University of Washington, Seattle, using an automated extraction tool. The study was approved by the University of Washington institutional review board and was conducted in accordance with the tenets of the Declaration of Helsinki and the Health Insurance Portability and Accountability Act. The need for informed consent was waived by the institutional review board owing to the retrospective nature of the dataset. Clinical variables were extracted simultaneously with the OCT images from the Epic electronic health records database. Data were collected from January 2016 to October 2022.

Datasets

The OCT dataset used in this study contained images from 2 institutions: the MacTel Project¹⁴ dataset, which included 2636 OCT scans from 780 patients with MacTel and 131 patients without MacTel (overall mean [SD] age, 60.8 [11.7] years; 63.8% female), and the University of Washington, which included 2564 scans from 1769 patients without MacTel (mean [SD] age, 61.2 [18.1] years; 53.4% female). The OCT images from both institutions were acquired using a SPECTRALIS OCT device (Heidelberg Engineering) and were combined to create the study dataset. See eFigure 1 in Supplement 1 for OCT image examples.

Data Preparation

The study dataset contained OCT volumes with varying dimensions for the width and height. To standardize the samples’ dimensions, we resampled all volumes to a fixed size of 496 × 768 × 196 B-scans using linear interpolation. To enhance computational efficiency and focus on clinically relevant areas, we selected the middle third of B-scans from each volume. This selection was made considering the anatomical significance of this region for MacTel diagnosis, as it often captures critical features indicative of the disease. We further refined our dataset by resampling the chosen B-scans into 3 slices (Figure 1). These B-scans were stacked to form a 3-channel RGB image, where each channel corresponds to a single B-scan. This approach allowed us to use the contextual information provided by adjacent B-scans in 2-dimensional neural network architectures.¹⁵ The resulting dataset consisted of 5200 volumes. We augmented the data using random horizontal flips and center crop to increase the robustness of the model. To address computational constraints and streamline experimentation, we opted for a conventional split, over 5-fold cross-validation, ensuring a balance between computational feasibility and robust evaluation (eDiscussion in Supplement 1). Thus, the study dataset is randomly split into training, validation, and test sets with a ratio of 80:10:10 at the patient level. The training set consists of 2348 positive and 1852 negative samples. The validation set consists of 262 positive and 238 negative samples. The test set consisted of 225 positive and 275 negative samples. We used the training and validation sets for model training and hyperparameter tuning and reserved the test set for the final evaluation of the model’s performance.

Figure 1. — The figure shows the traditional supervised learning method (TSL) and self-supervised learning method (SSL). Fine-tuning based on unlabeled optical coherence tomography (OCT) images was done using Bootstrap Your Own Latent.

Model Development

We adopted 2 distinct approaches. Traditional supervised learning (TSL) and SSL using BYOL to train models were used to investigate the feasibility of improving model accuracy for rare diseases with limited labeled data when the model is pretrained based on self-supervised learning and without any labels (Figure 1).

BYOL SSL Models

We pretrained 2 different convolutional neural network architectures, ResNet18 and ResNet50, using the BYOL algorithm on the 3-channel RGB images from the training set prepared without using any labels. We then fine-tuned the networks in a supervised learning manner using various amounts of labeled data (10%, 25%, 50%, and 100% of the training dataset).

Supervised Models

We also trained ResNet18 and ResNet50 using all training labeled data to predict whether each patient had MacTel. We used PyTorch implementations for these models and trained them using the stochastic gradient descent optimizer with a learning rate of 0.001 and a batch size of 32 (eDiscussion in Supplement 1). We initialized the models’ weights using the weights of a model pretrained on ImageNet, a large-scale dataset of natural images. We further fine-tuned the models’ weights on our OCT dataset for 100 epochs, with early stopping based on the binary cross entropy loss on the validation set.

Model Evaluation

We assessed the trained models using metrics, such as AUROC, AUPRC, accuracy, sensitivity, specificity, PPV, and NPV, on the dedicated test set for final evaluation. We tested the ability of the self-supervised BYOL models to learn useful features from the unlabeled data using K-nearest neighbor for binary classification. We then compared the performance of the BYOL self-supervised models with the supervised learning models after fine-tuning all models on varying amounts of labeled data.

Feature Visualization

We analyzed the self-supervised BYOL models using uniform manifold approximation and projection for dimension reduction (UMAP)¹⁶ to evaluate whether the features learned by the models were helpful in classifying MacTel. UMAP is a dimensionality reduction technique used for visualizing high-dimensional data in lower dimensions while preserving the pairwise distances between data points. We used the Euclidean metric for the logit layer results and the correlation metric for the last convolutional layer results. We set a minimum distance of 0.1 and 30 neighbors for visualizations. We used unsupervised manifold learning with UMAP to visualize and cluster the features learned by the models.

Explainability

To further interpret the models and gain insight into their decision-making processes, we selected several correctly classified OCT images of patients with MacTel from the test set. We then generated heatmaps using Grad-CAM^17,18 and guided backpropagation techniques¹⁹ to visualize the regions of the OCT images that the models attended to when making predictions.

Human Graders Comparison

We compared the performance of the models to that of human graders by asking experts at Queen’s University Belfast Reading Centre, Moorfields Eye Hospital, and Jules Gonin Eye Hospital, University of Lausanne, Lausanne, Switzerland, to independently grade a subset of the test set images. The graders rated the images for scan quality, presence or absence of MacTel, and confidence in their MacTel diagnosis. We used the binary class assignments (MacTel vs non-MacTel) to compute metrics against ground truth for each grader and for the graders ensemble (average of the 4 graders; when at least 2 graders reported MacTel, a diagnosis of MacTel was recorded). The Cohen κ coefficient was used to measure intergrader agreement and agreement between the graders and the ground truth labels, and the sensitivity and specificity of the human graders were computed using the same metrics as the models.

Statistical Analysis

Data analysis was executed using Python version 3.8 (Python Software Foundation), along with Torchvision 0.13.0, umap-learn 0.5.3, Pillow 9.2.0, and Pandas 1.4.3 libraries. The analytical timeframe spanned from January to September 2023, during which we implemented standard evaluation procedures aligned with computer vision deep learning frameworks. This included rigorous training, validation, and out-of-sample testing phases against both unseen data during training and the test set regraded by human experts. Subsequently, statistical measures appropriate for classification models, such as accuracy, sensitivity, specificity, AUROC, and AUPRC, were applied to assess the significance of our findings and the performance of trained models. To provide comprehensive insights, extensive visualizations of results were generated, incorporating techniques such as UMAP and GradCAM. Our computations were conducted on a Standard ND40rs v2 Azure virtual machine, equipped with 8 NVIDIA Tesla V100 NVLINK-connected GPUs, each offering 32 GB of GPU memory. This virtual machine also featured 40 non-HyperThreaded Intel Xeon Platinum 8168 (Skylake) cores and 672 GiB of system memory. This methodical integration of software and hardware resources, combined with standard cross-validation practices across diverse datasets, including human regrading of the reserved test set, ensures the reliability and thoroughness of our statistical analyses.

Results

Validation Accuracy

BYOL SSL Models

For the BYOL approach using the ResNet18 model, the highest accuracy using K-nearest neighbor as the classifier was achieved after 30 epochs, and for ResNet 50 the highest accuracy was achieved after 60 epochs, when comparing against the accuracy results in epoch 0. For both cases, κ = 7 resulted in the highest validation accuracy. These results indicate that pretraining with SSL can improve downstream task performance, even when using a classical machine learning model like K-nearest neighbor (eFigure 2 in Supplement 1).

BYOL vs ImageNet Pretrained Models

When the models pretrained on ImageNet were compared to the self-supervised BYOL pretrained models after fine-tuning with 10% of the labeled training data, the results demonstrated a larger gap in performance on the training vs validation sets for models pretrained based on ImageNet, indicating overfitting to the training set. Models pretrained using BYOL showed a smaller gap between the performance on training and validation sets, indicating better generalization (eFigure 3 in Supplement 1).

The models’ performance on the test set was compared after fine-tuning with increasing amounts of labeled data, and results demonstrated that pretraining based on BYOL SSL boosted the TSL results (Figure 2). Specifically, when the labeled data were scarce, the BYOL model achieved higher accuracy and AUROC score than the model pretrained on ImageNet. When only 10% of the labeled training data were used, the accuracy and AUROC score of the BYOL SSL pretrained with ResNet18 were 92% and 0.966 (95% CI, 0.964-0.967) respectively, compared to 84% and 0.857 (95% CI, 0.853-086) for the ImageNet pretrained model.

Figure 2. — Comparison of models trained based on traditional supervised learning (TLS) method vs self-supervised learning (SSL) after supervised fine-tuning on varying percentages of the labeled training data. AUROC indicates area under the receiver operating characteristic curve.

BYOL Feature Extractor Neural Network Choices: ResNet50 vs ResNet18

When all models were fine-tuned on 100% of the labeled training data, ResNet50 pretrained on ImageNet or using BYOL outperformed the ResNet18 counterpart on almost all metrics (Table). However, when the performances of ResNet50 and ResNet18 pretrained using BYOL were compared using varying percentages of labeled data, ResNet18 indicated that the self-supervised unlabeled pretrained model was better with less labeled data. This was less clear with ResNet 50, a bigger, more complex model that may require additional labeled data to fully unlock their potential and exhibit significant improvements (Figure 2).

Table. Comparison of Supervised Learning (100% of Labeled Data) Performance and Graders Results Evaluated Against Ground Truth^a.

Rater	AUROC (95% CI)	AUPRC (95% CI)	Accuracy	Sensitivity	Specificity	PPV	NPV
ResNet50 (SSL)	0.970 (0.969-0.972)	0.971 (0.97-0.973)	0.926	0.898	0.949	0.935	0.919
ResNet50 (TSL)	0.947 (0.946-0.949)	0.935 (0.932-0.938)	0.898	0.898	0.898	0.878	0.915
ResNet18 (SSL)	0.972 (0.971-0.973)	0.966 (0.964-0.968)	0.898	0.951	0.855	0.843	0.955
ResNet18 (TSL)	0.965 (0.963-0.966)	0.963 (0.961-0.964)	0.890	0.876	0.902	0.879	0.899
Grader 1	NA	NA	0.950	0.902	0.989	0.985	0.925
Grader 2	NA	NA	0.950	0.893	0.996	0.995	0.919
Grader 3	NA	NA	0.910	0.800	1.000	1.000	0.859
Grader 4	NA	NA	0.880	0.747	0.989	0.982	0.827
Graders ensemble^b	0.976 (0.976-0.977)	0.986 (0.986-0.987)	0.968	0.929	1.000	1.000	0.945

Open in a new tab

Abbreviations: AUPRC, area under the precision recall curve; AUROC, area under the receiving operating characteristic curve; NA, not applicable; NPV, negative predictive value; PPV, positive predictive value; SSL, self-supervised learning; TSL, traditional supervised learning.

^{^a}

Comparison of MacTel binary detection for all model architectures and pretraining methods for 500 patients (225 with MacTel and 275 without MacTel) in the test set. We evaluated each rater against ground truth.

^{^b}

Graders ensemble is voting among 4 graders; when at least 2 graders reported positive, we recorded a positive diagnosis. To compute AUROC and AUPRC for the graders ensemble, we averaged the diagnosis among 4 graders.

UMAP Feature Visualization

Visualization of features learned by the model pretrained with BYOL vs ImageNet using UMAP suggested that BYOL was effective in learning meaningful features from the OCT images. The UMAP visualization of features learned after fine-tuning the models on 10%, 25%, 50%, and 100% of the labeled training data showed that pretraining based on SSL using BYOL led to better separability between the classes for both last convolutional layer (Figure 3; eFigure 4 in Supplement 1) and logit results (eFigure 5 in Supplement 1), even when only 10% of the labeled training data were used. Particularly, the UMAP plots show a clearer distinction between the positive and negative samples when the model was pretrained using BYOL compared to pretraining based on ImageNet.

Figure 3. — Features learned from supervised learning based on 100% of the labeled data. SSL indicates self-supervised learning; TSL, traditional supervised learning.

Explainability

Our trained deep learning models inherently learn intricate patterns and autonomously identify discriminative features relevant to the classification of MacTel without explicit guidance. In this context, the model has the potential to discern subtle morphological characteristics, such as specific retinal layer thickness variations, presence of characteristic lesions, or other distinctive anatomical markers associated with disease. These features are likely learned through the convolutional layers, capturing both low-level and high-level representations. The interpretability techniques, Grad-CAM and guided backpropagation, provide post hoc insights into the decision-making by highlighting regions of input images that influenced the classification. For example, specific areas of the retina may consistently draw the model’s attention, indicating importance in the diagnostic process. However, it is crucial to note that these techniques offer correlation rather than causation. Our visualizations demonstrate that the model is relying on clinically relevant areas of the retina to classify MacTel (Figure 4; eFigures 6 and 7 in Supplement 1).

Figure 4. — Results for a correctly classified optical coherence tomography (OCT) image of a patient with MacTel for ResNet50 architecture trained based on the self-supervised learning approach using the entire training set. A, RGB images created from the original OCT scans given as input to the ResNet models. B, Grad-CAM results highlighting the regions of the image that were most relevant for the classification decision. C, Guided backpropagation results indicating which pixels of the image had the highest contribution to the classification decision. D, Combination of Grad-CAM and guided backpropagation.

Comparison to Human Graders

For accuracy and sensitivity, the self-supervised ResNet50 model achieved higher scores compared to 2 of the graders (Table). The Cohen κ matrix comparing agreement between each deep learning model and individual graders is shown in eFigure 8 in Supplement 1. There was some discrepancy between graders resulting from varying experience with grading MacTel (graders 1 and 2 were more experienced). The self-supervised models showed better agreement with the more experienced graders.

The overall best performing model was the ResNet 50 pretrained using the SSL approach. Using 100% of labeled training data, this model achieved an AUPRC of 0.971 (95% CI, 0.969-0.972) and AUROC of 0.970 (95% CI, 0.970-0.973). With accuracy of 97%, this model outperformed human graders 3 and 4 (accuracy of 91% and 88%, respectively). It also performed nearly as well as the best 2 human graders with respect to sensitivity, specificity, and NPV. It achieved comparable performance against an ensemble of human experts with an AUPRC score of 0.986 (95% CI, 0.986-0.987) and AUROC of 0.976 (95% CI, 0.976-0.977).

Discussion

This findings in this comparative effectiveness research study demonstrate the potential of SSL approaches, like BYOL, for improving the accuracy of automated classification of OCT images, even when access to large, labeled datasets is limited. Our results show that pretraining on unlabeled data can considerably boost the performance of downstream supervised learning tasks, particularly when only a small amount of labeled data are available, as is often the case with rare diseases, such as MacTel. The self-supervised model was able to learn relevant information from unlabeled OCT images that enabled it to accurately classify MacTel after fine-tuning with a small amount of labeled data, thus reducing the need for expert annotation. In addition, the BYOL model performed better than the TSL in the setting of fewer labeled OCT images. In fact, with only 419 OCT volumes containing 185 MacTel patients in the 10% labeled training dataset, the self-supervised methods with ResNet18 achieved comparable performance to the best model in our study.

Previous studies have found that models pretrained using SSL often outperform traditional deep learning approaches when labeled data are limited.^20,21 Burlina et al¹¹ compared TSL (ResNet50) to SSL approaches (Deep InfoMax) for classifying diabetic retinopathy needing referral to an ophthalmologist vs nonreferable diabetic retinopathy. They found that when the models were trained with many examples (5120 per class), both methods performed comparably, but in the setting of few examples (n = 160), the self-supervised model outperformed the traditional model (AUC of 0.747 vs 0.659, respectively). BYOL uses online and target networks for SSL, through which the model generates latent representations from the unlabeled data and then compares them to one another, gradually learning information about the data through this process and improving the model to function as a feature extractor for future tasks, such as classification.

We performed UMAP visualization to assess the features learned after fine-tuning the models on 10%, 25%, 50%, and 100% of the labeled data, and found that SSL using BYOL led to better separability between the classes (MacTel vs non-MacTel). This further suggests that pretraining using SSL with BYOL can improve the discriminative power of the learned features even when the amount of labeled data are limited. Grad-CAM and guided backpropagation techniques to assess which areas of the OCT image the models relied on to identify an image as MacTel showed that the areas of hyporeflective cavities and loss of retinal architecture were correctly identified as relevant areas on the OCT B-scan images.

With regard to model performance vs human graders, the self-supervised models generally showed more agreement with the human expert graders. Two of the graders with more expertise in MacTel grading demonstrated more consistent agreement with the ground truth labels, though all of the graders noted that they normally rely on more than 1 imaging modality to diagnose MacTel. But the ResNet50 self-supervised model outperformed the 2 graders with less expertise on MacTel classification on the test set on accuracy and sensitivity, and the self-supervised models showed better agreement with the most expert graders compared to the supervised models.

Limitations

This study has limitations. Our dataset was limited to patients from certain geographies and to this particular use case of MacTel classification; further external validations are needed on larger and more diverse datasets. While our results are promising, the transition to uncontrolled conditions necessitates caution. Future trials and consolidation efforts are imperative to validate and enhance the generalizability of our models to further fortify the reliability and applicability of our approach in broader clinical settings.

It is crucial to recognize that the immediate impact on current clinical practice may be nuanced. The integration of SSL methodologies into routine clinical workflows necessitates further validation, collaboration, and refinement. Presently, our study serves as a foundation, demonstrating the feasibility and potential benefits of leveraging SSL in the realm of MacTel diagnosis. To effect substantial changes in clinical practice, additional multicenter studies with larger and more diverse datasets incorporating insights from ophthalmic practitioners are imperative. Additionally, addressing challenges related to model interpretability, ethical considerations, and regulatory standards is essential for garnering trust and widespread adoption within the clinical community.

Future work could also include exploring multimodal approaches that combine OCT images with other data sources, such as genetic information, to further improve detection accuracy. We simulated not having a labeled dataset by pretraining with the full training set using BYOL instead of a second truly unlabeled dataset. While this may not be reflective of the true use case for others in the field, it allowed us to study having different amounts of training data.

Conclusions

The findings in this study suggest that self-supervised learning may improve the accuracy of automated MacTel vs non-MacTel binary classification on OCT with limited labeled training data. These approaches may be applicable to other rare diseases, although further research is warranted.

Supplement 1.

eDiscussion

eFigure 1. Example optical coherence tomography (OCT) slices

eFigure 2. MacTel detection accuracy on the validation set

eFigure 3. Learning behavior for ResNet18 models

eFigure 4. UMAP results based on last layer of neural net models

eFigure 5. UMAP results based on logit layer of neural net models

eFigure 6. AI Explainability results for several examples

eFigure 7. AI Explainability results evaluation for two examples

eFigure 8. Cohen’s Kappa Matrix reflects on the inter-rater agreements

jamaophthalmol-e236454-s001.pdf^{(2.5MB, pdf)}

Supplement 2.

The MacTel Research Group

jamaophthalmol-e236454-s002.pdf^{(142.5KB, pdf)}

Supplement 3.

Data sharing statement

jamaophthalmol-e236454-s003.pdf^{(14KB, pdf)}

References

1.Kedarisetti KC, Narayanan R, Stewart MW, Reddy Gurram N, Khanani AM. Macular telangiectasia type 2: A comprehensive review. Clin Ophthalmol. 2022;16:3297-3309. doi: 10.2147/OPTH.S373538 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Chew EY, Peto T, Clemons TE, et al. Macular telangiectasia type 2: a classification system using MultiModal Imaging MacTel Project Report Number 10. Ophthalmol Sci. 2022;3(2):100261. doi: 10.1016/j.xops.2022.100261 [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Venkatesh R, Reddy NG, Mishra P, et al. Spectral domain OCT features in type 2 macular telangiectasia (type 2 MacTel): its relevance with clinical staging and visual acuity. Int J Retina Vitreous. 2022;8(1):26. doi: 10.1186/s40942-022-00378-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Charbel Issa P, Gillies MC, Chew EY, et al. Macular telangiectasia type 2. Prog Retin Eye Res. 2013;34:49-77. doi: 10.1016/j.preteyeres.2012.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342-1350. doi: 10.1038/s41591-018-0107-6 [DOI] [PubMed] [Google Scholar]
6.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-175. doi: 10.1136/bjophthalmol-2018-313173 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Loo J, Cai CX, Choong J, et al. Deep learning-based classification and segmentation of retinal cavitations on optical coherence tomography images of macular telangiectasia type 2. Br J Ophthalmol. 2022;106(3):396-402. doi: 10.1136/bjophthalmol-2020-317131 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Loo J, Clemons TE, Chew EY, Friedlander M, Jaffe GJ, Farsiu S. Beyond performance metrics: automatic deep learning retinal OCT analysis reproduces clinical trial outcome. Ophthalmology. 2020;127(6):793-801. doi: 10.1016/j.ophtha.2019.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kihara Y, Heeren TFC, Lee CS, et al. Estimating retinal sensitivity using optical coherence tomography with deep-learning algorithms in macular telangiectasia type 2. JAMA Netw Open. 2019;2(2):e188029. doi: 10.1001/jamanetworkopen.2018.8029 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Aung KZ, Wickremasinghe SS, Makeyeva G, Robman L, Guymer RH. The prevalence estimates of macular telangiectasia type 2: the Melbourne Collaborative Cohort Study. Retina. 2010;30(3):473-478. doi: 10.1097/IAE.0b013e3181bd2c71 [DOI] [PubMed] [Google Scholar]
11.Burlina P, Paul W, Mathew P, Joshi N, Pacheco KD, Bressler NM. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020;138(10):1070-1077. doi: 10.1001/jamaophthalmol.2020.3269 [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Grill JB, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst. 2020;33:21271-21284. [Google Scholar]
13.Clemons TE, Gillies MC, Chew EY, et al. ; MacTel Research Group . Baseline characteristics of participants in the natural history study of macular telangiectasia (MacTel) MacTel Project Report No. 2. Ophthalmic Epidemiol. 2010;17(1):66-73. doi: 10.3109/09286580903450361 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.The MacTel Project. The Lowy Medical Research Institute . Accessed May 19, 2023. https://www.lmri.net/mactel/the-mactel-project/
15.Wu Y, Egan C, Olvera-Barrios A, et al. Developing a continuous severity scale for macular telangiectasia type 2 using deep learning and implications for disease grading. Ophthalmology. 2023;20:S0161-6420(23)00675-9. doi: 10.1016/j.ophtha.2023.09.016 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv. Posted online February 9, 2018. https://arxiv.org/abs/1802.03426
17.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017. IEEE International Conference on Computer Vision (ICCV). Published online 2017. doi: 10.1109/ICCV.2017.74 [DOI] [Google Scholar]
18.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, Nevada. 2016:2921-2929. doi: 10.1109/CVPR.2016.319 [DOI] [Google Scholar]
19.Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv. Posted online December 21, 2014. https://arxiv.org/abs/1412.6806
20.Huang SC, Pareek A, Jensen M, Lungren MP, Yeung S, Chaudhari AS. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digit Med. 2023;6(1):74. doi: 10.1038/s41746-023-00811-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Holmberg OG, Köhler ND, Martins T, et al. Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. Nat Mach Intell. 2020;2(11):719-726. doi: 10.1038/s42256-020-00247-1 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1.

eDiscussion

eFigure 1. Example optical coherence tomography (OCT) slices

eFigure 2. MacTel detection accuracy on the validation set

eFigure 3. Learning behavior for ResNet18 models

eFigure 4. UMAP results based on last layer of neural net models

eFigure 5. UMAP results based on logit layer of neural net models

eFigure 6. AI Explainability results for several examples

eFigure 7. AI Explainability results evaluation for two examples

eFigure 8. Cohen’s Kappa Matrix reflects on the inter-rater agreements

jamaophthalmol-e236454-s001.pdf^{(2.5MB, pdf)}

Supplement 2.

The MacTel Research Group

jamaophthalmol-e236454-s002.pdf^{(142.5KB, pdf)}

Supplement 3.

Data sharing statement

jamaophthalmol-e236454-s003.pdf^{(14KB, pdf)}

[eoi230086r1] 1.Kedarisetti KC, Narayanan R, Stewart MW, Reddy Gurram N, Khanani AM. Macular telangiectasia type 2: A comprehensive review. Clin Ophthalmol. 2022;16:3297-3309. doi: 10.2147/OPTH.S373538 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r2] 2.Chew EY, Peto T, Clemons TE, et al. Macular telangiectasia type 2: a classification system using MultiModal Imaging MacTel Project Report Number 10. Ophthalmol Sci. 2022;3(2):100261. doi: 10.1016/j.xops.2022.100261 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r3] 3.Venkatesh R, Reddy NG, Mishra P, et al. Spectral domain OCT features in type 2 macular telangiectasia (type 2 MacTel): its relevance with clinical staging and visual acuity. Int J Retina Vitreous. 2022;8(1):26. doi: 10.1186/s40942-022-00378-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r4] 4.Charbel Issa P, Gillies MC, Chew EY, et al. Macular telangiectasia type 2. Prog Retin Eye Res. 2013;34:49-77. doi: 10.1016/j.preteyeres.2012.11.002 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r5] 5.De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342-1350. doi: 10.1038/s41591-018-0107-6 [DOI] [PubMed] [Google Scholar]

[eoi230086r6] 6.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-175. doi: 10.1136/bjophthalmol-2018-313173 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r7] 7.Loo J, Cai CX, Choong J, et al. Deep learning-based classification and segmentation of retinal cavitations on optical coherence tomography images of macular telangiectasia type 2. Br J Ophthalmol. 2022;106(3):396-402. doi: 10.1136/bjophthalmol-2020-317131 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r8] 8.Loo J, Clemons TE, Chew EY, Friedlander M, Jaffe GJ, Farsiu S. Beyond performance metrics: automatic deep learning retinal OCT analysis reproduces clinical trial outcome. Ophthalmology. 2020;127(6):793-801. doi: 10.1016/j.ophtha.2019.12.015 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r9] 9.Kihara Y, Heeren TFC, Lee CS, et al. Estimating retinal sensitivity using optical coherence tomography with deep-learning algorithms in macular telangiectasia type 2. JAMA Netw Open. 2019;2(2):e188029. doi: 10.1001/jamanetworkopen.2018.8029 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r10] 10.Aung KZ, Wickremasinghe SS, Makeyeva G, Robman L, Guymer RH. The prevalence estimates of macular telangiectasia type 2: the Melbourne Collaborative Cohort Study. Retina. 2010;30(3):473-478. doi: 10.1097/IAE.0b013e3181bd2c71 [DOI] [PubMed] [Google Scholar]

[eoi230086r11] 11.Burlina P, Paul W, Mathew P, Joshi N, Pacheco KD, Bressler NM. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020;138(10):1070-1077. doi: 10.1001/jamaophthalmol.2020.3269 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r12] 12.Grill JB, Strub F, Altché F, et al. Bootstrap your own latent-a new approach to self-supervised learning. Adv Neural Inf Process Syst. 2020;33:21271-21284. [Google Scholar]

[eoi230086r13] 13.Clemons TE, Gillies MC, Chew EY, et al. ; MacTel Research Group . Baseline characteristics of participants in the natural history study of macular telangiectasia (MacTel) MacTel Project Report No. 2. Ophthalmic Epidemiol. 2010;17(1):66-73. doi: 10.3109/09286580903450361 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r14] 14.The MacTel Project. The Lowy Medical Research Institute . Accessed May 19, 2023. https://www.lmri.net/mactel/the-mactel-project/

[eoi230086r15] 15.Wu Y, Egan C, Olvera-Barrios A, et al. Developing a continuous severity scale for macular telangiectasia type 2 using deep learning and implications for disease grading. Ophthalmology. 2023;20:S0161-6420(23)00675-9. doi: 10.1016/j.ophtha.2023.09.016 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r16] 16.McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv. Posted online February 9, 2018. https://arxiv.org/abs/1802.03426

[eoi230086r17] 17.Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization. 2017. IEEE International Conference on Computer Vision (ICCV). Published online 2017. doi: 10.1109/ICCV.2017.74 [DOI] [Google Scholar]

[eoi230086r18] 18.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, Nevada. 2016:2921-2929. doi: 10.1109/CVPR.2016.319 [DOI] [Google Scholar]

[eoi230086r19] 19.Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv. Posted online December 21, 2014. https://arxiv.org/abs/1412.6806

[eoi230086r20] 20.Huang SC, Pareek A, Jensen M, Lungren MP, Yeung S, Chaudhari AS. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digit Med. 2023;6(1):74. doi: 10.1038/s41746-023-00811-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[eoi230086r21] 21.Holmberg OG, Köhler ND, Martins T, et al. Self-supervised retinal thickness prediction enables deep learning from unlabelled data to boost classification of diabetic retinopathy. Nat Mach Intell. 2020;2(11):719-726. doi: 10.1038/s42256-020-00247-1 [DOI] [Google Scholar]

PERMALINK

Self-Supervised Learning for Improved Optical Coherence Tomography Detection of Macular Telangiectasia Type 2

Shahrzad Gholami, PhD

Lea Scheppke, PhD

Meghana Kshirsagar, PhD

Yue Wu, PhD

Rahul Dodhia, PhD

Roberto Bonelli, PhD

Irene Leung, BA

Ferenc B Sallo, MD, PhD

Alyson Muldrew, PhD

Catherine Jamison, MSc

Tunde Peto, MD, PhD

Juan Lavista Ferres, PhD

William B Weeks, MD, PhD, MBA

Martin Friedlander, MD, PhD

Aaron Y Lee, MD, MSCI

Key Points

Question

Findings

Meaning

Abstract

Importance

Objective

Design, Setting, and Participants

Exposures

Main Outcomes and Measures

Results

Conclusions and Relevance

Introduction

Methods

Data Collection

MacTel Project Registry Study

University of Washington

Datasets

Data Preparation

Figure 1. Flow Diagram for 2 Model Training Approaches.

Model Development

BYOL SSL Models

Supervised Models

Model Evaluation

Feature Visualization

Explainability

Human Graders Comparison

Statistical Analysis

Results

Validation Accuracy

BYOL SSL Models

BYOL vs ImageNet Pretrained Models

Figure 2. Model Accuracy in the Test Set.

BYOL Feature Extractor Neural Network Choices: ResNet50 vs ResNet18

Table. Comparison of Supervised Learning (100% of Labeled Data) Performance and Graders Results Evaluated Against Ground Trutha.

UMAP Feature Visualization

Figure 3. Uniform Manifold Approximation and Projection for Dimension Reduction Results.

Explainability

Figure 4. Artificial Intelligence Explainability.

Comparison to Human Graders

Discussion

Limitations

Conclusions

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table. Comparison of Supervised Learning (100% of Labeled Data) Performance and Graders Results Evaluated Against Ground Truth^a.