Code-Free Machine Learning for the Detection of Common Ophthalmic Diseases

Trevor Lin; Theodore Leng

doi:10.1167/tvst.14.9.16

. 2025 Sep 11;14(9):16. doi: 10.1167/tvst.14.9.16

Code-Free Machine Learning for the Detection of Common Ophthalmic Diseases

Trevor Lin ¹, Theodore Leng ^2,^✉

PMCID: PMC12439502 PMID: 40932447

Abstract

Purpose

We explore a code-free method enabling physicians without programming experience to develop machine learning (ML) models for detecting diabetic retinopathy (DR), age-related macular degeneration (AMD), and glaucoma from fundus photographs.

Methods

Two classification models were developed using Google Vertex AI's no-code AutoML Vision platform: a binary model detecting any pathology and a multi-class model classifying specific diseases. The development dataset consisted of 800 fundus photography images (200 each of DR, AMD, glaucoma, and normal) from the publicly available Fundus Image dataset for Vessel Segmentation. Ten percent of the dataset was saved for testing and 10% for internal validation. External validation was performed using the Eye Disease Diagnosis and Fundus Synthesis dataset, from which 100 single-diagnosis images per class were randomly selected (total N = 400). Model performances were evaluated using area under the precision-recall curve (AUPRC), precision, recall, accuracy, F1 score, and confidence score analysis.

Results

Internally, the binary model yielded an AUPRC of 0.967, with 95.0% precision and recall. The multi-class model had an AUPRC of 0.906, with 91.0% precision and 90.0% recall. On external validation, the binary model reached 92.3% accuracy, whereas the multi-class model achieved 90% overall accuracy.

Conclusions

Code-free ML approaches can enable physicians to create ML models for retinal disease detection without requiring programming expertise, supporting early detection of eye diseases.

Translational Relevance

This work bridges the gap between AI research and clinical deployment by demonstrating that physicians can independently build ML models using accessible, no-code tools.

Keywords: fundus photography, ocular disease classification, automated machine learning, artificial intelligence, diabetic retinopathy

Introduction

Diabetic retinopathy (DR), age-related macular degeneration (AMD), and glaucoma are among the leading causes of vision loss worldwide.¹ Early detection and intervention can dramatically reduce morbidity of these causes, but systematic screening is resource-intensive. Recent advances in artificial intelligence (AI) have shown promise in automating detection across medical specialties, particularly in ophthalmology, where noninvasive imaging modalities such as fundus photography and optical coherence tomography (OCT) are routinely used for diagnosis and monitoring. The widespread use of these imaging techniques has generated large datasets suitable for the development of machine learning (ML) algorithms that can screen, diagnose, and manage eye diseases. Notably, in 2018, IDx-DR became the first AI diagnostic system cleared by the Food and Drug Administration in any field of medicine for DR detection.² In addition, deep learning models have been shown to be able to identify DR lesions,³ quantify AMD drusen,⁴^,⁵ and detect glaucomatous optic nerve changes,⁶ with some systems matching or surpassing ophthalmologists in tasks such as DR grading.⁷ These successes highlight AI's potential to expand ophthalmic screening capacity, improve accessibility, and enhance the early detection of eye diseases.

However, few AI tools have been fully integrated into clinical practice.⁸ Developing robust AI models typically requires large, annotated datasets. Traditional deep learning pipelines, including convolutional neural networks (CNNs), demand expertise in data science and significant coding knowledge, which is beyond the training of most physicians. Conversely, computer scientists may not be familiar with the nuances of clinical medicine.⁹ Building and deploying these systems incurs substantial computational and financial costs. In practice, most ophthalmic AI research is performed by data scientists rather than clinicians, contributing to the disparities in AI development and implementation, leaving physicians without the means to develop customized ML algorithms for their own practice.

Automated machine learning (AutoML) and no-code AI platforms have emerged as a potential solution. AutoML platforms automate complex tasks such as model selection, hyperparameter tuning, and validation, reducing the need for programming skills. Commercial console tools (e.g., Google Cloud AutoML/VertexAI, Amazon SageMaker Autopilot, Microsoft Azure AutoML) offer user-friendly interfaces that enable users to upload labeled data and automatically train deep learning models in the cloud, eliminating the need for installation and maintenance.¹⁰ In medicine and ophthalmology, there have been limited but encouraging demonstrations of no-code AI, illustrating that clinicians can leverage these ML tools to create practical models.¹¹^–¹⁸

Recent studies have documented impressive AI-enabled detection of DR, AMD, and glaucoma using retinal images. The rapid expansion of telehealth has catalyzed further opportunities to integrate ocular screening into primary care settings, with promising evidence that disease detection is possible outside traditional eye care environments.¹⁹^–²¹ These innovations are particularly relevant in underserved and low-resource areas, where screening programs and primary care settings often are the first point of contact for patients at risk of vision loss.²² However, most AI implementations still require technical expertise, limiting their use by clinicians. Although data science expertise remains crucial for guiding responsible application, the emergence of AutoML platforms offers the potential to broaden access to AI development, empowering physicians to build individualized models without programming or substantial resource investment. Nevertheless, only a handful of studies have assessed AutoML's feasibility and performance while adhering to rigorous reporting standards designed to minimize bias and establish genuine clinical utility, and none—to our knowledge—has examined its ability to classify multiple pathologies.²³

This study addresses this gap by testing a code-free ML approach—using Google's Vertex AI AutoML—to detect DR, AMD, and glaucoma in fundus photographs. We developed both a binary classifier (disease vs. normal) and a multi-class classifier and validated their performance on an external, independent dataset. Our objective was to demonstrate that physicians without coding experience can create effective classification and screening models, particularly for primary care settings, thereby expanding access to AI tools in ophthalmology.

Methods

This study was conducted in accordance with the principles of the Declaration of Helsinki and used publicly available, deidentified datasets. Institutional review board approval and informed consent were not necessary. Our methodology and reporting align with the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) statement.²⁴

Study Design and Data Sources

We conducted a model development and evaluation study using fundus photography datasets. The development dataset comprised 800 high-quality images from the Fundus Image dataset for Vessel Segmentation (FIVES), equally divided among DR, AMD, glaucoma, and normal eyes.²⁵ The external validation dataset was drawn from the Eye Disease Diagnosis and Fundus Synthesis (EDDFS) dataset, which contained 28,877 fundus images with a wide range of pathologies.²⁶ For external validation, a balanced sample of 100 images per category was selected, excluding images with multiple diagnoses. All image file names corresponding to the desired pathologies were systematically gathered from the dataset and compiled into a spreadsheet. A primary limiting factor was the relatively small number of single-diagnosis images available. For example, only 253 glaucoma images were labeled without coexisting pathologies. As such, although the EDDFS dataset offers a large image pool, we restricted our validation set to a random sample of 100 images for each category to ensure balanced class representation and minimize sampling bias. This controlled sampling approach ensured a clean and interpretable external validation while allowing fair performance comparisons across all categories.

Development Dataset

Images in the FIVES dataset were collected between 2016 and 2021 at the Ophthalmology Centre of the Second Affiliated Hospital of Zhejiang University, using the Topcon TRC‑NW8 non-mydriatic fundus camera (45° field of view). Fundus photographs had a confirmed diagnosis of DR, AMD, or glaucoma based on AAO Preferred Practice Pattern guidelines, incorporating multimodal clinical information including patient symptoms, intraocular pressure measurements, visual field testing, and optical coherence tomography (OCT) where applicable, in addition to fundus photography. All images were saved in PNG format with no compression or augmentation and at its original resolution of 2048  ×  2048 px. Quality control checks showed strong annotation consistency, with a Dice coefficient of 0.9679 for intra-annotator agreement and 0.9608 and 0.9241 for inter-annotator agreement between different- and same-level annotators, respectively.²⁵ No images were excluded for missing labels, and all development images had complete label information.

External Validation Dataset

The EDDFS dataset was collected at The First Hospital of Nanchang and was composed of 28,877 color retinal fundus images. There was no information available regarding the dates of data collection. Images were acquired using five different high-resolution digital fundus cameras with the following resolutions: 768  ×  576, 1956  ×  1934, 2976  ×  3158, and 3264  ×  2448 pixels, with most images captured at 3264  ×  2448 pixels. No image resizing was performed. This variation in imaging hardware and resolution was considered advantageous for external validation as it would pressure-test the robustness of the trained ML model. Annotations in EDDFS were based solely on expert interpretation of fundus photographs without the use of multimodal clinical testing such as OCT or visual field analysis. Images were annotated for seven retinal diseases by ophthalmology experts, with multi-label annotation. Only images of sufficient quality and clear diagnosis were retained. The “Good” rate, reflecting the proportion of high-quality images, was highest in this dataset compared to five other public, well-known fundus photography datasets for both all samples (79.0%) and disease samples (71.2%), and second highest for healthy samples (84.5%), indicating robust image quality and diversity.

Preprocessing of Images

To mimic a no-code workflow, all images underwent preprocessing using Adobe Photoshop Image Processor: the “Auto” function in Adjustments → Brightness/Contrast and Adjustments → Layers, followed by Smart Sharpen with default settings to normalize luminance and contrast across images. In some cases, development images contained background artifacts surrounding the fundus photograph, which were masked with a solid black fill, matching the original background while retaining original dimensions, quality, and fundus content (Fig. 1). When applied, these steps were executed in batch mode to ensure consistent and uniform processing across all images. The average preprocessing time was 1.84 seconds per image, and all steps were completed by a single annotator (first author) using a standardized protocol. While we did not implement automated contrast normalization methods such as CLAHE, we visually compared the outputs of our Photoshop-based preprocessing pipeline to CLAHE-adjusted images and found that the resulting histograms were broadly similar.

Figure 1. — (A) Original FIVES age-related macular degeneration fundus photo. (B) Preprocessed FIVES age-related macular degeneration fundus photo. (C) Original EDDFS diabetic retinopathy fundus photo. (D) Preprocessed EDDFS diabetic retinopathy fundus photo. (E) Original EDDFS glaucoma fundus photo. (F) Preprocessed EDDFS glaucoma fundus photo. (G) Original FIVES healthy fundus photo. (H) Preprocessed FIVES healthy fundus photo.

Model Development and Performance Evaluation

We used Google Vertex AI's AutoML Vision platform (Alphabet Inc.) to train two separate models: a binary classifier (disease vs. normal), and a multi-class classifier (DR vs. AMD vs. glaucoma vs. normal). Data was uploaded to Google Cloud Storage using a CSV file linking image paths with class labels, all handled through the Vertex AI graphical interface. No separate data use agreement beyond Google Cloud's standard terms was required. As outlined in Google's Cloud Data Processing Addendum, all data is processed within a customer-controlled environment, and Google does not access or repurpose customer data outside of the approved project scope.²⁷ Vertex AI automatically performed an 80/10/10 stratified split and Bayesian hyperparameter tuning, and we chose to allocate eight node-hours per model (latency target 200–300 ms), all without custom coding. The average monetary cost for training a single model was $27.72 USD, based on standard Google Cloud pricing as of February 2025. These figures may vary depending on cloud region, billing tier, and user-specific discounts/credit. Cross-validation is not currently supported within the platform. Instead, the model undergoes multiple internal training trials using the training and validation sets to identify an optimized model architecture, which is then evaluated on the held-out test set.²⁸ Google does not disclose the final architecture, methodology of its image normalization process, model size, or hyperparameter search bounds, as the internal AutoML pipeline is proprietary. These limitations reflect the trade-offs of current no-code machine learning platforms, where ease of use comes at the expense of full technical transparency.

The outcomes for internal performance were via area under the precision-recall curve (AUPRC), along with precision, recall, sensitivity, specificity, positive and negative predictive values, accuracy, and F1 score, all calculated from the full set of model outputs by assigning the predicted label as the class with the highest confidence score. To assess internal stability, we trained two additional models for each classifier and observed consistent performance. External validation involved batch predictions, with the highest confidence score determining the predicted class for each image. Identical metrics were computed while examining confidence distributions (Fig. 2). The AUPRC value was calculated for the binary model using the confidence scores for the “Disease” class. For the multi-class model, AUPRC was estimated using a one-versus-rest approach for each class, based on predicted confidence scores for the top class only.

Figure 2. — Flow diagram illustrates the end-to-end process for training and evaluating both binary and multi-class classification models using Google Vertex AI AutoML. The internal dataset (FIVES) consisted of 800 labeled fundus images (200 each of AMD, DR, glaucoma, and normal), which were preprocessed using Adobe Photoshop and split into training (80%), validation (10%), and testing (10%) sets. Two classification tasks were created: a binary model (disease vs. normal) and a multi-class model (AMD, DR, glaucoma, normal). Both models were trained and validated internally. An external dataset (EDDFS), comprising 400 additional images (100 per class), was processed identically and used to independently test both models.

Data security was ensured using Google-managed encryption keys. The analytical code was not available because of the proprietary no-code nature of Vertex AI. Data analysis and visualization for external validation were performed using Python (v3.10).

Results

All confusion matrices for the internal and external performances of both models are available in Supplementary Tables S1 and S2. The binary classification model achieved an AUPRC of 0.967, precision and recall of 95.0, sensitivity of 95%, specificity of 85.0%, accuracy of 92.5%, and F1 score of 0.95, with three false negatives and three false positives (Table 1). The multi-class model reached an AUPRC of 0.906, precision of 91.0%, and recall of 90.0%, and accuracy of 87.5%; DR accounted for four misclassifications (three as normal, one as AMD), whereas glaucoma and AMD each had one false-negative result (Table 2). Performance metrics across repeated model training runs varied by an AUPRC value of ±0.003, suggesting stable outcomes despite the relatively small dataset.

Table 1.

Binary Classification Model Internal Performance Metrics

Label	Precision (%)	Recall/Sensitivity (%)	Specificity (%)	Positive Predictive Value (%)	Negative Predictive Value (%)	Accuracy^*	F1 Score	AUPRC
Disease	95.0	95.0	85.0	95.0	85.0	0.950	0.950	0.981
Normal	—	—	—	—	—	0.850	—	0.910

Open in a new tab

AUPRC, Area Under the Precision-Recall Curve.

Accuracy values are expressed as counts and percentages out of the total number of samples per class: Disease (n = 60), Normal (n = 20).

Table 2.

Multi-Class Classification Model Internal Performance Parameters

Label	Precision (%)	Recall/Sensitivity (%)	Specificity (%)	Positive Predictive Value (%)	Negative Predictive Value (%)	Accuracy^*	F1 Score	AUPRC
AMD	82.2	90.0	96.7	85.7	93.5	0.900	0.900	0.896
DR	84.2	80.0	95.0	84.2	93.7	0.800	0.821	0.885
Glaucoma	85.7	90.0	98.3	85.7	95.2	0.900	0.923	0.981
Normal	90.0	90.0	93.3	81.8	96.7	0.900	0.857	0.888

Open in a new tab

Accuracy values are expressed as counts and percentages out of the total number of samples per class (n = 20).

External Validation

Externally, the binary model's accuracy was 92.3%, with disease precision of 95.9%, recall 93.7%, normal precision 82.2%, recall 88.0%, and an overall AUPRC value of 0.978 (Table 3); correct predictions had mean confidences of 0.886 (disease) and 0.866 (normal), whereas false positives averaged 0.793 and false negatives 0.639, including four high-confidence (>0.90) errors (0.1% of images) (Table 4; Fig. 3A). Images with a “normal” prediction had a wider confidence spread than those with a “disease” prediction (Fig. 3B)

Table 3.

Binary Classification Model External Performance Metrics

Label	Precision (%)	Recall/Sensitivity (%)	Specificity (%)	Positive Predictive Value (%)	Negative Predictive Value (%)	Accuracy^*	F1 Score	AUPRC
Disease	95.9	93.7	88.0	95.9	82.2	0.937	0.948	0.978
Normal	—	—	—	—	—	0.880	—	0.921

Open in a new tab

Accuracy values are expressed as counts and percentages out of the total number of samples per class: Disease (n = 300), Normal (n = 100).

Table 4.

Binary Classification Model External Evaluation Confidence and Error Rates by Class

Label	Correct Average Confidence (%)	Number of False-Positives N (%)^*	False-Positive Average Confidence (%)	Number of False-Negatives N (%)^*	False-Negative Avg. Confidence (%)
Disease	0.886	12 (0.12)	0.793	19 (0.06)	0.639
Normal	0.866	—	—	—	—

Open in a new tab

Percentages indicated as the false-positive and false-negative rates, respectively.

Figure 3. — (A) Average confidence scores heatmap for each combination of true and predicted labels in the binary classification model. (B) Box-and-whisker plot of the binary classification model confidence scores for each predicted label on external validation.

The multiclass model's overall accuracy was 90.0, with glaucoma exhibiting the highest recall (98.0%) and AMD the lowest (82.0%), often misclassified as glaucoma (Table 5). Precision, recall, F1 scores, and estimated AUPRC values for each class (calculated using the one-vs.-rest methodology) are summarized in Table 5. All classes demonstrated an F1 score > 0.87. Mean confidences for correct predictions were 0.836 for normal, 0.804 for glaucoma, 0.731 for DR, and 0.594 for AMD, with five high-confidence errors (>0.90; 0.13% of images) (Table 6; Fig. 4A). AMD predicted labels had the widest variation in confidence, whereas normal, glaucoma, and DR label variation was similar (Fig. 4B).

Table 5.

Multi-Class Classification Model External Performance Metrics

Label	Precision (%)	Recall/Sensitivity (%)	Specificity (%)	Positive Predictive Value (%)	Negative Predictive Value (%)	Accuracy^*	F1 Score	AUPRC
AMD	93.2	82.0	97.9	93.2	94.2	0.820	0.872	0.845
DR	98.9	88.0	99.6	98.9	96.1	0.880	0.931	0.908
Glaucoma	79.7	98.0	91.3	79.7	99.3	0.980	0.879	0.848
Normal	92.0	92.0	97.1	92.0	97.3	0.920	0.920	0.923

Open in a new tab

Accuracy values are expressed as counts and percentages out of the total number of samples per class: (n = 100).

Table 6.

Multi-Class Classification Model Confidence and Error Rates by Class

Label	Correct Average Confidence (%)	Number of False-Positives N (%)^*	False-Positive Average Confidence (%)	Number of False-Negatives N (%)^*	False-Negative Avg. Confidence (%)
AMD	0.594	6 (0.02)	0.467	18 (0.18)	0.642
DR	0.731	1 (0.003)	0.641	12 (0.12)	0.709
Glaucoma	0.804	25 (0.03)	0.668	2 (0.02)	0.593
Normal	0.836	8 (0.03)	0.583	8 (0.08)	0.566

Open in a new tab

Percentages indicated as the false-positive and false-negative rates, respectively.

Figure 4. — (A) Average confidence scores heatmap for each combination of true and predicted labels in the multi-class classification model. (B) Box-and-whisker plot of the multi-class classification model confidence scores for each predicted label on external validation.

Discussion

Our findings demonstrate that no-code AutoML can empower clinicians to build high-performing multiple ocular-disease classification models without programming. The binary and multi-class Vertex AI models achieved AUPRCs (0.967, 0.906)—comparable to expert-built CNNs: AMD classifiers (0.96–0.987),²⁹^,³⁰ glaucoma CNNs (AUROC 0.92–0.97),³¹^,³² DR models (0.94),³³ and for the classification of multiple diseases (AUC 0.93).³⁴^,³⁵ Unlike traditional pipelines, our approach reduced the need for programming or specialized technical expertise, potentially democratizing AI development for clinicians. In addition, the AutoML platform handled architecture selection, hyperparameter tuning, and resource allocation automatically, reducing the technical burden typically associated traditional CNNs. For instance, training and developing both our binary and multi-class models required just 105 minutes on average. The models’ performance on external validation resulted in only a modest performance decline, suggesting robustness to imaging variations from different hardware and image size.

Alternate Google Vertex AI AutoML models examining singular pathologies have reported AUPRCs and AUROCs of 0.717–0.99 for DR,¹¹^,¹⁴^,¹⁵^,¹⁷^,³⁶ 0.849 for AMD,¹⁸ and 0.988 for glaucoma.³⁷ Our binary classifier's AUPRC of 0.967 and sensitivity of 95.0 for detecting multiple pathologies highlights its potential use for broad, multi-disease screening. However, although demonstrating a strong comparable performance, like other multi‐label CNNs, our multi-class model showed some class‐specific weaknesses, reflecting the well-known difficulty of simultaneously distinguishing multiple ophthalmic diseases.

The confidence pattern analysis revealed interesting trends, particularly that the models were most confident in normal eye and glaucoma classifications, while showing less certainty for AMD and DR cases (Table 6). For AMD, correct cases averaged confidence of 0.594, but false-negative results were higher at 0.642. DR correct cases averaged a confidence of 0.731, with false-negative results at 0.709. These high-confidence errors underscore the need for human review, calibration, and pathology-specific confidence thresholds. In the binary setting, correct “disease” predictions carried very high confidence (0.886), as did correct “normal” calls (0.866), whereas false-negative results (missed disease) had substantially lower confidence (0.639). This gap suggests that low-confidence disease predictions are more likely to be errors and could be flagged for human review (e.g., reviewing any confidence ≤ 0.80).

Clinical Usability and Operator Considerations

Our study addressed a significant gap in AI implementation in ophthalmology by demonstrating that physicians can create effective classification models using AutoML platforms for the potential for screening. This democratization of AI development could help reduce technical and economic barriers, facilitate deployment in primary care, outreach, and specialist settings, as well as encourage more clinicians to engage in the growth of AI use in medical care. The literature supports the feasibility and impact of autonomous AI screening; for example, Beals et al. deployed 198 fundus cameras with IDx-DR across five health systems, screening nearly 20,000 diabetic patients and referring over 3450 cases of DR.³⁸ Other studies report high patient satisfaction and successful operation of fundus cameras by non-specialist staff after brief training.³⁹ AI-integrated handheld fundus cameras are gaining popularity, achieving high detection metrics without racial or gender bias, and further supporting the viability of AI screening in low-resource settings.⁷^,⁴⁰ AutoML's flexibility, supporting both multi-class and binary screening, means it can be adapted to specific clinical needs, such as a DR-only tool in general practice or an all-disease system in screening settings.

One example of how the model could be used in practice is through a simple web-based interface, where a technician uploads a fundus image and receives a prediction and confidence score in real time. Google Vertex AI also allows models to be deployed as cloud-hosted REST API endpoints—a common and user-friendly format that enables other applications (e.g., a clinic's software or mobile app) to send images and receive predictions without needing to manage the model directly.

It is important to note, however, that although AI can enhance clinic productivity and consistency, its successful implementation requires careful integration into clinical workflows, patient and provider education, and clear follow-up protocols and raises important ethical considerations. Over-reliance on AI, particularly in cases of high-confidence misclassification, could lead to missed diagnoses or unnecessary anxiety. Dual-review systems for positive findings, regular model updates, and continuous monitoring could help mitigate these risks. Ultimately, responsibility for clinical decisions must remain with the physician, with AI serving as a supportive tool.

Our preprocessing approach using widely available software demonstrates that clinicians can prepare images for AI analysis without specialized technical training. In practice, operators only need to be trained to capture fundus images reliably, and existing automated image-quality checks within modern cameras can further reduce errors. Future work should explore the minimum training required for different implementation scenarios and develop standardized protocols to ensure consistent image quality across operators. With that said, AutoML presents the novel opportunity for clinicians to rapidly create and deploy ML models tailored to their own practice, patient population, and clinical objectives.

Limitations

This study has several limitations. Our datasets were primarily composed of Asian (Chinese) fundus images, which may limit generalizability, because fundus characteristics can differ across races.⁴¹ Future research should prioritize diverse, multi-ethnic datasets for model development and validation. Only three diseases and normal eyes were included, omitting other important retinal pathologies. Future work should expand the model's scope to include additional diseases or flag abnormal findings for further review. Although the sample size was smaller than in some other ML studies, this limitation stemmed from the scarcity of high-quality datasets with sufficiently represented pathologies. However, the strong model performance and tight confidence score distributions suggest that the sample was adequate to demonstrate generalizability across varied imaging conditions. Future research could entail training this same dataset using a deep learning model for comparison. Furthermore, this study excluded images with multiple coexisting diagnoses to ensure clean ground truth labeling and balanced class distribution. However, multi-label classification reflects real-world clinical complexity, and future work should explore the development of AutoML models that can simultaneously identify multiple pathologies within a single image. The use of AutoML inherently limits customization and interpretability, as the algorithm's details are proprietary.

Conclusions

In summary, our code-free AutoML model demonstrated strong performance in classifying diabetic retinopathy, age-related macular degeneration, and glaucoma. Moreover, this study highlights how clinician-scientists can conceptualize and build custom AI tools without programming expertise, using platforms like AutoML. While not a substitute for clinical implementation studies, these models hold promise as foundational tools for future screening workflows—especially in primary care or underserved settings—when paired with appropriate validation, ethical safeguards, and operator training.

Supplementary Material

Supplement 1

tvst-14-9-16_s001.docx^{(17KB, docx)}

Supplement 2

tvst-14-9-16_s002.docx^{(15.5KB, docx)}

Acknowledgments

Supported by Research to Prevent Blindness (TL) and NIH grant P30-EY026877 (TL).

Data Availability Statements: The public datasets are available: The FIVES dataset at https://figshare.com/articles/figure/FIVES_A_Fundus_Image_Dataset_for_AI-based_Vessel_Segmentation/19688169?file=34969398 (accessed on January 16, 2025). The EDDFS dataset at https://github.com/xia-xx-cv/EDDFS_dataset/tree/main (accessed April 5, 2025).

Disclosure: T. Lin, None; T. Leng, Luxa Biotechnology (R), Astellas (R, C), Boehringer Ingelheim (C), Roche/Genentech (C), Toku (C), Topcon (C), Virtual Field (C)

References

1. Steinmetz JD, Bourne RRA, Briant PS, et al.. Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Global Health. 2021; 9(2): e144–e160. [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC.. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018; 1(1): 39. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Gargeya R, Leng T.. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017; 124: 962–969. [DOI] [PubMed] [Google Scholar]
4. Schlegl T, Waldstein SM, Bogunovic H, et al.. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology. 2018; 125: 549–558. [DOI] [PubMed] [Google Scholar]
5. Yoo TK, Choi JY, Seo JG, Ramasubramanian B, Selvaperumal S, Kim DW.. The possibility of the combination of OCT and fundus images for improving the diagnostic accuracy of deep learning for age-related macular degeneration: a preliminary experiment. Med Biol Eng Comput. 2019; 57: 677–687. [DOI] [PubMed] [Google Scholar]
6. Diaz-Pinto A, Morales S, Naranjo V, Köhler T, Mossi JM, Navea A.. CNNs for automatic glaucoma assessment using fundus images: an extensive validation. Biomed Eng Online. 2019; 18: 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Abràmoff MD, Lavin PT, Jakubowski JR, et al.. Mitigation of AI adoption bias through an improved autonomous AI system for diabetic retinal disease. NPJ Digit Med. 2024; 7: 369. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Li Z, Wang L, Wu X, et al.. Artificial intelligence in ophthalmology: The path to the real-world clinic. Cell Rep Med. 2023; 4(7): 101095. [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Jennings MR, Turner C, Bond RR, et al.. Code-free cloud computing service to facilitate rapid biomedical digital signal processing and algorithm development. Comput Methods Programs Biomed. 2021; 211: 106398. [DOI] [PubMed] [Google Scholar]
10. Sundberg L, Holmström J.. Democratizing artificial intelligence: How no-code AI can leverage machine learning operations. Business Horizons. 2023; 66: 777–788. [Google Scholar]
11. Mohammadi SS, Nguyen QD.. A user-friendly approach for the diagnosis of diabetic retinopathy using ChatGPT and automated machine learning. Ophthalmol Sci. 2024; 4(4): 100495. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Faes L, Wagner SK, Fu DJ, et al.. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health. 2019; 1(5): e232–e242. [DOI] [PubMed] [Google Scholar]
13. Antaki F, Kahwati G, Sebag J, et al.. Predictive modeling of proliferative vitreoretinopathy using automated machine learning by ophthalmologists without coding experience. Sci Rep. 2020; 10(1): 19528. [DOI] [PMC free article] [PubMed] [Google Scholar]
14. Jacoba CMP, Doan D, Salongcay RP, et al.. Performance of automated machine learning for diabetic retinopathy image classification from multi-field handheld retinal images. Ophthalmol Retina. 2023; 7: 703–712. [DOI] [PubMed] [Google Scholar]
15. Korot E, Guan Z, Ferraz D, et al.. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 2021; 3: 288–298. [Google Scholar]
16. Kim IK, Lee K, Park JH, Baek J, Lee WK.. Classification of pachychoroid disease on ultrawide-field indocyanine green angiography using auto-machine learning platform. Br J Ophthalmol. 2021; 105: 856–861. [DOI] [PubMed] [Google Scholar]
17. Silva PS, Zhang D, Jacoba CMP, et al.. Automated machine learning for predicting diabetic retinopathy progression from ultra-widefield retinal images. JAMA Ophthalmol. 2024; 142: 171–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Abbas A, O'Byrne C, Fu DJ, et al.. Evaluating an automated machine learning model that predicts visual acuity outcomes in patients with neovascular age-related macular degeneration. Graefes Arch Clin Exp Ophthalmol. 2022; 260: 2461–2473. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Liu J, Gibson E, Ramchal S, et al.. Diabetic retinopathy screening with automated retinal image analysis in a primary care setting improves adherence to ophthalmic care. Ophthalmol Retina. 2021; 5: 71–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Srihatrai P, Hlowchitsieng T.. The diagnostic accuracy of single-and five-field fundus photography in diabetic retinopathy screening by primary care physicians. Ind J Ophthalmol. 2018; 66: 94–97. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Davidson S, de Souza WR Jr, Eggleton K.. Use of a smartphone-based, non-mydriatic fundus camera for patients with red flag ophthalmic presentations in a rural general practice. J Primary Health Care. 2024. [DOI] [PubMed] [Google Scholar]
22. Solomon SD, Shoge RY, Ervin AM, et al.. Improving access to eye care: a systematic review of the literature. Ophthalmology. 2022; 129(10): e114–e126. [DOI] [PubMed] [Google Scholar]
23. Nagendran M, Chen Y, Lovejoy CA, et al.. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020; 368: m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
24. Collins GS, Moons KGM, Dhiman P, et al.. TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024; 385: e078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Jin K, Huang X, Zhou J, et al.. Fives: a fundus image dataset for artificial Intelligence based vessel segmentation. Sci Data. 2022; 9(1): 475. [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Xia X, Li Y, Xiao G, et al.. Benchmarking deep models on retinal fundus disease diagnosis and a large-scale dataset. Signal Proc Image Commun. 2024; 127: 117151. [Google Scholar]
27. Cloud Data Processing Addendum (Customers). Available at: https://cloud.google.com/terms/data-processing-addendum. Accessed July 19, 2025.
28. Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization. Blog. Available at: https://cloud.google.com/blog/products/ai-machine-learning/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization. Accessed April 4, 2025.
29. Hsu T-K, Lai IP, Tsai M-J, et al.. A deep learning approach for the screening of referable age-related macular degeneration–Model development and external validation. J Formosan Med Assoc. Preprint posted online December 15, 2024. 10.1016/j.jfma.2024.12.008. [DOI] [PubMed] [Google Scholar]
30. Dominguez C, Heras J, Mata E, Pascual V, Royo D, Zapata MÁ.. Binary and multi-class automated detection of age-related macular degeneration using convolutional-and transformer-based architectures. Comput Methods Programs Biomed. 2023; 229: 107302. [DOI] [PubMed] [Google Scholar]
31. Shoukat A, Akbar S, Hassan SA, Iqbal S, Mehmood A, Ilyas QM.. Automatic diagnosis of glaucoma from retinal images using deep learning approach. Diagnostics. 2023; 13: 1738. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Ajitha S, Akkara JD, Judy MV.. Identification of glaucoma from fundus images using deep learning techniques. Ind J Ophthalmol. 2021; 69: 2702–2709. [DOI] [PMC free article] [PubMed] [Google Scholar]
33. Tăbăcaru G, Moldovanu S, Răducan E, Barbu M.. A robust machine learning model for diabetic retinopathy classification. J Imaging. 2023; 10(1): 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
34. He J, Li C, Ye J, Wang S, Qiao Y, Gu L.. Classification of ocular diseases employing attention-based unilateral and bilateral feature weighting and fusion. Piscataway, NJ: IEEE; 2020: 1258–1261. [Google Scholar]
35. Li C, Ye J, He J, Wang S, Qiao Y, Gu L.. Dense correlation network for automated multi-label ocular disease detection with paired color fundus photographs. Piscataway, NJ: IEEE; 2020: 1–4. [Google Scholar]
36. Zago Ribeiro L, Nakayama LF, Malerbi FK, Regatieri CVS. Automated machine learning model for fundus image classification by health-care professionals with no coding experience. Sci Rep. 2024; 14(1): 10395. [DOI] [PMC free article] [PubMed] [Google Scholar]
37. Milad D, Antaki F, Mikhail D, et al.. Code-free deep learning glaucoma detection on color fundus images. Ophthalmol Sci. 2025; 5(4): 100721. [DOI] [PMC free article] [PubMed] [Google Scholar]
38. Beals D, Simon L, Rogers F, Pogroszewski S. Revolutionizing diabetic retinopathy screening: integrating AI-based retinal imaging in primary care. J CME. 2025; 14(1): 2437294. [DOI] [PMC free article] [PubMed] [Google Scholar]
39. Bhambhwani V, Whitestone N, Patnaik JL, Ojeda A, Scali J, Cherwek DH.. Feasibility and Patient Experience of a Pilot Artificial Intelligence-Based Diabetic Retinopathy Screening Program in Northern Ontario. Ophthalmic Epidemiology. 2024: 1–7. [DOI] [PubMed] [Google Scholar]
40. Rao DP, Savoy FM, Sivaraman A, et al.. Evaluation of an AI algorithm trained on an ethnically diverse dataset to screen a previously unseen population for diabetic retinopathy. Ind J Ophthalmol. 2024; 72: 1162–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Burlina P, Joshi N, Paul W, Pacheco KD, Bressler NM.. Addressing artificial intelligence bias in retinal diagnostics. Transl Vis Sci Technol. 2021; 10(2): 13. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

tvst-14-9-16_s001.docx^{(17KB, docx)}

Supplement 2

tvst-14-9-16_s002.docx^{(15.5KB, docx)}

[bib1] 1. Steinmetz JD, Bourne RRA, Briant PS, et al.. Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Global Health. 2021; 9(2): e144–e160. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib2] 2. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC.. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018; 1(1): 39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib3] 3. Gargeya R, Leng T.. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017; 124: 962–969. [DOI] [PubMed] [Google Scholar]

[bib4] 4. Schlegl T, Waldstein SM, Bogunovic H, et al.. Fully automated detection and quantification of macular fluid in OCT using deep learning. Ophthalmology. 2018; 125: 549–558. [DOI] [PubMed] [Google Scholar]

[bib5] 5. Yoo TK, Choi JY, Seo JG, Ramasubramanian B, Selvaperumal S, Kim DW.. The possibility of the combination of OCT and fundus images for improving the diagnostic accuracy of deep learning for age-related macular degeneration: a preliminary experiment. Med Biol Eng Comput. 2019; 57: 677–687. [DOI] [PubMed] [Google Scholar]

[bib6] 6. Diaz-Pinto A, Morales S, Naranjo V, Köhler T, Mossi JM, Navea A.. CNNs for automatic glaucoma assessment using fundus images: an extensive validation. Biomed Eng Online. 2019; 18: 1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] 7. Abràmoff MD, Lavin PT, Jakubowski JR, et al.. Mitigation of AI adoption bias through an improved autonomous AI system for diabetic retinal disease. NPJ Digit Med. 2024; 7: 369. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8. Li Z, Wang L, Wu X, et al.. Artificial intelligence in ophthalmology: The path to the real-world clinic. Cell Rep Med. 2023; 4(7): 101095. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9. Jennings MR, Turner C, Bond RR, et al.. Code-free cloud computing service to facilitate rapid biomedical digital signal processing and algorithm development. Comput Methods Programs Biomed. 2021; 211: 106398. [DOI] [PubMed] [Google Scholar]

[bib10] 10. Sundberg L, Holmström J.. Democratizing artificial intelligence: How no-code AI can leverage machine learning operations. Business Horizons. 2023; 66: 777–788. [Google Scholar]

[bib11] 11. Mohammadi SS, Nguyen QD.. A user-friendly approach for the diagnosis of diabetic retinopathy using ChatGPT and automated machine learning. Ophthalmol Sci. 2024; 4(4): 100495. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] 12. Faes L, Wagner SK, Fu DJ, et al.. Automated deep learning design for medical image classification by health-care professionals with no coding experience: a feasibility study. Lancet Digit Health. 2019; 1(5): e232–e242. [DOI] [PubMed] [Google Scholar]

[bib13] 13. Antaki F, Kahwati G, Sebag J, et al.. Predictive modeling of proliferative vitreoretinopathy using automated machine learning by ophthalmologists without coding experience. Sci Rep. 2020; 10(1): 19528. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] 14. Jacoba CMP, Doan D, Salongcay RP, et al.. Performance of automated machine learning for diabetic retinopathy image classification from multi-field handheld retinal images. Ophthalmol Retina. 2023; 7: 703–712. [DOI] [PubMed] [Google Scholar]

[bib15] 15. Korot E, Guan Z, Ferraz D, et al.. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell. 2021; 3: 288–298. [Google Scholar]

[bib16] 16. Kim IK, Lee K, Park JH, Baek J, Lee WK.. Classification of pachychoroid disease on ultrawide-field indocyanine green angiography using auto-machine learning platform. Br J Ophthalmol. 2021; 105: 856–861. [DOI] [PubMed] [Google Scholar]

[bib17] 17. Silva PS, Zhang D, Jacoba CMP, et al.. Automated machine learning for predicting diabetic retinopathy progression from ultra-widefield retinal images. JAMA Ophthalmol. 2024; 142: 171–178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18. Abbas A, O'Byrne C, Fu DJ, et al.. Evaluating an automated machine learning model that predicts visual acuity outcomes in patients with neovascular age-related macular degeneration. Graefes Arch Clin Exp Ophthalmol. 2022; 260: 2461–2473. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] 19. Liu J, Gibson E, Ramchal S, et al.. Diabetic retinopathy screening with automated retinal image analysis in a primary care setting improves adherence to ophthalmic care. Ophthalmol Retina. 2021; 5: 71–77. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] 20. Srihatrai P, Hlowchitsieng T.. The diagnostic accuracy of single-and five-field fundus photography in diabetic retinopathy screening by primary care physicians. Ind J Ophthalmol. 2018; 66: 94–97. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] 21. Davidson S, de Souza WR Jr, Eggleton K.. Use of a smartphone-based, non-mydriatic fundus camera for patients with red flag ophthalmic presentations in a rural general practice. J Primary Health Care. 2024. [DOI] [PubMed] [Google Scholar]

[bib22] 22. Solomon SD, Shoge RY, Ervin AM, et al.. Improving access to eye care: a systematic review of the literature. Ophthalmology. 2022; 129(10): e114–e126. [DOI] [PubMed] [Google Scholar]

[bib23] 23. Nagendran M, Chen Y, Lovejoy CA, et al.. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020; 368: m689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] 24. Collins GS, Moons KGM, Dhiman P, et al.. TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024; 385: e078378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib25] 25. Jin K, Huang X, Zhou J, et al.. Fives: a fundus image dataset for artificial Intelligence based vessel segmentation. Sci Data. 2022; 9(1): 475. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] 26. Xia X, Li Y, Xiao G, et al.. Benchmarking deep models on retinal fundus disease diagnosis and a large-scale dataset. Signal Proc Image Commun. 2024; 127: 117151. [Google Scholar]

[bib27] 27. Cloud Data Processing Addendum (Customers). Available at: https://cloud.google.com/terms/data-processing-addendum. Accessed July 19, 2025.

[bib28] 28. Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization. Blog. Available at: https://cloud.google.com/blog/products/ai-machine-learning/hyperparameter-tuning-cloud-machine-learning-engine-using-bayesian-optimization. Accessed April 4, 2025.

[bib29] 29. Hsu T-K, Lai IP, Tsai M-J, et al.. A deep learning approach for the screening of referable age-related macular degeneration–Model development and external validation. J Formosan Med Assoc. Preprint posted online December 15, 2024. 10.1016/j.jfma.2024.12.008. [DOI] [PubMed] [Google Scholar]

[bib30] 30. Dominguez C, Heras J, Mata E, Pascual V, Royo D, Zapata MÁ.. Binary and multi-class automated detection of age-related macular degeneration using convolutional-and transformer-based architectures. Comput Methods Programs Biomed. 2023; 229: 107302. [DOI] [PubMed] [Google Scholar]

[bib31] 31. Shoukat A, Akbar S, Hassan SA, Iqbal S, Mehmood A, Ilyas QM.. Automatic diagnosis of glaucoma from retinal images using deep learning approach. Diagnostics. 2023; 13: 1738. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] 32. Ajitha S, Akkara JD, Judy MV.. Identification of glaucoma from fundus images using deep learning techniques. Ind J Ophthalmol. 2021; 69: 2702–2709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib33] 33. Tăbăcaru G, Moldovanu S, Răducan E, Barbu M.. A robust machine learning model for diabetic retinopathy classification. J Imaging. 2023; 10(1): 8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] 34. He J, Li C, Ye J, Wang S, Qiao Y, Gu L.. Classification of ocular diseases employing attention-based unilateral and bilateral feature weighting and fusion. Piscataway, NJ: IEEE; 2020: 1258–1261. [Google Scholar]

[bib35] 35. Li C, Ye J, He J, Wang S, Qiao Y, Gu L.. Dense correlation network for automated multi-label ocular disease detection with paired color fundus photographs. Piscataway, NJ: IEEE; 2020: 1–4. [Google Scholar]

[bib36] 36. Zago Ribeiro L, Nakayama LF, Malerbi FK, Regatieri CVS. Automated machine learning model for fundus image classification by health-care professionals with no coding experience. Sci Rep. 2024; 14(1): 10395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37. Milad D, Antaki F, Mikhail D, et al.. Code-free deep learning glaucoma detection on color fundus images. Ophthalmol Sci. 2025; 5(4): 100721. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] 38. Beals D, Simon L, Rogers F, Pogroszewski S. Revolutionizing diabetic retinopathy screening: integrating AI-based retinal imaging in primary care. J CME. 2025; 14(1): 2437294. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39. Bhambhwani V, Whitestone N, Patnaik JL, Ojeda A, Scali J, Cherwek DH.. Feasibility and Patient Experience of a Pilot Artificial Intelligence-Based Diabetic Retinopathy Screening Program in Northern Ontario. Ophthalmic Epidemiology. 2024: 1–7. [DOI] [PubMed] [Google Scholar]

[bib40] 40. Rao DP, Savoy FM, Sivaraman A, et al.. Evaluation of an AI algorithm trained on an ethnically diverse dataset to screen a previously unseen population for diabetic retinopathy. Ind J Ophthalmol. 2024; 72: 1162–1167. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] 41. Burlina P, Joshi N, Paul W, Pacheco KD, Bressler NM.. Addressing artificial intelligence bias in retinal diagnostics. Transl Vis Sci Technol. 2021; 10(2): 13. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Code-Free Machine Learning for the Detection of Common Ophthalmic Diseases

Trevor Lin

Theodore Leng

Abstract

Purpose

Methods

Results

Conclusions

Translational Relevance

Introduction

Methods

Study Design and Data Sources

Development Dataset

External Validation Dataset

Preprocessing of Images

Figure 1.

Model Development and Performance Evaluation

Figure 2.

Results

Table 1.

Table 2.

External Validation

Table 3.

Table 4.

Figure 3.

Table 5.

Table 6.

Figure 4.

Discussion

Clinical Usability and Operator Considerations

Limitations

Conclusions

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases