Diabetes affects over half a billion people worldwide, with an estimated >43% of individuals with diabetes remaining undiagnosed (1). There is a socioeconomic divide, with four in five adults with diabetes living in low- and middle-income countries (LMIC) with limited access to resources (2,3). Diabetic retinopathy (DR) is a major complication of diabetes, as nearly half of individuals with diabetes are likely to develop DR, and it is the leading cause of blindness among the working-age population (4,5). Current American Diabetes Association and International Diabetes Federation guidelines call for biennial screening for all people with diabetes and annual screening for those at higher risk (6,7). Using more conservative estimates, ∼22% of people with diabetes already have DR (needing annual exams) and the remaining ∼78% require biennial exams, which together implies ∼360 million exams per year worldwide (8,9). This demand is untenable for current health systems, whether in high-income countries (HIC) with better technology but high manpower costs or in LMIC with lower labor costs but limited accessible technology. There are only four Food and Drug Administration–approved devices for automated deep learning (DL)-based DR detection, three of which are tabletop and only one of which is portable, and their reach is unable to match the demand. Glucagon-like peptide 1 receptor agonists help with many other diabetes complications, but DR is one area where they have not improved outcomes (10).
In this issue of Diabetes Care, Ran et al. (11) present a systematic review of real-world prospective validation and economic evaluation of DL-based DR detection from fundus photography. The authors are to be commended for shedding light on this understudied problem. Fundus photography is a noninvasive diagnostic tool that allows a careful inspection of the retina and optic nerves to determine the presence and severity of DR (12). DL-based DR detection holds promise for expanding access to clinical care and preventing vision loss (13). Given the pace of technological progress, DL-based DR detection is likely to become an inevitable partner in clinical care, with the only question remaining being how these tools can be integrated in a safe and equitable way. The authors reviewed 47 prospective validation studies and 15 economic evaluations across 20 countries and six continents. They report >95% discriminative accuracy with DL-based DR detection but highlight that contextual factors such as mydriasis, care pathway (primary vs. specialist settings), DL models, and hardware configurations significantly affect detection performance. At the same time, their article lays bare the availability of sparse evidence from LMIC, with only 5 studies from LMIC versus 35 from HIC. In summary, this review highlights the uneven evidence of DL-based DR detection systems on lack of appropriate technical details and economic feasibility in studies.
On the methods side, too many papers prize headline performance metrics (precision-recall) and reward newer models for outperforming existing models on private data sets rather than following a reproducible “methods spine” that includes transparent data curation and a detailed model development process (14). This invokes the old maxim “garbage in, garbage out,” with low-quality, biased inputs and poorly curated labels producing confident but unreliable predictions, often dubbed “hallucinations,” regardless of how comprehensive technical descriptions appear (15). Reporting details often are skipped as being too technical in clinical settings, such as data processing, model training, and testing on holdout sets that are fundamental for real-world implementation. Despite multiple reporting standards (TRIPOD + AI, MI-CLAIM-GEN) setting out expectations (16,17), papers often drift toward p-hacking by cherry-picking data and exploiting loopholes (18). For example, explaining models through generic saliency heat maps and variable-importance plots is scientifically driven, but these tools become unreliable in the presence of collinearity in the data, which is often not reported. A transparent scientific description of how accurately the technology detects true positives, how many false alarms it generates (a major deterrent for uptake), and the underlying population of the study is of critical importance (19). For real-world implementation of DL-based DR detection systems, the path forward must include rigorously curated data sets, full disclosure of the data and modeling pipeline in line with existing guidelines, and a deliberate preference for fewer but higher quality studies over a growing volume of poorly described models.
On the economic front, the review found that artificial intelligence (AI) plus human interaction was cost-effective relative to humans alone, but substantial variations were found in reporting initial investments and potential health benefits. Cost-effectiveness estimates seem to rest on two premises: 1) earlier, more accurate detection than usual practice should enable timely and targeted treatment, and 2) improved detection should reduce missed diagnoses and late, resource-intensive care, creating cost offsets and improved health outcomes. However, the review provides little detail on cost increases (staff training and workflow changes) and cost-effectiveness (health benefits and improved triage) from AI use compared with humans alone, which further varies between HIC and LMIC. If access to treatment does not expand alongside better detection, health benefits will not improve, and cost-effectiveness arguments are weaker for resource-constrained systems, especially where HIC face stable workflows but high labor costs and LMIC have more workforce but limited technology and nonstandardized workflows. Estimating cost-effectiveness is therefore difficult because it must reconcile patient outcomes, access to care, and system efficiency. For example, Google has licensed DL-based DR models to hospitals in India and Thailand and reports >600,000 screenings as of 2024, yet peer-reviewed evidence beyond press releases remains scarce (20). An earlier review (21) highlighted similar findings that despite high diagnostic accuracy, real-world impact is constrained by underlying context (mydriasis, care setting), nonstandardized reporting, and lack of robust head-to-head comparisons of available systems on benchmark data sets. Despite these issues being highlighted almost 2 years ago, they remain unanswered, with several systems still failing to outperform human graders in practice. In other words, a one-size-fits-all solution clearly does not work for DL-based DR detection systems (22).
For real-world implementation, success will depend on answering two questions: 1) where the costs fall and 2) what it is being used for comparisons with in usual care. In HIC, much of the digital infrastructure is already in place, but such costs in LMIC can be prohibitively large. In HIC, DL-based DR technologies usually compete with established DR screening programs, so the main benefit is extra efficiency and capacity. In many LMIC, the realistic comparator is opportunistic screening or no screening at all, so even a modest, low-cost DL-DR pathway can be a genuine win. Therefore, choosing the right comparator is key to justifying the principle of “some care is better than no care” (23). That choice shapes how AI can realistically fit into clinical pathways. For patients, benefits will come from earlier detection, and faster access to treatment, to prevent vision-related disability. For health systems, improved triage will lead to fewer missed cases and lower costs. At the same time, insights from explainable models can support frugal engineering by identifying which image features must be preserved so that low-cost systems can still function as a first-line option in low-resource clinics while more advanced technology is reserved for specialist centers. An analogous shift has already happened in cardiovascular care, where commercial wearable devices costing less than U.S. $10 can provide real-time heart rate and rhythm information at scale while 12-lead electrocardiogram machines serve in hospitals for advanced care (24). In that context, portable devices, including smartphone-based adapters, are likely to be central to reaching ∼360 million annual screenings for populations that currently lack DR screening.
Figure 1.
A future pathway for DL-based DR detection systems that caters not only to the demands of health care systems in HIC but also to the realities of LMIC by balancing model performance and health outcomes. This can be achieved through providing trustworthy evidence to key stakeholders on technical details and economic value.
Article Information
Acknowledgments. During the course of preparing this work, the author(s) used ChatGPT for the purpose of generating a baseline image of an outline image of a man walking a tightrope between two buildings based on the author’s detailed description of it. Following the use of this tool/service, the author(s) formally reviewed the content for its accuracy and made detailed changes in Microsoft PowerPoint for getting to a publication-ready image. All captions and annotations accompanying the image are the author’s original idea. The authors take full responsibility for all the content of this publication.
Duality of Interest. No potential conflicts of interest relevant to this article were reported.
Handling Editors. The journal editors responsible for overseeing the review of the manuscript were Elizabeth Selvin and Adrian Vella.
Footnotes
See accompanying article, p. 510.
References
- 1. International Diabetes Federation . Diabetic Retinopathy: A Call for Global Action. Brussels, Belgium, International Diabetes Federation, 2024. Accessed 18 November 2025. Available from https://idf.org/media/uploads/2023/04/IAPB-IDF_Diabetic-retinopathy-A-call-for-global-action_policy-brief.pdf [Google Scholar]
- 2. Seiglie JA, Marcus M-E, Ebert C, et al. Diabetes prevalence and its relationship with education, wealth, and BMI in 29 low- and middle-income countries. Diabetes Care 2020;43:767–775 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Stafford LK, Gage A, Xu YY, et al. Global, regional, and national cascades of diabetes care, 2000–23: a systematic review and modelling analysis using findings from the Global Burden of Disease Study. Lancet Diabetes Endocrinol 2025;13:924–934 [DOI] [PubMed] [Google Scholar]
- 4. Sivaprasad S, Wong TY, Gardner TW, Sun JK, Bressler NM. Diabetic retinal disease. Nat Rev Dis Primers 2025;11:62. [DOI] [PubMed] [Google Scholar]
- 5. National Eye Institute. Diabetic Retinopathy. Washington, DC, National Eye Institute, 11 September 2025. Accessed 22 November 2025. Available from https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/diabetic-retinopathy [Google Scholar]
- 6. American Diabetes Association Professional Practice Committee . 12. Retinopathy, neuropathy, and foot care: Standards of Care in Diabetes—2025. Diabetes Care 2025;48(Suppl. 1):S252–S265 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Ceriello A, Colagiuri S. IDF global clinical practice recommendations for managing type 2 diabetes - 2025. Diabetes Res Clin Pract 2025;222(Suppl. 1):112152. [DOI] [PubMed] [Google Scholar]
- 8. Sun H, Saeedi P, Karuranga S, et al. IDF Diabetes Atlas: global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract 2022;183:109119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Teo ZL, Tham Y-C, Yu M, et al. Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology 2021;128:1580–1591 [DOI] [PubMed] [Google Scholar]
- 10. Ramsey DJ, Makwana B, Dani SS, et al. GLP-1 receptor agonists and sight-threatening ophthalmic complications in patients with type 2 diabetes. JAMA Netw Open 2025;8:e2526321. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Ran AR, Ding JL, Tang Z, et al. Real-world prospective validation and economic evaluation of deep learning–based diabetic retinopathy detection from fundus photographs: a systematic review and meta-analysis. Diabetes Care 2026;49:510–525 [DOI] [PubMed] [Google Scholar]
- 12. Milea D, Najjar RP, Zhubo J, et al.; BONSAI Group . Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med 2020;382:1687–1695 [DOI] [PubMed] [Google Scholar]
- 13. Li J, Guan Z, Wang J, et al. Integrated image-based deep learning and language models for primary diabetes care. Nat Med 2024;30:2886–2896 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Jacobs PG, Herrero P, Facchinetti A, et al. Artificial intelligence and machine learning for improving glycemic control in diabetes: best practices, pitfalls, and opportunities. IEEE Rev Biomed Eng 2024;17:19–41 [DOI] [PubMed] [Google Scholar]
- 15. Teno JM. Garbage in, garbage out-words of caution on big data and machine learning in medical practice. JAMA Health Forum 2023;4:e230397. [DOI] [PubMed] [Google Scholar]
- 16. Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Miao BY, Chen IY, Williams CY, et al. The MI-CLAIM-GEN checklist for generative artificial intelligence in health. Nat Med 2025;31:1394–1398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol 2015;13:e1002106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Dave D, Erraguntla M, Lawley M, et al. Improved low-glucose predictive alerts based on sustained hypoglycemia: model development and validation study. JMIR Diabetes 2021;6:e26909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Sawhney R. How AI is making eyesight-saving care more accessible in resource-constrained settings. Google Research Asia, 2024. Accessed 22 November 2025. Available from https://blog.google/around-the-globe/google-asia/arda-diabetic-retinopathy-india-thailand/
- 21. Rajesh AE, Davidson OQ, Lee CS, Lee AY. Artificial intelligence and diabetic retinopathy: AI framework, prospective studies, head-to-head validation, and cost-effectiveness. Diabetes Care 2023;46:1728–1739 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Solomon SD, Chew E, Duh EJ, et al. Diabetic retinopathy: a position statement by the American Diabetes Association. Diabetes Care 2017;40:412–418 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Koch R, Meara JG, Wall AE. How should we decide whether and when some care is better than no care? AMA J Ethics 2019;21:E729–E734 [DOI] [PubMed] [Google Scholar]
- 24. Chan J, Goel M, Gollakota S, Nandakumar R. Mobile medical systems for equitable healthcare. Nat Rev Bioeng 2025;3:855–874 [Google Scholar]

