Abstract
Artificial intelligence (AI) has emerged as a transformative force in ophthalmology, enabling automated, accurate, and efficient clinical reporting. This review summarizes recent advances in AI-driven report generation, emphasizing the integration of multimodal imaging and clinical data. Deep learning and natural language processing (NLP) models can synthesize information from diverse sources—including fundus photography, optical coherence tomography, fluorescein angiography, and patient records—to generate structured, interpretable, and personalized diagnostic reports. Such systems enhance diagnostic precision, streamline workflow, and reduce interobserver variability. We outline the technological foundations underlying these systems, including convolutional and transformer-based architectures, self-supervised and multimodal learning, and large language models. Representative applications in diabetic retinopathy, glaucoma, cataract, and age-related macular degeneration are discussed, highlighting their clinical value and emerging real-world deployment. Persistent challenges—including data heterogeneity, model interpretability, ethical governance, and clinical integration—are critically reviewed. Finally, we explore future directions such as real-time AI-assisted reporting, predictive and personalized analytics, and global scalability across healthcare ecosystems. Multimodal, explainable, and clinically integrated AI systems hold promise to redefine ophthalmic diagnostics and improve both clinician efficiency and patient outcomes.
Keywords: Artificial intelligence, Multimodal imaging, Report generation, Ophthalmology
Key Summary Points
| The review highlights the emerging potential of artificial intelligence (AI)-driven automated report generation as an advanced tool for ophthalmic diagnostics. |
| This article explains how multimodal imaging and clinical data can be integrated using deep learning and natural language processing. |
| It summarizes representative clinical applications across major eye diseases and illustrates how automated reporting may enhance diagnostic consistency and streamline clinical workflow. |
| It provides a critical evaluation of current limitations, including data heterogeneity, limited interpretability, and the practical barriers that impede real-world clinical implementation. |
| It offers forward-looking perspectives on real-time reporting, predictive analytics, and the scalable deployment of clinically integrated multimodal AI systems. |
Introduction
Ophthalmic diseases are diverse and complex, including diabetic retinopathy (DR), glaucoma, and age-related macular degeneration (AMD), which are among the leading causes of visual impairment worldwide [1]. Before reaching a definitive diagnosis, clinicians often must perform complex differential diagnoses. With recent technological progress, ophthalmic examinations have become more sophisticated and diverse, and their results can now be generated instantaneously, providing substantial convenience in clinical decision-making [2, 3]. Ophthalmologists rely on patients’ clinical data, imaging data (such as ultra-widefield fundus photography, optical coherence tomography [OCT], and fundus fluorescein angiography [FFA]), and other pathological data (such as blood, biopsy, culture) to make accurate diagnoses [4]. However, in real-world practice, traditional ophthalmic diagnosis still depends heavily on physician experience and subjective judgment. The growing complexity of clinical work and the shortage of specialists capable of producing diagnostic reports have created bottlenecks, leading to delays and variability in reporting [5, 6].
Recent advances in natural language processing (NLP) and multimodal learning have accelerated this transformation. NLP enables machines to understand and generate medical language, while multimodal artificial intelligence (AI) allows the integration of heterogeneous data sources—imaging, clinical variables, and laboratory results—into unified representations. Such systems emulate the clinician’s reasoning process by combining visual cues with contextual patient information to produce reports that are both accurate and individualized. The VisionTrack and InterpreFFA frameworks exemplify this approach, combining convolutional neural networks (CNNs), graph neural networks, and large language models (LLMs) to generate high-fidelity ophthalmic reports that align closely with expert annotations. Importantly, these models not only assist in diagnosis but also enhance report standardization, workflow efficiency, and educational value for trainees and non-specialist clinicians.
Multimodal AI thus represents a paradigm shift from unidimensional image interpretation toward context-aware, patient-centric reporting. Image-driven CNNs [7–9] and text-driven NLP models [10–13] have achieved remarkable success in the field of ophthalmology. However, such single-modality systems remain limited in capturing the multidimensional characteristics of diseases and cannot emulate the way clinicians integrate multi-source information for comprehensive diagnostic reasoning [14].
Multimodal approaches, by contrast, offer both automation and enhanced accuracy. Unlike earlier single-modality networks that focused solely on lesion classification, these systems integrate fundus, OCT, and angiographic data with structured clinical inputs to mirror how ophthalmologists synthesize evidence in real-world practice. The ability to couple structured outputs (e.g., disease probability scores, lesion maps) with free-text descriptions (e.g., “microaneurysms with focal leakage temporal to fovea”) transforms AI from a detection tool into a documentation companion. In this way, AI-based report generation addresses not only diagnostic performance but also communication and continuity of care, which are integral to precision ophthalmology. For example, the VisionTrack platform combines CNN-based image feature extraction, a graph neural network modeling clinical risk factors, and an LLM parsing patient medical reports to predict multiple retinal diseases, achieving near-perfect accuracy on both OCT data and fundus data [15]. Moreover, multimodal AI systems can substantially accelerate clinical workflows, improve diagnostic accuracy, and reduce human errors by integrating diverse data sources, allowing ophthalmologists to focus on complex decision-making and patient care [16].
Despite these advances, several challenges hinder clinical translation. High-quality annotated datasets are limited by labor-intensive labeling and inter-observer variability. The heterogeneity of imaging devices, acquisition protocols, and patient demographics leads to domain shift and biases in model generalization. Moreover, the interpretability of generative models and the ethical implications of automated text generation—particularly regarding accountability and clinical trust—remain active areas of debate. Addressing these barriers will require not only algorithmic innovation but also concerted efforts in data standardization, cross-institutional validation, and regulatory harmonization.
This review aims to provide a comprehensive overview of AI-driven report generation in ophthalmology, emphasizing the integration of multimodal imaging and clinical data. It first summarizes the technological foundations—spanning deep learning (DL), NLP, self-supervised learning, and transformer-based multimodal architectures—and their applications in ophthalmic diagnostics. It then examines current real-world implementations, including disease-specific systems for DR, glaucoma, cataract, and AMD, and highlights how multimodal and LLM-based frameworks are reshaping clinical workflows. Finally, the review critically discusses the challenges of data quality, interpretability, ethics, and workflow integration, and explores future directions such as real-time AI-assisted reporting, predictive and personalized analytics, and scalable global deployment. By synthesizing current progress and outlining future opportunities, this article seeks to define the emerging frontier of automated, interpretable, and patient-centered report generation in ophthalmology. This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.
Technological Foundations
AI Algorithms and Models in Ophthalmology
AI algorithms in ophthalmology can be broadly conceptualized across three tiers—perception (image analysis), cognition (multimodal reasoning), and communication (report generation)—mirroring the clinical process of observing, interpreting, and documenting findings. AI in ophthalmology relies on a range of computational techniques, primarily machine learning (ML), DL, and NLP. ML enables computers to learn from large-scale data without explicit programming by identifying patterns and building predictive models for specific tasks [17]. DL, a subset of ML, employs hierarchical representation learning through multiple nonlinear layers, transforming low-level input features into higher-level abstract representations [18]. NLP, an interdisciplinary field bridging computer science, AI, and linguistics, allows machines to understand, process, and generate human language, supporting applications such as automated medical report generation, question-answering systems, and text summarization [19]. Figure 1 illustrates the technological foundations of AI-driven report generation. Together, these algorithms form the computational core that bridges visual perception and language understanding, enabling automated yet interpretable ophthalmic reporting.
Fig. 1.
Technological foundations of AI-driven report generation
These foundational techniques have been widely applied in ophthalmology, enabling AI models to analyze medical images, extract relevant features, and assist in the automated generation of diagnostic reports [20, 21]. Building upon these approaches, state-of-the-art models such as InterpreFFA leverage memory-driven transformer architectures like R2Gen, which capture dependencies from previous generation steps to produce comprehensive reports containing essential medical terminology and meaningful cross-modal attention maps [22]. Beyond these specific models, recent advancements in AI techniques further enhance capabilities: transformer architectures, originally dominant in NLP, are increasingly applied to medical imaging to capture long-range dependencies and spatial relationships, offering advantages over conventional CNNs [23]; diffusion models generate high-quality images through iterative noise-to-denoise processes, useful for image synthesis, reconstruction, and augmentation [24]; self-supervised learning leverages unlabeled data with pretext tasks (e.g., masked reconstruction, contrastive learning) to learn generalizable features, alleviating the scarcity of annotated datasets [25]; and foundation models, pretrained on large and diverse datasets, can be adapted with minimal task-specific data, with multimodal variants capable of jointly processing images and text, such as combining ophthalmic imaging with clinical reports [26]. Collectively, these technological innovations form the backbone of modern AI systems for ophthalmic diagnostics and automated report generation, enabling more accurate, efficient, and scalable clinical workflows. Table 1 summarizes the key concepts of AI in ophthalmology.
Table 1.
Key concepts underlying AI in ophthalmology
| Term | Description | Example application |
|---|---|---|
| CNN (convolutional neural network) | Learns spatial hierarchies for detecting anatomical or pathological patterns | DR screening, glaucoma detection |
| Transformer | Captures global dependencies and contextual relationships | Vision Transformer, R2Gen for report generation |
| Diffusion model | Iteratively reconstructs or augments images from noise | Synthetic OCT data generation |
| Self-supervised learning | Learns robust representations from unlabeled data | Masked image modeling in fundus datasets |
| Foundation model | Large-scale pretrained multimodal model adaptable to downstream tasks | RETFound, CLIP (Contrastive Language-Image Pre-training)-style retinal models |
These paradigms collectively enable scalable and generalizable AI applications for diagnostic interpretation and report synthesis. Building upon these algorithmic advances, multimodal frameworks now integrate diverse ophthalmic imaging modalities to emulate clinician-like reasoning.
Multimodal Imaging Integration
Figure 2 illustrates the conceptual architecture of multimodal ophthalmic report generation, integrating fundus, OCT, and angiographic images with clinical and textual data through attention-based fusion mechanisms. The integration of multimodal imaging modalities, such as fundus photography, OCT, and FFA, has gained increasing attention across medicine, including applications in cognitive impairment and Alzheimer’s disease, where it offers richer diagnostic insights than single-modality approaches [27, 28]. Multimodal fusion strategies can be broadly categorized as early fusion (feature-level concatenation) and late fusion (decision-level integration). Recent transformer-based architectures increasingly employ co-attention or cross-modal transformers to align imaging and clinical representations dynamically.
Fig. 2.
Multimodal integration framework for ophthalmic report generation
Integrating imaging data with clinical information is critical for generating accurate and context-aware ophthalmic reports. A common strategy involves embedding structured clinical data, such as diagnostic keywords or lab results, as semantic priors to enrich the representation of visual features and ensure meaningful report context [29]. Subsequently, these enhanced visual representations are fused with clinical embeddings via attention-based or graph-structured multimodal interaction mechanisms, effectively capturing cross-modal relationships [30]. This computational paradigm parallels clinical reasoning, where ophthalmologists synthesize structural imaging cues with contextual data such as systemic risk factors or visual function to form comprehensive diagnostic impressions.
To further align model outputs with clinical thinking, external medical knowledge can be integrated, which boosts the reliability and interpretability of report generation, for instance via graph attention networks over multimodal knowledge graphs [31].
Real-world application further underscores the clinical value of multimodal fusion. For example, Liu et al. developed and validated an AI-based dual-modality system integrating fundus photography and OCT for DR screening in a community hospital. Beyond achieving an area under the curve (AUC) of 0.981, the system substantially increased AI-detected referral cases (from 53 to 85), highlighting not only its diagnostic accuracy but also its robustness and practical utility in improving patient referral outcomes [32]. Moreover, self-supervised multimodal representation learning combining fundus and OCT images demonstrates robust generalizability across multiple downstream tasks in independent clinical datasets, supporting enhanced clinical predictions [33].
Collectively, these developments demonstrate that multimodal AI not only improves detection accuracy but also transforms ophthalmic imaging into an integrated diagnostic ecosystem.
Data Integration and Interoperability
Seamless integration and interoperability of clinical and imaging data are crucial for high-quality AI-assisted report generation. Studies have demonstrated that linking ophthalmic imaging data with electronic health records (EHRs) can improve both the accuracy and clinical utility of generated reports [34–36]. For example, Mbagwu et al. evaluated the feasibility of cross-vendor linkage of ophthalmic images with de-identified EHR data from the IRIS (Intelligent Research in Sight) Registry, highlighting the potential of multi-source data integration [37]. Interoperability relies on adopting open standards such as DICOM-SR for structured imaging data, HL7-FHIR for health information exchange, and OMOP-CDM for cross-domain harmonization. These frameworks enable consistent metadata mapping and longitudinal patient tracking across devices and institutions.
However, traditional imaging reports often store images and textual findings separately, limiting their interoperability. To address this, Berkowitz et al. proposed the Interactive Multimedia Reporting (IMR) framework, which standardizes infrastructure and communication protocols to integrate images, text, and other media, enhancing both clinician decision-making and patient communication [38]. Nevertheless, practical challenges persist, including proprietary image formats, inconsistent ontology mapping across vendors, and data privacy constraints that hinder federated interoperability. Addressing these barriers requires unified governance and collaborative benchmarking across institutions.
In terms of structured annotation, the PadChest-GR dataset demonstrates how chest X-ray images can be combined with bilingual report text and spatial lesion localization to support grounded, reproducible report generation [39]. Similarly, Shao et al. proposed the InterpreFFA framework, which integrates contrastive learning with generative models to automatically generate reports from FFA images, and its effectiveness was validated on multicenter datasets [22]. Collectively, these approaches show that standardization, structured annotation, and multimodal data integration can significantly enhance the quality, reproducibility, and clinical value of AI-driven medical reports. With these technological building blocks in place, the following section explores their translation into disease-specific clinical applications.
Current Applications in Ophthalmology
Automated Diagnostic Report Generation
Automated diagnostic report generation in ophthalmology has rapidly advanced in recent years, encompassing various common conditions such as DR, glaucoma, and cataract. These developments mark a shift from static, model-centric outputs to dynamic, clinician-aligned reporting systems capable of generating structured interpretations and free-text narratives. Table 2 summarizes representative AI systems developed for automated report generation in ophthalmology. Although framed around report generation, many existing systems primarily evaluate diagnostic or classification performance, with textual output serving a supportive role [21, 30]. For genuinely text-generative models, standardized NLP metrics and clinician-based linguistic assessments are still inconsistently reported, limiting systematic evaluation of report quality and clinical usability [30, 39]. These frameworks reflect diverse design philosophies, ranging from explainable lesion-based pipelines to end-to-end vision–language systems; however, this diversity also highlights the lack of standardized evaluation for generated text and its clinical usability.
Table 2.
Representative AI systems for automated report generation in ophthalmology
| Model/system | Data types | Application scenario | Dataset size | Performance metrics | Clinical value |
|---|---|---|---|---|---|
| ExplAIn | DR fundus images | DR severity classification with lesion segmentation | Public DR datasets, > 10k images | High classification accuracy (AUC > 0.95), interpretable | Improves diagnostic consistency, reduces black-box concerns |
| DeepLensNet | Lens images | Automated classification of age-related cataracts (three types) | Multicenter clinical data, > 3k patients | Accuracy higher than ophthalmologists for nuclear sclerosis and cortical opacity | Enhances cataract subtype classification efficiency and accuracy |
| Glaucoma DL model | Fundus + OCT | Glaucoma diagnosis | Single center, thousands of cases | Accuracy 92%, AUC 0.95 | Multimodal fusion outperforms single-modality models, supports decision-making |
| InterpreFFA | FFA images | Automated FFA interpretation and report generation | Multicenter, thousands of FFA images | Improved resident accuracy (85.6% → 90.3%) | Reduces reporting time, supports resident training |
| FFA-GPT | FFA + text | Automated reporting + interactive Q&A | Mid-size multimodal dataset | N/A (evaluated qualitatively) | Enables clinical interaction, enhances patient understanding |
| EyeFM | Fundus, OCT, five imaging modalities + paired clinical texts | Multimodal vision-language eye care copilot with retrospective validation, multicountry efficacy studies, and RCT (ChiCTR2500095518) evaluation | Pretrained on 14.5M ocular images + clinical texts from global, multiethnic datasets | In RCT, EyeFM copilot improved diagnostic accuracy (92.2% vs. 75.4%), referral rate (92.2% vs. 80.5%), report standardization, and patient compliance | Demonstrated efficacy as a clinical copilot across continents, improved ophthalmologist performance, patient outcomes, and post-deployment user acceptance |
| DeepDR-LLM | Fundus + clinical data + LLM (DeepDR-Transformer + LLM module) | Integrated diabetes management and DR screening support for PCPs | Retrospective validation + single-center prospective real-world deployment (n ≈ 769 patients across two arms) |
Retrospective: LLM module comparable to PCPs/residents (English), superior to PCPs in Chinese Prospective: improved PCP diagnostic accuracy (81.0% → 92.3%), better adherence to referrals and self-management (p < 0.05–0.01) |
Enhances primary care by supporting PCPs in low-resource settings, improves diabetes self-management, referral adherence, and quality of clinical reports |
PCP primary care physician, RCT randomized controlled trial
For instance, the ExplAIn framework provides end-to-end, image-supervised classification of DR severity, while simultaneously segmenting and categorizing lesions, achieving high classification performance comparable to black-box AI, with enhanced interpretability [40]. DeepLensNet enables automated and quantitative classification of all three types of age-related cataract, showing accuracy significantly higher than ophthalmologists for the most common types (nuclear sclerosis and cortical lens opacity) [41]. For glaucoma, a DL model fuses fundus photography and OCT images from the same eye, reaching 92% accuracy (AUC 0.95), outperforming fundus-only (86%, AUC 0.89) and OCT-only (84%, AUC 0.87) models [42]. These advancements underscore the expanding role of AI in ophthalmology, improving diagnostic efficiency and clinical decision-making. Together, these examples highlight three critical dimensions of progress: (1) increasing diagnostic granularity, (2) higher interpretability through visual and textual alignment, and (3) measurable improvements in clinician performance and workflow efficiency.
Beyond common conditions, automated report generation has been applied to context-specific challenges such as retinopathy of prematurity (ROP) which require specialized imaging protocols and careful interpretation. Federated learning frameworks enable privacy-preserving multicenter collaboration for ROP classification, allowing institutions to jointly train models without exchanging raw patient data. Such frameworks achieve performance comparable to centralized training and enhance robustness by incorporating diverse multicenter datasets [43, 44]. These approaches not only improve diagnostic accuracy and consistency but also support broader deployment in low-resource settings and telemedicine initiatives. Federated learning not only safeguards data privacy but also promotes inclusivity by allowing participation from smaller centers and underrepresented regions, addressing geographical inequity in AI training datasets.
In addition, real-time applications of AI-based report generation have begun to show clinical utility. A key trend emerging across these systems is the fusion of diagnostic automation with interactive reporting, allowing the model to serve as both an assistant and a communication interface. For example, InterpreFFA, a generative AI model based on supervised contrastive learning, was developed for FFA interpretation. This system improved residents’ diagnostic accuracy (from 85.55% to 90.34%) and reduced reporting time, demonstrating its potential as an efficient clinical aid [22]. Similarly, FFA-GPT introduced a two-stage automated pipeline combining a multimodal transformer (Bootstrapping Language-Image Pre-training, BLIP) with an LLM (Llama 2), enabling both diagnostic reporting and interactive question-answering from FFA images [45]. By alleviating physicians’ workload and enhancing patient understanding, these systems highlight the potential of multimodal generative AI for real-time, clinically embedded applications. Such multimodal generative AI tools exemplify the convergence of perception (image interpretation), cognition (contextual reasoning), and communication (report generation)—laying the groundwork for integrated clinical copilots.
Integration of Clinical Data for Comprehensive Reporting
Integration of clinical data with diagnostic imaging enables the generation of comprehensive and individualized reports. This integration transforms ophthalmic AI from disease detection to holistic patient profiling, combining systemic, genetic, and behavioral determinants into the diagnostic narrative. Wright et al. demonstrated that ensemble ML models trained solely on clinical and functional data can achieve high accuracy in grading DR severity, highlighting the potential of combining structured clinical data with imaging [46]. Building on this, large-scale registry studies have further validated the feasibility of efficiently linking ophthalmic images with EHRs across different vendors, laying the groundwork for multimodal datasets and AI research [37]. In a recent randomized controlled trial (RCT), the EyeFM multimodal vision-language eye care copilot, pretrained on millions of ocular images and paired clinical texts, significantly improved diagnostic accuracy, referral rates, and report standardization across multiple countries, demonstrating efficacy as a clinical support tool [47]. These results reveal that AI systems trained on multimodal, multicountry data can generalize effectively across diverse clinical settings, a key step toward scalable deployment. In addition, LLMs can extract and structure key information from unstructured clinical text (such as DR reports) and enhance image classifiers through weak supervision [48]. In AMD, the ML models that integrate clinical features, genetic risk scores, and lifestyle factors have achieved excellent predictive performance, with a 5-year AUC of 0.92, while also identifying multiple key risk factors, underscoring the value of multimodal fusion for precision prediction and personalized management [49].
Patient-centered and conversational AI applications: Emerging multimodal systems integrate imaging data, biomarkers, and patient characteristics to generate personalized reports, including risk scores, follow-up recommendations, and content adapted to patient comprehension. For patient-centered questions in AMD, ChatGPT-4 performed well in terms of coherency, factuality, comprehensiveness, and safety, while also providing a reference framework for evaluating the reliability of LLMs in ophthalmic information generation [50]. These tools illustrate how AI-generated language can bridge clinical complexity and patient comprehension, a cornerstone of precision communication in digital ophthalmology.
Meanwhile, the PRObot system for DR converts patient-reported outcome measures into interactive dialogue, using personalized prompts and empathetic feedback to improve patient engagement and adherence [51]. The DeepDR-LLM system further integrates image analysis with language models to deliver individualized diabetes management recommendations in primary care, and real-world studies have validated its effectiveness for DR screening and patient self-management [52]. Collectively, these studies demonstrate that multimodal and LLM-driven tools can enhance clinical decision support while improving patient experience, offering practical insights for future clinical implementation. These multimodal and conversational frameworks are redefining the ophthalmic report as a living interface—linking patient data, clinician interpretation, and personalized education into a continuous care cycle.
Disease-Specific Report Generation
AI systems for disease-specific report generation have achieved significant progress in recent years, particularly in the clinical management of DR, AMD, and macular holes. While general-purpose AI models demonstrate broad versatility, disease-specific systems remain crucial for achieving clinical-grade precision and interpretability, particularly in conditions with well-defined imaging phenotypes. Unlike general frameworks, these systems focus on key pathological features of a single disease or imaging modality, producing structured and interpretable reports to support diagnosis and individualized management. RETFound, a self-supervised retinal foundation model, was pretrained on large-scale unlabeled images and demonstrated highly efficient transfer for downstream tasks such as DR, enabling accurate report generation under low-annotation conditions [26]. Such foundation-level architectures allow efficient fine-tuning for disease-specific applications without requiring massive retraining, enabling rapid clinical translation. Low-shot contrastive learning methods have also been introduced, producing reliable DR diagnostic results even with minimal training samples, thereby offering a feasible approach for disease-specific reporting in data-scarce settings [53]. For macular diseases, both rule-based and DL–based NLP models have been developed to generate diagnostic reports from fundus photography and OCT, achieving diagnostic accuracy and clinical recommendations comparable to junior ophthalmologists [54]. The Assistive Diagnosis Framework for OCT (ADF-OCT) framework integrates multi-frame distillation with DL to enable multi-label lesion recognition and automated report generation from macular OCT, significantly improving comprehensiveness and efficiency [55]. By automating lesion labeling and structuring OCT-based narratives, ADF-OCT aligns with real-world clinical documentation practices, enhancing interpretability for both trainees and specialists. Furthermore, generative models based on conditional variational autoencoders have been applied to predict postoperative macular anatomy in macular hole surgery, producing intuitive structural forecasts from preoperative OCT and offering new tools for prognosis and patient management [56]. Such disease-specific automation not only improves efficiency but also supports more consistent monitoring and timely intervention in resource-limited settings. Collectively, these studies highlight that disease-specific AI reporting not only enhances diagnostic accuracy and efficiency but also maintains robustness under limited data conditions, providing valuable support for precision and individualized ophthalmic care. Figure 3 summarizes the clinical applications of AI in ophthalmology. Overall, these applications demonstrate a trajectory from algorithmic image interpretation to holistic disease modeling, marking the transition of AI from assistive technology to collaborative clinical intelligence.
Fig. 3.
Clinical applications of AI in ophthalmology
Challenges and Limitations
Despite remarkable progress in multimodal and generative AI, several translational barriers continue to limit real-world adoption. These challenges can be grouped into four domains: data quality, interpretability, ethics and regulation, and clinical integration.
Data Quality and Availability
Figure 4 summarizes the key challenges and future directions of AI in ophthalmology. High-quality data remain the foundation of any trustworthy AI model; yet ophthalmic datasets are often fragmented across institutions and heterogeneous in acquisition standards. The clinical application of AI in ophthalmology faces substantial challenges related to data quality and availability. High-quality annotated datasets required for training AI models are often limited due to labor-intensive labeling and variability in labeling standards across institutions [57]. Incomplete medical records or missing diagnostic results further compromise model reliability [58]. Additionally, imaging variability arising from device types, acquisition protocols, illumination, and operator technique can substantially influence the accuracy and reliability of AI-generated reports [59]. Biases arising from uneven representation of disease stages, patient demographics, or geographical regions can lead to suboptimal performance in specific populations [60]. Beyond data heterogeneity, dataset characteristics such as sample size, modality balance, and labeling strategies critically influence model robustness and bias [57, 60]. In particular, multimodal and text-generative systems generally require larger and more balanced datasets than classification-only models, while upstream data governance, including consent, de-identification, and dataset stewardship, plays a key role in ensuring model validity [26, 59, 61]. While standardized data collection and basic preprocessing may partially mitigate these issues [62, 63], they remain significant barriers to robust AI deployment. To mitigate these issues, multicenter data harmonization, federated learning, and continuous data curation pipelines are emerging as feasible solutions. Table 3 outlines representative challenges and potential strategies.
Fig. 4.
Challenges and future directions for Al in ophthalmology
Table 3.
Summary of key data-related challenges and mitigation strategies
| Challenge | Impact | Possible mitigation |
|---|---|---|
| Inconsistent labeling and annotation bias | Reduces model generalizability | Consensus grading, AI-assisted annotation tools |
| Imaging heterogeneity | Causes domain shift across devices | Device calibration, domain adaptation algorithms |
| Underrepresentation of minority populations | Leads to biased outcomes | Balanced sampling, federated and inclusive data collection |
| Missing or incomplete clinical metadata | Limits multimodal modeling | EHR linkage, imputation, and data validation pipelines |
Interpretability and Transparency
The black-box nature of DL models poses a major barrier to their adoption in ophthalmology. While these models can achieve impressive accuracy, their decision-making process is largely opaque, making it difficult for clinicians to understand or verify why a particular prediction is made [64]. This interpretability gap underscores the growing need for causable AI—systems whose reasoning process can be explicitly traced and validated by clinicians.
This lack of transparency can reduce clinicians’ trust in AI outputs and limit adoption in clinical practice [65]. Interpretability tools, such as saliency maps, are widely used to address this issue; however, studies have shown that they often suffer from poor stability, susceptibility to noise, and limited correlation with clinically relevant lesions [66]. From a methodological perspective, interpretability approaches can be broadly divided into intrinsic methods embedded within model design and post hoc techniques applied after prediction [67]. While post hoc visualizations are technically elegant, intrinsically interpretable models that align explanations with clinically meaningful features tend to be more actionable in practice, and both approaches may fail when explanations appear plausible but are clinically misleading [64, 66]. These limitations highlight the fact that current methods may not always provide reliable or meaningful insights for medical decision-making. Consequently, the development of explainable AI models in clinical settings is essential. Reliable and clinically meaningful explanations are critical for building trust and ensuring safe AI deployment. Moreover, explainability design should address both technical and clinical dimensions, accurately reflecting model reasoning while remaining aligned with clinicians’ cognitive processes and workflow, thereby facilitating practical and responsible integration of AI into ophthalmic practice [67]. Recent advances such as attention-guided transformers, counterfactual explanations, and concept-activation maps show promise in visualizing how model features correspond to pathophysiological findings. Embedding such explainability tools directly within clinical interfaces may accelerate clinician trust and regulatory approval.
Regulatory and Ethical Concerns
The integration of AI into ophthalmology presents intertwined regulatory and ethical challenges. These challenges span the entire AI life cycle—from data collection and model training to deployment, continuous learning, and post-market surveillance. Continuous updates to AI models can impact diagnostic consistency, creating regulatory uncertainty and highlighting the need for cross-institutional validation and monitoring [61]. For instance, adaptive models that update with new data can alter diagnostic thresholds over time, raising questions about version control and reproducibility in regulatory filings. AI systems should function as decision-support tools rather than autonomous diagnosticians, and interactive feedback mechanisms can help clinicians interpret AI outputs in a contextually appropriate and clinically meaningful way [68]. Moreover, AI deployment raises ethical concerns related to patient consent, data privacy, and accountability [65]. From a deployment perspective, ethical governance requires concrete mechanisms such as consent models for secondary data use, dataset versioning, and auditability of AI-generated text, particularly in alignment with Software as a Medical Device (SaMD) frameworks [65, 69]. Data stewardship is typically shared among AI vendors, clinicians, and healthcare institutions, with clinicians retaining final decision authority [65]. Clear allocation of responsibility is therefore essential to ensure accountability and clinical trust. The complexity of medical data, combined with potential model errors, creates uncertainty about who is responsible for AI-assisted decisions, while improper handling of sensitive data can lead to privacy breaches and undermine trust in AI systems [69]. Developing unified ethical frameworks—such as transparent audit trails, explicit human-in-the-loop protocols, and equitable data governance—will be essential for safe and accountable AI adoption. These challenges continue to pose significant barriers to the responsible, safe, and trustworthy integration of AI into ophthalmic clinical practice.
Integration into Clinical Workflows
The integration of AI tools into existing ophthalmic clinical workflows faces significant practical barriers. Even the most accurate algorithms provide little value if they cannot be seamlessly embedded into the daily routines of clinicians and technicians. These challenges arise from the complexity of routine clinical operations, variability in staff experience, and the lack of standardized procedures for incorporating AI outputs. Clinicians and support staff often require dedicated training to interpret AI-generated reports accurately and to translate these insights into actionable clinical decisions. Continuous education, simulation-based training, and inclusion of AI literacy in ophthalmology curricula are necessary to ensure effective human–AI collaboration. Inadequate understanding of AI recommendations can lead to underutilization, workflow disruption, or even misinterpretation of critical findings [70, 71]. Additionally, integrating AI systems without causing delays or added complexity to patient care remains a persistent obstacle, particularly in high-volume settings [72]. Ensuring that AI outputs are delivered in a timely, user-friendly format that aligns with existing workflows is essential for adoption. These factors collectively underscore the need to address both technological integration and human factors to achieve effective, reliable use of AI in ophthalmic practice. Addressing these challenges requires coordinated progress across data science, regulatory policy, and clinical education. The following section outlines emerging solutions and future directions that aim to overcome these barriers and translate AI innovation into sustainable clinical impact.
Future Directions and Emerging Trends
Building on the current evidence base, future research must bridge algorithmic capability with clinical impact. The next generation of ophthalmic AI will emphasize real-time responsiveness, multimodal synthesis, personalization, and equitable global scalability.
AI-Enhanced Real-Time Reporting
Addressing workflow integration challenges, future real-time systems are expected to deliver real-time, on-site reporting that can be seamlessly integrated into ophthalmic workflows. Such advancements may allow clinicians to obtain immediate feedback during imaging sessions, thereby accelerating decision-making and enabling earlier intervention [73, 74]. Such real-time systems could function as “augmented observers,” continuously analyzing incoming imaging streams to flag anomalies or progression indicators before report finalization. Future developments are also likely to focus on embedding AI models directly into clinical workstations, providing automated alerts or visual annotations that highlight critical findings in real time [75, 76]. These innovations will move ophthalmic practice toward a more interactive and responsive care model. This transition parallels trends in other specialties, such as radiology and cardiology, where on-device inference and workflow-integrated decision support have already shortened diagnostic turnaround times.
AI and Multimodal Data Fusion
Future AI applications in ophthalmology are anticipated to shift from single-modality imaging toward multimodal data fusion [77]. This evolution represents a move from image-centric diagnostics toward “oculomics,” where ocular biomarkers reflect systemic health and disease trajectories. By integrating structural imaging, functional assessments, electronic health records, and genetic information, this approach may enhance diagnostic precision and provide deeper insights into disease mechanisms [49, 78]. In parallel, the incorporation of these diverse data sources enables AI to generate more personalized and context-sensitive reports, offering a more comprehensive understanding of ocular health [79]. When implemented clinically, such multimodal frameworks could enable cross-specialty insights that link retinal microvasculature to cardiometabolic, neurological, and inflammatory pathways, thereby embedding ophthalmology within the broader precision medicine landscape.
Personalized and Predictive Report Generation
One of the most promising future directions is the transition from descriptive reporting to predictive and personalized report generation. Future systems may synthesize longitudinal imaging, genomic risk scores, and treatment response curves into dynamic reports that evolve with each patient visit. AI-driven models will likely be able to forecast disease progression, estimate treatment responses, and generate individualized risk profiles. Such predictive analytics can support tailored treatment strategies, guiding clinicians toward more precise and patient-specific care pathways. Ultimately, this shift will redefine ophthalmic reporting from a retrospective summary to a forward-looking clinical tool [52, 80–83]. In essence, ophthalmic reports will evolve from static documentation to adaptive clinical dashboards that anticipate disease progression, support shared decision-making, and continuously learn from real-world feedback.
Global Implementation and Scalability
Expanding AI-based report generation into low-resource settings requires lightweight, infrastructure-optimized systems including telemedicine and task-tailored, lightweight/edge models that reduce dependence on specialized equipment; economic evaluation from China shows that combined population-level screening with telemedicine/AI can be highly cost-effective in both rural and urban settings [84, 85]. However, equitable deployment requires not only technical optimization but also policy support, cross-border data governance, and capacity-building among local clinicians. Ensuring large-scale deployment also depends on standardization and interoperability. The consensus white papers and ocular-imaging interoperability reviews recommend adoption of standards (e.g., DICOM, HL7 FHIR) and interactive multimedia reporting frameworks to guarantee consistent, comparable, and EHR-integrable AI outputs across diverse healthcare platforms [38]. Collaborative consortia such as the International Ocular Imaging Interoperability Alliance and World Health Organization (WHO)-linked digital health frameworks are already shaping these standards, ensuring interoperability and data ethics in global eye health AI. Together, technology adaptation for low-resource contexts and coordinated standards adoption form the practical pathway to broaden access to AI-assisted ophthalmic diagnostics worldwide. Ultimately, the convergence of real-time analytics, multimodal integration, personalization, and global scalability will transform AI from an assistive diagnostic aid into a central pillar of precision ophthalmology. The next section summarizes how these emerging directions collectively redefine the future of automated, interpretable, and patient-centered reporting.
Conclusion
This review summarizes major advancements in AI-driven report generation in ophthalmology, highlighting the use of DL, NLP, and the effective integration of multimodal imaging with clinical data. By tracing the evolution from single-modality image classifiers to multimodal, language-enabled, and explainable systems, this review situates AI report generation within the broader paradigm of precision ophthalmology. AI systems have demonstrated considerable potential in improving diagnostic accuracy, accelerating report generation, optimizing clinical workflows, and supporting personalized patient management. These developments collectively underscore the fact that the value of AI lies not merely in automation, but in augmentation—enhancing clinical judgment, standardizing documentation, and enabling equitable access to expert-level interpretation. Future research should emphasize multimodal data fusion, explainability, and cross-disciplinary collaboration to refine AI tools for diverse clinical settings. Real-world adoption will depend on transparent algorithms, regulatory harmonization, and continuous clinician education. Collaborative frameworks—linking data scientists, clinicians, and policy experts—will be critical to ensure ethical deployment and equitable benefit across healthcare systems. As these technologies mature and become more widely adopted, AI-generated personalized reports are expected to provide real-time decision support in clinical practice, enhance efficiency and consistency, and optimize patient outcomes, thereby advancing ophthalmic care toward more precise and intelligent services. As these technologies mature and are integrated into clinical infrastructure, AI-generated ophthalmic reports are poised to evolve from retrospective documentation into dynamic, predictive, and patient-interactive tools. Looking forward, the field is shifting from descriptive documentation toward predictive and personalized reporting, enabled by multimodal and foundation models that support scalability and cross-disease generalization. This transition positions AI-generated ophthalmic reports not merely as summaries, but as forward-looking clinical decision-support tools.
Acknowledgments
Medical Writing, Editorial and Other Assistance
No medical writing or editorial assistance received during the writing of this article.
Author Contributions
Yingjiao Shen and Xin Ye conceptualized the review and wrote the original draft. Qian Chen and Xiaoying He investigated on the main subject and organized the literature. Rupesh Agrawal, Andrzej Grzybowski, Kai Jin, and Xin Ye supervised and guided this study. All authors read and approved the final manuscript.
Funding
The authors acknowledge the Project on Scientific and Technological Research of Traditional Chinese Medicine and Ethnic Medicine in Guizhou Province (QZYY-2025-225), and the Science and Technology Fund Project of Guizhou Provincial Health Commission (gzwkj2025-099). The sponsor or funding organization had no role in the design or conduct of this research. The Rapid Service Fee was funded by the authors.
Data Availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Declarations
Conflict of Interest
Yingjiao Shen, Qian Chen, Xiaoying He, Rupesh Agrawal, Andrzej Grzybowski, Kai Jin, and Xin Ye declare no conflict of interest related to this work. Rupesh Agrawal is an Editorial Board member of Ophthalmology and Therapy. Rupesh Agrawal was not involved in the selection of peer reviewers for the manuscript nor any of the subsequent editorial decisions. Andrzej Grzybowski is an Editorial Board member of Ophthalmology and Therapy. Andrzej Grzybowski was not involved in the selection of peer reviewers for the manuscript nor any of the subsequent editorial decisions.
Ethical Approval
This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors.
Contributor Information
Kai Jin, Email: jinkai@zju.edu.cn.
Xin Ye, Email: yexinsarah@163.com.
References
- 1.Burton MJ, et al. The Lancet Global Health Commission on Global Eye Health: vision beyond 2020. Lancet Glob Health. 2021;9:e489–551. 10.1016/S2214-109X(20)30488-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Szeto SK-H, et al. Recent advances in clinical applications of imaging in retinal diseases. Asia Pac J Ophthalmol. 2023;12:252–63. 10.1097/APO.0000000000000584. [DOI] [PubMed] [Google Scholar]
- 3.Alexopoulos P, Madu C, Wollstein G, Schuman JS. The development and clinical application of innovative optical ophthalmic imaging techniques. Front Med. 2022;9:891369. 10.3389/fmed.2022.891369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Zhang M, Li S, Xue M, Zhu Q. Two-stage classification strategy for breast cancer diagnosis using ultrasound-guided diffuse optical tomography and deep learning. J Biomed Opt. 2023;28:086002. 10.1117/1.Jbo.28.8.086002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang SY. The rural shortage of ophthalmic subspecialists. JAMA Ophthalmol. 2025;143:125–125. 10.1001/jamaophthalmol.2024.5704. [DOI] [PubMed] [Google Scholar]
- 6.Ghaffar F, Furtado NM, Ali I, Burns C. Diagnostic decision-making variability between novice and expert optometrists for glaucoma: comparative analysis to inform AI system design. JMIR Med Inform. 2025;13:e63109. 10.2196/63109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liu H, et al. Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA Ophthalmol. 2019;137:1353–60. 10.1001/jamaophthalmol.2019.3501. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Dow ER, et al. A deep-learning algorithm to predict short-term progression to geographic atrophy on spectral-domain optical coherence tomography. JAMA Ophthalmol. 2023;141:1052–61. 10.1001/jamaophthalmol.2023.4659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Asia A-O, et al. Detection of diabetic retinopathy in retinal fundus images using CNN classification models. Electronics. 2022;11:2740. [Google Scholar]
- 10.Huang AS, Hirabayashi K, Barna L, Parikh D, Pasquale LR. Assessment of a large language model’s responses to questions and cases about glaucoma and retina management. JAMA Ophthalmol. 2024;142:371–5. 10.1001/jamaophthalmol.2023.6917. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mihalache A, Popovic MM, Muni RH. Performance of an artificial intelligence chatbot in ophthalmic knowledge assessment. JAMA Ophthalmol. 2023;141:589–97. 10.1001/jamaophthalmol.2023.1144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Thirunavukarasu AJ, et al. Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: a head-to-head cross-sectional study. PLoS Digit Health. 2024;3:e0000341. 10.1371/journal.pdig.0000341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kang D, et al. Evaluating the efficacy of large language models in guiding treatment decisions for pediatric refractive error. Ophthalmol Ther. 2025;14:705–16. 10.1007/s40123-025-01105-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zou K, et al. Confidence-aware multi-modality learning for eye disease screening. Med Image Anal. 2024;96:103214. 10.1016/j.media.2024.103214. [DOI] [PubMed] [Google Scholar]
- 15.Zedadra A, Salah-Salah MY, Zedadra O, Guerrieri A. Multi-Modal AI for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images: A Hybrid Approach. Sensors. 2025;25:4492. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Muse ED, Topol EJ. Transforming the cardiometabolic disease landscape: multimodal AI-powered approaches in prevention and management. Cell Metab. 2024;36:670–83. 10.1016/j.cmet.2024.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.França RP, Borges Monteiro AC, Arthur R, Iano Y. In Trends in deep learning methodologies (eds Vincenzo Piuri, Sandeep Raj, Angelo Genovese, & Rajshree Srivastava) 63–87 (Academic Press, 2021).
- 18.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44. 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- 19.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011;18:544–51. 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Linde G, et al. A comparative evaluation of deep learning approaches for ophthalmology. Sci Rep. 2024;14:21829. 10.1038/s41598-024-72752-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Zhao X, et al. Diagnostic report generation for macular diseases by natural language processing algorithms. Br J Ophthalmol. 2025;109:1036. 10.1136/bjo-2024-326064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shao A, et al. Generative artificial intelligence for fundus fluorescein angiography interpretation and human expert evaluation. NPJ Digit Med. 2025;8:396. 10.1038/s41746-025-01759-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Azad R, et al. Advances in medical image analysis with vision transformers: a comprehensive review. Med Image Anal. 2024;91:103000. 10.1016/j.media.2023.103000. [DOI] [PubMed] [Google Scholar]
- 24.Kazerouni A, et al. Diffusion models in medical imaging: a comprehensive survey. Med Image Anal. 2023;88:102846. 10.1016/j.media.2023.102846. [DOI] [PubMed] [Google Scholar]
- 25.Huang SC, et al. Self-supervised learning for medical image classification: a systematic review and implementation guidelines. NPJ Digit Med. 2023;6:74. 10.1038/s41746-023-00811-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Zhou Y, et al. A foundation model for generalizable disease detection from retinal images. Nature. 2023;622:156–63. 10.1038/s41586-023-06555-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shi XH, et al. Deep learning models for the screening of cognitive impairment using multimodal fundus images. Ophthalmol Retina. 2024;8:666–77. 10.1016/j.oret.2024.01.019. [DOI] [PubMed] [Google Scholar]
- 28.Ravichandran S, et al. Association and multimodal model of retinal and blood-based biomarkers for detection of preclinical Alzheimer’s disease. Alzheimers Res Ther. 2025;17:19. 10.1186/s13195-024-01668-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yang S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Med Image Anal. 2023;86:102798. 10.1016/j.media.2023.102798. [DOI] [PubMed] [Google Scholar]
- 30.Mamdouh D, et al. Advancements in radiology report generation: a comprehensive analysis. Bioengineering. 2025. 10.3390/bioengineering12070693. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gao W, et al. Enhancing ophthalmology medical record management with multi-modal knowledge graphs. Sci Rep. 2024;14:23221. 10.1038/s41598-024-73316-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu R, et al. Application of artificial intelligence-based dual-modality analysis combining fundus photography and optical coherence tomography in diabetic retinopathy screening in a community hospital. Biomed Eng Online. 2022;21:47. 10.1186/s12938-022-01018-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Sükei E, et al. Multi-modal representation learning in retinal imaging using self-supervised learning for enhanced clinical predictions. Sci Rep. 2024;14:26802. 10.1038/s41598-024-78515-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Guo Y, et al. Improve the efficiency and accuracy of ophthalmologists’ clinical decision-making based on AI technology. BMC Med Inform Decis Mak. 2024;24:192. 10.1186/s12911-024-02587-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Bernstein IA, Fernandez KS, Stein JD, Pershing S, Wang SY. Big data and electronic health records for glaucoma research. Taiwan J Ophthalmol. 2024;14:352–9. 10.4103/tjo.TJO-D-24-00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lin WC, Chen JS, Chiang MF, Hribar MR. Applications of artificial intelligence to electronic health record data in ophthalmology. Transl Vis Sci Technol. 2020;9:13. 10.1167/tvst.9.2.13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Mbagwu M, et al. Feasibility of cross-vendor linkage of ophthalmic images with electronic health record data: an analysis from the IRIS Registry(®). JAMIA Open. 2024;7:ooae005. 10.1093/jamiaopen/ooae005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Berkowitz SJ, et al. Interactive multimedia reporting technical considerations: HIMSS-SIIM collaborative white paper. J Digit Imaging. 2022;35:817–33. 10.1007/s10278-022-00658-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Castro DCd, et al. PadChest-GR: a bilingual chest x-ray dataset for grounded radiology report generation. NEJM AI. 2025;2:AIdbp2401120. 10.1056/AIdbp2401120. [Google Scholar]
- 40.Quellec G, et al. ExplAIn: explanatory artificial intelligence for diabetic retinopathy diagnosis. Med Image Anal. 2021;72:102118. 10.1016/j.media.2021.102118. [DOI] [PubMed] [Google Scholar]
- 41.Keenan TDL, et al. DeepLensNet: deep learning automated diagnosis and quantitative classification of cataract type and severity. Ophthalmology. 2022;129:571–84. 10.1016/j.ophtha.2021.12.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Islam S, Deo RC, Barua PD, Soar J, Acharya UR. Novel deep learning model for glaucoma detection using fusion of fundus and optical coherence tomography images. Sensors. 2025;25:4337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hanif A, et al. Federated learning for multicenter collaboration in ophthalmology: implications for clinical diagnosis and disease epidemiology. Ophthalmol Retina. 2022;6:650–6. 10.1016/j.oret.2022.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lu C, et al. Federated learning for multicenter collaboration in ophthalmology: improving classification performance in retinopathy of prematurity. Ophthalmol Retina. 2022;6:657–63. 10.1016/j.oret.2022.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chen X, et al. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. NPJ Digit Med. 2024;7:111. 10.1038/s41746-024-01101-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wright DM, et al. Identifying the severity of diabetic retinopathy by visual function measures using both traditional statistical methods and interpretable machine learning: a cross-sectional study. Diabetologia. 2023;66:2250–60. 10.1007/s00125-023-06005-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wu Y, et al. An eyecare foundation model for clinical assistance: a randomized controlled trial. Nat Med. 2025. 10.1038/s41591-025-03900-7. [DOI] [PubMed] [Google Scholar]
- 48.Jaskari J, et al. DR-GPT: a large language model for medical report analysis of diabetic retinopathy patients. PLoS ONE. 2024;19:e0297706. 10.1371/journal.pone.0297706. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Ajana S, et al. Predicting progression to advanced age-related macular degeneration from clinical, genetic, and lifestyle factors using machine learning. Ophthalmology. 2021;128:587–97. 10.1016/j.ophtha.2020.08.031. [DOI] [PubMed] [Google Scholar]
- 50.Wang H, et al. ChatGPT-4 for addressing patient-centred frequently asked questions in age-related macular degeneration clinical practice. Eye (Lond). 2025;39:2023–30. 10.1038/s41433-025-03788-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Pielka M, Schneider T, Terheyden J, Sifa R [Vision Paper] PRObot: enhancing patient-reported outcome measures for diabetic retinopathy using Chatbots and Generative AI. (2024).
- 52.Li J, et al. Integrated image-based deep learning and language models for primary diabetes care. Nat Med. 2024;30:2886–96. 10.1038/s41591-024-03139-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Burlina P, et al. Low-shot deep learning of diabetic retinopathy with potential applications to address artificial intelligence bias in retinal diagnostics and rare ophthalmic diseases. JAMA Ophthalmol. 2020;138:1070–7. 10.1001/jamaophthalmol.2020.3269. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Zhao X, et al. Diagnostic report generation for macular diseases by natural language processing algorithms. Br J Ophthalmol. 2025;109:1036–42. 10.1136/bjo-2024-326064. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Gao W, et al. ADF-OCT: an advanced assistive diagnosis framework for study-level macular optical coherence tomography. Inf Fusion. 2025;117:102877. 10.1016/j.inffus.2024.102877. [Google Scholar]
- 56.Kwon HJ, Heo J, Park SH, Park SW, Byon I. Accuracy of generative deep learning model for macular anatomy prediction from optical coherence tomography images in macular hole surgery. Sci Rep. 2024;14:6913. 10.1038/s41598-024-57562-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Huang X, et al. Artificial intelligence in glaucoma: opportunities, challenges, and future directions. BioMed Eng OnLine. 2023;22:126. 10.1186/s12938-023-01187-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28:31–8. 10.1038/s41591-021-01614-0. [DOI] [PubMed] [Google Scholar]
- 59.da Costa DR, Medeiros FA. Big data for imaging assessment in glaucoma. Taiwan J Ophthalmol. 2024. 10.4103/tjo.TJO-D-24-00079. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zech JR, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15:e1002683. 10.1371/journal.pmed.1002683. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Finlayson SG, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med. 2021;385:283–6. 10.1056/NEJMc2104626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Shi X, et al. An automated data cleaning method for electronic health records by incorporating clinical knowledge. BMC Med Inform Decis Mak. 2021;21:267. 10.1186/s12911-021-01630-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Cai L, Zhu Y. The challenges of data quality and data quality assessment in the big data Era. Data Sci J 2015.
- 64.Li F, et al. The AI revolution in glaucoma: bridging challenges with opportunities. Prog Retin Eye Res. 2024;103:101291. 10.1016/j.preteyeres.2024.101291. [DOI] [PubMed] [Google Scholar]
- 65.Grzybowski A, Jin K, Wu H. Challenges of artificial intelligence in medicine and dermatology. Clin Dermatol. 2024;42:210–5. 10.1016/j.clindermatol.2023.12.013. [DOI] [PubMed] [Google Scholar]
- 66.Arun N, et al. Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiology: Artificial Intelligence. 2021;3:e200267. 10.1148/ryai.2021200267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Holzinger A, Langs G, Denk H, Zatloukal K, Müller H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip Rev Data Min Knowl Discov. 2019;9:e1312. 10.1002/widm.1312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Pinsky MR, Dubrawski A, Clermont G. Intelligent Clinical Decision Support. Sensors (Basel). 2022. 10.3390/s22041408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Daneshvar N, Pandita D, Erickson S, Snyder Sulmasy L, DeCamp M. Artificial intelligence in the provision of health care: an American College of Physicians policy position paper. Ann Intern Med. 2024;177:964–7. 10.7326/m24-0146. [DOI] [PubMed] [Google Scholar]
- 70.Taribagil P, Hogg HDJ, Balaskas K, Keane PA. Integrating artificial intelligence into an ophthalmologist’s workflow: obstacles and opportunities. Expert Rev Ophthalmol. 2023;18:45–56. 10.1080/17469899.2023.2175672. [Google Scholar]
- 71.Tappeiner C. Artificial intelligence in ophthalmology: acceptance, clinical integration, and educational needs in Switzerland. J Clin Med. 2025;14:6307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Hassan M, Kushniruk A, Borycki E. Barriers to and facilitators of artificial intelligence adoption in health care: scoping review. JMIR Hum Factors. 2024;11:e48633. 10.2196/48633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Ruamviboonsuk P, et al. Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: a prospective interventional cohort study. Lancet Digit Health. 2022;4:e235–44. 10.1016/s2589-7500(22)00017-6. [DOI] [PubMed] [Google Scholar]
- 74.Gungor A, et al. Artificial intelligence-based detection of central retinal artery occlusion within 4.5 hours on standard fundus photographs. J Am Heart Assoc. 2025;14:e041441. 10.1161/jaha.124.041441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Otaki Y, et al. Clinical deployment of explainable artificial intelligence of SPECT for diagnosis of coronary artery disease. JACC Cardiovasc Imaging. 2022;15:1091–102. 10.1016/j.jcmg.2021.04.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Karimdjee M, et al. Evaluation of a convolution neural network for baseline total tumor metabolic volume on [(18)F]FDG PET in diffuse large B cell lymphoma. Eur Radiol. 2023;33:3386–95. 10.1007/s00330-022-09375-1. [DOI] [PubMed] [Google Scholar]
- 77.Jin K, Yu T, Grzybowski A. Multimodal artificial intelligence in ophthalmology: applications, challenges, and future directions. Surv Ophthalmol. 2025. 10.1016/j.survophthal.2025.07.003. [DOI] [PubMed] [Google Scholar]
- 78.Pontikos N, et al. Next-generation phenotyping of inherited retinal diseases from multimodal imaging with Eye2Gene. Nat Mach Intell. 2025;7:967–78. 10.1038/s42256-025-01040-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Shaik NS, Cherukuri TK. Gated contextual transformer network for multi-modal retinal image clinical description generation. Image Vis Comput. 2024. 10.1016/j.imavis.2024.104946. [Google Scholar]
- 80.Bora A, et al. Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit Health. 2021;3:e10–9. 10.1016/s2589-7500(20)30250-8. [DOI] [PubMed] [Google Scholar]
- 81.Dai L, et al. A deep learning system for predicting time to progression of diabetic retinopathy. Nat Med. 2024;30:584–94. 10.1038/s41591-023-02702-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Alsadoun L, et al. Artificial intelligence (AI)-Enhanced detection of diabetic retinopathy from fundus images: the current landscape and future directions. Cureus. 2024;16:e67844. 10.7759/cureus.67844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Huang X, et al. Interpretable longitudinal glaucoma visual field estimation deep learning system from fundus images and clinical narratives. NPJ Digit Med. 2025;8:389. 10.1038/s41746-025-01750-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Liu H, et al. Economic evaluation of combined population-based screening for multiple blindness-causing eye diseases in China: a cost-effectiveness analysis. Lancet Glob Health. 2023;11:e456–65. 10.1016/s2214-109x(22)00554-x. [DOI] [PubMed] [Google Scholar]
- 85.Oikonomou EK, Khera R. Designing medical artificial intelligence systems for global use: focus on interoperability, scalability, and accessibility. Hellenic J Cardiol. 2025;81:9–17. 10.1016/j.hjc.2024.07.003. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.




