Skip to main content
Ophthalmology Science logoLink to Ophthalmology Science
. 2025 Nov 17;6(2):101013. doi: 10.1016/j.xops.2025.101013

OCT-PRO: A Multimodal Model Integrating OCT and Clinical Traits to Predict Postoperative Outcomes in Cataract Patients

Lixue Liu 1,, Mingyuan Li 1,, Yuxuan Wu 1, Zizheng Cao 1, Yuanjun Shang 1, Lanqin Zhao 1, Zhenyu Wang 2, Junwei Tan 3, Yan Yuan 3, Wenbin Huang 4, Jinghui Wang 4, Jianqiao Li 5, Fabao Xu 5, Zhangkai Lian 1, Jianyu Pang 1, Fan Xu 6, Ningning Tang 6, Xingru He 7, Yan Xu 7, Kun Zeng 8, Lishi Luo 8, Mingwei Wang 1, Ruiwen Xu 1, Zhenjun Tu 1, Xi Chen 1, Hui Chen 1, Zhenzhen Liu 1, Jing Tao 2,∗∗∗, Xiaohang Wu 1,4,∗∗, Haotian Lin 1,4,9,
PMCID: PMC12794479  PMID: 41532137

Abstract

Purpose

To develop and validate OCT-PRO, a multimodal machine learning model integrating OCT images and clinical traits to predict postoperative visual outcomes in cataract patients.

Design

Multicenter prospective cohort study.

Participants

A total of 2225 eyes from 1911 cataract patients were enrolled, including 1304 participants from Zhongshan Ophthalmic Center for model development and 607 from 6 hospitals across China for external testing.

Methods

All participants underwent standardized preoperative examinations including macular OCT and clinical data collection, followed by phacoemulsification and intraocular lens implantation. Postoperative best-corrected visual acuity (BCVA) was assessed at 4 weeks after surgery. A multimodal model was constructed using deep learning techniques, combining image features extracted via InceptionResNetV2 and structured metadata processed by fully connected layers. Model performance was assessed using mean absolute error (MAE) and root mean square error (RMSE) and compared with traditional laser interferometry and ophthalmologist predictions. Subgroup analysis and explainability assessments were conducted to evaluate generalizability and model attention.

Main Outcome Measures

Prediction error of postoperative BCVA (logarithm of the minimum angle of resolution [logMAR]) measured by MAE and RMSE.

Results

In the internal test data set, OCT-PRO achieved improved performance, with lower MAE and RMSE (0.128 and 0.211 logMAR) compared with the OCT-only model (0.138 and 0.226 logMAR), metadata-only model (0.161 and 0.234 logMAR) and laser interferometry (0.381 and 0.554 logMAR). In the external test data set, OCT-PRO achieved an MAE of 0.168 logMAR, significantly outperforming the OCT-only (0.183 logMAR, P = 0.003) and metadata-only models (0.229 logMAR, P < 0.001). Subgroup analyses confirmed consistent advantages of OCT-PRO across different cataract subtypes and baseline preoperative BCVA groups. Model interpretability analysis highlighted the importance of preoperative BCVA, age, and macular foveal structure, with greater reliance on OCT features than clinical metadata—especially in complex or low preoperative BCVA cases. In a head-to-head comparison, the model consistently outperformed both junior and senior ophthalmologists in predictive accuracy across various clinical subtypes.

Conclusions

OCT-PRO enables accurate prediction of postoperative visual outcomes in cataract surgery, outperforming conventional methods and ophthalmologists. It holds promise as a valuable decision-support tool to assist surgical decision-making and improve health care resource allocation.

Financial Disclosure(s)

The author has no/the authors have no proprietary or commercial interest in any materials discussed in this article.

Keywords: Cataract, Multimodal machine learning, OCT, Postoperative vision prognosis


Cataract remains the leading cause of blindness worldwide, accounting for approximately 40% of global blindness and posing a significant threat to visual health and quality of life.1 Cataract surgery, the primary intervention to restore vision in these patients, is now the most frequently performed surgical procedure globally, with over 20 million operations conducted annually.2 With continuous advancements in surgical techniques and technologies, expectations for postoperative visual outcomes have become more refined, and indications for cataract surgery have increasingly emphasized individualized visual needs rather than fixed acuity thresholds in many countries.3 Despite these improvements, a substantial proportion of patients—estimated at 5%–20% in China—fail to achieve functional visual improvement after surgery.4, 5, 6 Furthermore, the effective cataract surgery coverage (eCSC) (defined as postoperative best-corrected visual acuity [BCVA] >0.5) in China stands at 34.8%, which is only about half that of developed countries.7 These findings highlight the urgent need for reliable preoperative tools to predict visual outcomes, which can guide more efficient allocation of surgical resources and ultimately improve the global eCSC.

Traditionally, ophthalmologists estimate visual prognosis based on medical history and clinical examination. However, it is often difficult to accurately determine the extent to which ocular comorbidities—particularly retinal pathologies—contribute to visual impairment (VI) when cataracts coexist. Commonly used prediction tools, such as the superilluminated pinhole test, potential acuity meter, and laser interferometer, have shown limited accuracy, particularly in cases with moderate to advanced lens opacity, and are no longer considered reliable for preoperative visual potential assessment.8, 9, 10, 11 In recent years, the emergence of machine learning and artificial intelligence has enabled the development of data-driven models to address this challenge. Previous studies have explored various approaches to predict postoperative visual acuity in cataract patients, primarily using single-modality data. Early models were based on structured clinical variables and traditional statistical methods,12 such as logistic regression, whereas more recent work has shifted toward deep learning models trained on preoperative OCT images.13,14 However, these studies typically targeted specific subpopulations—such as patients with high myopic or age-related cataracts—and lacked broad phenotypic coverage. In addition, few studies have incorporated multimodal data or examined the incremental value of combining imaging and clinical features. Importantly, most models have not been prospectively evaluated in clinical settings nor have they been compared directly with ophthalmologists or existing tools such as laser interferometry, limiting their clinical applicability.

In this study, we aimed to develop and validate a multimodal regression model, OCT-PRO, a multimodal model that integrates OCT and clinical traits to predict postoperative outcomes in cataract patients. Using data collected from cataract patients at Zhongshan Ophthalmic Center, Sun Yat-sen University, we constructed a deep learning model incorporating multimodal fusion techniques. To assess the model’s generalizability and clinical applicability, we conducted a multicenter prospective evaluation involving 6 hospitals across China and performed subgroup analyses based on cataract subtypes and levels of preoperative VI. Furthermore, we designed a head-to-head comparison between the model and ophthalmologists of varying experience levels to assess its performance against clinical judgment. This approach may help support surgical decision-making and improve the efficiency of cataract care delivery.

Methods

Study Design and Participants

The overall study design is illustrated in Figure 1. Patients were prospectively recruited at the Zhongshan Ophthalmic Center, Sun Yat-sen University, between July 1, 2023, and April 17, 2024. The inclusion criteria were (1) age >18 years; (2) diagnosis of age-related or complicated cataract deemed suitable for phacoemulsification with intraocular lens implantation; (3) underwent uneventful cataract surgeries, defined as surgeries completed without intraoperative complications such as posterior capsule rupture, vitreous loss, or significant zonular dehiscence; and (4) availability of reliable BCVA measurements both preoperatively and at 4-week postoperative follow-up. The exclusion criteria included (1) congenital, developmental, or traumatic cataracts; (2) previous intraocular surgery (e.g., vitrectomy, glaucoma filtration surgery) in the operated eye; (3) history of amblyopia or neuro-ophthalmic disorders in the operated eye; (4) implantation of multifocal intraocular lenses; and (5) OCT images of insufficient quality preventing identification of macular structures. All enrolled patients underwent standard preoperative examinations and completed a 4-week follow-up. The study was registered on ClinicalTrials.gov (NCT05491798) and approved by the Institutional Review Board of the Zhongshan Ophthalmic Center, Sun Yat-sen University (IRB-ZOC-SYSU). Informed consent was obtained from all patients included in this study. All procedures adhered to the principles of the Declaration of Helsinki.

Figure 1.

Figure 1

Overall study design. A, Included patients with cataract who underwent OCT imaging, laser interferometry, history taking, and other ocular examinations before surgery, and who completed optometry 3 to 5 weeks after surgery to form a multimodal data set. B, During model development of OCT-PRO, features of OCT images and clinical metadata were extracted using InceptionResnetV2 and FC layers and then concatenated to finally predict the postoperative BCVA of the operated eye. C, To further evaluate this predictive model, multimodal data were prospectively collected from another 6 hospitals across China to externally validate its performance. In addition, patients were further classified according to cataract subtypes and severity of visual impairment to perform subgroup analysis. Furthermore, multimodal data were also reviewed by different groups of ophthalmologists, and their predictive results were included for comparative analysis. BCVA = best-corrected visual acuity; FC = full connection.

Data Collection

Data collection included both clinical traits recorded as metadata, and OCT imaging (Spectralis OCT, Heidelberg Engineering). Clinical metadata comprised of the following: (1) demographic data (age, sex); (2) medical history (cataract subtype, prior ocular conditions, ophthalmic procedures, and systemic diseases); and (3) ophthalmic examination results, including preoperative BCVA and lens opacity classification system III-based cortical (C), nuclear opalescence (N), and posterior subcapsular (P) scores. For imaging data, a single horizontal B-scan through the fovea was extracted from preoperative OCT for model input. A summary of all input variables is provided in Table S1, available at www.ophthalmologyscience.org. Additionally, preoperative visual potential was evaluated using laser interferometry. Intraoperative data—including surgical procedure, operative time, surgeon identity, laterality, intraocular lens type, and intraoperative complications—were documented. Postoperative BCVA was measured 4 weeks after surgery and converted to logarithm of the minimum angle of resolution (logMAR) for use as the model’s target output.

Development of Multimodal Model

Collected multimodal data were first preprocessed and partitioned for model training. To enhance model robustness and generalizability in ophthalmic imaging tasks, clinical metadata and OCT images were integrated on a per-patient basis to construct a multimodal data set. This data set was randomly divided into training, validation, and internal test sets in a 6:2:2 ratio, ensuring that data from each patient were restricted to a single subset to prevent data leakage. The fusion framework is illustrated in Figure 1B. We evaluated multiple deep learning architectures for OCT image modeling and selected InceptionResNetV2, which demonstrated the best performance by achieving the lowest mean absolute error (MAE) on the validation data set. OCT image features were extracted using InceptionResNetV2 and combined with clinical metadata that were processed through fully connected layers. The resulting feature vectors were concatenated and passed through additional fully connected layers to generate a continuous prediction of postoperative BCVA. In addition to developing a multimodal model that integrates OCT images and structured clinical data, we also constructed unimodal models for comparative analysis. Specifically, InceptionResNetV2 was utilized for modeling OCT images alone, whereas XGBoost was applied to model the clinical metadata independently.

Model Interpretability Analysis

To enhance model transparency and interpret the role of individual input features, we conducted explainability analyses for each data modality. For clinical metadata, a separate prediction model was trained using XGBoost, and SHAP (SHapley Additive exPlanations) values were calculated to quantify each variable's contribution to the model output.15 SHapley Additive exPlanations assigns an importance value to each feature based on cooperative game theory, with higher absolute SHAP values indicating greater influence on prediction.

Although originally developed for classification tasks, gradient-weighted class activation map was adapted in this study for regression-based deep learning models to improve interpretability in OCT image analysis.16 Specifically, the gradient of the continuous output (i.e., postoperative BCVA) with respect to the feature maps of the final convolutional layer was computed and combined to produce weighted activation maps. These heatmaps, overlaid on the original OCT images, highlight the regions that contributed most to the model’s prediction, offering insights into the model’s focus and supporting clinical interpretability of regression outcomes.

To assess the relative contribution of each modality in the prediction process of OCT-PRO, ablation experiments were conducted. Two unimodal models—one using only OCT images and the other using only clinical metadata—were trained and evaluated on the same test set. Their MAEs (denoted as MAE_OCT and MAE_metadata) were compared to the MAE of OCT-PRO. Based on the error reduction observed, the contribution of each modality to the overall model performance was quantified.17

Model Generalizability Evaluation and Subgroup Analysis

To evaluate model generalizability, an external test set was prospectively collected from 6 hospitals across China (Table S2, available at www.ophthalmologyscience.org). Further subgroup analyses were conducted to assess model performance in clinically relevant scenarios. Patients were stratified by cataract subtype—age-related, high myopic, type 2 diabetic, drug-induced, and others—based on clinical assessment and medical history. Preoperative BCVA was categorized according to the World Health Organization’s definition of different severity of VI: severe VI (<0.1 decimal), moderate VI (0.1 to <0.3), mild VI (0.3 to <0.5), and no VI (≥0.5). Model performance was then compared across these stratified subgroups.

Comparative Test with Ophthalmologists

To compare performance of OCT-PRO with that of human experts, a test set comprising 100 representative cataract cases—including OCT images and corresponding clinical metadata—was constructed and validated by an expert panel. Participants were required to estimate postoperative BCVA (in decimal notation, ranging from 0.1 to 1.25) based solely on the provided data. Six ophthalmologists of varying seniority completed the task independently, without access to additional information. The senior ophthalmologists each had over 10 years of clinical experience in cataract surgery. The junior ophthalmologists were residents who had just completed 3-year standardized ophthalmic training.

Statistical Analysis

The performance of all predictive approaches—including machine learning models, laser interferometry, and ophthalmologist estimates—was evaluated by comparing predicted and actual postoperative BCVA values using MAE and root mean square error (RMSE). All BCVA values were converted to logMAR before analysis. Paired t-tests were applied to compare MAE and RMSE between different methods. Two-sided P values <0.05 were considered statistically significant. All statistical analyses were performed using R software (version 3.2.4).

Results

Characteristics of the Data Sets

A total of 1304 and 607 cataract patients were included in the model development and external test cohorts, respectively. The clinical characteristics of patients in each dataset are summarized in Table 1. No statistically significant differences were observed among the training, validation, internal test, and external test datasets in terms of patient age, sex, or preoperative and postoperative BCVA of the operated eyes (P > 0.05).

Table 1.

Baseline Clinical Characteristics

Characteristics Training Validation Internal Test External Test
Patients 826 258 220 607
Age (mean ± SD, years) 64.8 ± 12.1 65.8 ± 11.9 65.2 ± 11.2 63.7 ± 12.3
Female, % 54.8 57.7 60.7 58.8
Eyes 921 297 267 740
Left/right 462/459 133/164 123/144 381/359
Preoperative BCVA (LogMAR) 0.741 ± 0.547 0.728 ± 0.527 0.721 ± 0.531 0.741 ± 0.568
LOCS III classification
 C 2.48 ± 1.07 2.50 ± 1.08 2.52 ± 1.09 2.50 ± 0.91
 N 3.43 ± 1.17 3.43 ± 1.12 3.54 ± 1.20 3.56 ± 1.19
 P 1.75 ± 1.14 1.68 ± 1.07 1.59 ± 1.05 1.71 ± 1.14
Postoperative BCVA change (LogMAR) –0.514 ± 0.537 –0.487 ± 0.379 –0.504 ± 0.552 –0.513 ± 0.533

BCVA = best-corrected visual acuity; C = cortical; LOCS = lens opacity classification system; LogMAR = logarithm of the minimum angle of resolution; N = nuclear opalescence; P = posterior subcapsular; SD = standard deviation.

Overall Performance of Predictive Models

Three predictive models were developed based on different data modalities: a metadata-based model, an OCT-based model and an OCT-PRO. After a head-to-head performance comparison of common backbones, InceptionResNetV2 was adopted as the final architecture (Table S3, available at www.ophthalmologyscience.org). In the internal test data set, these models achieved MAEs of 0.161, 0.138, and 0.128 logMAR and RMSEs of 0.384, 0.224, and 0.213 logMAR, respectively. For comparison, traditional laser interferometry yielded a MAE of 0.381 and RMSE of 0.554 logMAR. OCT-PRO demonstrated significantly lower MAE than both the metadata model and laser interferometry. In the external test dataset, OCT-PRO also achieved the lowest MAE (0.168 logMAR), significantly outperforming the metadata model (0.229, P < 0.001) and the OCT model (0.183, P = 0.003). Detailed performance metrics across datasets are provided in Table 2 and Table S4, available at www.ophthalmologyscience.org.

Table 2.

The Performance of Prediction Models and Laser Interferometer for Predicting Postoperative BCVA in the Internal and External Test Data Set

Model Internal Test
External Test
MAE P Value (with OCT-PRO) RMSE MAE P Value (with OCT-PRO) RMSE
Metadata model 0.161 (0.143–0.183) 0.001 0.234 (0.192–0.282) 0.229 (0.208–0.251) <0.001 0.384 (0.340–0.424)
OCT model 0.138 (0.119–0.161) 0.074 0.226 (0.159–0.307) 0.183 (0.174–0.192) 0.003 0.224 (0.209–0.239)
OCT-PRO 0.128 (0.110–0.150) - 0.211 (0.144-0.295) 0.168 (0.159–0.178) - 0.213 (0.199–0.230)
Laser interferometer 0.381 (0.326–0.443) <0.001 0.554 (0.426–0.682) - - -

BCVA = best-corrected visual acuity; MAE = mean absolute error; RMSE = root mean square error.

Interpretation of Model Decision

To interpret model decision-making, SHAP bar charts and heatmaps were employed to visualize the relative contribution of clinical metadata and OCT images. In the SHAP plots (Fig 2A), feature importance is color-coded—red for positive and blue for negative predictive value—with bar length indicating the magnitude of influence. The top 3 most influential features were preoperative BCVA, patient age, and the P score from lens opacity classification system III. For OCT-based predictions, representative heatmaps are shown in Figure 2B, indicating that model attention was predominantly focused on the macular region across various cataract subtypes. In the multimodal framework, the relative attention weights assigned to OCT imaging versus clinical metadata are illustrated in Figure 2C. OCT imaging contributed a larger share of model attention in all 5 cataract subtypes, especially in high myopic (82.1% vs. 17.9%) and drug-induced cataracts (79.9% vs. 20.1%). In stratified analyses by VI severity, metadata features dominated model attention in patients with no or mild VI, whereas OCT images accounted for nearly all attention in severe VI cases.

Figure 2.

Figure 2

Model explainability for clinical metadata, OCT images and multimodal fusion. A, The SHAP feature importance bar chart of the triage model. B, Representative cases of Grad-CAM visualization in different cataract subtypes. C, Model attention allocated to metadata and OCT image in different groups of cataract subtypes (left) and VI (right). BCVA = best-corrected visual acuity; Grad-CAM = gradient-weighted class activation mapping; LOCS = lens opacity classification system; logMAR = logarithm of the minimum angle of resolution; SHAP = SHapley Additive exPlanation; VI = visual impairment.

Model Performance Evaluation in Different Cataract Subgroups

The MAE and RMSE of the 3 predictive models across various cataract subtypes and preoperative VI levels are summarized in Tables 3 and 4 and illustrated in Figure 3. Overall, OCT-PRO demonstrated superior performance, yielding lower MAEs across all cataract subtypes. The most pronounced difference was observed in the high myopic cataract group, whereas OCT-PRO reduced MAE by 0.100 logMAR compared to the metadata model and by 0.024 logMAR relative to OCT model. In contrast, only modest improvements were observed in the age-related cataract group, with MAE differences of 0.037 and 0.006 logMAR, respectively.

Table 3.

The Performance of Prediction Models in Different Groups of Cataract Subtypes

Cataract Subtypes MAE (SD) (Metadata) MAE (SD) (OCT) MAE (SD) (OCT-PRO) MAE Difference (OCT-PRO vs. Metadata) P Value MAE Difference (OCT-PRO vs. OCT) P Value
All (N = 1007) 0.211 (0.280) 0.171 (0.145) 0.158 (0.143) 0.053 <0.001 0.013 <0.001
Age-related (N = 562) 0.184 (0.242) 0.153 (0.110) 0.147 (0.112) 0.037 <0.001 0.006 0.185
Diabetic (N = 186) 0.228 (0.323) 0.193 (0.204) 0.172 (0.199) 0.056 <0.001 0.021 0.023
High myopic (N = 169) 0.272 (0.335) 0.196 (0.172) 0.172 (0.167) 0.100 <0.001 0.024 0.025
Drug-induced (N = 34) 0.272 (0.314) 0.203 (0.092) 0.167 (0.190) 0.105 0.060 0.036 0.112
Other (N = 56) 0.209 (0.243) 0.187 (0.134) 0.176 (0.132) 0.033 0.344 0.011 0.511

MAE = mean absolute error; SD = standard deviation.

Table 4.

The Performance of Prediction Models in Different Groups of Visual Impairment

Visual Impairment MAE (SD) (Metadata) MAE (SD) (OCT) MAE (SD) (OCT-PRO) MAE Difference (OCT-PRO vs. Metadata) P Value MAE Difference (OCT-PRO vs. OCT) P Value
All (N = 1007) 0.211 (0.280) 0.171 (0.145) 0.158 (0.143) 0.053 <0.001 0.013 <0.001
No VI (N = 243) 0.138 (0.122) 0.181 (0.114) 0.148 (0.117) –0.010 0.357 0.033 <0.001
Mild VI (N = 268) 0.135 (0.093) 0.160 (0.100) 0.150 (0.109) –0.015 0.060 0.010 0.169
Moderate VI (N = 295) 0.147 (0.105) 0.148 (0.101) 0.150 (0.112) –0.003 0.751 –0.002 0.734
Severe VI (N = 201) 0.494 (0.497) 0.208 (0.242) 0.192 (0.223) 0.302 <0.001 0.016 0.080

MAE = mean absolute error; SD = standard deviation; VI = visual impairment.

Figure 3.

Figure 3

Model performance in different subgroups of cataract patients. A, Mean absolute error and 95% CI of model 1-3 in different groups of VI. B, Mean absolute error of model 1-3 in different groups of cataract subtypes. CI = confidence interval; MAE = mean absolute error; VI = visual impairment.

In the analysis stratified by preoperative VI, model performance between the multimodal and metadata models was comparable in the no VI, mild VI, and moderate VI groups. However, in cases of severe VI, OCT-PRO achieved a markedly lower MAE (0.192 logMAR) compared to the metadata model (0.494 logMAR), suggesting enhanced robustness of the multimodal approach under conditions of more advanced VI.

To further address the effects of image quality on model performance, we additionally stratified by lens-opacity severity, where OCT-PRO maintained advantages across strata with the largest gain in the severe lens opacity subgroup (Table S5, available at www.ophthalmologyscience.org). To facilitate clinical interpretability, we categorized patient groups by the eCSC threshold (postoperative BCVA >0.5) and obtained confusion matrices, with OCT-PRO showing higher sensitivity for the BCVA >0.5 group and better specificity for the BCVA ≤0.5 group than OCT and Metadata models (Fig S1, available at www.ophthalmologyscience.org).

Human–AI Comparison

As shown in Figure 4, OCT-PRO demonstrated superior overall predictive accuracy measured by absolute errors compared to ophthalmologists. The distribution of absolute errors (Fig 4A) revealed a significantly higher MAE among junior ophthalmologists (0.149 ± 0.162 logMAR) relative to both seniors (0.124 ± 0.125 logMAR, P < 0.01) and OCT-PRO (0.112 ± 0.103 logMAR, P < 0.001), whereas no significant difference was observed between senior ophthalmologists and the model. When stratified by VI severity (Fig 4B), MAE increased with severity for all groups; however, the model consistently maintained lower or comparable errors across all strata, particularly in cases of severe VI. Interestingly, as shown in Fig 4C, both junior and senior ophthalmologists exhibited a higher proportion of accurate estimations than the model, highlighting the strength of experienced clinicians—especially seniors—in making precise judgments in certain cases. In contrast, the model showed a more balanced distribution of underestimation and overestimation, which may reflect reduced bias and improved consistency across a range of VI levels. Together, these findings suggest that although OCT-PRO achieves strong overall performance, experienced human raters retain distinct advantages in specific prediction scenarios.

Figure 4.

Figure 4

Prediction results of OCT-PRO and ophthalmologists in comparative test data set. A, Violin plots showing the distribution of prediction errors across all test cases for juniors, seniors, and OCT-PRO. Each violin represents the density distribution of error values, with dashed lines indicating median and interquartile ranges. Statistical comparisons between groups were performed using the paired t test. Significant differences were observed between juniors and seniors (P < 0.01) and between juniors and OCT-PRO (P < 0.001), whereas the difference between seniors and OCT-PRO was ns. B, Grouped error bar plot comparing MAE across 4 levels of VI: No VI, Mild VI, Moderate VI, and Severe VI. Each point represents the MAE, and the error bars denote its 95% CI. C, Stacked bar charts showing the proportion of overestimated, accurate, and underestimated predictions by juniors, seniors, and OCT-PRO, stratified by the severity of VI. CI = confidence interval; MAE = mean absolute error; ns = not statistically significant; VI = visual impairment.

Discussion

In this multicenter prospective study, we developed and validated OCT-PRO, a multimodal model integrating macular OCT images and clinical metadata to predict postoperative visual acuity at 4 weeks after cataract surgery. OCT-PRO demonstrated favorable predictive performance in both internal and external test cohorts, achieving lower MAEs than traditional preoperative tools such as laser interferometry, and outperforming ophthalmologists across varying levels of experience. Furthermore, the model maintained robust accuracy across diverse cataract subtypes and baseline visual acuity strata, supporting its broad clinical applicability.

In clinical practice, estimating a cataract patient’s postoperative visual potential remains challenging, particularly in the presence of coexisting retinopathies.18, 19, 20 Traditional tools such as the potential acuity meter and laser interferometry are often hindered by technical limitations, including patient cooperation and low reproducibility under advanced lens opacities. For example, prior studies reported that the proportion of preoperative predictions within 2 lines of actual postoperative acuity using laser interferometry was only 56% in moderate cataracts, 61% in moderate cataracts with ocular comorbidities, and as low as 37% in advanced cataracts.8 These limitations have prompted a growing interest in artificial intelligence–based approaches, which offer the potential for automated, objective, and individualized outcome prediction using routinely collected data. Several studies have explored this direction, but most existing models have been restricted in scope, relying on single-modality inputs or narrowly defined patient cohorts. For instance, Milla et al12 developed a logistic model using basic preoperative data to predict whether significant visual improvement can be achieved after phacoemulsification, achieving an area under the curve of 65% to 80%. Wei et al14 applied deep learning technique to OCT images of highly myopic eyes to predict postoperative BCVA, yielding MAEs of 0.1524 and 0.1602 logMAR in internal and external data sets, respectively. Wang et al13 combined preoperative BCVA with OCT features extracted by deep learning models to predict outcomes in age-related cataracts, reporting MAEs of 0.1250 and 0.1194 logMAR. Although these models achieved reasonable performance in the targeted population, they were largely developed within limited clinical contexts, often excluding patients with complex cataract subtypes. Furthermore, their clinical utility has rarely been benchmarked against current standards such as ophthalmologist assessments or established predictive methods.

To address the limitations of previous studies, we developed OCT-PRO, a multimodal model trained on a large, heterogeneous data set encompassing a broad spectrum of cataract types, surgical indications, and patient demographics. The model integrated 2 complementary preoperative data sources: structured clinical metadata and macular OCT images. Clinical data provide high-level prognostic cues—such as age, baseline BCVA, and comorbidities—whereas OCT images offer detailed anatomical information about macular integrity, retinal thickness, and foveal morphology. Notably, unlike some earlier models that rely on presegmented inputs or handcrafted OCT features,13,21 our architecture directly incorporates raw OCT images through convolutional layers, enabling automated and standardized feature extraction. This reduces dependence on task-specific preprocessing pipelines and enhances model scalability and applicability across diverse clinical environments. Ablation analyses revealed that OCT-PRO could dynamically allocate attention between modalities. OCT features dominated in complex cataract subtypes such as high myopic (82.1%) and drug-induced cataracts (79.9%), whereas clinical metadata played a greater role in age-related and diabetic cataracts. When stratified by preoperative VI, metadata contributed more in mild or no VI cases, whereas OCT features accounted for nearly all attention (99.7%) in severe VI. This adaptive fusion mechanism allowed the model to accommodate diverse baseline patient characteristics, supporting robust and individualized prediction.

To rigorously evaluate clinical utility of OCT-PRO, we benchmarked its performance against both traditional predictive tools and human experts. In internal test data set, OCT-PRO achieved a MAE of 0.128 logMAR—significantly outperforming metadata-only (0.161 logMAR) and laser interferometry-based methods (0.381 logMAR). Moreover, OCT-PRO outperformed junior (MAE 0.149 ± 0.162 logMAR) and senior ophthalmologists (0.124 ± 0.125), with consistently lower or comparable errors (0.112 ± 0.103) across VI strata. Notably, the model exhibited a more balanced distribution of overestimation and underestimation, suggesting improved objectivity and reduced bias.

The interpretability of deep learning models remains a critical concern in clinical artificial intelligence deployment.22 By integrating SHAP values and gradient-weighted class activation map heatmaps, we were able to visualize the model’s attention to specific clinical features and retinal regions, thereby enhancing its transparency. The SHAP analysis highlighted preoperative BCVA, age, and lens posterior capsular opacity grade as key clinical contributors, whereas gradient-weighted class activation map heatmaps consistently focused on the central foveal contour and integrity, aligning with known determinants of visual prognosis.13,14,23 These explainability analyses not only support the model’s validity but also help build clinician trust in its outputs.

This study has several limitations. First, although we used OCT as the primary imaging modality due to its high resolution and ability to capture macular microstructures, other commonly available ophthalmic imaging techniques—such as fundus photography—can also provide valuable information on macular morphology. Future studies are warranted to explore the feasibility and predictive performance of models based on alternative imaging modalities. Besides, although the model demonstrated superior accuracy compared to experienced ophthalmologists in this study, its integration into clinical workflows will require further prospective validation, user interface development, and assessment of actual impact on patient outcomes and satisfaction. Third, although the performance of OCT-PRO was generally satisfactory, more advanced multimodal data fusion strategies, such as attention mechanisms, remain to be systematically investigated in future work.

In conclusion, we developed OCT-PRO, a multimodal deep learning model capable of accurately predicting postoperative visual acuity after cataract surgery using preoperative OCT images and clinical metadata. The model demonstrated strong generalizability across multiple institutions and patient subgroups and outperformed both traditional assessment tools and ophthalmologists in comparative evaluations. By supporting ophthalmologists in making more informed surgical decisions, particularly in patients with complex conditions, this approach may contribute to more effective triaging and scheduling of cataract surgeries. Ultimately, such tools hold promises for optimizing cataract surgical resource allocation and improving the eCSC, especially in resource-limited settings.

Manuscript no. XOPS-D-25-00566.

Footnotes

Disclosure(s):

All authors have completed and submitted the ICMJE disclosures form.

The authors have no proprietary or commercial interest in any materials discussed in this article.

This project was supported by the National Natural Science Foundation of China (NSFC, grant no.: 92368205, 82441003), Guangdong Provincial Natural Science Foundation for Progressive Young Scholars (grant no.: 2023A1515030170), National Key Laboratory of Ophthalmology Young Acceleration Program (grant no.: 2025QNJS06), Research Funds of the State Key Laboratory of Ophthalmology (grant no.: 2025QNMY07), Guangdong Basic Research Center of Excellence for Major Blinding Eye Diseases Prevention and Treatment, Postdoctoral Fellowship Program of China Postdoctoral Science Foundation (CPSF, grant no.: GZC20242096), Beijing Nova Program (grant no.: 20240484601), Natural Science Foundation of Guangdong Province (grant no.: 2023A1515011102), Basic scientific research projects of Sun Yat-sen University (grant no.: 23ykcxqt002), and the Hainan Province Clinical Medical Center.

HUMAN SUBJECTS: Human subjects were included in this study. The study was registered on ClinicalTrials.gov (NCT05491798) and approved by the Institutional Review Board of the Zhongshan Ophthalmic Center, Sun Yat-sen University (IRB-ZOC-SYSU). Informed consent was obtained from all patients included in this study. All procedures adhered to the principles of the Declaration of Helsinki.

No animal subjects were included in this study.

Author Contributions:

Conception and design: L. Liu, X. Wu, Lin

Data collection: L. Liu, Wu, Cao, Shang, Z. Wang, Tan, Yuan, J. Wang, Fabao Xu, Lian, Tang, He, Y. Xu, Luo, M. Wang, R. Xu, X. Chen, H. Chen

Analysis and interpretation: L. Liu, M. Li, Zhao, Huang, J. Li, Pang, Fan Xu, Zeng, Tu, Z. Liu, Tao, X. Wu

Obtained funding: L. Liu, Z. Liu, Tao, X. Wu, Lin

Overall responsibility: L. Liu, M. Li, Y. Wu, Cao, Shang, Zhao, Z. Wang, Tan, Yuan, Huang, J. Wang, J. Li, Fabao Xu, Lian, Pang, Fan Xu, Tang, He, Y. Xu, Zeng, Luo, M. Wang, R. Xu, Tu, X. Chen, H. Chen, Z. Liu, Tao, X. Wu, Lin

Supplemental material available atwww.ophthalmologyscience.org.

Contributor Information

Jing Tao, Email: taojing@mail.ccmu.edu.cn.

Xiaohang Wu, Email: wxhang@mail2.sysu.edu.cn.

Haotian Lin, Email: linht5@mail.sysu.edu.cn.

Supplementary Data

Figure S1
mmc1.pdf (119.9KB, pdf)
Table S1
mmc2.pdf (17.8KB, pdf)
Table S2
mmc3.pdf (18.8KB, pdf)
Table S3
mmc4.pdf (24.6KB, pdf)
Table S4
mmc5.pdf (29.8KB, pdf)
Table S5
mmc6.pdf (29.2KB, pdf)

References

  • 1.Lam D., Rao S.K., Ratra V., et al. Cataract. Nat Rev Dis Primers. 2015;1 doi: 10.1038/nrdp.2015.14. [DOI] [PubMed] [Google Scholar]
  • 2.Erie J.C. Rising cataract surgery rates: demand and supply. Ophthalmology. 2014;121:2–4. doi: 10.1016/j.ophtha.2013.10.002. [DOI] [PubMed] [Google Scholar]
  • 3.Burton M.J., Ramke J., Marques A.P., et al. The lancet global health commission on global eye health: vision beyond 2020. Lancet Glob Health. 2021;9:e489–e551. doi: 10.1016/S2214-109X(20)30488-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Xu T., Wang B., Liu H., et al. Prevalence and causes of vision loss in China from 1990 to 2019: findings from the global burden of disease study 2019. Lancet Public Health. 2020;5:e682–e691. doi: 10.1016/S2468-2667(20)30254-1. [DOI] [PubMed] [Google Scholar]
  • 5.Han X., Zhang J., Liu Z., et al. Real-world visual outcomes of cataract surgery based on population-based studies: a systematic review. Br J Ophthalmol. 2023;107:1056–1065. doi: 10.1136/bjophthalmol-2021-320997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Du Y.F., Liu H.R., Zhang Y., et al. Prevalence of cataract and cataract surgery in urban and rural Chinese populations over 50 years old: a systematic review and meta-analysis. Int J Ophthalmol. 2022;15:141–149. doi: 10.18240/ijo.2022.01.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.McCormick I., Butcher R., Evans J.R., et al. Effective cataract surgical coverage in adults aged 50 years and older: estimates from population-based surveys in 55 countries. Lancet Glob Health. 2022;10:e1744–e1753. doi: 10.1016/S2214-109X(22)00419-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Vianya-Estopa M., Douthwaite W.A., Funnell C.L., Elliott D.B. Clinician versus potential acuity test predictions of visual outcome after cataract surgery. Optometry. 2009;80:447–453. doi: 10.1016/j.optm.2008.11.011. [DOI] [PubMed] [Google Scholar]
  • 9.Vianya-Estopa M., Douthwaite W.A., Noble B.A., Elliott D.B. Capabilities of potential vision test measurements: clinical evaluation in the presence of cataract or macular disease. J Cataract Refract Surg. 2006;32:1151–1160. doi: 10.1016/j.jcrs.2006.01.111. [DOI] [PubMed] [Google Scholar]
  • 10.Lasa M.S., Datiles M.B., 3rd, Freidlin V. Potential vision tests in patients with cataracts. Ophthalmology. 1995;102:1007–1011. doi: 10.1016/s0161-6420(95)30921-9. [DOI] [PubMed] [Google Scholar]
  • 11.Reid O., Maberley D.A., Hollands H. Comparison of the potential acuity meter and the visometer in cataract patients. Eye (Lond) 2007;21:195–199. doi: 10.1038/sj.eye.6702165. [DOI] [PubMed] [Google Scholar]
  • 12.Perea-Milla E., Vidal S., Briones E., et al. Development and validation of clinical scores for visual outcomes after cataract surgery. Ophthalmology. 2011;118:9–16. doi: 10.1016/j.ophtha.2010.04.009. [DOI] [PubMed] [Google Scholar]
  • 13.Wang J., Wang J., Chen D., et al. Prediction of postoperative visual acuity in patients with age-related cataracts using macular optical coherence tomography-based deep learning method. Front Med (Lausanne) 2023;10 doi: 10.3389/fmed.2023.1165135. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wei L., He W., Wang J., et al. An optical coherence tomography-based deep learning algorithm for visual acuity prediction of highly myopic eyes after cataract surgery. Front Cell Dev Biol. 2021;9 doi: 10.3389/fcell.2021.652848. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Lundberg S.M., Lee S.-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4768–4777. [Google Scholar]
  • 16.Selvaraju R.R., Cogswell M., Das A., et al. Proceedings of the IEEE international conference on computer vision, Venice, Italy. 2017. Grad-Cam: Visual Explanations from Deep Networks via Gradient-Based Localization. [Google Scholar]
  • 17.Shazeer N., Mirhoseini A., Maziarz K., et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv. 2017 preprint arXiv:170106538. [Google Scholar]
  • 18.Diabetic Retinopathy Clinical Research Network Authors. Writing C., Bressler S.B., Baker C.W. Pilot study of individuals with diabetic macular edema undergoing cataract surgery. JAMA Ophthalmol. 2014;132:224–226. doi: 10.1001/jamaophthalmol.2013.6209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Casparis H., Lindsley K., Kuo I.C., et al. Surgery for cataracts in people with age-related macular degeneration. Cochrane Database Syst Rev. 2017;2 doi: 10.1002/14651858.CD006757.pub4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lee D., Agron E., Keenan T., et al. Visual acuity outcomes after cataract surgery in type 2 diabetes: the action to control cardiovascular risk in diabetes (ACCORD) study. Br J Ophthalmol. 2022;106:1496–1502. doi: 10.1136/bjophthalmol-2020-317793. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Zhang Y., Xu F., Lin Z., et al. Prediction of visual acuity after anti-VEGF therapy in diabetic macular edema by machine learning. J Diabetes Res. 2022;2022 doi: 10.1155/2022/5779210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Dobson J.E. On reading and interpreting Black box deep neural networks. Int J Digital Humanities. 2023;5:431–449. [Google Scholar]
  • 23.Modjtahedi B.S., Hull M.M., Adams J.L., et al. Preoperative vision and surgeon volume as predictors of visual outcomes after cataract surgery. Ophthalmology. 2019;126:355–361. doi: 10.1016/j.ophtha.2018.10.030. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1
mmc1.pdf (119.9KB, pdf)
Table S1
mmc2.pdf (17.8KB, pdf)
Table S2
mmc3.pdf (18.8KB, pdf)
Table S3
mmc4.pdf (24.6KB, pdf)
Table S4
mmc5.pdf (29.8KB, pdf)
Table S5
mmc6.pdf (29.2KB, pdf)

Articles from Ophthalmology Science are provided here courtesy of Elsevier

RESOURCES