Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Dec 12;15:43729. doi: 10.1038/s41598-025-27848-3

Predicting myopia risk using a machine learning model based on fundus imageomics

Xiaoling Zhang 1,2,#, Zixun Wang 2,#, Jingtao Yu 2,#, Jinghui Wang 2, Desheng Song 2, Bo Zhang 2, Xuan Li 2, Bei Du 2,, Ruihua Wei 2,
PMCID: PMC12700998  PMID: 41387491

Abstract

The purpose of this study was to develop a machine learning-based model using quantitative color fundus photography (CFP) data to predict myopia risk in school-age children, based on the axial length/corneal curvature radius (AL/CR) ratio, and to identify key retinal features associated with myopia progression. This cross-sectional study included 2,184 CFPs from children aged 6–10 years. Retinal imageomics features were extracted from CFPs using the EVisionAI platform, alongside age and sex data, resulting in 146 variables. After feature selection using LASSO regression and expert review, predictive models were constructed using seven machine learning algorithms, including random forest (RF), XGBoost, and LightGBM. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and calibration. The RF model showed the best predictive performance (AUC = 0.798), followed by LightGBM and XGBoost. Key predictors included age, nasal disc–foveal distance, atrophic areas, and vascular parameters. The RF model demonstrated high specificity (0.80) and moderate sensitivity (0.59), with robust calibration and decision curve analysis confirming its clinical value. This study demonstrates that quantitative CFP-derived imageomics, combined with machine learning, can effectively predict myopia risk in school-age children. The RF model, incorporating age, retinal distances, and vascular features, offers a promising tool for early myopia risk stratification.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-27848-3.

Keywords: Myopia risk prediction, Axial length/Corneal curvature ratio (AL/CR), Color fundus photography (CFP), Machine learning (ML), Deep learning imageomics

Subject terms: Computational biology and bioinformatics, Diseases, Health care, Medical research, Risk factors

Introduction

Myopia in children and adolescents is a major public health problem [1], in which the growth of the axial length (AL) is a crucial factor in the progression of myopia to pathologic myopia1,2. Although AL is an important predictor of myopia progression in children, the AL/corneal curvature radius (AL/CR ratio) is currently recognized as one of the potential metrics for assessing the risk of myopia in children due to the physiologic increase in AL with individual differences in children of different ages35. Color fundus photography (CFP), a crucial tool for assessing eye health in children, has also been utilized in recent years for monitoring various diseases6,7. Some studies have shown a correlation between CFP and AL8,9. Up to now, the relationship between CFP and AL during refractive development in children seems to remain unclear.

Machine learning (ML), a branch of artificial intelligence (AI), can handle plenty of high-dimensional data, analyze complex relationships, and identify optimal predictors of clinical outcomes10. Deep learning (DL), as an important AI method for processing images, can segment images and obtain clinically meaningful quantitative metrics11. CFP-based studies of myopia in children have identified features such as tessellation density, optic disc, and retinal vascularization1214. However, the potential predictive role of quantitative fundus indices on the risk of myopia in children is still unclear.

As mentioned earlier, adult CFPs have been utilized to predict AL lengths, and some attempts have been made to explain the logic of how DL works in terms of Grad-CAM15. For one thing, it seems that the relationship with AL has not been studied for CFP in children without obvious pathological changes. For another, the interpretation of the images may not be comprehensive enough to summarize in a quantitative way what factors are associated with the correlation between AL and CFP. High accuracy in ML models is often accompanied by opacity regarding individual variable effects, constraining their clinical utility16. Advancing transparency, SHapley Additive exPlanations (SHAP) merges optimal credit allocation with techniques for local explanation, resulting in clear visual attribution of variable importance and consequently, interpretable predictions17.

Therefore, the purpose of this study was to quantify CFP by DL for children classified as having different myopia risks using the AL/CR ratio and to select potentially meaningful metrics and use ML to develop a predictive model using the quantitative metrics of CFP as a predictor of myopia risk or not. Finally, the performance of the model was evaluated, and the significance of the included variables was analyzed for interpretability.

Methods

Data collection and preprocessing

The study was approved by the Ethics Committee of Tianjin Medical University Eye Hospital and conducted by the principles outlined in the Declaration of Helsinki (2024KY-67), and all methods were performed according to the relevant guidelines and regulations. The study subjects were recruited during myopia screening in Tianjin. First, the 5790 CFPs were selected from children aged 6–10 years with a generally normal fundus who had been excluded from other ocular diseases. Exclusion criteria included (1) data with no cycloplegic Sphere Equivalent (SE) data or unclear underlying ocular history; (2) data without basic refractive information such as AL, K1, and K2; (3) data with abnormal data and poor image quality after quantification. Following this, the quantitative data from 2184 eyes were finally included for model construction. The outcome index used the AL/CR rate as a dichotomous basis for categorizing whether there was a risk of developing myopia, where AL/CR ≥ 3 was defined as at risk, and AL/CR < 3 was described as not at risk. Subsequently, we extracted age, sex, K1, K2, and AL data of these children from their medical records. AL and CR measurements are taken three times by an experienced optician, with the average value recorded. All AL data were measured using a Lenstar LS-900 optical biometer (Haag Steit AG, Konitz, Switzerland). All CFP images were examined using a 45° fundus camera (Canon Inc., 9 − 1 Kanagawa, Japan). Refraction after cycloplegia was measured using an autorefractor (FKR.800, Topcon, Tokyo, Japan). Figure 1 illustrates the inclusion process of this study.

Fig. 1.

Fig. 1

The overall flowchart of the study.

In this study, the automated fundus image analysis software EVisionAI18 was utilized for intelligent retinal image processing. EVision AI, developed on a bioinspired visual framework, integrates advanced computer vision and deep learning methodologies. The software initially performs preprocessing procedures, including region of interest (ROI) extraction, denoising, normalization, and contrast enhancement19,20, to eliminate irrelevant regions such as image backgrounds and to mitigate noise and inter-image variability. Subsequently, a deep learning-based network20 combined with a visual attention-driven edge detection algorithm is applied to accurately delineate and segment key retinal structures, including the atrophic arc, optic disc, and retinal vasculature. Based on the segmentation outcomes, quantitative morphological parameters of these retinal features were extracted using the EVisionAI platform. With specific regard to blood vessels, the method of segmentation of the optic disk has been described exhaustively18. Figure 2 shows a schematic diagram of CFP segmentation. The EVisionAI platform (Version 2.0) employs a transparent, multi-stage pipeline with validated performance at each step. The use of standardized pre-processing, state-of-the-art deep learning models trained on high-consensus ground truth, and rigorous pixel calibration ensures the reproducibility and reliability of the quantitative retinal vascular parameters reported in our study.(https://evisionaibeta.yiweiimage.com/#/).

Fig. 2.

Fig. 2

Example of Our study substantiates that ML models can predict the AL/CR of school-age children by analyzing age, sex, and 20 CFP variables with EVisionAI-based retinal image processing and quantitative vascular analysis. (A) Original color fundus photograph. (B) Preprocessed image with enhanced vessel contrast. (C) Automated vessel segmentation and skeletonization with measurement overlays, including vessel diameter, curvature, and branching points. (D) Binary vessel mask used for morphometric and fractal dimension calculations. (E) Peripapillary region analysis with highlighted arterial (red) and venous (blue) segments, including branching geometry measurements. (F) Macular-centered region of interest (3 mm and 5 mm concentric circles) for vessel density and diameter quantification.

Model development and validation

We obtained a total of 144 quantitative indicators of CFP, and after incorporating sex and age, the number of variables increased to 146. The quantitative CFP variables were categorized into the following areas: (1) the presence of leopard spots and atrophic arcs; (2) vascular-related parameters, including the length, area, and angle of arterioles and venules, and vascular density and parameters in a certain area; (3) optic disc-related parameters, including morphologic quantification, relative optic disc position of the optic disc, and optic cup ratio of the optic disc; and (4) the distance of important locations in the fundus (macula, optic disc, and vascular arch).

The sample data (subjects) were divided into a training set and a validation set (training set: validation set = 8:2). Due to the large number of variables after image quantization, the lasso was used to select features, which is a method to introduce L1 regularization, select features, and reduce dimensions by compressing coefficients, screening features with large contributions, and eliminating redundant features. At the same time, three experienced ophthalmologists assessed the clinical significance of the screened variables and performed a second manual screening for variables with similar eigenvalues.

In this study, seven ML algorithms, extreme gradient boosting (XGBoost), support vector machine (SVM), random forest (RF), artificial neural network (ANN), decision tree, logistic regression (LR), and light gradient boosting machine (lightGBM), were used to construct the prediction model. Fivefold cross-validation was used to ensure the stability of the model. Grid tuning parameters were used to select the best tuning parameters for each algorithm. In the process of parameter adjustment, the highest area under the curve (AUC) of the receiver operating characteristic (ROC) was selected as the optimal model. Hyperparameters were optimized by grid search within the training folds only, without accessing validation data. AUC 95% confidence intervals were estimated by bootstrapping (1,000 resamples) of the validation cohort.

The predictive models were trained on the designated training dataset, and the optimal model was subsequently evaluated on the validation cohort. Model performance was quantified using the AUC, sensitivity, specificity, F1 score, recall, and overall accuracy, with the classification outcomes visualized through a confusion matrix. To further assess the clinical applicability of the models, decision curve analysis (DCA) and calibration curves were generated. DCA evaluates the net clinical benefit of a predictive model across a range of threshold probabilities, thereby demonstrating its value in guiding clinical decisions21. Calibration curves, on the other hand, illustrate the agreement between predicted probabilities and observed outcomes, serving as a measure of model reliability and goodness-of-fit22.

Model interpretation

SHAP values were computed to quantify the contribution and relevance of each feature concerning its influence on the final classification outcome. Features with higher SHAP values were considered to exert a greater effect on the model predictions. The derived feature importance scores were presented to facilitate the interpretation of the optimal predictive model2325. Additionally, the Local Interpretable Model-Agnostic Explanations (LIME) approach was applied to provide supplementary, instance-level interpretability of the model2628.

Statistical analysis

Continuous variables were summarized as mean ± standard deviation (SD) with ranges or as median with interquartile ranges (IQR), depending on data distribution. Group comparisons were performed using Student’s t-test for normally distributed variables or the Wilcoxon rank-sum test for non-normally distributed data. Categorical variables were presented as counts and percentages, with group differences evaluated via the chi-square test or Fisher’s exact test, as appropriate. Statistical significance was defined as a two-tailed P-value < 0.05. All statistical analyses were carried out using SPSS version 27.0 (IBM Corp.), R version 4.4.2 (R Foundation for Statistical Computing), and Python version 3.10.4 (Python Software Foundation).

Results

Baseline characteristics and feature selection

After an exclusion criteria procedure, CFPs from 2,184 eyes were finally included for quantitative analysis and modeling. Table 1 shows the baseline data for the training and validation data, including sex, age, cycloplegic SE, AL, and AL/CR values. The training set contains 1,748 eyes, and the validation set contains 436 eyes. There was no statistical difference between the baseline information and AL/CR.

Table 1.

Baseline characteristics of the AL/CR data Set.

Characteristic Total Traindata Valdata P value
Number of children(eyes) 2184 1748 436
Sex 0.486
Male, % 1126, 51.56% 908, 51.95% 218, 50.00%
Female, % 1058, 48.44% 840, 48.05% 218, 50.00%
Age, mean ± SD, y 7.82 ± 0.97 7.83 ± 0.96 7.78 ± 0.99 0.275
Cycloplegic SE, mean ± SD, D 0.10 ± 1.24 0.08 ± 1.26 0.18 ± 1.15 0.060
Axial length, mean ± SD, mm 23.28 ± 0.82 23.28 ± 0.83 23.27 ± 0.79 0.098
AL/CR 2.99 ± 0.09 2.99 ± 0.09 2.99 ± 0.08 0.316
AL/CR ≤ 3.00, % 1229, 55.95% 984, 56.29% 245, 56.19%
AL/CR > 3.00, % 955, 44.05% 764, 43.71% 191, 43.81%

SE: Spherical equivalent; SD: Square deviation; AL/CR: axial length/corneal curvature radius.

After the CFP was quantified, a total of 144 quantitative variables were generated (Table S1), plus sex and age, for a total of 146 variables. Lasso regression was used to screen the relevant features of the training set, and the characteristics of the variable coefficients were shown in Fig. 3A-B. The Lasso regression model selects features with non-zero coefficients as potential predictors, effectively reducing multicollinearity and preventing over-fitting. We employed Lasso regression with 10-fold cross-validation to analyze the initial high-dimensional data set and screen relevant variables. The iterative analysis was performed using a tenfold cross-validation method. Based on the 20 variables selected by Lasso, three experienced ophthalmologists conducted a clinical significance assessment of the variables based on their clinical expertise. The final adjusted set comprises the following 22 variables: Sex, Age, Distance from Nasal Disc Margin to Foveal Center (µm), Presence of Atrophic Areas, Average Vessel Branching Angle, Average Arterial Diameter within 1.0–1.5.0.5 PD, Disc Area (mm²), Arterial Diameter in Nasal Peripapillary Region, Distance from Vascular Arcade to Intersection with Outer 2PD Circle (µm), Distance from Vascular Arcade to Intersection with Outer 1PD Circle (µm), Venous Diameter in Inferior Peripapillary Region, Vessel Length (µm), Vessel Area within 0.5–1.0.5.0 PD Annulus, Optic Disc Vertical/Horizontal Ratio, Presence of Tessellated Fundus, Average Vessel Diameter within 2.0–2.5.0.5 PD, Average Vessel Diameter within 5 mm Macular Fovea Area, Average Venous Curvature within 2.0–2.5.0.5 PD, Average Venous Curvature within 1.0–1.5.0.5 PD, Vessel Tortuosity in Nasal Peripapillary Region, Venous Tortuosity within 5 mm Macular Fovea Area, Vessel Tortuosity within 3 mm Macular Fovea Area, Average Vessel Density in Superior Peripapillary Region, Vessel Coverage Density within 2.0–2.5.0.5 PD, Arterial Diameter in Temporal Peripapillary Region, Average Venous Diameter within 0.5–1.0.5.0 PD, Venous Diameter within 5 mm Macular Fovea Area, Horizontal Cup Diameter (µm), Vertical Cup Diameter (µm), Vertical Cup-to-Disc Ratio, Maximum Rim Width of Optic Disc Cup, Optic Disc Long/Short Axis Ratio, Venous Vessel Length, Arterial Vessel Length, and Vessel Area within 2.0–2.5.0.5 PD Annulus. This process resulted in 22 retained features, as detailed in Supplementary Table S2.

Fig. 3.

Fig. 3

Presentation of the results of the LASSO regression analysis. (A) LASSO Regression Model Factor Selection: Left dashed line represents the optimal lambda value (lambda⋅min), while the right dashed line marks the lambda value within one standard error of the optimal (lambda.1se = 0.041); (B) LASSO regression model screening variable trajectories; (C) ROC curves for the machine learning models. XGBoost: extreme gradient boosting; SVM: support vector machine; ANN: artificial neural network; GBM: gradient boosting machine; ROC: receiver operating characteristic; AUC: area under the curve.

Model performance comparisons

We constructed seven machine learning models to recognize AL/CR > 3 as a marker of myopia risk. Figure 3C displayed the discriminative performance of seven models in terms of ROC curves. All seven models either strongly or weakly demonstrated their predictive performance in predicting myopia risk, with the RF model showing the best performance. The RF model achieved an AUC of 0.798 [95% confidence interval (CI): 0.760–0.835], setting the benchmark for myopia risk prediction. This was closely followed by the LightGBM model, which was slightly less effective than RF, with an AUC of 0.753 (95% CI: 0.712–0.793), outperforming the other algorithms. The remaining models are ranked in decreasing order of performance as follows: the XGBoost (AUC = 0.750, 95% CI: 0.709–0.791), SVM (AUC = 0.718, 95% CI: 0.675–0.760), LR (AUC = 0.705, 95% CI: 0.662–0.748), ANN (AUC = 0.697, 95% CI: 0.654–0.740) and Decision Tree (AUC = 0.664, 95% CI: 0.620–0.709).

Table 2 shows the detailed performance metrics of the seven models. The overall performance of the RF model is strong (sensitivity: 0.59, specificity: 0.80). Figure 4A illustrates the confusion matrix for RF. Notably, XGBoost obtained the highest F1 score (0.66) and accuracy (0.72) among all evaluated models, while also having the highest precision (0.74). Figure 4B illustrates the calibration curves for all seven models, providing an important insight into their predictive reliability. Except for the decision tree and ANN models, five of the seven models showed good agreement between the predicted probabilities and observations.

Table 2.

Performances of the ML models for predicting myopia risk.

Model AUC 95% CI Lower 95% CI Upper Accuracy Precision Sensitivity Specificity F1 Score
LR 0.71 0.66 0.75 0.67 0.66 0.51 0.79 0.58
Decision Tree 0.66 0.62 0.71 0.63 0.62 0.50 0.74 0.55
RF 0.80 0.76 0.84 0.70 0.71 0.59 0.80 0.64
XGBoost 0.75 0.71 0.79 0.72 0.74 0.59 0.83 0.66
LightGBM 0.75 0.71 0.79 0.69 0.68 0.61 0.76 0.64
SVM 0.72 0.68 0.76 0.66 0.65 0.55 0.76 0.60
ANN 0.70 0.65 0.74 0.63 0.61 0.54 0.71 0.57

XGBoost: extreme gradient boosting; SVM: support vector machine; RF: Random Forest; ANN: artificial neural network; LR: logistic regression; LightGBM: Light Gradient Boosting Machinearea AUC: under the curve; ROC: receiver operating characteristic.

Fig. 4.

Fig. 4

A demonstration of the model’s performance. (A) Confusion matrix for RF; (B) Seven models of the Calibration Curve for the Validation Set; (C) Seven models of the Decision Curve Analysis for the Validation Set.

In terms of clinical applicability, each model except Decision Tree and ANN showed robust net benefit over a wide range of threshold probabilities, with the RF model showing the highest net benefit and thus being selected as the best model for predicting risk of myopia (Fig. 4C).

Interpretability analysis

Figure 5 illustrates the SHAP analysis used to interpret the RF model and quantify the contribution of individual ophthalmic parameters to model predictions. Figure 5A presents the mean absolute SHAP values for the top features, indicating their overall importance in the RF model. Age, distance from the nasal disc margin to the foveal center, and presence of atrophic areas emerged as the most influential variables, followed by vessel-related parameters such as average vessel branching angle and arterial branching angle. Disc area and arterial diameter in the nasal peripapillary region also showed moderate contributions, whereas other vascular and morphometric features exerted smaller effects. Figure 5B shows the SHAP summary plot, where each point represents an individual sample, color-coded by feature value (red for high and blue for low). The figure highlights that older age and increases nasal disc–foveal distance strongly increased the model output, whereas smaller arterial diameters and reduced vessel branching angles contributed negatively. Several vascular morphometrics—including arterial and venous diameters, vessel length, and vascular density ratios—exerted modest but notable impacts.

Fig. 5.

Fig. 5

SHAP-based feature attribution for the RF model. (A) Mean absolute SHAP values ranking feature importance. (B) SHAP summary plot illustrating the direction and magnitude of each feature’s effect (red: high value, blue: low value). (C–D) SHAP waterfall plots for representative Class 1 and Class 0 predictions, showing features driving risk upward (red) or downward (blue). (E–F) SHAP force plots for two individual cases, highlighting how age, arterial branching angle, vessel density, and disc morphometrics collectively shaped prediction probabilities.

Figures 5C-D display SHAP waterfall plots for representative samples classified into Class 1 (AL/CR ≥ 3) and Class 0 (AL/CR < 3), respectively. In Class 1, younger age, larger arterial diameters, and higher vessel density reduced the predicted risk (blue bars), while greater nasal disc–foveal distance and presence of atrophic areas drove the prediction toward the positive class (red bars). Conversely, for Class 0, older age and greater disc morphometric measures increased the likelihood of classification, counterbalanced by smaller vessel diameters and lower vessel density.

Figure 5E-F details the SHAP force plots for two individual cases, demonstrating the feature-specific contributions to the final prediction probability. In both cases, structural metrics such as arterial branching angle, disc axis ratios, and fractal dimension of vessels strongly modulated the model outputs, underscoring their diagnostic relevance in conjunction with global age and morphometric features.

Discussion

The prediction of AL from CFP represents a significant advance in ophthalmic imaging, leveraging deep learning to estimate a 3D biometric measurement. Wang Y et al. believed that it is feasible to utilize deep learning models to predict AL for moderate to high myopic patients with Ultra-Widefield Fundus Imaging images29. Dong L et al. showed that deep-learning may be helpful in estimating AL based on CFP30. Deep learning-based prediction of AL using CFP was fairly accurate and enhanced by age inclusion. The optic disc and temporal peripapillary area may contain crucial structural information for AL prediction in CFP31.

Our study substantiates that ML models can predict the AL/CR of school-age children by analyzing age, sex, and 20 CFP variables with Smart Eye software from EVisionAI. Among the 20 CFP variables, all distance variables of important locations in the fundus ranked in the top 10, with atrophic arcs ranking second, and the top 10 included five vascular-related parameters. These findings suggest that distance variables, atrophic arcs, and vascular-related parameters are crucial in the study of applying CFP to predict AL/CR in children’s myopia screening. Additionally, these three types of indicators are also important parameters of myopia progression.

Meanwhile, this study selected AL/CR as the outcome indicator for several advantages. For one thing, by integrating CR to evaluate ocular morphology comprehensively, it reduces individual variations32; for another, during ocular development, CR remains relatively stable while AL elongation serves as the primary cause of myopia progression. The AL/CR can detect “rapid AL elongation” signals earlier, facilitating early intervention30. Furthermore, many Studies demonstrated that AL/CR shows significantly higher predictive accuracy for myopia progression compared to AL alone, particularly during rapid progression phases33,34,35. Overall, compared to AL alone, applying AL/CR as the outcome indicator enhances the predictive value and provides a more precise clinical assessment in this study.

CFP demonstrates multiple values in myopia management of children. Wang et al. conducted a predictive study of childhood SE using CFP, demonstrating that certain CFP parameters may be associated with myopia. However, due to the “black box” nature of direct CFP learning, the specific details of CFP linked to myopia have not been systematically investigated50. Its non-contact and rapid imaging capabilities make it particularly suitable for children. Compared to OCT or wide-field imaging, CFP offers lower costs and greater accessibility for community screening30. Numerous research teams have utilized CFP to predict and assess myopia progression risks through structural biomarkers3638. However, early myopia may lack significant fundus changes, limiting its early detection sensitivity. To address this, our study incorporates EVisionAI ‘s CFP quantification parameters, enabling analysis and quantification of subtle early-stage fundus alterations in myopia. This change may make CFP the “first line of defense” in primary myopia prevention at the grassroots level.

We used CFP imageomics to numerically quantify image data and predict myopia progression using the ML algorithm. Our results suggest that RF may be the ML method with the best efficacy in constructing this predictive model. Random forests are essentially an integration of tree models, where each tree can handle nonlinear relationships and higher-order interactions39. This study considers the reason for the stronger RF performance Ken is related to the characteristics of the data itself. In this study, there are many values of feature variables, and Random Forest automatically “ignores” the noisy variables by randomly sampling a subset of features in each tree training and focuses on utilizing the useful variables.

In parameter selection, this study incorporated age and sex factors into the analysis compared to using CFP quantification parameters alone. Zhao L et al. discovered that myopia is associated with increased age40. The studies of Mu J et al. showed that older age and female sex are risk factors for myopia, and their team also used ML with factors such as gender and age41,42. Xu S et al. also included sex in the study of the prediction model in children to axial length change43. These findings are consistent with this study. In this study, age was the first factor and sex was the sixth to predict myopia risk in school-age children. Together, these two factors exclude the changes in the fundus caused by the growth of the AL during the growth and development of children, making the prediction more accurate. Recent studies have applied ML to infer anterior segment parameters from posterior segment images, such as predicting myopic regression after corneal refractive surgery and keratometry from fundus photographs44,45. However, few studies have directly quantified CFP-derived imageomics in children to predict AL/CR-based myopia risk. Our work expands on these approaches by integrating explainable ML, pediatric cohorts, and CFP quantitative metrics.

Most importantly, we selected 20 CFP variables, among which the more important ones mainly include the distance variable of important locations in the fundus, the existence of atrophic arcs, and vascular-related parameters. This study examined the presence of atrophic arcs and categorized them based on the area of the atrophic arc, distinguishing between areas greater than 0 and equal to 0. The presence of papilledema is one of the manifestations of retinal changes caused by myopia46. Although the cases included in this study were school-age children, there were still patients with moderate to high myopia, which may be the main reason why atrophic arc became an important indicator. Previous studies by Guo Y et al. demonstrated that the optic disc-fovea distance is closely associated with AL growth, which was mainly due to an enlargement of the parapapillary gamma zone. At the same time, macular BM length increased to a minor degree47. This coincides with the distance from the nasal disc margin to the foveal area included in this study through screening. Surprisingly, the distance from the vascular arcade to the intersection with the outer circle was also significant in our model. The distances variable of important locations in the fundus is of great significance, which may be related to the stability of CR in AL/CR and the obvious increase of AL in the population included in this study, which is growing and developing. Many vascular-related parameters were also included in the model. Mi X et al. considered that the vascular arcade angle was associated with AL in children with myopia48. Gong W et al. found that the fundus vascular arcades angle was closely related to choroidal thickness48. Huang D et al. also found that fundus tessellated density can be applied as a quantitative biomarker to estimate sub-foveal choroidal thickness in children49. Thus, the parameters of vascular arcade angle and vessel area not only reflect the changes in the retina of myopic children, but also in the choroid, which improved the accuracy of our model. At the same time, we also found that vascular diameter and vascular length were associated with AL/CR, which may be like the reason why the distances variable of important locations are meaningful.

This study has several limitations. First, we used AL/CR ≥ 3 as the threshold based on prior evidence showing its superiority over AL alone in predicting myopia onset. Nevertheless, unlike cycloplegic refraction (SE ≤ −0.50 D) or longitudinal progression, this static cutoff lacks temporal validation. Thus, our findings should be interpreted as biometric risk stratification rather than definitive refractive diagnosis. While we used an 8:2 ratio for model training and validation, the lack of corresponding samples from other regions for external verification may affect the model’s applicability. Although we have developed the model, its performance remains suboptimal. Additionally, this study is limited by its cross-sectional design, exclusion of over 60% of initial data (raising possible selection bias), reliance on a single-center, ethnically homogeneous cohort, and the inclusion of both eyes from some participants without statistical adjustment for intra-subject correlation. The 45° CFP field may also miss peripheral retinal changes.

Conclusion

This study demonstrates that a machine learning approach leveraging quantitative color fundus imageomics can effectively predict myopia risk in school-age children, as defined by AL/CR > 3. Among seven algorithms, random forest delivered the best balance of accuracy, calibration, and clinical utility. Age, fundus morphometric distances, atrophic areas, and vascular metrics emerged as key determinants of risk prediction. Given CFP’s low cost, accessibility, and noninvasive nature, this model offers promise as a scalable tool for community-based myopia screening. Future research should incorporate external validation, expanded age cohorts, and longitudinal follow-up to strengthen predictive accuracy and clinical adoption.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (26.8KB, docx)

Acknowledgements

This work was partially supported by EVision Technology (Beijing) Co., Ltd. We are grateful to Dr.Xingye Wang and Jie Wang for their technical support in image processing.

Author contributions

DB and WRH conceived and supervised the experiment. ZXL, WZX, and YJT performed the study. WJH, SDS, LX, and ZB collected the data. WJH, YJT, and WZX analyzed the data. WJH helped perform the analysis with constructive discussions. WZX, ZXL, and YJT wrote the manuscript and revisions. ZXL, WZX, and YJT contributed equally to this work and were both considered first authors. DB, and WRH contributed equally to this work and were both considered corresponding Authors. All authors read and approved the final manuscript.

Funding

This study received no funding.

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Declarations

Ethics approval and consent to participate

This study was approved by the ethics committee of the Tianjin Medical University Eye Hospital [No. 2024KY-67]. All study procedures adhered to the tenets of the Declaration of Helsinki. All children included in this study obtained informed consent from their parents (legal guardians) and themselves and signed informed consent forms.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Xiaoling Zhang, Zixun Wang and Jingtao Yu contributed equally to this work.

Contributor Information

Bei Du, Email: Dubei@tmu.edu.cn.

Ruihua Wei, Email: rwei@tmu.edu.cn.

References

  • 1.Jones, D. et al. IMI—instrumentation for myopia management. Invest. Ophthalmol. Visual Sci.66, 7 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Mutti, D. O. et al. Refractive error, axial length, and relative peripheral refractive error before and after the onset of myopia. Invest. Ophthalmol. Visual Sci.48, 2510–2519 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Liu, S. et al. Axial elongation as a marker to identify Obvious myopic shift in non-myopic eyes of Chinese children. Ophthalmic Physiol. Opt.45(6), 1435-1446 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Gopalakrishnan, A. et al. Ocular biometry percentile curves and their relation to myopia development in Indian children. J. Clin. Med.13, 2867 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Qin, X. et al. Risk factors for ocular biological parameters in Chinese preschool children: a cohort study from the Beijing whole childhood eye study. Front. Med.12, 1510124 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Zhang, J. et al. RETFound-enhanced community-based fundus disease screening: Real-world evidence and decision curve analysis. NPJ digit. med.7, 108 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.A novel artificial intelligence-. based classification of highly myopic eyes based on visual function and fundus features - PubMed. https://pubmed.ncbi.nlm.nih.gov/39235401/. Accessed 18 Dec 2024. [DOI] [PMC free article] [PubMed]
  • 8.Hayashi, K. et al. Long-term pattern of progression of myopic maculopathy. Ophthalmology117, 1595–1611e4 (2010). [DOI] [PubMed] [Google Scholar]
  • 9.Zhu, Z. et al. Retinal age gap as a predictive biomarker of stroke risk. BMC Med.20, 466 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Liu, D. et al. Ocular biometric parameters in Chinese preschool children and physiological axial length growth prediction using machine learning algorithms: A retrospective cross-sectional study. BMJ Open.14, e084891 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Varadarajan, A. V. et al. Deep learning for predicting refractive error from retinal fundus images. Investig Ophthalmol. Vis. Sci.59, 2861–2868 (2018). [DOI] [PubMed] [Google Scholar]
  • 12.Huang, D. et al. The associations between myopia and fundus tessellation in school children: A comparative analysis of macular and peripapillary regions using deep learning. Transl Vis. Sci. Technol.14, 4 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Guo, Z. et al. Automated measurement and correlation analysis of fundus tessellation and optic disc characteristics in myopia. Sci. Rep.14, 28399 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Zhao, Y. et al. AI-based fully automatic analysis of retinal vascular morphology in pediatric high myopia. BMC Ophthalmol.24, 415 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Yang, C-N. et al. Convolutional neural Network–Based prediction of axial length using color fundus photography. Trans. Vis. Sci. Tech.13, 23 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Cabitza, F., Rasoini, R. & Gensini, G. F. Unintended consequences of machine learning in medicine. JAMA318, 517–518 (2017). [DOI] [PubMed] [Google Scholar]
  • 17.Lundberg, S. M. et al. From local explanations to global Understanding with explainable AI for trees. Nat. Mach. Intell.2, 56–67 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.He, H-L. et al. Deep learning-enabled vasculometry depicts phased lesion patterns in high myopia progression. Asia-pac J. Ophthalmol.13, 100086 (2024). [DOI] [PubMed] [Google Scholar]
  • 19.He, H-L. et al. Fundus tessellated density of pathologic myopia. Asia-pac J. Ophthalmol. (phila Pa). 12, 604–613 (2023). [DOI] [PubMed] [Google Scholar]
  • 20.Xu, Y. et al. The diagnostic accuracy of an intelligent and automated fundus disease image assessment system with lesion quantitative function (SmartEye) in diabetic patients. BMC Ophthalmol.19, 184 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Chalkou, K., Vickers, A. J., Pellegrini, F., Manca, A. & Salanti, G. Decision curve analysis for personalized treatment choice between multiple options. Med. Decis. Mak: Int. J. Soc. Med. Decis. Mak.43, 337–349 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Van Calster, B., McLernon, D. J., van Smeden, M., Wynants, L. & Steyerberg, E. W. Topic group ‘Evaluating diagnostic tests and prediction models’ of the STRATOS initiative. Calibration: the Achilles heel of predictive analytics. BMC Med.17, 230 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Bhandari, M., Shahi, T. B., Siku, B. & Neupane, A. Explanatory classification of CXR images into COVID-19, pneumonia and tuberculosis using deep learning and XAI. Comput. Biol. Med.150, 106156 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Loveleen, G. et al. Explanation-driven HCI model to examine the mini-mental state for alzheimer’s disease. ACM Trans. Multimed Comput. Commun. Appl.20, 41:1–4116 (2023). [Google Scholar]
  • 25.Patel, A. N. et al. An explainable transfer learning framework for multi-classification of lung diseases in chest X-rays. Alexandria Eng. J.98, 328–343 (2024). [Google Scholar]
  • 26.Zhao, X. et al. A machine-learning-derived online prediction model for depression risk in COPD patients: A retrospective cohort study from CHARLS. J. Affect. Disord. 377, 284–293 (2025). [DOI] [PubMed] [Google Scholar]
  • 27.Huang, D., Gong, L., Wei, C., Wang, X. & Liang, Z. An explainable machine learning-based model to predict intensive care unit admission among patients with community-acquired pneumonia and connective tissue disease. Respir. Res.25(1), 246 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Guan, C. et al. Interpretable machine learning model for new-onset atrial fibrillation prediction in critically ill patients: A multi-center study. Crit. Care.28(1), 349 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wang, Y. et al. Development and validation of a deep learning model to predict axial length from ultra-wide field images. Eye (Lond). 38 (7), 1296–1300 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Dong, L. et al. Deep Learning-Based Estimation of axial length and subfoveal choroidal thickness from color fundus photographs. Front. Cell. Dev. Biol.9, 653692 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Yang, C. N. et al. Convolutional neural Network-Based prediction of axial length using color fundus photography. Transl Vis. Sci. Technol.13 (5), 23 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Foo, V. H. et al. Axial Length/Corneal radius of curvature ratio and myopia in 3-Year-Old children. Transl Vis. Sci. Technol.5 (1), 5 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Mu, J. et al. The accuracy of the axial length and axial length/corneal radius ratio for myopia assessment among Chinese children. Front. Pediatr.10, 859944 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Chen, S., Mu, J., Tan, X., Wu, X. & Duan, J. Prediction of myopia based on biometric parameters of 500,000 children and adolescents aged 3–18 years. Front. Public. Health. 13, 1563305 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.González-Jiménez, R. et al. The distribution of ocular normative parameters in a Spanish school population. J. Clin. Med.14 (7), 2507 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dong, L. et al. Prevalence of myopia and axial length distribution in china: the Wuhu children and adolescents eye study. Invest. Ophthalmol. Vis. Sci.66 (6), 33 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Sameshima, S. et al. Longitudinal changes of funduscopic optic disc size, color and cup-to-disc ratio in school children. Int. J. Retina Vitreous. 10 (1), 51 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Liu, L. et al. Quantitative analysis of retinal vascular parameters changes in school-age children with refractive error using artificial intelligence. Front. Med. (Lausanne). 11, 1528772 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Wang, G. et al. Construction and evaluation of a machine learning-based predictive model for enteral nutrition feeding intolerance risk in ICU patients. Front. Nutr.12, 1600319 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhao, L. et al. Prevalence and risk factors of myopia among children and adolescents in Hangzhou. Sci. Rep.14 (1), 24615 (2024). Published 2024 Oct 19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Mu, J., Zhong, H., Jiang, M., Wang, J. & Zhang, S. Development of a nomogram for predicting myopia risk among school-age children: a case-control study. Ann. Med.56 (1), 2331056 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mu, J., Zhong, H. & Jiang, M. Machine-learning models to predict myopia in children and adolescents. Front. Med. (Lausanne). 11, 1482788 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Xu, S. et al. Establishment of Myopia Occurrence Prediction Model in Children without Myopia Using Cycloplegic Refraction and Prior Axial Length Change. Ophthalmology. [DOI] [PubMed]
  • 44.Kim, J. et al. Machine learning predicting myopic regression after corneal refractive surgery using preoperative data and fundus photography. Graefes Arch. Clin. Exp. Ophthalmol.260 (11), 3701–3710 (2022). [DOI] [PubMed] [Google Scholar]
  • 45.Choi, J. Y. et al. Deep learning prediction of steep and flat corneal curvature using fundus photography in post-COVID telemedicine era. Med. Biol. Eng. Comput.62 (2), 449–463 (2024). [DOI] [PubMed] [Google Scholar]
  • 46.Chen, L. et al. ARA-net: an attention-aware retinal atrophy segmentation network coping with fundus images. Front. Neurosci.17, 1174937 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Guo, Y. et al. Optic disc-fovea distance and myopia progression in school children: the Beijing children eye study. Acta Ophthalmol.96 (5), e606–e613 (2018). [DOI] [PubMed] [Google Scholar]
  • 48.Mi, X. et al. Temporal vascular arcade angle in fundus image was associated with the rate of spherical equivalent refractive error and axial length changes in myopia children with young school age. Photodiagnosis Photodyn Ther.49, 104305 (2024). [DOI] [PubMed] [Google Scholar]
  • 49.Huang, D. et al. Fundus tessellated density assessed by deep learning in primary school children. Transl Vis. Sci. Technol.12 (6), 11 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wang, Z. et al. Myopic-Net: deep Learning-Based direct identification of myopia onset and progression. Transl Vis. Sci. Technol.14 (8), 38 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (26.8KB, docx)

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES