Abstract
Neurofibromatosis type 1 (NF-1) is a genetic disorder associated with a high risk of vision loss in children. Optical coherence tomography (OCT) provides non-invasive, high-resolution imaging of retinal and optic nerve structures, yet translating these data into predictive clinical tools remains challenging. This retrospective longitudinal cohort study analyzed 515 OCT measurements collected across multiple visits from 168 pediatric NF-1 patients (aged 3–19 years) to evaluate the ability of machine learning models to identify current vision abnormalities based on retinal and optic nerve layer thickness, rather than raw OCT images. Among the algorithms tested, the Balanced Random Forest model demonstrated the best performance (AUC = 0.82; sensitivity = 0.66). Thinning of the retinal nerve fiber layer (RNFL) and ganglion cell layer (GCL+) were identified as the strongest predictors of abnormal vision. Data-driven cut-off values for total macular and nerve layer thickness provided clear thresholds for clinical interpretation, while a cumulative “k-out-of-n” analysis showed that combining multiple OCT abnormalities enhanced risk stratification. These findings highlight the potential of explainable machine learning to transform OCT data into interpretable, clinically actionable tools for early detection and management of current vision abnormalities in pediatric NF-1. Validation in larger, multi-center cohorts is needed to confirm generalizability and support clinical adoption.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-026-37900-5.
Keywords: Neurofibromatosis type 1, Optical coherence tomography, Machine learning, Pediatric ophthalmology, Visual impairment prediction
Subject terms: Biomarkers, Computational biology and bioinformatics, Diseases, Medical research
Introduction
Neurofibromatosis type 1 (NF-1), first described by Von Reklinghausen1, is a rare autosomal dominant disease affecting 1 in 2,500–3,000 live newborns2–4. The condition is characterized by a broad spectrum of clinical manifestations, and its diagnosis, as outlined by the National Institutes of Health (NIH), requires two or more specific criteria. These include six or more café-au-lait spots, two or more cutaneous neurofibromas, at least one plexiform neurofibroma, axillary or inguinal freckles, optic glioma, two or more Lisch nodules, bone lesions, and a first-degree relative with NF-15. Among ocular manifestations, Lisch nodules, reduced visual acuity (VA), and optic pathway gliomas (OPG) are the most prevalent, presenting significant diagnostic and therapeutic challenges.
Approximately 20% of pediatric NF-1 patients develop OPG6–8, often leading to marked decreases in VA around the age of 5 years9. Early detection and monitoring of OPG are essential for preserving vision, as visual impairment decreases significantly after eight years of age8. Traditionally, VA and visual field (VF) abnormalities have required further investigation through magnetic resonance imaging (MRI) to determine tumor’s presence and progression, guiding treatment decisions.
Optical coherence tomography (OCT) has become a critical non-invasive tool for pediatric ophthalmology. OCT’s capacity to visualize retinal layers, particularly the retinal nerve fiber layer (RNFL), has been validated in conditions like optic neuritis in multiple sclerosis, where RNFL thinning correlates with VA and VF deterioration10,11. Avery et al. confirmed OCT’s utility in detecting RNFL thinning in NF-1 patients with OPG, providing an important diagnostic marker. Similarly, Gu et al.12 demonstrated OCT’s capability in detecting reductions in the ganglion cell layer-inner plexiform layer (GCL-IPL), further underscoring its utility in identifying visual impairment in OPG cases.
Recent advances in deep learning (DL) have enhanced OCT’s diagnostic capabilities. For instance, Xiang et al.13 developed a DL-based framework to predict long-term visual function in childhood cataract patients, demonstrating high predictive accuracy. Similarly, Leandro et al.14 introduced OCT-based deep-learning models for identifying retinal key signs, showcasing the efficacy of convolutional neural networks. Peng et al.15 extended the use of DL models to diagnose multiple retinal diseases, achieving over 95% accuracy in classifying disease subtypes using OCT images. While all three studies highlight the power of DL in retinal diagnostics, neither focused on pediatric NF-1 patients, as our study does. Our machine learning (ML) models are specifically tailored to detect vision abnormalitites associated with NF-1, addressing the unique retinal changes and challenges in this population. Moreover, unlike these image-based DL approaches, our study uses traditional ML models applied to quantitative OCT layer measurements, offering improved interpretability by allowing direct linkage between feature importance and anatomically meaningful retinal structures.
Building on prior advancements, our study applies machine learning techniques to identify current vision abnormalities in pediatric NF-1 patients using OCT-derived layer thickness measurements rather than OCT images. We analyzed retinal and optical nerve layer thickness to enhance diagnostic accuracy and to identify the most critical layers associated with visual impairment. Additionally, we established OCT cut-off values specifically for NF-1 patients, offering quantitative reference points for interpreting structural OCT changes in relation to visual function. This is crucial for pediatric patients, where early recognition of vision abnormalities can prevent irreversible vision loss and significantly improve outcomes.
Methods
We conducted this case-control, retrospective study at Hospital Sant Joan de Déu, in Barcelona, from July 2018 to April 2023. We included healthy individuals and those diagnosed with neurofibromatosis type 1 (NF-1), with or without OPG, aged 3 to 20 years, who presented to the Ophthalmology Department. Patients who were uncooperative during OCT scans or had ocular pathologies that compromised retinal layers integrity were excluded.
We strictly followed the ethical guidelines of the Declaration of Helsinki (1975, revised in Tokyo in 2013). Institutional Review Board (IRB)/Ethics Committee approval was obtained from the Ethics Committee of Fundació Sant Joan de Déu, Esplugues de Llobregat, Barcelona (approval reference: PIC-60-18, dated October 3, 2018). We obtained informed consent from all participants and their legal guardians after fully explaining the study’s nature and purpose.
We assessed Visual acuity (VA) using the LogMAR scale, tailored to each age group: the Teller test for ages 6 to 36 months, the preverbal LEA test for ages 3 to 4 years, the preverbal HTVO for ages 4–5 years, and the verbal HTVO for participants older than 5 years. An experienced ophthalmologist examined the anterior segment to detect Lisch nodules. We performed OCT scans on the macular and optic nerve areas using the Triton OCT DRI SS-OCT instrument (Topcon Corporation, Tokyo, Japan), with a 3D vertical scan in the macular zone and a 3D scan in the optic nerve zone.
The OCT software automatically performed retinal layer segmentation. The retinal nerve fiber layer (RNFL) was defined as the space between the inner limiting membrane (ILM) and the nerve fiber layer (NFL). The ganglion cell layer (GCL+) was segmented between the nerve fiber layer (NFL) and the inner plexiform layer (IPL), while the expanded ganglion cell layer (GCL++) was defined as the space between the inner limiting layer (ILM) and the inner plexiform layer (IPL). In addition, automated segmentation by the device provided measurements of macular and peripapillary (optic nerve) choroidal thickness, defined as the distance between the retinal pigment epithelium (RPE) and the choroid–sclera interface within the respective scan regions.
Data collection
We collected 1,649 OCT measurements from 189 patients, documenting visit and birth dates, visit numbers, and relevant clinical features such as NF-1 status, OPG, Lisch and choroidal nodules, and the examined eye. To ensure independent observations, we analyzed data from the eye with the most visits for each patient, selecting the right eye in cases of a tie. Visual acuity and OCT assessments provided detailed measurements of macular and nerve layers, further subdivided into totals and specific zones.
Data cleaning and preprocessing
To ensure data integrity and quality, we cleaned pre-processed the dataset. We removed 22 redundant records from confirmation visits occurring within a month. Patients lacking LogMAR visual acuity values were excluded, as this parameter is essential for defining abnormal vision and imputing it could have biased the results. When OCT layer data for either nerve or macula were missing, we interpreted this as an unperformed exam, resulting in absent values across all zones. Because these measurements are critical for prediction, we opted not to impute missing values and instead excluded any record missing all zone data for a nerve or macular layer. After these steps, our final dataset included 515 measurements from 168 patients for analysis.
Definition of abnormal vision
We defined normal vision according to the age-specific LogMAR benchmarks16,17, as detailed in Table 1. These benchmarks were used to categorize VA measurements as either normal or abnormal.
Table 1.
Age-Specific LogMAR benchmarks for normal Vision.
| Age | LogMAR |
|---|---|
| 6–12 months | 0.95 or better |
| 12–18 months | 0.95 or better |
| 18–24 months | 0.90 or better |
| 24–30 months | 0.5 or better |
| 30–35 months | 0.5 or better |
| 36–47 months | 0.4 or better |
| 48–59 months | 0.3 or better |
| 60–72 months | 0.2 or better |
| > 72 months | 0.0 or better |
Abnormal cohort definition
To investigate differences in demographic, clinical, and OCT parameters, we divided patients into two mutually exclusive groups based on visual acuity (VA) status across all visits. The “normal” group included patients who maintained normal VA at every visit, while the “abnormal” group comprised patients who exhibited abnormal VA at least once during follow-up, regardless of their status at other visits. Importantly, vision status was largely stable over time; only 16 of 168 patients (9.5%) transitioned between normal and abnormal vision across visits. This grouping approach enabled us to capture both persistent and transient abnormalities and to facilitate robust, patient-level comparisons of clinical and imaging features associated with any occurrence of abnormal vision.
Statistical analysis
We compared demographic, clinical, and OCT parameters between groups. Numerical variables are reported as mean ± standard deviation and were tested for normality using the Shapiro–Wilk test. Based on the distribution, we used either the independent t-test or Mann–Whitney U test for comparisons. For categorical variables, we applied the Chi-square or Fisher’s exact test, as appropriate. We defined statistical significance as p < 0.05. To illustrate distributional differences in OCT measurements between groups, we generated kernel density estimate (KDE) plots. All statistical analyses and visualizations were performed using Python18.
Machine learning analysis for vision abnormality prediction
We applied and evaluated nine supervised machine learning models to predict abnormal vision from optical coherence tomography (OCT) data. The models included logistic regression, random forest, balanced random forest, gradient boosting, extra trees, AdaBoost, K-nearest neighbors, support vector machine (SVM), and XGBoost. All models were trained using their default hyperparameters with random_state = 42, and no hyperparameter tuning was performed.
For feature selection, we focused on aggregate OCT measurements, specifically, total thickness values of the retinal nerve fiber layer (RNFL), ganglion cell layer (GCL+), expanded ganglion cell layer (GCL++), and choroid in both macular and nerve region. We excluded individual sublayer values to minimize feature redundancy and multicollinearity, as these totals were highly correlated with sublayer metrics and better reflect global retinal structural integrity. This approach also improved the interpretability of the resulting models. To ensure comparability across algorithms, we standardized all continuous features using z-score normalization.
To prevent data leakage associated with repeated measurements from individual patients, model training and evaluation used group-based cross-validation (GroupKFold, 5 folds), with all data from each patient assigned exclusively to either the training or the test set in each fold. This approach replicates real-world clinical application, where models are deployed on new, previously unseen patients.
We assessed model performance across all cross-validation folds using area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (PR-AUC), F1 score, recall (sensitivity), precision (positive predictive value), and Cohen’s kappa statistic. Overfitting was quantified as the difference between mean training and cross-validated AUC (“delta AUC”). The optimal model was selected as the one with the highest recall among those with a delta AUC less than 0.05, emphasizing the clinical need to maximize sensitivity for identifying abnormal vision.
Model interpretability was examined by applying SHAP (SHapley Additive exPlanations) analysis to the final selected model. SHAP values were computed using the TreeExplainer, with the full feature matrix used as the background dataset. SHAP summary plots were generated to rank features by their influence on prediction, and SHAP dependence plots were used to visualize the relationship between feature values and predicted risk, enabling the identification of clinically relevant thresholds. All analyses were conducted in Python using the scikit-learn (v1.3.2), imbalanced-learn (v0.11.0), XGBost (v2.0.2) and SHAP (v0.45.1.dev19) libraries19,20.
Results
The cohort’s mean age at first visit was 8.9 ± 3.9 years (range: 3–19 years). Patients attended an average of 3.3 ± 1.9 visits.
Demographic and clinical characteristics: normal vs. Abnormal vision
Patients with abnormal vision did not differ significantly in age at first visit compared to those with normal vision (9.5 ± 4.1 vs. 8.6 ± 3.8 years, p = 0.2) but attended slightly more clinic visits (3.7 ± 1.9 vs. 2.8 ± 1.8, p = 0.003). The prevalence of OPG was notably higher in the abnormal vision group (57% vs. 22%, p < 0.0001). No significant differences were observed for Lisch nodules or choroidal nodules. Detailed demographic and clinical comparisons are provided in Table 2.
Table 2.
Cohort characteristics and comparative analysis between normal and abnormal vision Groups. Numerical variables are presented as mean ± SD. For binary clinical features, both the percentage and absolute number of patients are shown. Numerical variables were compared using the independent t-test or Mann–Whitney U test, based on the Shapiro–Wilk normality test. Categorical variables were compared using the Chi-square or fisher’s exact test, as appropriate. The test used is indicated by a superscript: † Mann–Whitney U; ‡ Chi-square; § fisher’s exact. Significant p-values (p < 0.05) are shown in bold.
| Total Measurements | All patients (n = 168) |
Normal (n = 115) |
Abnormal (n = 53) |
p-Value |
|---|---|---|---|---|
| 515 | 321 | 194 | - | |
|
Age (years) at first visit (mean ± std) |
8.9 ± 3.9 | 8.6 ± 3.8 | 9.5 ± 4.1 | 0.2† |
| Number of Visits per patient (mean ± std) | 3.0 ± 1.8 | 2.8 ± 1.8 | 3.7 ± 1.9 | 0.003 † |
| OPG (% of patients) | 35% (59) | 23% (27) | 60% (32) | < 0.0001 ‡ |
| Lisch Nodules (% of patients) | 2% (87) | 51% (59) | 53% (28) | 0.98‡ |
| Choroidal Nodules (% of patients) | 26% (44) | 28% (32) | 23% (12) | 0.60‡ |
OCT layer analysis: normal vs. Abnormal vision
Significant reductions in thickness were observed for the totals of the retinal nerve fiber layer (RNFL), ganglion cell layer (GCL+), and expanded ganglion cell layer (GCL++) among patients with abnormal vision, compared to those with normal vision (all FDR-adjusted p < 0.0001; Table 3). While macular choroidal thickness did not differ between groups (p = 0.9), nerve choroidal thickness was significantly reduced in the abnormal vision cohort (p = 0.0004). Full zone-specific results for macular and nerve regions are detailed in Appendix A.1.
Table 3.
Differential macular and nerve total layer thickness in normal vs. Abnormal vision Cohorts. Mean thickness values (± SD) and Mann–Whitney U test results are provided for each retinal layer. Bolded results indicate statistically significant differences after adjustment for multiple comparisons (FDR correction).
| Layer | All (mean ± SD) |
Normal (mean ± SD) |
Abnormal (mean ± SD) |
Man Whitney p-value (FDR adj.) |
|---|---|---|---|---|
| OCT macula: rnfl TOTAL | 32.8 ± 8.1 | 35.5 ± 4.6 | 28.3 ± 10.3 | < 0.0001 |
| OCT macula: gcl TOTAL | 68.2 ± 10.7 | 72.9 ± 7.3 | 60.4 ± 11.0 | < 0.0001 |
| OCT macula: gcl + + TOTAL | 100.9 ± 17.0 | 108.2 ± 10.3 | 88.7 ± 18.8 | < 0.0001 |
| OCT macula: coroides TOTAL | 280.6 ± 69.8 | 283.1 ± 60.6 | 276.4 ± 82.7 | 0.36 |
| OCT nerve: rnfl TOTAL | 93.0 ± 24.6 | 104.2 ± 16.1 | 74.4 ± 25.0 | < 0.0001 |
| OCT nerve: gcl TOTAL | 43.4 ± 6.5 | 45.2 ± 5.4 | 40.5 ± 7.0 | < 0.0001 |
| OCT nerve: gcl + + TOTAL | 136.4 ± 28.7 | 149.4 ± 18.9 | 115.0 ± 29.4 | < 0.0001 |
| OCT nerve: coroides TOTAL | 144.0 ± 53.4 | 139.3 ± 50.3 | 151.9 ± 57.3 | 0.02 |
Figure 1. Kernel Density Estimate (KDE) plots comparing the distributions of total macular (first row) and nerve (second row) layer thicknesses for normal and abnormal vision groups. The plotted layers include the retinal nerve fiber layer (RNFL); the combined ganglion cell and inner plexiform layers (GCL+); the ganglion cell complex (GCL++, defined as RNFL + GCL + IPL); and the choroid. Reduced retinal layer thickness is evident in the abnormal group, whereas choroidal thickness remains relatively unchanged.
Fig. 1.
provides kernel density estimate (KDE) plots for the thickness of the total layers, demonstrating the clear leftward shift, indicating thinner layers, in the abnormal vision group. These visualizations underscore the clinical relevance of OCT-derived retinal thickness metrics for distinguishing patients with abnormal vision.
Machine learning analysis
The Balanced Random Forest (BRF) model was selected as the final model because it provided the highest recall (0.66) among all models with controlled overfitting (delta AUC < 0.05), a priority that aligns with the clinical need to maximize sensitivity in a screening context. Using total macular layer thicknesses as predictors, the BRF achieved a cross-validated AUC of 0.82, PR-AUC of 0.75, F1 score of 0.64, recall of 0.66, precision of 0.62, Cohen’s kappa of 0.43, and delta AUC of 0.04. Although models such as LDA and Logistic Regression demonstrated similar AUC values with slightly lower delta AUC, their lower recall made them less suitable for clinical screening, where failing to detect abnormal vision carries greater risk. Similar performance was observed for the BRF model trained on total nerve layer thicknesses (AUC 0.81, recall 0.65), though macular features conferred marginally superior discrimination. Comprehensive results for all model-feature combinations are presented in Table 4.
Table 4.
Cross-validated performance metrics for all machine learning classifiers trained on total macular (top) and total nerve (bottom) layer thicknesses. Values represent mean ± standard deviation across 5 folds. Metrics include AUC, PR-AUC, F1 score, recall, precision, and cohen’s kappa, with ΔAUC indicating the difference between training and cross-validated AUC as a measure of overfitting.
| AUC (CV) | AUC (train) | Delta AUC | PR-AUC (CV) | F1 (CV) | Recall (CV) | Precision (CV) | Kappa (CV) | |
|---|---|---|---|---|---|---|---|---|
| Macula total layers | ||||||||
| Extra Trees | 0.83 ± 0.07 | 0.87 ± 0.01 | 0.04 | 0.76 ± 0.06 | 0.51 ± 0.16 | 0.40 ± 0.16 | 0.75 ± 0.11 | 0.38 ± 0.16 |
| Balanced Random Forest | 0.82 ± 0.08 | 0.86 ± 0.02 | 0.04 | 0.75 ± 0.05 | 0.64 ± 0.08 | 0.66 ± 0.08 | 0.62 ± 0.12 | 0.43 ± 0.19 |
| Random forest | 0.81 ± 0.08 | 0.89 ± 0.02 | 0.08 | 0.76 ± 0.07 | 0.61 ± 0.09 | 0.54 ± 0.10 | 0.72 ± 0.08 | 0.47 ± 0.13 |
| LDA | 0.80 ± 0.10 | 0.82 ± 0.02 | 0.02 | 0.72 ± 0.08 | 0.66 ± 0.08 | 0.60 ± 0.09 | 0.74 ± 0.06 | 0.51 ± 0.14 |
| Gradient Boosting | 0.80 ± 0.07 | 0.94 ± 0.01 | 0.14 | 0.73 ± 0.07 | 0.61 ± 0.08 | 0.56 ± 0.12 | 0.69 ± 0.09 | 0.45 ± 0.15 |
| AdaBoost | 0.80 ± 0.08 | 0.96 ± 0.01 | 0.16 | 0.74 ± 0.05 | 0.61 ± 0.07 | 0.55 ± 0.11 | 0.70 ± 0.07 | 0.45 ± 0.15 |
| Ridge Classifier | 0.80 ± 0.09 | 0.82 ± 0.02 | 0.02 | NaN | 0.65 ± 0.08 | 0.58 ± 0.09 | 0.75 ± 0.07 | 0.50 ± 0.15 |
| Logistic Regression | 0.80 ± 0.10 | 0.82 ± 0.02 | 0.02 | 0.74 ± 0.07 | 0.65 ± 0.08 | 0.58 ± 0.09 | 0.74 ± 0.07 | 0.50 ± 0.15 |
| XGBoost | 0.79 ± 0.07 | 0.98 ± 0.00 | 0.19 | 0.72 ± 0.07 | 0.62 ± 0.07 | 0.59 ± 0.10 | 0.66 ± 0.09 | 0.44 ± 0.13 |
| SVM | 0.77 ± 0.08 | 0.84 ± 0.02 | 0.07 | 0.71 ± 0.09 | 0.59 ± 0.15 | 0.49 ± 0.16 | 0.79 ± 0.15 | 0.46 ± 0.18 |
| K-Nearest Neighbors | 0.76 ± 0.09 | 0.92 ± 0.01 | 0.16 | 0.65 ± 0.07 | 0.52 ± 0.13 | 0.43 ± 0.13 | 0.68 ± 0.09 | 0.36 ± 0.16 |
| Nerve Total Layers | ||||||||
| Extra Trees | 0.82 ± 0.10 | 0.85 ± 0.02 | 0.03 | 0.76 ± 0.07 | 0.59 ± 0.11 | 0.48 ± 0.11 | 0.78 ± 0.13 | 0.45 ± 0.16 |
| Ridge Classifier | 0.82 ± 0.10 | 0.82 ± 0.03 | 0.00 | — | 0.61 ± 0.07 | 0.51 ± 0.06 | 0.79 ± 0.14 | 0.47 ± 0.15 |
| Logistic Regression | 0.82 ± 0.10 | 0.82 ± 0.03 | 0.01 | 0.76 ± 0.08 | 0.63 ± 0.06 | 0.52 ± 0.05 | 0.80 ± 0.13 | 0.49 ± 0.14 |
| Balanced Random Forest | 0.81 ± 0.11 | 0.84 ± 0.02 | 0.03 | 0.74 ± 0.09 | 0.64 ± 0.09 | 0.65 ± 0.07 | 0.64 ± 0.13 | 0.46 ± 0.18 |
| LDA | 0.80 ± 0.09 | 0.83 ± 0.03 | 0.02 | 0.75 ± 0.06 | 0.61 ± 0.07 | 0.51 ± 0.08 | 0.79 ± 0.14 | 0.47 ± 0.14 |
| Random Forest | 0.80 ± 0.10 | 0.87 ± 0.02 | 0.07 | 0.74 ± 0.07 | 0.62 ± 0.11 | 0.53 ± 0.10 | 0.75 ± 0.15 | 0.46 ± 0.20 |
| SVM | 0.76 ± 0.09 | 0.83 ± 0.02 | 0.07 | 0.71 ± 0.09 | 0.60 ± 0.11 | 0.50 ± 0.10 | 0.78 ± 0.13 | 0.46 ± 0.16 |
| K-Nearest Neighbors | 0.75 ± 0.08 | 0.90 ± 0.02 | 0.15 | 0.68 ± 0.06 | 0.60 ± 0.06 | 0.50 ± 0.08 | 0.77 ± 0.11 | 0.46 ± 0.12 |
| AdaBoost | 0.75 ± 0.07 | 0.97 ± 0.01 | 0.22 | 0.65 ± 0.15 | 0.51 ± 0.14 | 0.47 ± 0.16 | 0.55 ± 0.11 | 0.30 ± 0.15 |
| Gradient Boosting | 0.74 ± 0.10 | 0.94 ± 0.01 | 0.20 | 0.66 ± 0.10 | 0.56 ± 0.12 | 0.51 ± 0.13 | 0.62 ± 0.12 | 0.38 ± 0.18 |
| XGBoost | 0.71 ± 0.09 | 0.98 ± 0.00 | 0.27 | 0.63 ± 0.11 | 0.51 ± 0.12 | 0.49 ± 0.13 | 0.52 ± 0.11 | 0.28 ± 0.17 |
To better understand the determinants of model prediction, we applied SHAP (SHapley Additive exPlanations) analysis to the leading models. SHAP summary plot (Fig. 2) demonstrated that reduced total GCL++, GCL + and RNFL thicknesses in both macular and nerve regions were strongly associated with increased probability of abnormal vision, highlighting their clinical significance. The accompanying SHAP bar plots further quantified the relative influence of each layer on the model’s predictions.
Fig. 2.
SHAP (Shapley Additive Explanations) summary plots showing the relative contribution and directionality of total macular (left) and nerve (right) layer thicknesses in predicting abnormal vision. Features are ordered by importance, with SHAP values indicating the strength and direction of association. Red and blue indicate high and low feature values, respectively; positive SHAP values (right) increase the predicted probability of abnormal vision, while negative values (left) decrease it. The bar plot (below) visualizes overall feature importance.
To enhance clinical interpretability, we also generated SHAP dependence plots, (Fig. 3) which revealed distinct threshold values for each key feature. These plots demonstrated that the risk of abnormal vision increased sharply below specific retinal layer thicknesses, providing actionable cut-off values that could inform future clinical assessment and risk stratification.
Fig. 3.

SHAP (Shapley Additive Explanations) dependence plots depicting the relationship between individual OCT layer thickness and model prediction, with macular totals (left) and nerve totals (right). Each plot shows the feature value (x-axis) versus its SHAP value (y-axis), quantifying the feature’s influence on abnormal vision prediction. Higher SHAP values indicate greater risk, while lower values suggest normal vision. Vertical red dashed lines denote empirically determined cutoffs, marking feature thresholds where the probability of abnormal vision rises sharply. These cutoffs offer clinically actionable benchmarks for interpretation.
To assess the clinical utility of the predictive thresholds identified by our models, we stratified abnormal vision prevalence according to whether OCT layer measurements fell above or below each cutoff. Table 5 presents these results, displaying the rates of abnormal and normal vision for each threshold, along with statistical significance determined by chi-square or Fisher’s exact test, as appropriate.
Table 5.
Differential prevalence of abnormal vision based on predictive feature Thresholds. abnormal and normal vision rates are shown below and above the optimal threshold for each OCT feature. P-values reflect the significance of group differences by chi-square or fisher’s exact test.
| Feature | Threshold | Prevalence abnormal vision | Prevalence normal vision | p-value | ||
|---|---|---|---|---|---|---|
| Below | Above | Below | Above | |||
| OCT macula: rnfl (total) | 34 | 81% | 19% | 38% | 62% | < 0.00001* |
| OCT macula: gcl (total) | 61 | 60% | 40% | 8% | 92% | < 0.00001* |
| OCT macula: gcl++ (total) | 101 | 73% | 27% | 24% | 76% | < 0.00001* |
| OCT macula: coroides (total) | 240 | 33% | 67% | 23% | 77% | 0.019* |
| OCT nerve: rnfl (total) | 87 | 64% | 36% | 14% | 86% | < 0.00001* |
| OCT nerve: gcl (total) | 40 | 52% | 48% | 15% | 85% | < 0.00001* |
| OCT nerve: gcl++ (total) | 167 | 99% | 1% | 86% | 14% | 0.0002† |
| OCT nerve: coroides (total) | 213 | 84% | 16% | 94% | 6% | < 0.00001* |
*P-value by chi-square test; †P-value by Fisher’s exact test.
This stratified approach demonstrates a strong association between specific OCT layer thicknesses and abnormal vision risk. For example, patients with total macular RNFL thickness below 34 μm had an 81% prevalence of abnormal vision, compared to only 19% among those above this threshold. Comparable risk stratification was observed for other layers, with significant changes in prevalence occurring at the identified cutoffs. These findings confirm the clinical relevance and robustness of the selected thresholds for distinguishing patients at increased risk of vision abnormalities.
To further evaluate the cumulative predictive value of these OCT thresholds, we examined the proportion of abnormal and normal cases meeting or exceeding different numbers of abnormality cutoffs (“k out of 8” rule). For each k, we calculated the proportions and assessed statistical significance. Requiring at least 3 out of 8 thresholds to be crossed identified 84% of abnormal cases while flagging 44% of normals (p < 10⁻¹⁷). Increasing the stringency (e.g., ≥ 5 thresholds) enhanced specificity (flagging only 12% of normals) but modestly reduced sensitivity (detecting 66% of abnormals; p = 1.2 × 10⁻³⁶). Complete results for all k-value combinations are summarized in Appendix Table A.2, highlighting the tradeoff between sensitivity and specificity as the number of required abnormal features increases.
Discussion
This study demonstrates that optical coherence tomography (OCT) measurements, particularly total retinal and optic nerve layer thicknesses, show strong associations with vision abnormalities in pediatric patients with neurofibromatosis type 1 (NF-1). Consistent with previous research, including the work of Avery et al., we confirm that both macular and optic nerve scans are valuable for patient monitoring, with retinal and optic nerve layers acting as sensitive indicators of abnormal vision risk [12].
A notable finding is that models using only aggregate OCT layer thickness values can achieve high predictive accuracy, obviating the need for an exhaustive set of features. This simplification may facilitate more efficient clinical workflows, enabling comprehensive risk assessment with fewer measurements and reducing exam burden without sacrificing diagnostic performance.
A major advancement of our work is the identification of data-driven cut-off values for total OCT layers, where the risk of abnormal vision rises sharply. These thresholds provide preliminary, data-driven reference points for clinicians, supporting more nuanced risk stratification and moving beyond binary interpretation. The observed nonlinear relationship between OCT layer thickness and abnormal vision further emphasizes the value of these cut-offs as exploratory tools for understanding potential risk patterns rather than as established clinical decision thresholds.
We extended this approach by evaluating the cumulative predictive value of OCT thresholds, showing that the likelihood of abnormal vision increases as more layers cross their respective cut-off values. Our “k-out-of-n” analysis illustrates a potential framework for adjusting sensitivity and specificity, allowing risk assessment to be tailored to individual clinical contexts and priorities. This cumulative strategy may offer insights relevant to future screening or surveillance research, particularly for high-risk populations.
The application of machine learning, specifically the Balanced Random Forest (BRF) classifier, demonstrated strong performance in distinguishing normal from abnormal vision, underscoring the potential of AI-driven tools to augment clinical decision-making. Importantly, the BRF model offered the highest sensitivity among models with controlled overfitting, a priority that aligns with the clinical need to minimize missed cases of vision impairment in pediatric NF-1.These methods may inform future approaches aimed at earlier detection, pending prospective validation, and support more proactive, individualized patient care.
In contrast to prior studies that applied deep learning directly to OCT images, our approach uses traditional machine learning models trained on quantitative OCT layer measurements. While image-based deep learning can achieve strong performance, these methods often function as “black boxes,” limiting insight into the anatomical drivers of model predictions. In contrast, using structured OCT features enables clearer interpretability through SHAP analyses, making it possible to understand which retinal and nerve layers contribute most to the model’s predictions. This feature-level transparency may offer practical advantages for clinical adoption, where explicable reasoning is often preferred.
By proposing specific OCT cut-off values for risk stratification, our findings offer a framework for refining current treatment protocols, which are often adjusted only after significant visual or visual field loss has occurred. These results should be interpreted as hypothesis-generating and suggest that OCT-derived metrics could inform hypotheses for future prospective studies evaluating earlier intervention approaches. Prospective, multi-center studies are needed to validate these findings and to assess whether AI-informed management can improve visual outcomes and quality of life for children with NF-1.
Despite the strengths of this study, several limitations should be acknowledged. First, the retrospective design restricts our ability to infer causality. The study population was homogeneous, limited to pediatric NF-1 patients from a single center, which reduces the generalizability of the findings to other populations. In addition, some demographic variables such as sex and ethnicity were not available in the source dataset and therefore could not be analyzed or included in the baseline characteristics. An additional limitation is that transient events, such as inflammation, which may temporarily alter or affect OCT layer thickness in patients with abnormal vision, were not explicitly accounted for. However, only 16 patients (9.5%) demonstrated any transition between normal and abnormal vision, suggesting that the impact of such events on our results is likely minimal. Nevertheless, future studies should further investigate this aspect. Furthermore, the interpretability of SHAP values is inherently model-dependent; the identified feature importance is specific to the machine learning algorithms employed and may not translate directly to other models or clinical scenarios.
Looking ahead, multi-center, prospective studies involving larger and more diverse patient cohorts are needed to validate the generalizability and predictive accuracy of the OCT cut-off values proposed here. Further research should also address the interpretability and clinical integration of machine learning models, including how AI-driven decision support can best be incorporated into ophthalmic workflows without compromising ethical standards. Finally, longitudinal studies assessing the real-world impact of AI- and OCT-informed interventions on patient outcomes are essential for understanding the long-term benefits of early intervention and for refining predictive models to maximize their clinical utility.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
C.F.C. and A.G. conceived and designed the study. J.P.B., H.S., A.L.C., and E.P.R. collected and curated the data. A.G. performed statistical and machine learning analyses and interpreted the results. C.F.C. and J.G.P. contributed to clinical interpretation and manuscript editing. A.G. drafted the manuscript. C.F.C. and J.G.P. supervised the project. All authors approved the final version of the manuscript.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Data availability
The dataset analysed during the current study are available from the corresponding author on reasonable request.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Lisch, K. Ueber beteiligung der Augen, Insbesondere Das Vorkommen von Irisknötchen Bei der neurofibromatose (Recklinghausen). Z. für Augenheilkunde. 93, 137–143. 10.1159/000299316 (2010). [Google Scholar]
- 2.Huson, S. M., Harper, P. S. & Compston, D. A. Von Recklinghausen neurofibromatosis. A clinical and population study in south-east Wales. Brain111 (Pt 6), 1355–1381. 10.1093/brain/111.6.1355 (1988). [DOI] [PubMed] [Google Scholar]
- 3.Gutmann, D. H. et al. The diagnostic evaluation and multidisciplinary management of neurofibromatosis 1 and neurofibromatosis 2. JAMA278, 51–57 (1997). [PubMed] [Google Scholar]
- 4.Williams, V. C. et al. Neurofibromatosis type 1 revisited. Pediatrics123, 124–133. 10.1542/peds.2007-3204 (2009). [DOI] [PubMed] [Google Scholar]
- 5.Neurofibromatosis. Conference statement. National Institutes of Health Consensus Development Conference. Arch Neurol 45, 575–578. (1988). [PubMed]
- 6.Listernick, R., Darling, C., Greenwald, M., Strauss, L. & Charrow, J. Optic pathway tumors in children: the effect of neurofibromatosis type 1 on clinical manifestations and natural history. J. Pediatr.127, 718–722. 10.1016/s0022-3476(95)70159-1 (1995). [DOI] [PubMed] [Google Scholar]
- 7.Balcer, L. J. et al. Visual loss in children with neurofibromatosis type 1 and optic pathway gliomas: relation to tumor location by magnetic resonance imaging. Am. J. Ophthalmol.131, 442–445. 10.1016/s0002-9394(00)00852-7 (2001). [DOI] [PubMed] [Google Scholar]
- 8.Freret, M. E. & Gutmann, D. H. Understanding Vision Loss from Optic Pathway Glioma in Neurofibromatosis Type 1. Annals of neurology 61 (2007/03). 10.1002/ana.21107
- 9.Listernick, R., Charrow, J., Greenwald, M. & Mets, M. Natural history of optic pathway tumors in children with neurofibromatosis type 1: a longitudinal study. J. Pediatr.125, 63–66. 10.1016/s0022-3476(94)70122-9 (1994). [DOI] [PubMed] [Google Scholar]
- 10.Fisher, J. B. et al. Relation of visual function to retinal nerve fiber layer thickness in multiple sclerosis. Ophthalmology113, 324–332. 10.1016/j.ophtha.2005.10.040 (2006). [DOI] [PubMed] [Google Scholar]
- 11.Trip, S. A. et al. Retinal nerve fiber layer axonal loss and visual dysfunction in optic neuritis. Ann. Neurol.5810.1002/ana.20575 (2005). /09/01). [DOI] [PubMed]
- 12.Gu, S., Glaug, N., Cnaan, A., Packer, R. J. & Avery, R. A. Ganglion cell layer–Inner plexiform layer thickness and vision loss in young children with optic pathway gliomas. Investig. Ophthalmol. Vis. Sci.5510.1167/iovs.13-13119 (2014). /03/01). [DOI] [PMC free article] [PubMed]
- 13.Xiang, Y. et al. Longtime vision function prediction in childhood cataract patients based on optical coherence tomography images. Front. Bioeng. Biotechnol.9, 646479. 10.3389/fbioe.2021.646479 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Leandro, I. et al. OCT-based deep-learning models for the identification of retinal key signs. Sci. Rep.13, 14628. 10.1038/s41598-023-41362-4 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Peng, Y. et al. DeepSeeNet: A deep learning model for automated classification of Patient-based Age-related macular degeneration severity from color fundus photographs. Ophthalmology126, 565–575. 10.1016/j.ophtha.2018.11.015 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Pan, Y. et al. Visual acuity norms in pre-school children: the Multi-Ethnic pediatric eye disease study. Optom. Vis. Sci.86, 607–612. 10.1097/OPX.0b013e3181a76e55 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Leone, J. F., Mitchell, P., Kifley, A. & Rose, K. A. Sydney childhood Eye, S. Normative visual acuity in infants and preschool-aged children in Sydney. Acta Ophthalmol.92, e521–529. 10.1111/aos.12366 (2014). [DOI] [PubMed] [Google Scholar]
- 18.Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods. 17, 261–272. 10.1038/s41592-019-0686-2 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lee, S. L. a. S.-I. A unified approach to interpreting model predictions. (2017). 10.48550/arXiv.1705.07874
- 20.Pedregosa, F. et al. Scikit-learn: machine learning in python. /01/02). (2012). 10.48550/arXiv.1201.0490
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The dataset analysed during the current study are available from the corresponding author on reasonable request.


