Skip to main content
Neurology logoLink to Neurology
. 2022 Sep 13;99(11):e1100–e1112. doi: 10.1212/WNL.0000000000200883

The Role of Optical Coherence Tomography Criteria and Machine Learning in Multiple Sclerosis and Optic Neuritis Diagnosis

Rachel C Kenney 1, Mengling Liu 1, Lisena Hasanaj 1, Binu Joseph 1, Abdullah Abu Al-Hassan 1, Lisanne J Balk 1, Raed Behbehani 1, Alexander Brandt 1, Peter A Calabresi 1, Elliot Frohman 1, Teresa C Frohman 1, Joachim Havla 1, Bernhard Hemmer 1, Hong Jiang 1, Benjamin Knier 1, Thomas Korn 1, Letizia Leocani 1, Elena Hernandez Martinez-Lapiscina 1, Athina Papadopoulou 1, Friedemann Paul 1, Axel Petzold 1, Marco Pisa 1, Pablo Villoslada 1, Hanna Zimmermann 1, Lorna E Thorpe 1, Hiroshi Ishikawa 1, Joel S Schuman 1, Gadi Wollstein 1, Yu Chen 1, Shiv Saidha 1, Steven Galetta 1, Laura J Balcer 1,
PMCID: PMC9536738  PMID: 35764402

Abstract

Background and Objectives

Recent studies have suggested that intereye differences (IEDs) in peripapillary retinal nerve fiber layer (pRNFL) or ganglion cell + inner plexiform (GCIPL) thickness by spectral domain optical coherence tomography (SD-OCT) may identify people with a history of unilateral optic neuritis (ON). However, this requires further validation. Machine learning classification may be useful for validating thresholds for OCT IEDs and for examining added utility for visual function tests, such as low-contrast letter acuity (LCLA), in the diagnosis of people with multiple sclerosis (PwMS) and for unilateral ON history.

Methods

Participants were from 11 sites within the International Multiple Sclerosis Visual System consortium. pRNFL and GCIPL thicknesses were measured using SD-OCT. A composite score combining OCT and visual measures was compared individual measurements to determine the best model to distinguish PwMS from controls. These methods were also used to distinguish those with a history of ON among PwMS. Receiver operating characteristic (ROC) curve analysis was performed on a training data set (2/3 of cohort) and then applied to a testing data set (1/3 of cohort). Support vector machine (SVM) analysis was used to assess whether machine learning models improved diagnostic capability of OCT.

Results

Among 1,568 PwMS and 552 controls, variable selection models identified GCIPL IED, average GCIPL thickness (both eyes), and binocular 2.5% LCLA as most important for classifying PwMS vs controls. This composite score performed best, with area under the curve (AUC) = 0.89 (95% CI 0.85–0.93), sensitivity = 81%, and specificity = 80%. The composite score ROC curve performed better than any of the individual measures from the model (p < 0.0001). GCIPL IED remained the best single discriminator of unilateral ON history among PwMS (AUC = 0.77, 95% CI 0.71–0.83, sensitivity = 68%, specificity = 77%). SVM analysis performed comparably with standard logistic regression models.

Discussion

A composite score combining visual structure and function improved the capacity of SD-OCT to distinguish PwMS from controls. GCIPL IED best distinguished those with a history of unilateral ON. SVM performed as well as standard statistical models for these classifications.

Classification of Evidence

This study provides Class III evidence that SD-OCT accurately distinguishes multiple sclerosis from normal controls as compared with clinical criteria.


Recent studies have suggested that an intereye difference (IED) in peripapillary retinal nerve fiber layer (pRNFL) thickness of 5 μm and a 4-μm IED in ganglion cell inner + plexiform layer (GCIPL) thickness by spectral domain optical coherence tomography (SD-OCT) optimizes identification of people with multiple sclerosis (Pw--MS) who harbor a history of unilateral optic neuritis.13

The diagnostic utility of IEDs by OCT to detect a history of optic neuritis (ON) requires further validation. Machine learning techniques have developed significantly over the past several years and may be useful for validating IED thresholds by OCT. Such models may examine the added utility of visual function tests, such as low-contrast letter acuity (LCLA), for confirming the history of acute ON.

Support vector machine (SVM) and other machine learning classifier (MLC) methods have yielded prediction models with high degrees of accuracy, sensitivity, and specificity. In 1 study, the MLC method out performed general ophthalmologists for accurate diagnosis of glaucoma.4 These models use areas under the curves (AUCs) for receiver operating characteristic (ROC) curve methods to measure model performance. ROC analysis has shown that pRNFL thickness is an effective discriminator for glaucoma vs healthy eyes.5,6

There is a paucity of studies aimed at evaluating the utility of MLCs for distinguishing PwMS from controls or PwMS with ON history vs those without. One small study by Del Palomar et al. found that MLCs with swept source OCT showed high levels of ability to differentiate PwMS vs controls for pRNFL thickness; this was not the case for GCIPL thickness.7 A recent study by Cavaliere et al. used SVM classifiers to distinguish control subjects from PwMS without ON and demonstrated sensitivity = 89%, specificity = 92%, and AUC = 0.97 when applying machine learning techniques for retinal neurodegeneration. In contrast to the study by Del Palomar et al., these authors found total macular GCIPL thickness to be one of the most discriminant variables.8

Further studies incorporating intereye thickness differences using SD-OCT are needed to investigate the ability of MLCs to distinguish PwMS vs healthy controls and ON vs non-ON eyes in PwMS in larger cohorts within a wider range of ages and countries. The primary research question being addressed in this study was to assess the utility of OCT in MS and ON diagnosis. The purpose of the current investigation was to (1) determine the best diagnostic predictors using SD-OCT, low-contrast visual acuity, and high-contrast visual acuity and demographics for distinguishing PwMS from controls; (2) determine best diagnostic predictors to identify those with a history of unilateral ON within a cohort of PwMS; (3) examine the ability of MLCs to classify these groups using best determined predictors. We hypothesized that adding visual parameters and demographics would improve the ability of OCT to detect MS and ON history and that MLCs would also perform well in classifying these groups.

Methods

Study Cohort

Participants for this study included a convenience sample of 546 control subjects with no history of neurologic or ophthalmologic disease and 1,568 PwMS who fulfilled the 2010 McDonald criteria for diagnosis. All participants were aged 18 years or older and had spherical diopter refractive error between −6 and +6. Age, race, and sex were included where available. Participants were part of an 11-site collaboration (International Multiple Sclerosis Visual System consortium) in the United States, Europe, and the Middle East. Data collection methods have been previously described.2 Subjects with ocular comorbidities were excluded. Patients with an episode of acute ON within 3 months before data collection or those with other demyelinating diseases such as neuromyelitis optica spectrum disorder or myelin oligodendrocyte glycoprotein antibody disease were excluded (Figure 1).

Figure 1. Patient Inclusion Diagram.

Figure 1

This figure describes subjects assessed for eligibility in this study and shows excluded participants and numbers included for each SD-OCT and visual acuity assessment. anti-MOG = anti–myelin oligodendrocyte protein; GCIPL = ganglion cell + inner plexiform layer; HCVA = high-contrast visual acuity; LCVA = low-contrast visual acuity; NMO = neuromyelitis optica; SD-OCT = spectral domain optical coherence tomography; ON = optic neuritis; pRNFL = peripapillary retinal nerve fiber layer.

Standard Protocol Approvals, Registrations, and Patient Consents

Study protocols were approved by institutional review boards, and all participants provided written informed consent.

Visual Assessments

LCLA was measured monocularly and binocularly at 2.5% and 1.25% contrast levels using low-contrast Sloan letter charts (Precision Vision, LaSalle, IL) in a retro-illuminated light box.9 Scores are recorded as the numbers of letters read correctly/70 per chart. The Early Treatment Diabetic Retinopathy Study (ETDRS) visual acuity charts or Snellen acuity charts were used to measure high-contrast visual acuity (HCVA). Scores were normalized and converted to the ETDRS scale. LCLA and HCVA were not performed at every site (Figure 1); as such, only subjects with these measurements were included in analyses. Refraction and visual field assessments were performed at 1 site (New York University).

Optical Coherence Tomography

Spectralis SD-OCT (Heidelberg Engineering, Heidelberg, Germany) or Cirrus HD-OCT (Carl Zeiss Meditec, Dublin, CA) scans were obtained at each site by trained technologists.2 pRNFL thickness was measured using a 3.4-mm diameter circle centered on the optic disc. Macular GCIPL thickness was also assessed. GCIPL measurements were obtained from automated macular volume cube scans, with a measurement area of 4- × 5-mm annulus surrounding the fovea10 on Cirrus OCT and from the macular volume scan encompassing 6- × 6-mm cylinder surrounding the fovea on Spectralis OCT. OCT scans from Cirrus and Spectralis devices were reviewed, and OSCAR-IB criteria for quality control were followed.11 Scans not meeting the quality control criteria, including not being centered properly, were excluded. OCT results were reported in concordance with the APOSTEL 2.0 guidelines.12,13 Because 2 different devices, Spectralis and Cirrus, were used based on site protocols, Spectralis SD-OCT measurements were converted to the Cirrus scale using equations established by concurrent work (unpublished data):

  • [Cirrus = −5.0 + 1.0 × Spectralis pRNFL]

  • [Cirrus = −4.5 + 0.9 × Spectralis GCIPL]

These conversion equations were developed using structural equation modeling (R2 = 0.996), accounting for clustering, with control data (n = 346) from 1 site where participants were scanned on both devices on the same day.

Optic Neuritis

The history of acute ON was defined as an exacerbation consistent with inflammatory demyelinating event affecting the optic nerve, self-reported by the patient, and verified by medical record review. Clinical features considered in ON diagnosis were pain on eye movement, subacute visual loss, reduced visual acuity, afferent pupillary defect, and reduced color vision. The episode of ON was categorized as unilateral (n = 488) or bilateral (n = 132). ON refers to any instance of optic neuritis, and, where applicable, will be classified as unilateral or bilateral. People with an uncertain history of ON (n = 74), either because the data were not available or because the patient was not sure of their history, were excluded from the analyses of IED thresholds.

Statistics

Descriptive statistics were used to report continuous variables (mean ± SD), assuming normality of distribution, with categorical variables as frequencies and percentages. PRNFL and GCIPL IEDs were calculated by subtracting right eye from left eye values and reporting the absolute value. Average pRNFL and GCIPL thicknesses were calculated as averages between the 2 eyes. Conversion equations, described above, were used to pool data from Spectralis and Cirrus OCT devices. Each potential diagnostic criterion (Table 1) was standardized to have a mean of 0 and a standard deviation of 1. A composite score combining the above diagnostic criteria was developed to assess for improvements in areas under the ROC curves. The composite score was developed using logistic regression with the dichotomous outcome of MS vs control diagnosis.

Table 1.

Potential Variables Used in Classification Models

graphic file with name WNL-2022-200850T1.jpg

Accuracies and AUCs for several models were compared to find the best model for distinguishing PwMS from controls. Forward and backward selections using logistic regression were used to determine variables above that had the most significant contributions to the models. Classification and regression tree (CART) models were used to determine variable importance. The CART model was developed using a training data set (2/3 of cohort) and tested on a testing data set (1/3 of cohort). We compared the following 5 models: (1) a full model with all variables, (2) a partial model with variables with a high numbers of missing data excluded (disease duration and binocular acuities), (3) models with variables from forward and backward selections, (4) a model with the 3 most important variables from the CART model, and (5) a model with the most clinically relevant variables. Composite scores for the 4 best models were developed and compared using ROC curves for the training data set (2/3 of cohort) and then applied to the testing data set (remaining 1/3 of cohort). Training and testing data sets were chosen randomly.

Comparisons of the AUCs were performed in a pair-wise fashion accounting for within-patient, intereye correlations and modeled after DeLong et al.'s14 nonparametric approach, accounting for the correlated nature of the data. Cut-points, or thresholds for the composite score, were chosen based on the 1-sum of squares method. Because HCVA can be measured using the numbers of letters correct or logMAR units, the composite scores incorporated both scoring methods.

A second composite score was developed based on the results of a logistic regression model with the dichotomous outcome of the presence or absence of history of ON and included weights based on beta effect estimates for significant diagnostic criterion for classifying the unilateral ON history in PwMS. Sensitivity and specificity for the overall pRNFL and GCIPL thinning, intereye and overall LCLA scores, and the composite scores were calculated and compared. Sensitivity and specificity for pRNFL and GCIPL IEDs have been previously analyzed and published.2 Finally, a third composite score for the history of ON by eye was developed and compared. A binary composite score was also developed based on the optimal cut-points identified on ROC curve analysis for classification of MS history, unilateral ON history by patient, and ON history by eye. Analyses were performed using Stata 16.0 and R.

SVM analysis was used to assess diagnostic capability of OCT to distinguish PwMS from controls and to assess potential incremental capacity by AUCs compared vs logistic regression models. SVM is a supervised machine learning method that seeks to define a hyperplane to separate 2 classes (i.e., MS vs controls) with maximized margins. A margin is the distance between the nearest data point in either set and the hyperplane. SVM models input data and can transform it using mathematical functions to nonlinear models using various kernels. Linear, radial, polynomial, and sigmoid kernels were evaluated in these analyses. Only complete cases with no missing data were included in SVM models. To determine best cost and gamma parameters, 10-fold cross-validation was used. Models evaluated included a full model with all variables (Table 1), and a model with a subset of variables used to develop the composite score. SVM models were developed using a training data set (2/3 of cohort) and independent testing data set (1/3 of cohort). AUCs were compared to assess the best model fit. Our data complied with the OSCAR-AI (RASCO) criteria15 where applicable.

Data Availability

Anonymized data may be made available by reasonable request from a qualified investigator.

Results

Study Cohort

For the 2,120 participants (1,568 PwMS, 552 controls, Figure 1), characteristics are presented in Table 2. The history of ON (unilateral or bilateral) was reported in 39.5% of PwMS (n = 620/1,568). The MS cohort had a larger proportion of female patients than controls (70% vs 55%, p < 0.001). As would be expected, PwMS, particularly those with ON history, had lower pRNFL and GCIPL thicknesses, lower (worse) LCLA and HCVA scores, and larger IEDs for all measures.

Table 2.

Baseline Demographics for Study Cohort (2008–2017)

graphic file with name WNL-2022-200850T2.jpg

Composite Score for MS Disease Classification

Variable selection models using logistic regression identified GCIPL IEDs, average GCIPL thicknesses for both eyes, pRNFL IEDs, binocular 2.5% contrast LCLA scores, and binocular HCVA scores as most important measures for distinguishing PwMS vs controls. The CART model identified GCIPL IED, average GCIPL thickness for both eyes, and binocular 2.5% LCLA as the most important variables. The 2 models (logistic regression variable selection and CART) were compared, and the model with fewer variables performed equally well (p = 0.42). Therefore, GCIPL IED, average GCIPL thickness for both eyes, and binocular 2.5% LCLA were combined to form a composite score. This composite score had the best performance among all the models tested for distinguishing PwMS vs controls. AUC was 0.89 (95% CI 0.85–0.93), with sensitivity 81%, specificity 80%, and accuracy 81%. Figure 2 shows the ROC curve for this model and for each individual measure contributing to the composite score. Comparisons of ROC curves for composite score to those for each individual measure showed that the composite score was a better model (p < 0.0001 for intereye GCIPL and GCIPL average of both eyes; p = 0.0002 for binocular 2.5% LCLA). Sensitivity, specificity, and AUCs for each OCT measure and for the composite score are presented in Table 3. A model substituting HCVA for LCLA in the composite score was evaluated because HCVA measures may be more available clinically than LCLA. This model with GCIPL IED, average GCIPL thickness of both eyes, and binocular HCVA also performed very well, with AUC 0.89 (95% CI 0.85–0.93), sensitivity 86%, specificity 78%, and accuracy 84% (Table 4). Scores greater than 0 indicate higher likelihood of MS for all composite scores (Figure 3).

Figure 2. ROC Curve Analysis for MS Disease Classification Using a Composite Score.

Figure 2

This figure illustrates the ROC curve analysis using the composite score (113 + 4 × intereye GCIPL − GCIPL thickness average of both eyes − binocular LCLA) to classify PwMS from healthy controls. A composite score of 0 is the cut-point where sensitivity and specificity are optimized; a score greater than 0 is associated with a higher likelihood of MS disease status. The area under the ROC curve (0.89) demonstrates excellent discriminatory power. The ROC curves for the individual components of the composite score are plotted for comparison. GCIPL = ganglion cell + inner plexiform layer; LCLA = low-contrast letter acuity; PwMS = people with multiple sclerosis; ROC = receiver operating characteristic.

Table 3.

Performance of Classification Models

graphic file with name WNL-2022-200850T3.jpg

Table 4.

Composite Scores for the MS Disease Classification Model

graphic file with name WNL-2022-200850T4.jpg

Figure 3. Box plot of Composite Score for Classification of MS Diagnosis.

Figure 3

This figure illustrates a box plot for the composite score (113 + 4 × intereye GCIPL − GCIPL thickness average of both eyes − binocular LCLA) for PwMS a healthy controls. A score of 0 indicates a higher likelihood of MS disease status (red reference line). The lower boundary of the box indicates the 25th percentile. The upper boundary of the box indicates the 75th percentile. The line within the box indicates the median value of the composite score for that group. Dots above the plot indicate outliers. GCIPL = ganglion cell + inner plexiform layer; LCLA = low-contrast letter acuity; PwMS = people with multiple sclerosis; ROC = receiver operating characteristic.

The cut-points identified for each measure, where sensitivity and specificity are optimized using ROC curve analysis, were as follows: GCIPL IED ≥3 μm, average GCIPL thickness for both eyes ≤80 μm, and binocular LCLA at 2.5% contrast ≤42 letters (20/40 Snellen). When considering these binary outcomes, if a patient met all 3 criteria, the specificity for MS diagnosis was 99.5%, with a sensitivity of 11.4%. For comparison, the sensitivities and specificities for each individual binary measure alone were as follows: 70.6% and 73.4% for GCIPL IED ≥3 μm, 40.7% and 81.3% for average GCIPL thickness for both eyes ≤80 μm, and 33.8% and 83.3% for binocular LCLA at 2.5% contrast ≤42 letters (20/40 Snellen). The threshold of GCIPL IED ≥4 μm was previously published (ref) and is discussed in the following paragraph for ON classification. We substituted this threshold instead of GCIPL IED ≥3 μm in the binary composite score for MS classification to compare results and found sensitivity = 9.9% and specificity = 98.9%. In addition, GCIPL IED ≥4 μm was evaluated individually and found to have sensitivity = 44.9% and sensitivity = 96.7%, with an AUC = 0.75.

Composite Score for ON Classification

GCIPL IED, average GCIPL thickness for both eyes, and pRNFL IED were identified as best for identifying a history of unilateral ON in PwMS. The composite score combining these measures did not improve the model (AUC = 0.77, 95% CI 0.71–0.83) compared with GCIPL IED alone (AUC = 0.77, 95% CI 0.71–0.83, sensitivity = 68%, specificity = 77%, Table 3). The binary composite score (GCIPL IED ≥4 μm, average GCIPL thickness ≤75 μm, and pRNFL IED ≥5 μm) had high specificity (97.9%) and low sensitivity (7.1%).

For classifying the ON history by eye, instead of by patient, GCIPL IED, average GCIPL thickness for both eyes, overall pRNFL thickness (by eye), average pRNFL thickness of both eyes, and age performed well. When these measurements were combined into a composite score, the composite had AUC = 0.72 (95% CI 0.67–0.76) and optimized sensitivity (58%) and specificity (77%). Sensitivity, specificity, and AUCs for each separate OCT measure and for the composite scores are presented in Table 3. The composite score did not perform better than any individual OCT measure for identifying the history of ON in that eye. Sensitivity, specificity, and AUCs for the 4 best models are shown in eTable 1 (links.lww.com/WNL/C156).

Support Vector Machine

Among 482 participants with complete data who were included in the SVM analysis, the training set included n = 321 and the testing data set included n = 161. The linear kernel, which can be used when data are linearly separable, performed best when compared with radial, sigmoid, and polynomial kernels. The linear model had 58 support vectors when used on the testing data set. SVM and logistic regression both performed best, as would be expected, when using all the potential variables in the model (Table 1). The SVM method yielded sensitivity = 86%, specificity = 79%, accuracy = 84%, and AUC = 0.95 in correctly identifying PwMS vs controls. Using the logistic regression model, there was sensitivity = 86%, specificity = 89%, accuracy = 88%, and AUC = 0.96. Using the subset of variables identified for the composite score performed comparably, the SVM method yielded sensitivity = 83%, specificity = 90%, accuracy = 88%, and AUC = 0.93. The logistic regression model yielded sensitivity = 81%, specificity = 87%, accuracy = 85%, and AUC = 0.93. Similar results were seen for classification of ON history within PwMS. The linear kernel performed best, and the AUC did not improve using SVM compared with logistic regression models. SVM showed an AUC = 0.89, while logistic regression yielded AUC = 0.91 using the full model with all variables.

Classification of Evidence

Classification of Evidence: This study provides Class III evidence that SD-OCT accurately distinguishes multiple sclerosis from normal controls as compared with the clinical criteria.

Discussion

The results of this investigation demonstrate that a composite score combining GCIPL IED, average GCIPL thickness of both eyes, and 2.5% LCLA improved the capacity of single-eye SD-OCT measurements alone to distinguish PwMS from healthy controls. GCIPL IED was the most informative measure for identifying unilateral ON among PwMS. SVM analyses were equally effective vs standard statistical models. All study objectives were met and hypotheses were confirmed to demonstrate that OCT and visual function measures are able to distinguish PwMS vs controls as well as to identify those with a history of ON.

Individual SD-OCT measures did not perform as well as the composite score in distinguishing PwMS vs controls. This could be because not all PwMS have optic nerve disease or retinal thinning. In addition, even if a PwMS does have ON history, degrees of retinal thinning could have OCT values in normal ranges. The composite score for classification of PwMS incorporates GCIPL IEDs, average GCIPL of both eyes, and binocular 2.5% LCLA. GCIPL is a measure of visual structure, while LCLA and HCVA measure function. Including 3 potential measures that incorporate both increases SD-OCT sensitivity and specificity. The model that incorporates the same GCIPL variables, combined with HCVA, also performed well in distinguishing PwMS from controls. Because HCVA is more common in clinical practice, the prediction equation with HCVA may be more broadly applicable. Models incorporating monocular acuity did not perform as well. The GCIPL IED component of the composite score would argue for an asymmetric optic neuropathy while a reduced binocular 2.5% LCLA score would imply a visual deficit. It could be that having an asymmetric GCIPL could produce a binocular vision deficit. For clinical applications, it is important to consider the context. For example, if a patient came in with a first demyelinating episode suggestive of MS and then was found to have these OCT and visual deficits, it could argue strongly for the optic nerve as an involved site. However, if a patient had no other neurologic complaints, then it would be not enough to make a diagnosis or to start MS treatment. However, it could most certainly prompt further investigation. Similarly, if a patient had other symptoms or a clinical history suggestive of optic neuritis, a GCIPL IED on OCT could confirm the diagnosis. In the context of the MS McDonald diagnostic criteria, GCIPL IED could potentially satisfy requirements for dissemination in time or space as an optic nerve event.

A composite score did not improve the capacity for OCT to distinguish people (or eyes) with a unilateral ON history within the MS cohort compared with previously published results using GCIPL IED alone.2 A recent study by Coric et al.3 showed a diagnostic sensitivity of 70% for GCIPL IED (as a percentage difference), with a specificity of 97% for distinguishing people with unilateral ON vs healthy controls. The specificity in that study may be higher because healthy controls were the comparison group; in this study, PwMS without ON history represented the comparison group. Our results confirm high degrees of capacity for GCIPL IED to distinguish eyes and people with an ON history. Therefore, to identify the ON history within an MS cohort, the simplest model that uses GCIPL IED is sufficient, with a threshold of 4 μm. However, caution should be used when interpreting results in a clinical setting; other mimickers of MS and ON should be considered before making a diagnosis given the moderate sensitivity (68%) and specificity (77%) of this measure.

Binary composite scores (using the measures in the composite score, such as GCIPL IED, and incorporating a normal/abnormal component above or below the cut-points for each measure), for both identification of PwMS vs controls and for distinguishing people with vs without ON history, had high levels of specificity (99.5% and 97.9%, respectively) with low levels of sensitivity (11.4% and 7.1%, respectively). High specificity correlates with low false positive rate, and lower degrees of IED may be useful to identify those who do not have disease (MS or ON). Low levels of sensitivity may indicate a cautious test with a high likelihood of false negatives. For tests that are candidates for the diagnostic criteria, high levels of specificity are important to avoid false positive rates.16 In the context of adding a test to the diagnostic criteria for MS or to identify ON history, a test with high specificity may be useful. People who are classified as positive on the test are very likely to have MS/ON history; hence, coupled with other diagnostic tests, the OCT results could potentially improve the overall diagnostic criteria. In situations for which high levels of sensitivity and specificity are preferred, the composite score with continuous variables would be more appropriate. In clinical practice, if a patient has abnormal LCLA, a large GCIPL IED, and GCIPL thinning in 1 or both eyes, this increases likelihood of an optic neuropathy, potentially MS-related, although other causes of optic nerve injury should be ruled out before confirming a diagnosis.

The linear SVM model performed best, compared with nonlinear SVM models. However, logistic regression models performed just as well; this method may produce more interpretable results compared with SVM models. The comparability in performance between SVM and logistic regression models could also mean that machine learning models could be useful in certain contexts, for example, when no neuro-ophthalmologist is available to interpret results. Similar degrees of performance between logistic regression and linear SVM models suggest that variables in the models are likely to be linear. Nonlinear models may not be needed in this case. Machine learning methods work best with large sample sizes and large numbers of feature variables; the present analyses have a relatively small sample sizes and relatively small numbers of feature variables compared with traditional machine learning cohorts. This may help explain why machine learning models did not improve diagnostic accuracy for this particular study cohort. It is reassuring that human-driven design and interpretation of statistical models, which should be planned ahead of research collection, can be just as effective as high-performing machine learning methods in certain instances. Further studies could explore other supervised machine learning methods, such as Random Forest or Gradient Tree Boosting, which have performed well in glaucoma diagnostics. Deep learning/artificial intelligence models using SD-OCT images may be useful for future studies of diagnosis for MS and for investigation of the role for ON and optic nerve lesions as MS diagnostic criteria.

Limitations of this study include the cross-sectional design. Because longitudinal data for these investigations are not yet available, predictions of change over time, such as the capacity to predict future ON or future conversion to MS based on current diagnostic criteria, are yet to be determined. Another limitation is the fact that GCIPL thickness and HCVA were not measured for all sites. In addition, 74 participants did not have information on ON history. These limitations were addressed in detail in our initial publication on IEDs in OCT measures.2 We would not expect bias to be introduced from the small amount of missing data. Participants with no GCIPL, HCVA, or ON history data were similar in all baseline characteristics compared with those with these measures included. Participants who did not have LCLA had slightly longer disease durations than did those with LCLA measurements (10.2 vs 6.7 years); however, distributions of Expanded Disability Status Scale scores, disease subtypes, pRNFL and GCIPL values were similar. It is unlikely that this subgroup had more severe optic nerve disease, and therefore, we would not expect the results to be biased by these characteristics. The healthy control cohort included 55% female patiebts while the PwMS cohort was 70% female. In a previous study using these data, sensitivity analyses were performed to compare the cohorts; it was determined that no significant bias would come from these discrepancies in sex distribution.2 Although there could be site-related effects, site was not included as a covariate in the statistical models. Site would not be interpretable in the composite score or machine learning models. To overcome potential site differences, the conversion equation described was used to normalize data. Furthermore, other factors such as age, sex, and race/ethnicity were included in the models. Given these adjustments, we do not expect any bias was introduced from site-related effects.

The data were normalized to be on the scale of Zeiss devices; however, these data can be converted to Spectralis scale using the conversion equations provided. IEDs across Zeiss and Spectralis are comparable and not subject to the overall thickness of the retina or the device used based on our previous study.2 We would expect this to be true for all OCT devices; therefore, IED could be used as a measure regardless of device used. However, the average GCIPL measure incorporated into the composite score for MS diagnosis could be affected by device differences. If other devices are used, a conversion equation would need to be developed to convert to that device's scale or a consistent segmentation algorithm used on both devices. Similarly, macular grid size of 1, 3, and 6 mm were used to measure GCIPL thickness. If an alternate grid size was used, a comparison would need to be made to assess the level of adjustment needed.

A composite score incorporating GCIPL IED thickness, average GCIPL thickness for both eyes, and binocular 2.5% LCLA (or HCVA) improved diagnostic accuracy of SD-OCT for differentiating PwMS vs controls. A GCIPL IED thickness of ≥4 μm is the best measure to distinguish ON history in PwMS. The results of this study provide strong preliminary data on which further studies could be designed to evaluate the utility of IEDs and of the composite score to identify optic nerve lesions as a fifth anatomic lesion site for the next revision of the McDonald criteria for MS. For the composite score to be added to the MS diagnostic criteria, a subsequent study may be conducted showing the capacity of the score to potentially increase sensitivity and specificity of the current diagnostic criteria. This may require substitution of the vision composite score with current criteria or could be performed by developing a protocol using data from patients with clinically isolated syndromes followed over time. Such studies would incorporate the vision composite score at baseline to see whether it would correctly capture the patients who progress to clinically definite MS. GCIPL IEDs could also be considered for inclusion in the MS diagnostic criteria as evidence of a potentially MS-related optic neuropathy and as a proxy for an optic nerve lesion. A similar study as described for the composite score could be designed to evaluate the capacity of this measure to improve MS diagnosis. Thus, IED could be used to identify a fifth lesion site, while the composite score could be most helpful to distinguish MS patients from healthy controls. As such, both could be additive to the diagnostic criteria. Machine learning models performed as well as logistic regression, suggesting that these models should be explored further studies of MS and ON, perhaps incorporating raw SD-OCT image analyses.

Glossary

AUC

area under the curve

CART

classification and regression tree

ETDRS

Early Treatment Diabetic Retinopathy Study

GCIPL

ganglion cell inner + plexiform layer

HCVA

high-contrast visual acuity

IED

intereye difference

LCLA

low-contrast letter acuity

MLC

machine learning classifier

ON

optic neuritis

pRNFL

peripapillary retinal nerve fiber layer

PwMS

people with multiple sclerosis

ROC

receiver operating characteristic

SD-OCT

spectral domain optical coherence tomography

SVM

support vector machine

Appendix. Authors

Appendix.

Footnotes

Editorial, page 453

Class of Evidence: NPub.org/coe

Study Funding

P.A. Calabresi is supported by R01NS082347 Joachim Havla is (partially) funded by the German Federal Ministry of Education and Research (Grant Numbers 01ZZ1603[A-D] and 01ZZ1804[A-H] (DIFUTURE)). B. Hemmer received funding for the study by the European Union's Horizon 2020 Research and Innovation Program (grant MultipleMS, EU RIA 733161) and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy within the framework of the Munich Cluster for Systems Neurology (EXC 2145 SyNergy ID 390857198) B. Knier is funded by the Else Kröner-Fresenius-Stiftung (Else Kröner-Fresenius Exzellenzstipendium). T. Korn is funded by the DFG (SFB1054-B06, TRR128-A07, TRR128-A12, TRR274-A01, Synergy Cluster of Excellence, EXC 2145, ID 390857198) and by the ERC (CoG 647215) Joel S. Schuman received funding from the NIH (Bethesda, MD) R01-EY013178. An unrestricted grant from Research to Prevent Blindness (New York, NY) to the Department of Ophthalmology, NYU Langone Health, NYU Grossman School of Medicine, New York, NY. P. Villoslada is funded by Instituto de Salud Carlos III, Spain and Fondo Europeo de Desarrollo Regional (PI15/0061).

Disclosure

A. Brandt is named as inventor on several patents and patent applications describing multiple sclerosis serum biomarkers, retinal image analysis methods and human pose estimation methods. He is cofounder and holds shares in companies Motognosis and Nocturne. P.A. Calabresi is PI on grants to Johns Hopkins from Principia and Genentech. He has consulted for Disarm, Nerveda, Biogen and Avidea. E. Frohman has received consulting and speaker fees from Novartis, Genzyme, Biogen, Alexion, and Janssen. T.C. Frohman has received consulting fees from Alexion. Steven Galetta has been a consultant for Genentech. J. Havla reports grants for OCT research from the Friedrich-Baur-Stiftung and Merck, personal fees and nonfinancial support from Celgene, Merck, Alexion, Novartis, Roche, Santhera, Biogen, Heidelberg Engineering, Sanofi Genzyme and nonfinancial support of the Guthy-Jackson Charitable Foundation, all outside the submitted work. B. Hemmer has served on scientific advisory boards for Novartis; he has served as DMSC member for AllergyCare, Polpharma and TG therapeutics; he or his institution have received speaker honoraria from Desitin; his institution received research grants from Regeneron for MS research. He holds part of 2 patents; 1 for the detection of antibodies against KIR4.1 in a subpopulation of patients with MS and 1 for genetic determinants of neutralizing antibodies to interferon. All conflicts are not relevant to the topic of the study. B. Knier received travel support and a research grant from Novartis (Oppenheim research award). A. Papadopoulou has received speaker-fee from Sanofi-Genzyme and travel support from Bayer AG, Teva and Hoffmann-La Roche. Her research was/is being supported by the University and University Hospital of Basel, the Swiss Multiple Sclerosis Society, the Stiftung zur Förderung der gastroenterologischen und allgemeinen klinischen Forschung sowie der medizinischen Bildauswertung and the Swiss National Science Foundation (Project number: P300PB_174480). F. Paul serves as an Associate Editor for Neurology® Neuroimmunology & Neuroinflammation, reports research grants and speaker honoraria from Bayer, Teva, Genzyme, Merck, Novartis, MedImmune and is member of the steering committee of the OCTIMS study (Novartis). A. Petzold is supported by the National Institute for Health Research (NIHR) Biomedical Research Centre based at Moorfields Eye Hospital NHS Foundation Trust and UCL Institute of Ophthalmology. A. Petzold is part of the steering committee of the ANGI network which is sponsored by ZEISS, steering committee of the OCTiMS study which is sponsored by Novartis and reports speaker fees from Heidelberg-Engineering. P. Villoslada holds stocks and have received consultancy fees from Accure therapeutics, Spiral Therapeutics, QMENTA, Attune Neurosciences, CLight, NeuroPrex and Adhera Health. H. Zimmermann received research grants from Novartis and speaking honoraria from Bayer Healthcare. J.S. Schuman, Aerie Pharmaceuticals, Inc.: Consultant/Advisor, Equity Owner, BrightFocus Foundation: Grant Support, Boehringer Ingelheim: Consultant/Advisor, Carl Zeiss Meditec: Patents/Royalty/Consultant/Advisor, Massachusetts Eye and Ear Infirmary and Massachusetts Institute of Technology: Intellectual Property, National Eye Institute: Grant Support, New York University: Intellectual Property, Ocugenix: Equity Owner, Patents/Royalty, Ocular Therapeutix, Inc.: Consultant/Advisor, Equity Owner, Opticient: Consultant/Advisor, Equity Owner, Perfuse, Inc.: Consultant/Advisor, Regeneron, Inc.: Consultant/Advisor, SLACK Incorporated: Consultant/Advisor, Tufts University: Intellectual property, University of Pittsburgh: Intellectual property. S. Saidha has received consulting fees from Medical Logix for the development of CME programs in neurology and has served on scientific advisory boards for Biogen, Genzyme, Genentech Corporation, EMD Serono & Celgene. He has consulted for Carl Zeiss Meditec. He is the PI of investigator-initiated studies funded by Genentech Corporation and Biogen Idec, and received support from the Race to Erase MS foundation. He has received equity compensation for consulting from JuneBrain LLC, a retinal imaging device developer. L.J. Balcer is editor-in-chief of the Journal of Neuro-Ophthalmology. All other authors report no relevant disclosures. Go to Neurology.org/N for full disclosures.

References

  • 1.Nolan RC, Galetta SL, Frohman TC, et al. Optimal intereye difference thresholds in retinal nerve fiber layer thickness for predicting a unilateral optic nerve lesion in multiple sclerosis. J Neuroophthalmol. 2018;38(4):451-458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Nolan-Kenney RC, Liu M, Akhand O, et al. Optimal intereye difference thresholds by optical coherence tomography in multiple sclerosis: an international study. Ann Neurol. 2019;85(5):618-629. [DOI] [PubMed] [Google Scholar]
  • 3.Coric D, Balk LJ, Uitdehaag BMJ, Petzold A. Diagnostic accuracy of optical coherence tomography inter-eye percentage difference for optic neuritis in multiple sclerosis. Eur J Neurol. 2017;24(12):1479-1484. [DOI] [PubMed] [Google Scholar]
  • 4.Shigueoka LS, Vasconcellos JPC, Schimiti RB, et al. Automated algorithms combining structure and function outperform general ophthalmologists in diagnosing glaucoma. PLoS One. 2018;13(12):e0207784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Kim SJ, Cho KJ, Oh S. Development of machine learning models for diagnosis of glaucoma. PLoS One. 2017;12(5):e0177726. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Silva FR, Vidotti VG, Cremasco F, Dias M, Gomi ES, Costa VP. Sensitivity and specificity of machine learning classifiers for glaucoma diagnosis using Spectral Domain OCT and standard automated perimetry. Arq Bras Oftalmol. 2013;76(3):170-174. [DOI] [PubMed] [Google Scholar]
  • 7.Perez Del Palomar A, Cegonino J, Montolio A, et al. Swept source optical coherence tomography to early detect multiple sclerosis disease. The use of machine learning techniques. PLoS One. 2019;14(5):e0216410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cavaliere C, Vilades E, Alonso-Rodriguez MC, et al. Computer-aided diagnosis of multiple sclerosis using a support vector machine and optical coherence tomography features. Sensors (Basel). 2019;19(23):5323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Balcer LJ, Raynowska J, Nolan R, et al. Validity of low-contrast letter acuity as a visual performance outcome measure for multiple sclerosis. Mult Scler. 2017;23(5):734-747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Oberwahrenbrock T, Weinhold M, Mikolajczak J, et al. Reliability of intra-retinal layer thickness estimates. PLoS One. 2015;10(9):e0137316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Tewarie P, Balk L, Costello F, et al. The OSCAR-IB consensus criteria for retinal OCT quality assessment. PLoS One. 2012;7(4):e34823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Cruz-Herranz A, Balk LJ, Oberwahrenbrock T, et al. The APOSTEL recommendations for reporting quantitative optical coherence tomography studies. Neurology. 2016;86(24):2303-2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Aytulun A, Cruz-Herranz A, Aktas O, et al. APOSTEL 2.0 recommendations for reporting quantitative optical coherence tomography studies. Neurology. 2021;97(2):68-79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837-845. [PubMed] [Google Scholar]
  • 15.Petzold A, Albrecht P, Balcer L, et al. Artificial intelligence extension of the OSCAR-IB criteria. Ann Clin Transl Neurol. 2021;8(7):1528-1542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Gordis L. Epidemiology. 5th ed. Elsevier/Saunders, 2014. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Anonymized data may be made available by reasonable request from a qualified investigator.


Articles from Neurology are provided here courtesy of American Academy of Neurology

RESOURCES