Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Jun 1.
Published in final edited form as: Ophthalmology. 2023 Feb 6;130(6):631–639. doi: 10.1016/j.ophtha.2023.01.021

Comparing the accuracy of peripapillary OCT scans and visual fields to detect glaucoma worsening

Chris Bradley 1, Patrick Herbert 2, Kaihua Hou 2, Mathias Unberath 2, Pradeep Ramulu 1, Jithin Yohannan 1,2
PMCID: PMC10200740  NIHMSID: NIHMS1872337  PMID: 36754173

Abstract

Objective:

Compare the accuracy of detecting moderate and rapid rates of glaucoma worsening over a 2-year period with different numbers of OCT scans and VFs in a large sample of glaucoma and glaucoma-suspect eyes.

Design:

Descriptive and simulation study

Participants:

Our OCT sample consisted of 12,150 eyes from 7,392 adult patients with glaucoma or glaucoma-suspect status followed at the Wilmer Eye Institute from 2013-2021. Our VF sample consisted of 20,583 eyes from 10,958 adult patients from the same database. All eyes had at least 5 measurements over follow-up, from Zeiss Cirrus OCT or from the Humphrey Field Analyzer.

Methods:

Within-eye rates of change in RNFL thickness and MD were measured using linear regression. For each measured rate, simulated measurements of RNFL thickness and MD were generated using the distributions of residuals. Simulated rates of change for different numbers of OCT scans and VFs over a 2-year period were used to estimate the accuracy of detecting moderate (75th percentile) and rapid (90th percentile) worsening for OCT and VF. Accuracy was defined as the percentage of simulated eyes in which the true rate of worsening (the rate without measurement error) was at or less than a criterion rate (e.g., 75th or 90th percentile).

Main Outcome Measures:

The accuracy of diagnosing moderate and rapid rates of glaucoma worsening for different numbers of OCT scans and VFs over a 2-year period.

Results:

Accuracy was less than 50% for both OCT and VF when diagnosing worsening after a 2-year period. OCT accuracy was 5-10 percentage points higher than VF accuracy at detecting moderate worsening and 10-15 percentage points higher for rapid worsening. Accuracy increased by over 17 percentage points when using both OCT and VF to detect worsening, i.e., when relying on either OCT or VF to be accurate.

Conclusions:

More frequent OCT scans and VFs are needed to improve the accuracy of diagnosing glaucoma worsening. Accuracy greatly increases when relying on both OCT and VF to detect worsening.

Introduction

Two of the most widely used metrics for detecting glaucoma worsening are the rate of peripapillary retinal nerve fiber layer (RNFL) thickness loss measured by optical coherence tomography (OCT)1-3 and the rate of mean deviation (MD) worsening in visual fields (VF) measured by static automated perimetry4-9. In a previous study3, we estimated the accuracy of using OCT to detect moderate (i.e., 75th percentile) and rapid (i.e., 90th percentile) RNFL worsening as a function of the number and frequency of OCT RNFL measurements over a 2-year period. We defined accuracy as the percentage of eyes in which the true rate of OCT RNFL worsening — the rate without measurement error — was at or less than a criterion rate (e.g., 75th or 90th percentile rate of worsening). Using simulations based on OCT data from over 12,000 eyes with glaucoma or glaucoma suspect status, we found that accuracy was less than 50% for detecting moderate and rapid RNFL worsening over a 2-year period given the OCT examination frequencies (once every 390 ± 186 days) in our sample.

In this study we perform the comparable analysis for VF using over 20,000 eyes with glaucoma or glaucoma suspect status. One of our goals is to compare the accuracies of OCT and VF for diagnosing glaucoma worsening. It is possible that OCT has higher accuracy because OCT is not a subjective test. It is also possible that VF is more accurate since MD is (essentially) the mean of a large number of total deviation values, which reduces the standard error of the estimate (as in a sampling distribution) and increases reliability. However, OCT measurements of RNFL thickness are also averages, and it is an empirical question which is more accurate.

A second goal of this study is to determine how much accuracy increases when using both OCT and VF to diagnose worsening. Specifically, how much does accuracy increase when relying on at least one of OCT or VF to be accurate? Probability theory — the disjunction rule — tells us that using both OCT and VF in such a manner must increase overall accuracy. How much accuracy increases depends on the individual accuracies for OCT and VF. We estimate accuracy for detecting moderate (75th percentile) and rapid (90th percentile) worsening for different numbers of OCT scans and VFs over a 2-year period. We also provide a website where clinicians can specify their preferred criterion rates of RNFL thickness and MD worsening, enter the number of OCT scans and VFs taken over a 2-year period, and obtain estimates of diagnostic accuracy for OCT alone, VF alone, and for OCT and VF combined.

Methods

This study was approved by the Johns Hopkins University School of Medicine Institutional Review Board and adhered to the tenets of the Declaration of Helsinki. As this was a retrospective study, the need for informed consent was waived. OCT (Cirrus, Carl Zeiss Meditec, Dublin, CA, USA) and VF (Humphrey Field Analyzer) results were obtained from same database of glaucoma and glaucoma suspect patients 18 years or older under care at the Wilmer Eye Institute from 2013 to 2021. Our OCT sample consisted of eyes with at least 5 measurements of RNFL thickness over followup with a minimum superior/inferior quadrant thickness of 50 μm and a minimum signal strength of 6. Measurements below these thresholds were likely to be below the RNFL thickness floor due to artifact, segmentation error or poor image quality10,11. Our VF sample consisted of eyes with at least 5 24-2 SITA Standard VFs over followup.

Glaucoma severity was defined using previously published criteria: MD > −6 dB for mild; −6 dB ≥ MD ≥ −12 dB for moderate; −12 dB > MD for severe9. Glaucoma suspect was defined as MD > −6 dB with a glaucoma hemifield test (GHT) “within normal limits”. Glaucoma severity for our OCT sample was defined based on the VF closest to the first OCT scan, while glaucoma severity for our VF sample was defined using baseline VF.

To estimate accuracy, we used the same simulation method as in our prior OCT study3. For each eye in our VF sample, we used linear regression to measure the rate of change in MD. The measured rates of MD change served as our best estimate of the true (i.e., without measurement error) rates of MD change in glaucoma and glaucoma suspect patients. For each eye, 100 sets of simulated MD data were generated — a much larger number of simulations was not performed due to computational time constraints; however, we report in the Results section the mean absolute error (MAE) between two sets of 100 simulations. The ith simulated MD data point for the kth eye had the form yk,i = βk + εk,i where βk represents the true rate of MD change and εk,i represents the ith residual for the kth eye. For each βk the ith residual was randomly chosen from the combined distribution of residuals for all eyes whose measured rates of MD change were within 0.25 dB/yr of βk. Choosing residuals in this manner implies that measurement noise depends on βk, but is independent of time, glaucoma severity as well as patient characteristics (e.g., age, gender, race). While these assumptions may not be completely accurate, they enabled us to simulate different measurement strategies — different time intervals between measurements — and generate as many simulated measurements per eye as desired, including more than were actually measured.

Figure 1 shows a schematic of how simulated MD data were generated for two different measurement strategies: “evenly spaced” (center) and “clustered” (right). In the evenly spaced condition, consecutive measurements were separated by a constant time interval. In the clustered condition, approximately half the simulated measurements were at the beginning of the 2-year period and the rest at the end, with the number being exactly half if the total number of measurements M was even. If M was odd, (M + 1)/2 simulated measurements were at the beginning and (M − 1)/2 at the end, or vice versa (randomly determined for each simulated dataset). Accuracy was estimated for evenly spaced and clustered measurement strategies because the accuracies for these two measurement strategies lie near the theoretical lower and upper bounds, respectively, when using linear regression to estimate a rate of change12-14.

Figure 1.

Figure 1.

Schematic of how simulated MD data (red dots) were generated for one eye. The true rate of MD change (black line) in this example is set to βk = −0.5 dB/yr. Residuals are randomly chosen from the combined distribution of residuals (top left) for rates of MD change within 0.25 dB/yr of βk. The simulated rate (blue line) is the result of linear regression on the simulated MD data. Simulations are repeated 100 times for each eye included in our VF sample. Accuracy was computed for different numbers of measurements with evenly spaced (center) and clustered (right) measurement strategies.

For each set of simulated MD data, we used linear regression to calculate the simulated rate of MD change (blue line in Figure 1). The distribution of simulated rates was then used to estimate the accuracy of diagnosing glaucoma worsening — defined as the percentage of cases where the true rate of MD change (black line) was at or less than the criterion rate (e.g., 75th or 90th percentile rate of worsening) when the simulated rate of MD change was also at or below that criterion. All analysis was done in R (https://www.R-project.org/).

VF accuracy was compared to OCT accuracy from our previous study3. We also calculated how much accuracy increases when clinicians use both OCT and VF to diagnose glaucoma worsening. Specifically, we asked how often is at least one of OCT or VF accurate in detecting worsening? Percent correct (accuracy) in the combined case can be calculated using the disjunction rule in probability theory: p(OCT ∪ VF) = p(OCT) + p(VF) − p(OCT ∩ VF), where p(OCT) represents the probability that OCT is accurate, p(VF) represents the probability that VF is accurate, p(OCT ∪ VF) represents the probability that at least one of OCT or VF is accurate, and p(OCT ∩ VF) represents the probability that both OCT and VF are accurate. Percent correct (accuracy) in the combined case is PC = 100 * p(OCT ∪ VF).

Since clinicians may differ in their choice of clinically significant (i.e., criterion) rates of worsening, we provide a website (https://wilmer-glaucoma-ml.github.io/projects/oct-vs-vf-accuracy-calculation/) where clinicians can specify their preferred criterion rates of RNFL worsening and MD worsening, enter the number of OCT scans and VFs over a 2-year period, and obtain estimates of diagnostic accuracy when using OCT alone, VF alone, and when using both combined. Accuracy estimates for the web-based tool are provided only for the evenly spaced condition since it is rare in clinical practice to cluster half the measurements at each of the endpoints of a 2-year time period.

Results

Table 1 lists key demographic characteristics of the two sets of eyes we used in this study: 12,150 eyes from 7,392 patients for OCT, and 20,583 eyes from 10,958 patients for VF. The main difference between the two samples is their distributions of glaucoma severity. Because severity was defined in our OCT sample based on the VF closest to the first OCT scan, 28.95% of eyes in the OCT sample did not have a defined severity due to lack of an associated VF test. However, once these eyes are removed, the distributions of severity among all eyes with defined severity are similar between the two samples.

Table 1.

OCT and VF demographics

OCT VF
Sample size
Patients 7,392 10,958
Eyes 12,150 20,583
Age
Mean (SD) 67.6 (14.8) 63.0 (12.3)
Median 70.2 64.0
Range 18.9, 102.3 18.1, 99.5
Gender, n (%)
Male 2,901 (39.25%) 4,702 (42,91%)
Female 4,488 (60.71%) 6,255 (57.08%)
Other 3 (0.04%) 1 (0.01%)
Race, n (%)
White 4,100 (55.46%) 6,757 (61.66%)
Black 2,077 (28.10%) 3,297 (30.09%)
Asian 444 (6.01%) 464 (4.23%)
Other 543 (7.35%) 402 (3.67%)
N/A 228 (3.08%) 38 (0.35%)
Severity, n (%)
Suspect 3,677 (30.26%) 8,819 (42.85%)
Mild 3,602 (29.65%) 8,053 (39.12%)
Moderate 875 (7.20%) 2,259 (10.98%)
Severe 479 (3.94%) 1,452 (7.05%)
N/A 3,517 (28.95%)
Baseline, median (IQR) RNFL thickness in μm MD in dB
All 83.6 (74.7, 92.7) −1.75 (−4.49, −0.15)
Suspect 85.8 (78.1, 93.6) −0.28 (−1.34, 0.58)
Mild 80.6 (72.0, 89.6) −2.39 (−3.94, −0.99)
Moderate 74.7 (65.8, 86.2) −8.21 (−9.83, −6.99)
Severe 68.7 (61.2, 81.6) −15.78 (−19.05, −13.67)

OCT = optical coherence tomography; VF = visual field; RNFL = retinal nerve fiber layer; MD = mean deviation; IQR = interquartile range.

Table 2 (top half) shows mean, median, 75th and 90th percentile rates of worsening for OCT RNFL thickness (in μm/yr) and for MD (in dB/yr) as a function of disease severity. One major difference between the results for OCT and VF is the difference between their means and medians. The mean rates of worsening for OCT RNFL thickness lie well below the 90th percentile, while the mean rates for MD lie closer to the median. These OCT results were not due to outliers. However, they could be due to small sample size. There were on average 6.3 OCT RNFL thickness measurements per eye and 8.8 MD measurements per eye. MD is also (essentially) the mean of 54 independently measured total deviation values for the 24-2 pattern, which reduces variance. OCT RNFL thickness measurements is also an average, but it is unclear how much variance can be reduced given errors in the OCT segmentation algorithms.

Table 2.

Distribution of rates of RNFL thickness and MD worsening

Mean Median 75th Percentile 90th Percentile
OCT (μm/yr) Average −3.08 −0.39 −1.09 −2.35
Suspect −1.57 −0.37 −1.01 −1.96
Mild −1.72 −0.39 −1.05 −2.07
Moderate −6.21 −0.43 −1.13 −2.75
Severe −8.49 −0.39 −1.37 −5.16
VF (dB/yr) Average −0.23 −0.14 −0.39 −0.79
Suspect −0.17 −0.11 −0.28 −0.53
Mild −0.27 −0.16 −0.43 −0.90
Moderate −0.31 −0.22 −0.67 −1.24
Severe −0.15 −0.21 −0.59 −1.07
Percentage of eyes with a measured rate at or less than
0 −0.5 −1 −1.5 −2 −2.5 −3
OCT (μm/yr) Average 68.9% 45.0% 27.0% 17.5% 12.1% 9.3% 7.6%
Suspect 69.2% 43.6% 25.2% 15.0% 9.6% 6.9% 5.6%
Mild 69.6% 44.7% 25.8% 15.9% 10.5% 7.8% 6.2%
Moderate 71.2% 46.4% 28.5% 19.4% 13.0% 10.8% 8.9%
Severe 65.1% 47.3% 31.8% 23.9% 20.0% 16.6% 13.5%
VF (dB/yr) Average 71.6% 18.8% 7.0% 3.3% 1.7% 0.8% 0.5%
Suspect 73.0% 11.0% 3.1% 1.5% 0.7% 0.3% 0.2%
Mild 71.7% 21.6% 8.5% 3.9% 2.0% 1.1% 0.7%
Moderate 67.7% 32.1% 14.7% 7.1% 3.8% 2.0% 1.1%
Severe 69.1% 29.5% 11.1% 4.8% 2.3% 0.8% 0.3%

OCT = optical coherence tomography; VF = visual field; RNFL = retinal nerve fiber layer; MD = mean deviation.

Table 2 (bottom half) shows the percentage of measured rates of worsening at or less than select criterion rates for both OCT and VF. Note that the 75th and 90th percentile rates for OCT decrease with increasing severity (top half of Table 2) while the percentage of eyes below these percentiles increases (bottom half of Table 2). In contrast, the 75th and 90th percentile rates for VF are lowest for moderate glaucoma instead of severe, while the percentage of rates below the 75th and 90th percentiles is highest for moderate, not severe. These observations will be important for intuiting results presented later on that stratify accuracy of OCT and VF by disease severity.

Figure 2A shows the percentage of simulated eyes (y-axes) in the evenly spaced condition (i.e., percent correct) where both the true and measured rates of average RNFL thickness worsening were at or less than different criterion rates of worsening (x-axes). Results are from our prior OCT study3 and plotted as a function of different numbers of measurements (color-coded, see legend). Figure 2B shows the analogous results for VFs, while Figures 2C and 2D show the results when using a clustered measurement strategy. In all conditions, OCT had higher accuracy than VF to detect the 75th and 90th percentile rates of worsening — the 75th percentile is represented by the vertical dashed lines (−1.09 μm/yr for average RNFL thickness and −0.39 dB/yr for MD) and at the 90th percentile is represented by the vertical dotted line (−2.35 μm/yr for average RNFL thickness and −0.79 dB/yr for MD). As expected, accuracy was higher when using a clustered measurement strategy compared to evenly spaced.

Figure 2.

Figure 2.

Accuracy of OCT (A, C) and VF (B, D) to detect glaucoma worsening over a 2-year period for different numbers of simulated measurements (color coded, see legend) and different measurement strategies: evenly spaced (A, B) and clustered (C, D). Moderate worsening (i.e., 75th percentile) is represented by the vertical dashed line, while rapid worsening (i.e., 90th percentile) is represented by the vertical dotted line.

Figure 3 compares percent correct for OCT (blue curves) and VF (green curves) as a function of the total number of simulated measurements to detect moderate (A, B) and rapid (C, D) rates of worsening in evenly spaced (A, C) and clustered (B, D) conditions. Data points (black triangles and squares) represent conditions where percent correct was actually estimated, while the equations for the fitted functions (blue and green curves) are provided in Supplemental Table S3 (available at www.aaojournal.org). Accuracy is higher for OCT than VF in all conditions tested, with the difference being greater for rapid worsening than for moderate worsening. The precision of the estimates in Figure 3 is relatively high: the mean absolute error (MAE) between two sets of 100 simulations for OCT was approximately 0.15 percentage points while it was approximately 0.58 percentage points for VF.

Figure 3.

Figure 3.

Accuracy to detect moderate (A, B) and rapid (C, D) glaucoma worsening using OCT (blue curve) and VF (green curve) for different numbers of simulated measurements and for evenly spaced (A, C) and clustered (B, D) measurement strategies. OCT is more accurate than VF in all cases.

Figure 4 plots percent correct for OCT and VF as a function of glaucoma severity (color-coded) and the total number of simulated measurements over a 2-year period. For OCT, accuracy increases with increasing severity, while for VF accuracy is highest for moderate instead of severe. These results can be intuited by looking at the bottom half of Table 2, which shows the percentage of eyes that are at or below different criterion rates of worsening. To a first approximation relative accuracy is predicted by which eyes have the highest percentage of measured rates well below the criterion. For OCT, the percentage of eyes well below the criterion becomes higher with increasing severity, while for VF the highest percentage is achieved with moderate severity. Accuracy increases with a larger percentage of eyes well below the criterion because any measured rate well below the criterion is almost certain to come from a true rate below the criterion, increasing accuracy.

Figure 4.

Figure 4.

Accuracy of OCT and VF for detecting moderate and severe glaucoma worsening as a function of glaucoma severity (color-coded; see legends), the total number of simulated measurements and the measurement strategy used (evenly spaced vs. clustered). For both OCT and VF, accuracy is higher for moderate and severe glaucoma than for mild and suspect.

One factor that decreases accuracy is a large mean absolute residual from linear regression. The larger the mean absolute residual, the more likely any measured rate at or just below the criterion comes from a true rate right above it. In general, greater severity implies larger mean absolute residuals for OCT and VF. For OCT mean absolute residuals were 2.47 μm for suspect, 2.73 μm for mild, 3.90 μm for moderate and 4.74 μm for severe. For VF mean absolute residuals were 0.76 dB for suspect, 0.97 dB for mild, 1.28 dB for moderate and 1.29 dB for severe. Figure 4 shows that the effect of having larger mean absolute residuals with increasing severity is smaller than the effect of having a larger percentage of eyes well below the criterion.

Table 4 shows examples of how accuracy increases in the evenly spaced condition (the measurement strategy generally used in clinical practice) when splitting the total number of measurements over a 2-year period equally between OCT and VF, and asking whether at least one of OCT or VF is accurate. Accuracy increases by over 17 percentage points for moderate (75th percentile) worsening and over 20 percentage points for rapid (90th percentile) worsening for the selected cases in Table 4. Note that all cases in Table 4 assume the same percentile criterion rate of worsening is used for OCT and VF. The more general case where different criterion rates are used for OCT and VF, and the OCT/VF split is not 50/50, can be investigated by our web-based tool (https://wilmer-glaucoma-ml.github.io/projects/oct-vs-vf-accuracy-calculation/).

Table 4.

Accuracy of OCT, VF and both combined

Tests per
year
Percentile
worsening
OCT only VF only Half OCT +
half VF
2 75th 48.5% 42.0% 70.1%
2 90th 42.3% 28.9% 60.0%
4 75th 52.0% 44.9% 73.6%
4 90th 47.9% 33.6% 65.4%
6 75th 55.1% 47.6% 76.5%
6 90th 52.7% 38.1% 70.7%
12 75th 62.6% 54.1% 82.8%
12 90th 64.2% 48.7% 81.6%

OCT = optical coherence tomography; VF = visual field.

Discussion

In this study, we compared the accuracy of diagnosing glaucoma worsening over a 2-year period using OCT, VF and both combined. OCT accuracy was 5-10 percentage points higher than VF accuracy at detecting moderate worsening and 10-15 percentage points higher at detecting rapid worsening, given a fixed number of measurements. When using both OCT and VF — asking whether the diagnosis of worsening was accurate for at least one of the two — accuracy increased by over 17 percentage points compared to OCT or VF alone, given the same number of total measurements and using the same percentile criterion rate of worsening for both OCT and VF.

This strategy of relying on at least one of OCT or VF to be accurate is clinically relevant because of how low accuracy is when relying on OCT or VF alone to detect worsening. OCT accuracy was 47% at detecting moderate worsening and 40% at detecting rapid worsening over a 2-year period given the average OCT examination frequencies in our sample (once every 390 ± 186 days). VF accuracy was even lower: 40% for moderate worsening and 27% for rapid worsening given VF measurement frequencies in our sample (once very 466 ± 232 days). However, these numbers rise to 68% for moderate worsening and 56% for rapid worsening when both OCT and VF are used.

In general, OCT and VF accuracy increased as glaucoma severity increased. However, the effects of severity were generally greater for VF than OCT, especially for moderate worsening. VF accuracy as a function of severity varied by up to ~35 percentage points for detecting moderate worsening while OCT accuracy varied only up to ~15 percentage points. For rapid worsening the effects of severity were more similar: a range of ~35 percentage points for VF compared to ~30 percentage points for OCT. The effect of severity was also different in another respect: for OCT there was a substantial difference between severe versus all the others, while for VF there were substantial differences between suspect, mild and moderate. Precisely why this is the case is not clear. However, as discussed earlier, these results are primarily due to the percentage of eyes whose measured rates of worsening lie well below the criterion rate for different levels of severity (see discussion of Figure 4 in Results).

The effect of measurement strategy — evenly spaced versus clustered — on accuracy was as expected: clustered outperformed evenly spaced, even with relatively few measurements (see Figure 4). The reason is because the variance of a predicted value from linear regression is higher for evenly spaced measurements than for clustered measurements12. In other words, it is more likely that the true rate of worsening lies farther away from the measured rate with evenly spaced measurements than with clustered measurements. This property of evenly spaced measurements decreases accuracy when the measured rate is at or just below the criterion rate. Hybrid measurements strategies that cluster at multiple time points (not just at the end points) are expected to exhibit accuracy levels between those of the evenly spaced and clustered conditions12-14. Therefore, multiple OCT and/or VF measurements should be obtained during the same visit whenever possible, even at the expense of longer time intervals between visits.

Our approach for estimating the number of OCT scans and/or VFs needed to detect moderate and rapid worsening differs from previous approaches. Prior studies used statistical power as the basis for their recommendations7,8. For example, one study recommended measuring 3 VFs per year to detect a −2 dB change in MD over 2 years (−1 dB/yr) at 80% power7. The problem with statistical power is that it tells us the probability of correctly detecting a given effect size (i.e., a given rate of worsening) for a certain number of measurements, but it does not tell us whether the diagnosis was accurate. For example, suppose we measure 3 VFs per year as recommended and the measured rate of MD change is −1 dB/yr. Does this mean the true rate of change is −1 dB/yr? No. Does this mean there is an 80% probability that the true rate of change is −1 dB/yr or worse? No. From a clinical standpoint, it is more relevant to estimate the number of OCT scans and/or VFs needed to accurately diagnose glaucoma worsening.

Our approach for estimating diagnostic accuracy is applicable in more general settings when three conditions are satisfied: 1) the rate of worsening is defined by intervals on a continuous axis (e.g., our definitions for moderate and rapid worsening correspond to intervals on a μm/yr and dB/yr axis); 2) the distribution of the rates of worsening is known a large and representative sample of patients; 3) a model of measurement error enables simulated rates of worsening to be generated (e.g., the distribution of combined residuals post-linear regression enabled us to generate simulated rates of worsening in μm/yr and dB/yr). We note that condition 2 is essential. Having the ability to generate simulated data alone (condition 3) is insufficient to estimate accuracy if the probability of each measurement occurring in the sample of patients (condition 2) is unknown.

A major strength of this study is that we estimated accuracy from a large dataset of glaucoma and glaucoma-suspect patients followed in a clinical population. However, there are limitations to this study. Our accuracy estimates apply only to Zeiss Cirrus OCT and to the Humphrey Field Analyzer. Accuracy was also calculated only for a 2-year period. We also assumed that OCT and VF samples are statistically independent when estimating accuracy for OCT and VF combined. It is possible that correlations at the patient level could change the accuracy estimates — we could not investigate this due to the deidentified datasets being extracted at different times for the OCT and VF studies. Residuals for clustered measurements may also not be similar to residuals for evenly spaced measurements since clustering is conceptually closer to test-retest variability than trend-based analysis. Our definition of accuracy is also not the only viable one — accuracy could be defined as the probability the true rate of change is within some interval rather than at or less than the criterion rate. Finally, our accuracy estimates for the combined case (using both OCT and VF) only apply in cases where the eye is suitable for measurement with both OCT and VF. This requirement may not be satisfied if OCT RNFL thickness is below the measurement floor, or if a reliable VF cannot be measured (e.g., due to the patient’s inability to fixate properly).

Despite these limitations, our results show that more OCT scans and/or VFs are needed to accurately diagnose glaucoma worsening than is currently the norm in clinical practice. Accuracy can be greatly increased if both OCT and VFs are used with the clinician relying on at least one of OCT or VF to be accurate.

Supplemental

Table S3 shows that the curves in Figure 3 fit the data well when the total number of measurements M is transformed to a log axis. For the clustered condition the relation between probability correct and M is linear on a log axis, while for the evenly spaced condition the relation is quadratic. The equations in Table S3 apply to 2 ≤ M ≤ 25 and only for a 2-year period; they do not apply to arbitrarily large values of M because there is no upper asymptote at 100 percent correct.

Supplementary Material

1

The accuracy of detecting moderate and rapid rates of glaucoma worsening is compared for optical coherence tomography and visual fields using over 12,000 glaucomatous eyes. More frequent measurements are needed to improve accuracy.

Financial support:

5 K23 EY032204-02; Unrestricted grant from Research to Prevent Blindness

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflict of Interest: All authors have completed and submitted the ICMJE disclosures form. Authors with financial interests or relationships to disclose are listed prior to the references.

References

  • 1.Medeiros FA, Zangwill LM, Alencar LM, et al. Detection of glaucoma progression with Stratus OCT retinal nerve fiber layer, optic nerve head, and macular thickness measurements. Invest Ophthalmol Vis Sci. 2009;50:5741–5748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Miki A, Medeiros FA, Weinreb RN, et al. Rates of Retinal Nerve Fiber layer Thickness in Glaucoma Suspect Eyes. Ophthalmology. 2014;121:1350–1358. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bradley C, Hou K, Herbert P, et al. Evidence based guidelines for the number of peripapillary OCT scans needed to detect glaucoma worsening. Ophthalmology. 2022 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Heijl A, Leske MC, Bengtsson B, Bengtsson B, Hussein M; Early Manifest Glaucoma Trial Group. Measuring visual field progression in the Early Manifest Glaucoma Trial. Acta Ophthalmol Scand. 2003;81(3):286–293. [DOI] [PubMed] [Google Scholar]
  • 5.Chauhan BC, Garway-Heath DF, Goni FJ, et al. Practical recommendations for measuring visual field change in glaucoma. Br J Ophthalmol. 2008;92:569–573. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Heijl A, Buchholz P, Norrgren G, Bengtsson B. Rates of visual field progression in clinical glaucoma care. Acta Ophthalmol. 2013;91(5):406–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chauhan BC, Malik R, Shuba LM, Rafuse PE, Nicolela MT, Artes PH. Rates of glaucomatous visual field change in a large clinical population. Invest Ophthalmol Vis Sci. 2014;55(7):4135–4143. [DOI] [PubMed] [Google Scholar]
  • 8.Anderson AJ; S Significant Glaucomatous Visual Field Progression in the First Two Years: What Does It Mean?. Trans. Vis. Sci. Tech 2016;5(6):1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Yohannan J, Wang J, Brown J, et al. Evidence-based Criteria for Assessment of Visual Field Reliability. Ophthalmology. 2017;124(11):1612–1620. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mwanza JC, Kim HY, Budenz DL, et al. Residual and dynamic range of retinal nerve fiber layer thickness in glaucoma: comparison of three OCT platforms. Invest Ophthalmol Vis Sci. 2015;56:6344–6351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sung MS, Heo H, Park SW. Structure-function Relationship in Advanced Glaucoma After Reaching the RNFL floor. J Glaucoma. 2019;28:1006–1011. [DOI] [PubMed] [Google Scholar]
  • 12.Gaylor DW, Sweeney HC. Design for Optimal Prediction in Simple Linear Regression. J Am Stat Assoc. 1965;60:205–216. [Google Scholar]
  • 13.Crabb DP, Garway-Heath DF. Intervals Between Visual Field Tests When Monitoring the Glaucomatous Patient: Wait-and-See Approach. Invest Ophthalmol Vis Sci. 2012;53:2770–2776. [DOI] [PubMed] [Google Scholar]
  • 14.Wu Z, Medeiros FA. Impact of Different Visual Field Testing Paradigms on Sample Size Requirements for Glaucoma Clinical Trials. Sci. Rep 2018;8:4889. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES