Abstract
Purpose
To determine and compare the diagnostic performance of spectral-domain optical coherence tomography (SD-OCT), stereoscopic disc photographs, and automated perimetry as assessed by a group of glaucoma specialists in differentiating individuals with early glaucoma from suspects.
Methods
Forty-six eyes (46 patients) with suspicious optic nerves had previously undergone SD-OCT scans, 24-2 visual fields (VFs), and optic disc photographs. The average VF mean deviation was −1.97 ± 2.09 (SD) dB. Four glaucoma specialists examined the 138 individual diagnostic tests and classified the patient as likely glaucomatous or nonglaucomatous based on the results of a single test. The diagnostic performances of each of the three tests were compared to a previously determined reference standard, based on the consensus of a separate panel of four glaucoma specialists who examined all three tests together.
Results
Among the four specialists, the interobserver agreement across the three diagnostic tests was poor for VF and photos, with kappa (κ) values of 0.13 and 0.16, respectively, and moderate for OCT, with κ value of 0.40. Using panel consensus as reference standard, OCT had the highest discriminative ability, with an area under the curve (AUC) of 0.99 (95% 0.96–1.0) compared to photograph AUC 0.85 (95% 0.73–0.96) and VF AUC 0.86 (95% 0.76–0.96), suggestive of closer performance to that of a group of glaucoma specialists.
Conclusions
Compared to VF and disc photography, SD-OCT, when used alone, had better internal agreement as well as better agreement with the consensus of clinicians using all available data. Future studies should evaluate best practices for SD-OCT interpretation.
Keywords: glaucoma, optical coherence tomography, detection; discrimination
Glaucoma is a progressive, chronic optic neuropathy often associated with visual field (VF) loss in the absence of elevated intraocular pressure. The current clinical standard for a diagnosis of glaucoma is for ophthalmologists to perform a baseline dilated fundus examination to assess the optic nerve head. The examination may be performed in conjunction with VF testing and optic nerve imaging, typically acquired via either spectral-domain optical coherence tomography (SD-OCT) or stereoscopic disc photographs.1 This evaluation is particularly important in early glaucoma, since early disease detection and treatment may be beneficial for preventing glaucoma-related disability, and minimizing false positive lessens unnecessary testing and care. However, the detection of glaucoma in its early stages may be challenging, since there is significant overlap between normal variability and early disease. In a recent report by Hood and colleagues,2 three fellowship-trained glaucoma specialists, presented with stereo disc photographs, Swedish Interactive Threshold (SITA) standard VFs, and SD-OCT retinal nerve fiber layer (RNFL) thickness measures, were unable to reach a consensus in almost 40% of cases. These results suggest that the diagnostic agreement is limited and often subject to the interpretation by the clinician3 and that interpretation may be particularly difficult in early glaucoma detection. Additionally, in early glaucoma, structural or functional damage is often subtle and may not reach statistical significance relative to normative databases, and is therefore not highlighted by the ancillary diagnostic tests as “abnormal.” Additionally, interindividual variations in disc size and shape, presence of coexisting eye disease, and variation in test-taking ability, among other covariates, may blur the lines of distinction between normal variants and early disease.
In this study, we aimed to investigate the interobserver agreement of fellowship-trained glaucoma specialists in detecting early glaucoma using either SD-OCT, stereo disc photographs, or automated perimetry, and to compare the relative performance based on each diagnostic test.
Methods
This study is part of an ongoing prospective cohort study and was approved by the institutional review board of Columbia University Medical Center. Informed consent was obtained from all participants prior to enrollment. This study followed the tenets of the Declaration of Helsinki and was performed in compliance with the Health Insurance Portability and Accountability Act (HIPAA).
Forty-six eyes of 46 open-angle glaucoma or glaucoma suspects were included. Based upon the stereoscopic photographic appearance of the optic disc suggestive of glaucomatous optic neuropathy, all eyes were considered abnormal or suspicious by the referring glaucoma specialist.2 All eyes had open anterior chamber angles, spherical equivalent refractive error less than 6 diopters, and optic disc stereo photographs, 24-2 VF tests (Mean Deviation [MD]<−6 dB), and SD-OCT scans within 6 months. Eyes were excluded for cataract scores, as defined by slit-lamp examination, equal to or worse than N02, NC02, C2, and P2.24 on the Lens Opacities Classification System III (LOCS III); or if the eyes had other conditions likely to affect the VF results (e.g., corneal opacity, neurophthalmologic or retinal diseases).
Diagnostic Testing
The 24-2 VFs were obtained using SITA Standard Automated Perimetry (Humphrey Field Analyzer; Carl Zeiss Meditec, Inc., Dublin, CA, USA). The VF examination closest in date to the OCT test was used. All VFs were required to have ≤33% false-negative responses and fixation losses and ≤15% false positives. Simultaneous stereo photographs of the optic disc were obtained with a Nidek 3-Dx mydriatic fundus camera (Nidek, Inc., Gamagori, Japan). Images were selected based on subjective assessment of image clarity, stereo effect, minimal artifacts, and overall image quality. Stereo photographs were analyzed on a computer screen with the aid of a stereoviewer. A peripapillary circle scan (1.7-mm radius, 1024 A-scans with at least 16 overlapping averages) was obtained with a Topcon SD-OCT (3D-OCT 2000; Topcon Corp., Paramus, NJ, USA). The RNFL thickness was segmented by the machine's algorithm without any operator correction. All scans had proper alignment, focus, and quality scores. Scans with poor fixation and blink artifacts were rejected.
Definition of reference standard: There is currently little consensus on how to differentiate early glaucoma from suspects or preperimetric disease. For the purpose of this study, a reference standard was needed in order to investigate and compare the diagnostic performance of each diagnostic technique individually. For that purpose, a previous consensus review by four glaucoma specialists was used to define glaucoma versus suspect. Details of the methodology used to define the reference standard are described elsewhere.2 In brief, three ophthalmologists (glaucoma specialists), masked to all nonstudy patient data, evaluated stereo photographs, 24-2 VFs, and commercial OCT reports at the same time. The fourth glaucoma specialist was the referring physician, who was not masked to clinical data. Of the 50 eyes included in the previous study,2 31 eyes were deemed abnormal by consensus, defined by at least three of the four glaucoma specialists judging the eye abnormal. The 15 eyes deemed normal by consensus are herewith called suspects; the experts did not reach consensus on four eyes, and the data from these eyes were therefore excluded.
For the purposes of this study, 46 sets of VFs, disc photos, and SD-OCTs from 46 eyes, altogether totaling 138 individual diagnostic tests, were presented to four glaucoma specialists (different from the specialists who defined the reference standard). Each test was imported to an individual slide in a PDF file so that one slide contained only one test result, either a VF printout, stereo disc photo, or OCT report. Paired stereo photographs and a standard single VF analysis report were presented for each patient. To keep masking and avoid redundancy between the stereo photographs and the optic disc measurements, only the RNFL thickness measurements were displayed. All test results were presented in the commercially available default format except for the peripapillary RNFL, which was presented as a nasal-superior-temporal-inferior-nasal (NSTIN) rather than TSNIT plot. (An NSTIN plot is one in which the scan starts and finishes in the nasal rather than temporal portion.) The 138 diagnostic tests were presented in a random order so that the three tests corresponding to a single patient were not consecutive. Figure 1 shows a representative sample of the testing. Patients had a mean age of 57.9 ± 15.5 years. The mean MD ± SD on the 24-2 VF test was −1.97 ± 2.09 dB, and the best-corrected visual acuity ranged from 20/20 to 20/30 (Table).
Table.
The four fellowship-trained glaucoma specialists were asked to classify each single diagnostic test into a binary outcome variable: “likely glaucoma” (1) or “likely not glaucoma” (0). The method of their analyses of VF and OCT results was left to their discretion, that is, as would be the case in a clinical scenario. Therefore, the interpretation of the output statistics relative to normative databases did not have to follow any predefined set of criteria. The glaucoma specialists were not told which three tests corresponded to a single patient. The specialists were not given any time limitations and did not communicate with each other while evaluating the slides.
Statistical Analysis
Interobserver agreement was determined using a Fleiss' kappa (κ) statistic. Agreement was classified as poor when κ was 0.20 or less, fair when between 0.21 and 0.40, moderate when between 0.41 and 0.60, good when between 0.61 and 0.80, and very good when higher than 0.80.4 A bootstrap resampling procedure (n = 1000 resamples) was used to derive the 95% confidence intervals. The evaluation of each of the three diagnostic tests was compared to the reference standard, and diagnostic performance was calculated in terms of the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy, defined as true positive (TP) + true negative (TN)/total, all with 95% confidence intervals (CI).
To evaluate the ability of the tests to differentiate between cases and controls, areas under the receiver operating characteristic (ROC) curves (area under the curve [AUC]) were calculated. An AUC equal to 1 represents perfect discrimination, whereas an AUC of 0.5 represents chance discrimination.
The discriminatory accuracy of a diagnostic test is measured by its ability to correctly classify normal and abnormal individuals based upon our reference standard. The result of a diagnostic test may be binary, ordinal, or continuous. The global performance of a diagnostic test is commonly summarized by the area under the AUC ROC. This area can be interpreted as the probability that the result of a diagnostic test of a randomly selected abnormal subject will be greater than the result of the same diagnostic test from a randomly selected normal subject. The greater the AUC, the better the global performance of the diagnostic test. We performed a nonparametric ROC analysis using the reference standard described above. This method is robust because it does not make any distributional assumptions about the diagnostic test measurements. Predictors (or classifiers) were the binary classification scores (0 = suspect, 1 = glaucoma) defined by each rater using each test modality.
For the analysis of the performance of each test modality, the scores from each rater were summed to a composite, ordinal measure. Therefore, for each test modality (e.g., VF), the ROC predictor ranged from 0 (none of the raters deemed the eye glaucomatous) to 4 (all raters found the eye glaucomatous). Then, these composite scores for VF, photo, and OCT had their AUC ROC values calculated and compared using the method described by DeLong et al.5
Results
The interobserver agreement among the four specialists yielded κ values of 0.13 (95% CI, 0.01–0.31) for VF, 0.16 (95% CI, 0.04–0.32) for photos, and 0.40 (95% CI, 0.29–0.61) for OCT. Among the specialists, there was a large difference in the percentage of eyes rated “likely not glaucoma”: 39% to 78% based on VFs and 33% to 61% based on stereo photos. The variation was less for SD-OCT, ranging from 57% to 61%.
According to the reference standard, 31 of the 46 eyes were glaucomatous while 15 were nonglaucomatous (suspects), Thus, for our analysis there were a total of 134 individual tests (46 eyes × 3 tests), and based upon the reference standard, 93 tests (31 eyes) were considered glaucomatous and 45 tests (15 eyes) nonglaucomatous (suspect). When compared to the reference standard, the OCT had the highest sensitivity among the glaucoma specialists with a value of 0.81 (95% CI, 0.68–0.95), while sensitivities for both VF and photos were approximately 0.64 (95% CI, 0.47–0.81). The average specificity for OCT was also highest at 0.87 (95% CI, 0.69–1.00), while the specificities for both VF and photos were approximately 0.73 (95% CI, 0.51–0.96). The mean PPV and NPV were both higher for OCT relative to VF or photos, although the difference not significant. The average PPV was 0.93 (95% CI, 0.83–1.00) for OCT, 0.87 (95% CI, 0.73–1.00) for VF, and 0.84 (95% CI, 0.69–0.99) for photos. The average NPV was 0.69 (95% CI, 0.48–0.90) for OCT, 0.45 (95% CI, 0.25–0.66) for VF, and 0.51 (95% CI, 0.30–0.71) for photos. In terms of diagnostic accuracy, defined as correct diagnoses over total diagnoses, the OCT had the highest averaged value, 0.83 (95% CI, 0.72–0.94), compared to the values for both VF and photos, 0.67 (95% CI, 0.53–0.80).
The ROC curves for OCT, VF, and photos for the highest-performing physician, defined as the specialist with the greatest total AUC, are seen in Figure 2A. The summed total AUC ROC was 0.97 (95% CI, 0.93–1.00), whereas the OCT AUC was 0.95 (95% 0.90–1.00), photo AUC was 0.85 (95% 0.74–0.96), and VF AUC was 0.76 (95% 0.65–0.87). Figure 2B shows the group's mean ROC curves for OCT, VF, and photos. The OCT AUC was 0.99 (95% 0.96–1.00), photo AUC was 0.85 (95% 0.73–0.96), and VF AUC was 0.86 (95% 0.76–0.96). Comparison between composite AUC ROC values revealed better performance of OCT compared to disc photos and VFs (P = 0.0071).
Discussion
Despite its medical relevance, the diagnosis of early glaucoma in clinical practice can be challenging, as there is significant overlap between normal variants and early disease on standard diagnostic tests. Typically, multiple diagnostic tests are used to detect glaucoma. Yet there are limited data comparing the relative diagnostic ability between the three main ancillary diagnostic modalities (i.e., photos, VFs, and OCT) in differentiating early glaucoma from glaucoma suspects, which is a dilemma that clinicians frequently face.6 One of the main reasons is the lack of a gold standard to define early disease. Another reason is the fact that different combinations of these tests end up being used to define a reference standard, which makes it inadequate to test the performance of each of these tests individually once they have been employed to define the reference standard. Furthermore, while VF testing and SD-OCT have undergone considerable technological advances, relatively few studies have considered the subjective interpretation of testing in classifying patients as glaucomatous or disease free. Most studies have employed the objective, numeric output of these devices to calculate their diagnostic performance. In clinical practice, however, clinicians often rely mostly on these devices' classification relative to normative databases. Regardless of the reasons, since diagnostic testing may give contradictory or inconclusive information in the early stage, it is important to have direct head-to-head studies of the three tests in order to assist with clinical decision making.
In this study, we investigated the diagnostic performance among a group of glaucoma specialists in differentiating subjects with early glaucoma from glaucoma suspects using either a SD-OCT report of peripapillary RNFL thickness analysis, stereo disc photographs, or SITA standard VF, leaving to their discretion how to weigh each technique's output. We found that both interobserver agreement and diagnostic ability among glaucoma specialists were highest in differentiating glaucoma suspects from early glaucoma using the SD-OCT, suggesting that SD-OCT may be the single most useful tool for eye care providers in early diagnosis of suspicious optic nerves if other modalities are not available. Such early disease detection is critical in preserving visual function and preventing blindness, as detection of glaucoma at later stages is more difficult to manage, is associated with greater costs, and results in decreased quality of life.7
Interobserver Agreement
Despite significant knowledge of glaucoma, interobserver variation for each test was substantial. Spectral-domain OCT had the highest level of agreement, with a κ value indicating fair agreement. In contrast, the κ value for VF and stereo photos was poor. This indicates that even among “experts” using an objective technology such as OCT, eye care providers each have their own subjective approach to glaucoma detection. This finding is consistent with the glaucoma literature, which has suggested considerable interobserver variability in the assessment of disc photos8–11 and VFs.12 It is also consistent with our earlier study,2 which found marked disagreement among glaucoma specialists even when assessing at the same time the SD-OCT, VF, and photo test results used in this study. It is noteworthy that even though SD-OCT and other imaging techniques were developed to overcome subjectivity among clinicians, there is still a meaningful disagreement among clinicians on how to interpret their output.
Diagnostic Accuracy
Our second purpose was to evaluate the diagnostic accuracy of the three commonly employed tests. To our knowledge, this is the first study to assess the independent diagnostic performance between 24-2 VFs, disc photos, and SD-OCT in a single population of early glaucoma patients. Two recent reports evaluated the relative diagnostic performance of the three tests.13,14 However, in both studies, the diagnostic tests were given sequentially and ordered so that the raters evaluated both the stereo photographs and VFs before the OCT results. Despite this, both studies found that the OCT information was helpful in detecting early stage disease.
Although stereo photos and VFs can reflect structural and functional changes related to glaucoma, the diagnostic utility of either as a single test for early glaucoma detection in our study was only modest. This indicates that the diagnostic ability of VF or stereo photographs as a stand-alone test is limited. However, our study demonstrates that SD-OCT can greatly assist clinicians in their ability to differentiate early glaucoma from glaucoma suspects in the same population. We believe there are several reasons why SD-OCT outperformed the other tests. First, SD-OCT is more objective and less subject to patient reliability than a VF. Even though we have included only “reliable” VF tests, the VF is implicitly limited by its subjectivity. Second, it is possible that the VF missed early VF defects that occurred between the 6° points, as has been reported by Hood and colleagues.15 Third, it is possible that the SD-OCT changes developed prior to VF changes, as was recently reported by Zhang et al.16 Furthermore, the subjectivity of optic nerve head interpretation,8–11 particularly in light of atypical optic nerve characteristics, makes stereo photographs a limited stand-alone test. We have found that SD-OCT is particularly helpful when both RNFL and macular scans are included. In particular, we developed and evaluated a one-page report that included information from both macular and disc OCT scans, as well as 24-2 VFs.2,17 In the first phase of the evaluation, two report specialists viewed the version of the one-page report that contained the RNFL and retinal ganglion cell (RGC) data but not the VF data. With macular and RNFL OCT scans, the specialists agreed on 49 of the 50 eyes. In fact, these two individuals were nearly as good without the 24-2 VF information on the report.2
Limitations
Our study has a number of limitations that should be taken into account. First, the study design may be subject to selection bias, since patients were identified for study inclusion based on suspicious optic nerve head appearance. In this case, it may not be surprising that stereo photographs detailing the optic nerve head appearance would not provide sufficient additional diagnostic information to detect glaucoma. Likewise, Lisboa et al.18 reported that SD-OCT RNFL parameters had greater diagnostic ability than optic disc topographic measurements in detecting glaucoma when patients were selected on suspicious optic disc appearance. The optic nerves included in this study frequently had atypical optic nerve characteristics, such as tilted discs, poorly defined cup or disc margins, small optic nerves, myopic crescents, and generalized cupping rather than focal notching. One patient additionally had peripapillary choroidal neovascularization. Despite this potential for bias, we chose this study design because we felt it most closely reflected the most challenging patients in clinical practice. Second, since there is no single gold standard in glaucoma detection, we classified patients based on consensus among experts. In contrast, Medeiros et al.19 have previously suggested utilizing long-term follow-up as a means to distinguish between normal and early glaucomatous eyes, although it is unclear if one method is superior to the other. Future studies may consider comparing the ability of the three tests in identifying progressive glaucomatous damage. Third, the glaucoma specialists did not have clinical information such as family history, intraocular pressure, or corneal thickness, which may have swayed their results in clinical practice. Lastly, it is important to emphasize that our study is intended to reflect classification of early glaucoma among glaucoma suspects and does not represent screening results or testing in a more general population.
Summary
The results of this study suggest that SD-OCT RNFL measurements may provide more useful clinical information than VF or stereo photographs when used alone in differentiating early glaucoma from suspects. Moreover, a higher interobserver agreement for SD-OCT also poses an advantage over other techniques. Future studies should evaluate best practices for SD-OCT interpretation, particularly in detecting early glaucomatous damage and damage in patients with atypical optic nerve characteristics.
Acknowledgments
Disclosure: D.M. Blumberg, None; C.G. De Moraes, None; J.M. Liebmann, Carl Zeiss Meditech, Inc. (C, F), Topcon (F); R. Garg, None; C. Chen, None; A. Theventhiran, None; D.C. Hood, Heidelberg (C, F), Topcon (C)
References
- 1. American Academy of Ophthalmology Glaucoma Panel. Preferred Practice Pattern® Guidelines. Primary Open-Angle Glaucoma Suspect. 2010. Available at: www.aao.org/ppp. Accessed November 15, 2015.
- 2. Hood DC, Raza AS, De Moraes CG,et al. . Evaluation of a one-page report to aid in detecting glaucomatous damage. Transl Vis Sci Technol. 2014; 3 6: 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. . Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004; 140: 189– 202. [DOI] [PubMed] [Google Scholar]
- 4. Landis JR, Koch GG. . The measurement of observer agreement for categorical data. Biometrics. 1977; 33: 159– 174. [PubMed] [Google Scholar]
- 5. DeLong ER, DeLong DM, Clarke-Pearson DL. . Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988; 44: 837– 845. [PubMed] [Google Scholar]
- 6. Vessani RM, Moritz R, Batis L, Zagui RB, Bernardoni S, Susanna R. . Comparison of quantitative imaging devices and subjective optic nerve head assessment by general ophthalmologists to differentiate normal from glaucomatous eyes. J Glaucoma. 2009; 18: 253– 261. [DOI] [PubMed] [Google Scholar]
- 7. DeMoraes CG, Liebmann JM, Medeiros FA, Weinreb RN. . Management of advanced glaucoma: characterization and monitoring [published online ahead of print March 24, 2016]. Surv Ophthalmol. doi:http://dx.doi.org/10.1016/j.survophthal.2016.03.006. [DOI] [PubMed]
- 8. Coleman AL, Sommer A, Enger C, Knopf HL, Stamper RL, Minckler DS. . Interobserver and intraobserver variability in the detection of glaucomatous progression of the optic disc. J Glaucoma. 1996; 5: 384– 389. [PubMed] [Google Scholar]
- 9. Parrish RK, Schiffman JC, Feuer WJ,et al. . Test-retest reproducibility of optic disk deterioration detected from stereophotographs by masked graders. Am J Ophthalmol. 2005; 140: 762– 764. [DOI] [PubMed] [Google Scholar]
- 10. Varma R, Steinmann WC, Scott IU. . Expert agreement in evaluating the optic disc for glaucoma. Ophthalmology. 1992; 99: 215– 221. [DOI] [PubMed] [Google Scholar]
- 11. Abrams LS, Scott IU, Spaeth GL, Quigley HA, Varma R. . Agreement among optometrists ophthalmologists, and residents in evaluating the optic disc for glaucoma. Ophthalmology. 1994; 101: 1662– 1667. [DOI] [PubMed] [Google Scholar]
- 12. Viswanathan AC, Crabb DP, Westcott MC, McNaugh AI, Fitzke FW, Hitchings RA. . Inter-observer agreement on visual field progression in glaucoma: a comparison of methods. Br J Ophthalmol. 2003; 87: 726– 730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kim KE, Kim SH, Oh S,et al. . Additive diagnostic role of imaging in glaucoma: optical coherence tomography and retinal nerve fiber layer photography. Invest Ophthalmol Vis Sci. 2014; 55: 8024– 8030. [DOI] [PubMed] [Google Scholar]
- 14. Bae HW, Lee KH, Lee N, Hong S, Seong GJKC. . Visual fields and OCT role in diagnosis of glaucoma. Optom Vis Sci. 2014; 91: 1312– 1319. [DOI] [PubMed] [Google Scholar]
- 15. Hood DC, Raza AS, de Moraes CG, Liebmann JM, Ritch R. . Glaucomatous damage of the macula. Prog Retin Eye Res. 2013; 32: 1– 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Zhang X, Loewen N, Tan O, . et al.; Advanced Imaging for Glaucoma Study Group. Predicting development of glaucomatous visual field conversion using baseline Fourier-domain optical coherence tomography. Am J Ophthalmol. 2016; 163: 29– 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Hood DC, Raza AS. . On improving the use of OCT imaging for detecting glaucomatous damage. Br J Ophthalmol. 2014; 98 suppl 2: ii1– ii9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Lisboa R, Paranhos A, Weinreb RN, Zangwill LM, Leite MT, Medeiros FA. . Comparison of different spectral domain OCT scanning protocols for diagnosing preperimetric glaucoma. Invest Ophthalmol Vis Sci. 2013; 54: 3417– 3425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Medeiros FA, Alencar LM, Zangwill LM, Bowd C, Sample PA, Weinreb RN. . Prediction of functional loss in glaucoma from progressive optic disc damage. Arch Ophthalmol. 2009; 127: 1250– 2156. [DOI] [PMC free article] [PubMed] [Google Scholar]