Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 1.
Published in final edited form as: Clin Exp Optom. 2021 Jul 27;105(5):494–499. doi: 10.1080/08164622.2021.1945406

A comparison of subjective and objective conjunctival hyperemia grading with AOS® Anterior software

Maria K Walker a, Erin S Tomiyama a, Kelsea V Skidmore a, Justina R Assaad a, Anita Ticak a, Kathryn Richdale a
PMCID: PMC8792102  NIHMSID: NIHMS1722313  PMID: 34315357

Abstract

Clinical relevance:

This study evaluates a commercially available conjunctival hyperemia grading system, providing validation of an important tool for ocular surface research and clinical trials.

Background:

Bulbar conjunctival hyperemia is a sign of ocular surface inflammation, and proper measurement is essential to clinical care and trials. The aim of this study was to assess the validity and repeatability of an objective grading system in comparison with subjective grading.

Methods:

This study was a retrospective, randomised analysis of 300 bulbar conjunctival images that were collected at an academic institution. The images used were de-identified and collected from the Keratograph K5 and Haag-Streit slit lamp. Six investigators graded the images with either a 0.1 or 0.5 unit scaling using a 0-4 Efron grading scale. Three of the investigators also imported the images into the AOS ® Anterior software and graded them objectively. All measurement techniques were assessed for repeatability and comparability to each other.

Results:

Mean hyperemia with the objective system (1.1 ± 0.7) was significantly less than the subjective grading (2.0 ± 0.8) (P < 0.001). Both inter- and intra-subject repeatability of the objective system (0.15) was better than the subjective methods (1.70).

Conclusion:

The results showed excellent repeatability of the AOS ® Anterior objective conjunctival hyperemia grading software, although they were not found to be interchangeable with subjective scores. This system has value in monitoring levels of hyperemia in contact lens wearers and patients in clinical care and research trials.

Keywords: Hyperemia, conjunctiva, objective grading, subjective grading

[Introduction]

The conjunctival tissue, containing blood vessels and lymphatics, is particularly responsive to ocular surface insult or injury. Overlying the opaque sclera and rich with vasculature, the conjunctiva is often studied to assess the immune and inflammatory response of the ocular surface. Measurement of bulbar conjunctival hyperemia is particularly common to study the level of ocular inflammation and discomfort associated with contact lens wear or ocular disease. With increasing rates of ocular surface disease, advancing use of contact lenses, and ever-emerging ocular surface therapeutics, the need for quantitative and reliable methods of measuring inflammation and ocular discomfort is certain.

Historically, conjunctival hyperemia has been measured by subjective comparison to a published grading scale. Multiple illustrated grading scales have been used in both clinical and research settings. The Cornea and Contact Lens Research Unit, more recently referred to as the Institute for Eye Research or Brien Holden Vision Institute (BHVI) scale, includes photographic depictions of anterior segment conditions over a range of severity.1 The Efron scale, comprised of computer-generated illustrations, is also a popular technique.2 However, all subjective techniques are limited by observer bias, may require lengthy observer training, and can be time consuming and expensive to conduct, especially for large trials.1-3 Furthermore, the pictorial grading scales have been found to be more sensitive for lower scores but less sensitive for higher scores.3 In recent years, the development of objective grading software has emerged, and several automated software programs have been used for assessment of conjunctival hyperemia;4-10 however, few of them are available for commercial use. These systems have the potential to provide objectively consistent grading that could be useful in clinical research and allow better evaluation of ocular surface health in cases such as disease assessment or monitoring the efficacy of therapeutic intervention.

AOS ® Anterior (Sparca Corp, UK) is a Windows-based computer software program designed to measure ocular surface images for characteristics such as conjunctival hyperemia. This system has been tested in a recent study by Huntjens et al.,11 who graded 30 slit-lamp images and found reasonable comparability of this platform to subjective grading by two individuals. The aim of this study was to provide a robust assessment of the validity and repeatability of the AOS ® Anterior system.

METHODS

This study analysed conjunctival hyperemia in a series of images captured from two different instruments, an Oculus Keratograph K5 (Oculus, Inc., Arlington, WA, USA) and a Haag-Streit slit lamp camera (Haag-Streit, Koniz, Switzerland). All images were available de-identified from previous studies, so the study did not meet the definition of human subjects research under category 45 CFR 46.101(a)(d). All data analysis was done at The Ocular Surface Institute at the University of Houston College of Optometry. A total of 300 images were available for analysis: 140 captured with the slit lamp camera and 160 images captured using the Keratograph. The images from the two instruments were taken during two different studies, with no overlap of subjects, and all images are unique to a different subject.

Six examiners graded all images, using the 0-4 range Efron scale.12 The level of experience ranged from <1 to 18 years of grading conjunctival hyperemia in clinical care and research (Table 1). Three examiners graded the images using a 0.5-step scale (graders 1-3), while the other three examiners graded the images using a 0.1-step scale (graders 4-6). Each examiner graded each image three times, with at least three days between each grading session and different order of images with grades recorded on a separate spreadsheet; the three trials were averaged for the final hyperemia score.

Table 1.

Experience level for each grader, shown approximately in years.

Grader Experience Levels
Grader Years of experience
1 6
2 3
3 18
4 3
5 <1
6 10

For the objective analysis, each image was uploaded to the AOS ® Anterior program (Figure 1). The investigators manually selected a region of interest (ROI) – the conjunctiva – using as many points as possible and avoiding areas with shadows, artifacts, or adjacent structures (e.g. cornea, eyelids, lashes). Once the ROI was selected, the software automatically measured “conjunctival total redness score,” a value ranging from 0-4 in 0.1-unit steps. The software also provided a per cent vessel score indicating the per cent of redness in the image, but since there is currently no clinical correlate to this score, this output was not utilised in this study. The objective analysis was done by graders 1-3, who graded each image three times each using the objective software system, with at least three days between each session. The three scores were again averaged for the final grade.

Figure 1. The AOS ® Anterior software interface.

Figure 1.

Images were uploaded into the AOS ® Anterior software and the examiner subjectively selected the region of interest (A). The vasculature was automatically traced by the software, and outputs of conjunctival redness and per cent vessel score are shown (B).

Statistical analysis

Mean, variance, and repeatability were measured with the objective and subjective grading systems, the latter two calculated as measures of the variation in the repeat measurements taken on the test images. Variance is defined as the square of the standard deviation (SD), and repeatability is calculated as 2.77*SD with lower values indicating a more repeatable measure;13 each were measured for all examiners. In addition, validity was assessed by comparing the objective measurements to the subjective to determine the accuracy of the software to provide a hyperemia score that is comparable to what would be graded subjectively.

Normality of the data was tested using the D’Agostino & Pearson normality test. To compare subjective and objective trials, non-parametric analysis of variance (Kruskal-Wallis) testing was used. Post-hoc analysis was done using paired (same examiner) and un-paired (different examiners) T-tests or their non-parametric alternatives (Wilcoxon and Mann-Whitney tests for paired and unpaired data, respectively). Bland-Altman plots were used to assess the agreement between scoring methods. In post-hoc testing, variance and repeatability were calculated separately for the Keratograph and the slit lamp to determine whether there were differences between the two instruments.

RESULTS

A total of 160 images from the Keratograph and 103 images from the Haag-Streit slit lamp were used in the final analysis. Due to low luminance, 37 images taken with the slit lamp were excluded. These images had a yellow tinted background hue due to an auxiliary light source that interfered with objective analysis, leading to excessively high grade levels of redness (Figure 2a). Keratograph images, all of which were able to be used, are shown with mean grades from each of the three scoring methods as examples (Figure 2b).

Figure 2. Examples of Slit Lamp and Keratograph images with hyperemia grading.

Figure 2.

Images taken with the slit lamp (A) and Keratograph (B) are shown for 0.1 subjective, 0.5 subjective, and 0.1 objective grading (columns). The images shown represent grading of approximately 1.0, 2.0 and 3.0 (rows). Exact mean grades of all examiners are shown in the lower right corner of each image. Note that for slit lamp imaging using the AOS® software, no images were scored greater than 3.0; the image shown was taken using a low luminance auxiliary light that caused falsely high reading with the objective system, and this images was among those that were excluded from final analysis.

The median hyperemia scores and range of the scores using each of the three grading methods are shown for each examiner (Figure 3). Subjective grading was significantly different between graders (Kruskal-Wallis, P < 0.001), and the mean of all subjective scores (2.0 ± 0.8) was about one unit higher than the mean of objective scores (1.1 ± 0.7) (P < 0.001). Post-hoc analysis showed significant differences between subjective 0.5 and subjective 0.1 scoring (Mann-Whitney, P < 0.001), as well as between 0.5 subjective versus 0.1 objective (Wilcoxon, P < 0.001), and 0.1 subjective versus 0.1 objective scoring (Mann-Whitney, P < 0.001). No differences were found between any objective trials for any of the graders (Kruskal-Wallis, P = 0.96). Bland-Altman plots show poor agreement between objective and subjective methods for each of the three examiners (Figure 4, top row). Specifically, a bias of 1.4 ± 0.6, 1.3 ± 0.6, and 0.8 ± 0.8 was observed for graders 1, 2 and 3, respectively, indicating higher grading with the subjective method. Furthermore, poor agreement was observed between the mean of the three subjective 0.5 examiners and the three subjective 0.1 examiners when compared to each other, and when compared to the mean of the three-objective grading (Figure 4, bottom row). The objective grading showed good agreement between all three examiners (Figure 4, middle row). Only the first three graders completed the objective assessment.

Figure 3.

Figure 3.

Box and whisker plots for each grader using the subjective and objective systems (n=263 images). Boxes show median hyperemia and interquartile ranges with the whiskers indicating the minimum and maximum values. Large ranges are seen due to the variability of redness levels of the images graded. Graders are identified by “G.”

Figure 4.

Figure 4.

Bland-Altman plots showing differences versus means for various grading methods. Top row: within examiner comparison of subjective and objective grading for Graders 1, 2 and 3 (G1, G2, G3). Middle row: between examiner comparison of objective grading. Bottom row: comparison of all subjective and objective scales for all examiners. Dotted lines represent 95% confidence intervals. Bias and limits of agreement (LOA) are shown for each plot.

Variance and repeatability were calculated for each grader using each technique (Table 2), with lower values indicating a more repeatable outcome. Objective grading showed the lowest variance and best repeatability for all graders, followed by subjective grading using a 0.1-unit scale, then subjective grading with 0.5-unit scale. The variance and repeatability of the subjective grading techniques were better for slit lamp images compared to the Keratograph images (Table 2). On average, the hyperemia that was graded from the slit lamp images were lower using the objective system but similar between the instruments when graded subjectively.

Table 2.

Mean, variance and repeatability for each type of hyperemia scoring, shown for Keratograph images only (top), slit lamp images only (middle), and for all images combined (bottom).

Subjective Objective
Grader G4 G5 G6 G1 G2 G3 All G1 G2 G3 All
Keratograph
Mean 2.5 2.4 1.9 1.2 1.9 2.4 2.0 1.5 1.5 1.5 1.5
Variance 0.09 0.02 0.05 0.16 0.16 0.09 0.14 0.01 0.01 0.02 0.02
Repeatability 0.74 0.29 0.49 0.94 0.99 0.72 0.88 0.18 0.21 0.21 0.20
Slit lamp
Mean 2.6 2.5 2.0 0.9 1.8 2.1 2.0 0.7 0.7 0.6 0.7
Variance 0.12 0.02 0.03 0.06 0.23 0.17 0.16 0.002 0.002 0.004 0.003
Repeatability 0.88 0.28 0.41 0.49 1.26 1.08 0.94 0.07 0.05 0.08 0.07
All images
Mean 2.5 2.4 1.9 1.1 1.9 1.3 2.0 1.1 1.2 1.1 1.1
Variance 0.10 0.02 0.04 0.12 0.19 0.13 0.40 0.01 0.01 0.02 0.01
Repeatability 0.80 0.29 0.46 0.76 1.10 0.86 1.70 0.14 0.15 0.16 0.15

variance is calculated as SD2

repeatability interval is calculated as 2.77*SD

DISCUSSION

There are several benefits to using an objective automated grading method for measuring conjunctival hyperemia. The AOS ® Anterior computerised grading system is highly repeatable both within and between examiners, showing better repeatability and reliability than the Efron subjective grading scale. The software would be superior to subjective grading methods in clinical research applications, where data measurements are repeated (for example: monitoring progression of ocular disease, before and after treatment) or where data would otherwise need to be analysed by different investigators (for example: multi-site). However, it must be noted that while the system had high repeatability for images graded <3, it appears that those scored as >3 by the system were often false due to the low luminance which caused a falsely high score. The objective system appears not to not grade any images truly above 3, consistent with the average objective grading of about 1 unit lower than subjective scores.

Subjective techniques are currently the standard of care for grading conjunctival hyperemia; however, there is often intra- and inter-observer variability.10 When comparing all subjective with all objective measurements, the objective software grade approximately 1 unit lower, indicating that the validity of the measurements in this study was potentially poor, grading lower than the true values. However, the repeatability is a much more important indicator of usability of the instrument, as the consistency of the measurement is more important than the relatively arbitrary “true” values currently used in subjective scales. Within the subjective grading, the 0.5-unit and 0.1-unit scaling methods were used in this study because they are both utilised in a clinical setting and in research. There were significant biases between the two scales, as well as between either of them and the objective system. In this study, the 0.5-unit subjective grades may have been lower than 0.1 due to an observer bias toward a lower value when between two 0.5 scale units, since higher scores are instinctively associated with negative outcomes. Examiners using the 0.1 scale were able to be more descriminative in their hyperemia grading without making such clinically significant steps. This study is in agreement with the Bailey et al. study that suggested use of a finer grading scale can increase the repeatability of the subjective measurement.14 However, this has been challenged by a recent and robust study that found 0.5 grading to be more precise than 0.1 grading in their cohort of optometry students and optometrists.15 These authors suggested that the finer scales are too precise for the levels of observed variation in hyperemia. Using this objective software, the 0.1 scale appears adequately precise, although was not tested against an objective 0.5 scale measurement.

Interestingly, the level of clinical experience of the examiners did not appear to affect subjective grading, with no clear relationships between experience level and grading, again emphasizing the variable nature of subjective scales. Other studies have shown that while training can improve the reliability of an individual examiner, a non-experienced observer is still able to comparably measure hyperemia.14,16 One study even found that non-Optometrists were able to grade hyperemia in images comparable to Optometrists, although slightly less reliably presumably due to a lack of clinical knowledge attributed to the skill,16 and another found that optometry students graded with more precision than experienced examiners. In this study, all examiners were Optometrists but at difference experience levels and no differences were found in the repeatability based on experience level. It ultimately appears that experience is not a strong factor in level of precision in subjective measurements. Beyond the level of training and the bias of unit-scales used, there remained signficiant differences between graders, such as between grader 1 and 4 who showed a 1.4 difference in average grading, despite being experienced in clinical research for 6 and 3 years, respectively. This supports the hypothesis that subjective grading is inherently biased, regardless of training and scales used. Overall, the subjective methods showed large variability compared to the objective system and were more subject to individual bias.

The repeatability of the objective grading scores were about two-times better with the slit lamp images than the Keratograph images. This could be due to the presence of the placido discs on the keratography images, or differences in pixel intensity or other image features from the two different systems. Despite this, the repeatability statistics of the AOS ® Anterior objective system are still approximately five times better than either of the subjective methods, even in the least repeatable Keratograph analyses. Further, each of the three trials for the different instruments were highly similar amongst themselves. This indicates that the software is versatile and able to be used on images taken from different instruments, although caution should be taken when comparing images taken on one instrument directly to those acquired on a different type of instrument. While the repeatability of the objective analysis on the slit lamp images was the best of all methods compared in this study, there were still 37 low luminance images that were unable to be accurately assessed by the AOS ® Anterior software. These images were erroneously graded as very high (scores ranging from 3.5-4.0) due to the overall yellow hue being interpreted as redness. This occurred with the slit lamp and not with Keratograph images because the lighting was subjectively adjusted (in various ways, including adjustment of an external light source) for the slit lamp photo acquisition, versus constant high luminance for the keratography images. An important limiting factor to the use of the AOS ® Anterior software is to ensure images are taken with sufficiently high luminance for accurate grading. Ultimately, before using this system, calibration of imaging devices should be done and images should be tested to ensure sufficient lighting is being used with images acquired for AOS ® Anterior analysis. Future studies are indicated to determine the optimal luminance and instrument calibration necessary to use this software most effectively.

CONCLUSION

This study showed the reliable use of the AOS ® Anterior grading software on a robust collection of images from different optometric instruments, maximally sensitive to scores <3. When compared to 0.1 and 0.5-unit subjective grading techniques, the objective AOS ® Anterior software was more repeatable both within and between examiners. While the validity of the software may be lower than the average subjective score, its repeatability can improve the ability to accurately detect smaller differences in clinical trials and reduce subjective variability from the grading of conjunctival hyperemia.

ACKNOWLEDGEMENTS

The authors would like to acknowledge Sparca Corp, UK for providing the AOS ® Anterior software for this analysis.

REFERENCES

  • 1.Terry R, Schnider C, Holden B, et al. CCLRU standards for success of daily and extended wear contact lenses. Optom Vis Sci. 1993;70:234–43. [DOI] [PubMed] [Google Scholar]
  • 2.Efron N, Pritchard N, Brandon K, et al. A survey of the use of grading scales for contact lens complications in optometric practice. Clin Exp Optom. 2011;94:193–9. [DOI] [PubMed] [Google Scholar]
  • 3.Wolffsohn JS. Incremental nature of anterior eye grading scales determined by objective image analysis. Br J Ophthalmol. 2004;88:1434–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kurita J, Shoji J, Inada N, et al. Clinical Severity Classification using Automated Conjunctival Hyperemia Analysis Software in Patients with Superior Limbic Keratoconjunctivitis. Curr Eye Res. 2018;43:679–82. [DOI] [PubMed] [Google Scholar]
  • 5.Rodriguez JD, Johnston PR, Ousler GW, Smith LM, Abelson MB. Automated grading system for evaluation of ocular redness associated with dry eye. Clin Ophthalmol. 2013;3:1197–204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Fieguth P, Simpson T. Automated measurement of bulbar redness. Invest Ophthalmol Vis Sci. 2002;43:340–7. [PubMed] [Google Scholar]
  • 7.Yoneda T, Sumi T, Hoshikawa Y, Kobayashi M, Fukushima A. Hyperemia Analysis Software for Assessment of Conjunctival Hyperemia Severity. Curr Eye Res. 2019;44:376–80. [DOI] [PubMed] [Google Scholar]
  • 8.Romano V, Steger B, Brunner M, et al. Detecting Change in Conjunctival Hyperemia Using a Pixel Densitometry Index. Ocul Immunol Inflamm. 2019;27:276–81. [DOI] [PubMed] [Google Scholar]
  • 9.Al-Hayouti H, Daniel M, Hingorani M, Calder V, Dahlmann-Noor A. Automated Ocular Surface Image Analysis and Health-Related Quality of Life Utility Tool to Measure Blepharokeratoconjunctivitis Activity in Children. Cornea. 2019;38:1418–23. [DOI] [PubMed] [Google Scholar]
  • 10.Hwang JM, Park IK, Chun YS, Kim KG, Yang HK. New clinical grading scales and objective measurement for conjunctival injection. Invest Ophthalmol Vis Sci. 2013;54:5249–57. [DOI] [PubMed] [Google Scholar]
  • 11.Huntjens B, Basi M, Nagra M. Evaluating a new objective grading software for conjunctival hyperaemia. Contact Lens Anterior Eye. 2020;43:137–43. [DOI] [PubMed] [Google Scholar]
  • 12.Efron N, Morgan PB, Katsara SS. Validation of grading scales for contact lens complications. Ophthalmic Physiol Opt. 2001;21:17–29. [PubMed] [Google Scholar]
  • 13.Bland J, Altman DG. Measurement Error. BMJ. 1996;312(7047):1654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Bailey IL, Bullimore MA, Raasch TW, Taylor HR. Clinical grading and the effects of scaling. Investig Ophthalmol Vis Sci. 1991;32:422–32. [PubMed] [Google Scholar]
  • 15.Vianya-Estopa M, Nagra M, Cochrane A, et al. Optimising subjective anterior eye grading precision. Contact Lens Anterior Eye 2020;43:489–92. [DOI] [PubMed] [Google Scholar]
  • 16.Efron N, Morgan PB, Jagpal R. The combined influence of knowledge, training and experience when grading contact lens complications. Ophthalmic Physiol Opt. 2003;23:79–85. [DOI] [PubMed] [Google Scholar]

RESOURCES