Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Aug 1.
Published in final edited form as: Am J Ophthalmol. 2021 Apr 20;228:96–105. doi: 10.1016/j.ajo.2021.03.061

Development of classification criteria for the uveitides

The Standardization of Uveitis Nomenclature (SUN) Working Group*,1,2
PMCID: PMC8526627  NIHMSID: NIHMS1695223  PMID: 33848532

Abstract

Purpose:

To develop classification criteria for 25 of the most common uveitides.

Design:

Machine learning using 5766 cases of 25 uveitides.

Methods:

Cases were collected in an informatics-designed preliminary database. Using formal consensus techniques, a final database was constructed of 4046 cases achieving supermajority agreement on the diagnosis. Cases were analyzed within uveitic class and were split into a training set and a validation set. Machine learning used multinomial logistic regression with lasso regularization on the training set to determine a parsimonious set of criteria for each disease and to minimize misclassification rates. The resulting criteria were evaluated in the validation set. Accuracy of the rules developed to express the machine learning criteria was evaluated by a masked observer in a 10% random sample of cases.

Results:

Overall accuracy estimates by uveitic class in the validation set were: anterior uveitides 96.7% (95% confidence interval [CI] 92.4, 98.6); intermediate uveitides 99.3% (95% CI 96.1, 99.9); posterior uveitides 98.0% (95% CI 94.3, 99.3); panuveitides 94.0% (95% CI 89.0, 96.8); and infectious posterior/panuveitides 93.3% (95% CI 89.1, 96.3). Accuracies of the masked evaluation of the “rules” were: anterior uveitides 96.5% (95% CI 91.4, 98.6) intermediate uveitides 98.4% (91.5, 99.7), posterior uveitides 99.2% (95% CI 95.4, 99.9), panuveitides 98.9% (95% CI 94.3, 99.8), and infectious posterior/panuveitides 98.8% (95% CI 93.4, 99.9).

Conclusions:

The classification criteria for these 25 uveitides had high overall accuracy (i.e. low misclassification rates) and appeared to perform well enough for use in clinical and translational research.

PRECIS

Using a formalized approach to developing classification criteria, including informatics-based case collection, consensus-technique-based case selection, and machine learning, classification criteria for 25 of the most common uveitides were developed. The resulting criteria had overall uveitic class accuracies >90% in both the training and validation sets, suggesting potential usefulness in clinical and translational research.


The uveitides are a collection of over 30 diseases characterized by intraocular inflammation.1 They can be organized as a matrix of diseases classified by the uveitic anatomic class and by whether they are: 1) infectious, 2) associated with a systemic auto-inflammatory or auto-immune disease, or 3) eye-limited and presumed to be immune-mediated (Table 1).1 The uveitic classes are defined anatomically by the primary site in which inflammation is detected clinically and consist of: anterior uveitis (primary site in the anterior chamber), intermediate uveitis (primary site in the vitreous), posterior uveitis (primary site in the retina or choroid), and panuveitis (anterior chamber, vitreous, and retina/choroid all typically involved without predominance in any one site). Posterior uveitides may involve primarily the retina and typically are infectious or may involve primarily the choroid and/or retinal pigment epithelium and though often non-infectious also may be infectious.2

Table 1.

Uveitic Diseases Addressed by the SUN Developing Classification Criteria for the Uveitides Project

Anatomic class Infectious* Systemic Disease Associated Eye-limited
Anterior Cytomegalovirus anterior uveitis Juvenile idiopathic arthritis-associated anterior uveitis Fuchs uveitis syndrome
Herpes simplex virus anterior uveitis Spondyloarthritis/HLA-B27-associated anterior uveitis
Varicella zoster virus anterior uveitis Tubulointerstitial nephritis with uveitis
Syphilitic anterior uveitis Sarcoidosis-associated anterior uveitis
Intermediate Syphilitic intermediate uveitis Multiple sclerosis-associated intermediate uveitis Pars planitis
Sarcoidosis-associated intermediate uveitis Intermediate uveitis, non-pars planitis type
Posterior Acute retinal necrosis Sarcoidosis-associated panuveitis Acute posterior multifocal placoid pigment epitheliopathy
Cytomegalovirus retinitis Birdshot chorioretinitis
Syphilitic posterior uveitis Multiple evanescent white dot syndrome
Toxoplasmic retinitis Multifocal choroiditis with panuveitis
Tuberculous posterior uveitis Punctate inner choroiditis Serpiginous choroiditis
Panuveitis Syphilitic panuveitis Behçet disease uveitis Sympathetic ophthalmia
Tuberculous panuveitis
Sarcoidosis-associated panuveitis
Vogt-Koyanagi-Harada disease
(Early-stage and late-stage)
*

Infectious uveitides refer to those with evidence of active infection. They do not include auto-inflammatory or auto-immune diseases triggered by a prior infection (e.g. reactive arthritis-associated uveitis).

Classification criteria are employed to diagnose individual diseases for research purposes.3 Classification criteria differ from clinical diagnostic criteria, in that although both seek to minimize misclassification, when a trade-off is needed, diagnostic criteria emphasize sensitivity, whereas classification criteria emphasize specificity. The goal of classification criteria is to define a homogeneous group of patients for inclusion in research studies and to optimize the likelihood that all participants in the study will be generally accepted to have the disease.3

Classification criteria are needed for the field of uveitis. Although diagnostic criteria have been proposed for several diseases, currently there is no validated systematic approach to classifying the uveitides, and currently the agreement among uveitis experts on the diagnosis of a specific case is moderate at best (κ=0.39). Furthermore, there are pairs of experts for whom the observed level of diagnostic agreement could have occurred by chance alone (κ~0.0).4 As such there is a lack of uniformity of reporting in the literature and an uncertainty about the comparability of different case series and clinical studies of patients with uveitis. Adoption of generally-accepted and widely-used classification criteria for the reporting the uveitides in the literature should help address the current uncertainty that exists.

The Standardization of Uveitis Nomenclature (SUN) Working Group is an international collaboration dedicated to improving research in the field of Uveitis.2 The “SUN Developing Classification Criteria for the Uveitides” project’s goal was to develop classification criteria for the 25 of the most common uveitides using a formal approach to development and classification.2,46

Methods

The SUN Developing Classification Criteria for the Uveitides project proceeded in four phases: 1) informatics, 2) case collection, 3) case selection, and 4) machine learning.46

Informatics.

As previously described, the informatics phase was conducted from 2009 to 2010 and developed a standardized vocabulary and set of dimensions for describing uveitic cases and diseases.5,6 It enabled the development of a standardized, menu-driven, hierarchical case report for case collection, which sought to maximize discrete data collection and minimize free text.

Case collection.

In case collection, information on 5766 cases of 25 of the most common uveitides was collected retrospectively between 2010 and 2016 using the standardized forms developed during the informatics phase.4 Information was entered into the SUN preliminary database by the 76 contributing investigators. Case information was de-identified, and investigators entered cases retrospectively from existing case records. Investigators were instructed to enter data from the presentation visit or, in the unusual situation where there was disease evolution, the visit at which the diagnosis become known.4 The target for case collection was 150–250 cases of each of the 25 diseases. Once ~250 cases were collected, case collection for a specific disease was closed. Because they enter into the differential diagnosis of several classes of the uveitides, more than 250 cases of sarcoidosis-associated uveitis (383 cases) and tuberculous uveitis (358 cases) were collected. Because of their very different features, cases of early-stage Vogt-Koyanagi-Harada disease and late-stage Vogt-Koyanagi-Harada disease were collected separately. Because the goal of the project was to develop criteria to distinguish among the uveitides, only cases with uveitis were entered into the preliminary database.

Investigators were instructed to submit images relevant to the diagnosis (e.g. fundus photographs for infectious and non-infectious posterior and panuveitides, fluorescein angiograms, and optical coherence tomograms as appropriate) into the database. These images were used by the case selection committees for case selection and graded independently by a Reading Center at the Department of Ophthalmology, the University of Wisconsin, Madison School of Medicine and Public Health. Reading Center grades included information on lesion number, location, size, and character as appropriate. Reading Center data were used preferentially in the machine learning for posterior and panuveitides (including the infectious subset) for features including lesion (or spot) number, distribution, and size. Image results relevant to criteria were reviewed and discrepancies adjudicated by a clinician (PM) dedicated to image management. Actual images were not subject to machine learning.

Case selection.

Because there is no “gold standard” for case definition, because there is modest agreement among uveitis experts,4 and because there is a need to define a homogeneous group of patients by classification criteria, it was decided to “select” those cases from the preliminary database that achieved a supermajority agreement on the diagnosis as the final database to be used in the machine learning phase. Case selection occurred during 2016 and 2017. Cases in the preliminary database were reviewed by committees of 9 investigators for inclusion into the final database (case “selection”).4 Committees were geographically and “school of thought” dispersed. Case selection proceeded in two steps: online voting followed by consensus conference calls.4 During online voting committee members reviewed the cases and individually voted on whether the data supported the diagnosis or not based on their clinical judgment without reference to any specific criteria. A “forced choice” was required on whether the investigator thought that the case should be included or not included in the final database. Cases obtaining a supermajority (>75%) of “yes” votes were included, those with a supermajority of “no” votes were excluded. Those cases with no supermajority yes or no votes were tabled for consensus conference calls. The consensus conference calls were conducted using nominal group techniques, which are a formal consensus approach that minimizes “dominant personality” effects.7 A round of formal uninterrupted individual comments was followed by anonymous voting with supermajority requirements for acceptance or rejection. If the case was neither accepted nor rejected after the first round, a second round was conducted. If the case was neither accepted nor rejected after the second round, it was permanently tabled and not included in the final database. Five committees based on uveitic class worked in parallel, with infectious posterior and panuveitis as a separate committee. The core committee membership was the same for all of the diseases within a class, but there was some variability between specific diseases in the committee membership based on investigator availability.

Machine learning.

Machine learning was conducted during 2018 and 2019. The final database then was randomly separated into a training set (~85% of the cases) and a validation set (~15% of the cases) for each uveitic class. Data from “check all that apply” questions in the database were converted to a series of binary “yes/no” or “present/absent” items. Because of the retrospective nature of data collection and the selective Bayesian approach to testing in clinical care now advocated (in which tests are selected to rule in or out a diagnosis, rather than a standard set of tests used on all cases),1 not all laboratory data were available on each case. Therefore, an “evidence for” approach was adopted in which data supporting the diagnosis were needed to make the diagnosis and missing data were treated as negative data. This approach mimics clinical care in which it is presumed that tests not performed would be negative or irrelevant if they had been performed. However, relatively more complete data were available for the two typically exclusionary diseases that can present clinically in any of the uveitic classes, syphilis and sarcoidosis.

Because the uveitic disease diagnosis is a patient diagnosis, eye-specific information was coalesced into patient-specific information, typically representing the “worse eye”. If the feature was present in either eye, it was treated as present for the individual, and if there were multiple options for a feature (e.g. predominant lesion size), it was taken as the larger of the two ranks.

Machine learning was used on the training set to determine criteria that minimized misclassification. Because diagnostic confusion typically is within class and not between anatomic classes of the uveitis, machine learning was performed separately within class for 5 groups of diseases: anterior uveitides, intermediate uveitides, posterior uveitides, panuveitides, and infectious posterior or panuveitides. Cases from subsets of diseases which crossed class (e.g. syphilitic uveitis, sarcoidosis-associated uveitis, and tuberculous uveitis) were included in the relevant class. Because of the low ratio of cases to diagnoses, it was elected to sequester ~150 cases into each of the 5 validation sets, which would provide a point-wise confidence interval no greater than ±0.08 when expressing accuracy as the fraction correct in the validation set. Four classification methods were considered, listed with their tuning parameters and R package name (in parentheses): classification and regression trees (CART), with cost-complexity pruning and cp=0.01 (rpart);8 random forests (RF) with default tuning parameters (randomForest), multinomial logistic regression with lasso regularization and the 1 standard error (se) value chosen for lambda (gimnet), and support vector machines (SVM) with radial kernel and tuning performed on a grid of cost and gamma values (e1071). The classification methods were compared with respect to accuracy and confusion matrices, Obuchowski’s index,9 Van Calster’s polytomous discrimination index,10,11 and discrimination plots. For the polytomous discrimination index, currently available packages could not handle the 9-level categorical variables required, and a new algorithm was developed (Oden, N, personal communication; R code available upon request).

In order to strive for parsimony in the feature set for each disease and avoid overfitting, an approach based on the Boruta algorithm was used (R package Boruta).12 Boruta is an all relevant feature wrapper algorithm that uses random forests by default and compares importance of attributes with shadow attributes, created in each iteration by shuffling original ones. Attributes that have significantly worse importance than shadow ones are consecutively dropped, while attributes that are significantly better than shadows are admitted to be confirmed. Candidate features of a given uveitic class were arranged in order of descending Boruta importance. Then each candidate classification method was asked to construct classifications 1, 2, 3, etc., where classification 1 is based only on the most important feature, classification 2 is based on the 2 most important features, etc. Graphs were constructed showing accuracy and unweighted kappa for the methods versus increasing number of included features, based on 5-fold cross-validation, and used to choose a final set of features that would generate a classification that was both parsimonious and accurate.

Multinomial logistic regression, RF, and SVM all provided similar results, but CART provided slightly worse performance (data not shown). Multinomial logistic regression with lasso regularization was chosen. This approach typically presents classification rules as linear combinations of features, which were restated as equivalent Boolean classification rules. This approach was possible because all SUN features were treated as categorical, and the few continuous features (e.g. age, intraocular pressure) were stratified as categorical variables. Thus once a logistic model is constructed for a uveitic class, the model can be asked to predict its outcome for every unique combination of final features in the training set. The resulting meta data were submitted to the Quine-McCluskey algorithm as extended by Dusa and Thiem,13 and implemented as the eQMC function in the R QCApro package. This algorithm constructs a minimal set of Boolean expressions, one for each different type of predicted output (uveitic disease) in the class. The collection of Boolean expressions is a set of classification rules that exactly re-creates the decisions of the logistic regression in the training set.

In order to optimize performance of the criteria, an iterative approach was taken to feature engineering using the learning set in which clinically relevant “OR” variables were combined as a single “evidence of” variable, such as combining chest radiographic results with chest computed tomography results to produce a variable identifying bilateral hilar adenopathy on chest imaging (i.e. chest radiography or computed chest tomography), then combining this variable with a tissue biopsy demonstrating non-caseating granulomata to produce an “evidence of sarcoidosis” variable. All such “OR” variable creation was performed only on the learning set and without reference to the diagnosis, and the performance of those variables selected for the final model was evaluated in the validation set. When the Quine McCluskey algorithm produced more than one equivalent set of criteria, the set that best fit with the other methods and with clinical care was chosen.

After criteria for each disease were developed using the training set, they were evaluated on the validation set, and the misclassification rate was calculated for both the learning and the validation sets. The misclassification rate was the proportion of cases classified incorrectly by the machine learning algorithm when compared to the consensus diagnosis. As a check on the accuracy measure, the balanced accuracy (which is unaffected by the relative numbers of cases of each diagnosis in the set) also was calculated on the validation set.

The final classification rules, which were expressed formally in terms of Boolean expressions involving variables in the training and test sets, were restated in English as the criteria (“final rules”) for each individual disease. In order to test the accuracy of the criteria, an ~10% sample of each disease in the final database was randomly selected, and the original case data (without engineered variables) were evaluated within uveitic class by a single observer masked as to diagnosis (JET), in order to estimate the class accuracy of the final rules. The masked observer’s results were compared to the machine learning results (to determine how well they reflected the conversions of the machine learning variables to “rules” expressed in English) and to the consensus diagnoses (to determine how well they performed).

The final classification rules and the disease-specific manuscripts presenting them were subject to multiple levels of review and approval, including the individual manuscripts’ writing committees, the Executive Committee, the Steering Committee, and the SUN Working Group. The SUN Working Group held a meeting in December 2019 to review the manuscripts and criteria, resulting in over 80 separate suggestions for the criteria and manuscripts. Additional analyses and sensitivity analyses suggested by that meeting were conducted in the first quarter of 2020, leading to additional revisions.

Results

Of the 5766 cases collected, 4046 (70%) were selected in the case selection phase and used in the machine learning phase. Supermajority agreement that a case should be included or excluded was achieved on 99% of cases overall (i.e. only 1% were tabled due to failure to reach agreement).4 The numbers of cases selected and the region of origin by uveitic class are listed in Table 2. Cases of sarcoidosis-associated, syphilitic, and tubercular uveitis were analyzed within relevant uveitic class (e.g. sarcoid anterior uveitis with the anterior uveitides, sarcoid intermediate uveitis with the intermediate uveitides, etc), and when cases were in the differential diagnosis both of non-infectious uveitides and panuveitides and of infectious posterior or panuveitides, they were used in both sets for machine learning. Because of this use of some cases in more than one class and the use of subsets of sarcoidosis, syphilis, and tuberculosis in different classes the numbers of cases used in the machine learning phase were: anterior uveitides, 1083; intermediate uveitides, 589; posterior uveitides, 1068; panuveitides, 1012; and infectious posterior and panuveitides 803.

Table 2.

Regional Origin of Cases in Final Database after Case Selection

Uveitic Class Anterior Uveitides Intermediate Uveitides Posterior Uveitides Panuveitides Infectious Posterior & Panuveitides
Total cases selected* 947 452 735 846 1066
Region Regional Origin of Uveitic Class Cases Selected (% of class)
Asia 14 2 9 34 27
Australia 7 3 2 4 5
Europe 44 49 31 26 25
North America 34 39 57 31 31
South America 1 7 1 5 12
*

Number of cases used in machine learning sets may vary from the number cases selected, as some cases (e.g. syphilis, sarcoidosis, & tuberculosis) which cross class may be used for machine learning in more than one class.

Detailed reasons for non-selection of cases for the final database were not collected, but the consensus of the selection committees was that inadequate information (e.g. missing data) and wrong diagnosis (e.g. undifferentiated anterior uveitis presumptively diagnosed as herpetic uveitis) predominated.

The characteristics of the individual disease data sets, the classification criteria selected, and the individual disease misclassification rates are reported in the accompanying articles on each disease.1438 Overall accuracies by uveitic class in the training set were: anterior uveitides 97.5%, intermediate uveitides 99.8%, posterior uveitides 92.7%, panuveitides 96.3%, and infectious posterior/panuveitides 92.1%. Overall accuracies by uveitic class (Table 3) in the validation set were: anterior uveitides 96.7% (95% confidence interval [CI] 92.4, 98.6); intermediate uveitides 99.3% (95% CI 96.1, 99.9); posterior uveitides 98.0% (95% CI 94.3, 99.3); panuveitides 94.0% (95% CI 89.0, 96.8); and infectious posterior/panuveitides 93.3% (95% CI 89.1, 96.3). The balanced accuracies (Table 3) gave qualitatively and quantitatively similar results as the simple accuracies.

Table 3.

Accuracy of Validation Set Classification Criteria for the Uveitides

Uveitic Class Number diseases Accuracy (%)* 95% Confidence Interval Balanced Accuracy (%) 95% Confidence Interval
Anterior uveitides 9 96.7 92.4, 98.6 97.1 94.6, 99.5
Intermediate uveitides 5 99.3 96.1, 99.9 99.6 99.0, 100
Posterior uveitides 9 98.0 94.3, 99.3 98.7 97.3, 100
Panuveitides 7 94.0 89.0, 96.8 94.8 91.4, 98.1
Infectious posterior and panuveitides 5 93.3 89.1, 96.3 93.8 90.1, 97.6
*

Using multinomial logistic regression with lasso regularization.

Defined as the unweighted average of the sensitivities.

The numbers of cases for which the final rules were evaluated by the masked examiner were: anterior uveitides 112; intermediate uveitides 65; posterior uveitides 122; panuveitides 96; and infectious posterior and panuveitides 85. The results of the masked evaluation of the final sets of “rules” by the masked evaluator are listed in Table 4. The estimates of the accuracies by uveitic class for the masked examiner versus the machine learning results were: anterior uveitides 94.8% (95% CI 89.1, 97.6); intermediate uveitides 100% (95% CI 94.2, 100); posterior uveitides 91.6% (95% CI 85.2, 95.4); panuveitides 93.7% (95% CI 86.9, 97.1); and infectious posterior/’panuveitides 91.5% (95% CI 83.4, 95.8). The estimates of accuracies by uveitic class for the masked examiner versus the consensus diagnosis were: anterior uveitides 96.5% (95% CI 91.4, 98.6); intermediate uveitides 98.4% (95% CI 91.5, 99.7); posterior uveitides 99.2% (95% CI 95.4, 99.9); panuveitides 98.9% (95% CI 94.3, 99.8); infectious posterior and panuveitides 98.8% (95% CI 93.4, 99.8).

Table 4.

Accuracy of “Final Rules” in Random Sample*

Uveitic class vs Machine Learning Result vs Consensus Diagnosis
Accuracy (%) 95% CI Accuracy (%) 95% CI
Anterior uveitides 94.8 89.1, 97.6 96.5 91.4, 98.6
Intermediate uveitides 100 94.2, 100 98.4 91.5, 99.7
Posterior uveitides 91.6 85.2, 95.4 99.2 95.4, 99.9
Panuveitides 93.7 86.9, 97.1 98.9 94.3, 99.8
Infectious posterior/panuveitides 91.5 83.4, 95.8 98.8 93.4, 99.9
*

~10% random sample of cases within uveitic class. Evaluated by examiner masked as to machine learning results and consensus diagnosis.

95% CI = 95% confidence interval.

Discussion

Classification criteria may take several formats, such as a set of minimal criteria (i.e. minimal number of criteria from a list), a point system, or a set of required criteria and exclusions. The Systemic Lupus International Collaborating Clinics criteria for systemic lupus erythematosus,39 like the American College of Rheumatology (ACR) criteria before them,40 use a minimal number from a list of criteria approach; the 2010 ACR/European League Against Rheumatism (EULAR) criteria for rheumatoid arthritis41 and 2019 EULAR/ACR criteria for systemic lupus erythematosus42 use a point system. The International League Against Rheumatism (ILAR) criteria for juvenile idiopathic arthritis (JIA) use a list of required criteria and exclusions approach.43 The SUN classification criteria for the uveitides appeared to fit best with an approach similar to that of the ILAR criteria for JIA, and that format was adopted. As evidenced by the accuracy results on the validation set, the SUN criteria appear to perform reasonably well in distinguishing the diseases included in this study.

Given the number of cases, it was decided to analyze the diseases within uveitic class. This approach simplified the analyses without sacrificing accuracy. For example, anterior uveitides are characterized by inflammation detected primarily in the anterior chamber and an absence of chorioretinal inflammatory lesions, whereas posterior uveitides are characterized lesions in the choroid and/or retina; hence the presence of chorioretinal inflammatory lesions (i.e. uveitic class) distinguishes the two disease sets. Some diseases, such as syphilitic uveitis,36 sarcoidosis-associated uveitis,31 and in selected situations tubercular uveitis, 38 may present as a disease in more than one uveitic class. For these diseases, those cases from the relevant anatomic class were analyzed with the other diseases in that uveitic class. Infectious retinitides were analyzed separately from non-infectious posterior uveitides, as the latter primarily affect the choroid and/or retinal pigment epithelium,1 which would separate them from primarily choroidal diseases on machine learning. However, relevant cases from selected infectious diseases (e.g. serpiginous-like tubercular choroiditis) also were analyzed with the posterior uveitides as they are in the differential diagnosis.

In general, traditional names were used for most of these diseases, and diseases were analyzed within the most appropriate uveitic class. For example, the term acute posterior multifocal pigment epitheliopathy was used even though data suggest that it may be primarily a disease of the choriocapillaris,24 and multifocal choroiditis with panuveitis was analyzed as a posterior uveitis (despite its name) as the primary site of inflammation is the choroid; the anterior chamber and vitreous inflammation are variable and not always present.27

There are limitations to this work. The primary one is the retrospective nature of data collection. Because of the selective Bayesian approach to laboratory testing now advocated1 for clinical care, not all data on all tests were available for all cases, and results needed to be imputed. In selected diseases, this limitation may have led to overestimation of the value of the test in question. Sensitivity analyses were performed with the population frequency of the test imputed randomly to the control diseases to determine if the test’s value was qualitatively similar and are discussed with the appropriate diseases.25,37 The database transformation and feature engineering to enable machine learning have the potential to introduce error into the system, as does the conversion of variables used in machine learning to clinically usable rules. The masked evaluation of the final rules suggests that the conversion of variables to rules did not introduce substantial error into the process, as there was >90% agreement between the masked evaluation of the rules and the machine learning results and >95% agreement between the masked evaluation of the rules and the consensus diagnoses.

Uveitis experts generally are associated with academic medical centers in more resource-abundant countries, which also is true for the SUN Working Group. Although there are differences between academic medical centers and community ophthalmologists in the distribution of uveitis cases seen (a higher proportion of the cases seen by community ophthalmologists are anterior uveitis), it appears that the clinical features of the diagnosed uveitic diseases are similar.45 Cases were collected from multiple uveitis centers on 5 continents in order to collect adequate numbers of cases with regional variability in incidence. Nevertheless, an effort was made to collect cases of each disease from several centers and all regions. As such, the criteria should have reasonable generalizability, and in selected diseases (e.g. cytomegalovirus anterior uveitis, Behçet disease uveitis, tubercular uveitis) a comparison of cases from different regions was made to investigate the generalizability.14,30,38

Several diseases have criteria which include the level of inflammation detected in either the anterior chamber or the vitreous.25,26,28,29,34,35 Although operationally in the machine learning this translated into a threshold cut-off on a semi-quantitative scale (e.g. ≤ ½+ or ≥2+), the criteria use the terms such as “absent”, “minimal”, “mild”, and “moderate”. This decision was made for two reasons. First, given the retrospective nature of the data, it was highly likely that not all cases (or even the majority) used the SUN grading schema for anterior chamber and vitreous inflammation,2 and different scales have different numbers of steps, making comparability more difficult, even though qualitative meanings are attached to the numerical grades.2 Second, even though the SUN grading schemas have substantial to almost perfect agreement within one grade, the exact agreement is moderate,46 so that a threshold of ½+ for one examiner might be 1+ for another, even when both agree on the qualitative nature of the inflammation. Nevertheless, in general, the qualitative terms map to the semi-quantitative SUN grades as follows: absent, grade 0; minimal, grade ½+; mild, 1+; and moderate, 2+. More severe grades (i.e. 3+ and 4+) were included as the phrase moderate or greater. For some of the posterior uveitides, topographical location of the posterior segment lesions is important and is described as posterior pole, mid-periphery, periphery.27,28 In general these areas correspond to the area within the arcades or adjacent to the optic nerve (posterior pole), the area from the posterior pole to the equator (mid-periphery), and the area anterior to the equator (periphery).

The exclusions in the individual disease criteria list findings, which if present, exclude the disease in question. In many cases the exclusion criteria will suggest an alternate diagnosis, but they also merely could cast sufficient uncertainty on the diagnosis that including them in a research study would be deemed inappropriate. In prospective studies, it is anticipated that these findings/imaging/tests will be actively sought or performed. However, it is recognized that in retrospective studies, they may not always have been performed, and that in low prevalence situations, they appropriately may not be performed (e.g. Lyme disease testing in low-prevalence regions).2123 Hence in some situations not performing the test does not exclude the disease for reporting purposes, but a result consistent with an exclusion criterion does. For example, a patient with cytomegalovirus anterior uveitis diagnosed on a polymerase chain reaction (PCR)-based assay of an aqueous specimen would be diagnosed as cytomegalovirus anterior uveitis even if a syphilis serology had not been performed.

Polymerase chain reaction assays for the presence of pathogen nucleic acids performed on intraocular fluids (either aqueous or vitreous) is a useful adjunct for infectious uveitides, and a positive result on a PCR assay is a criterion to support the diagnosis of several conditions, such as viral anterior uveitides, viral retinitides, and toxoplasmic retinitis.1416,34,35,37 Polymerase chain reaction assays should be performed using validated assays, either with an established sensitivity cutoff or with quantitation (qPCR). Appropriate positive and negative controls, including controls for sensitivity and for the presence of inhibitors in ocular fluids should be run with each sample by a certified clinical laboratory. Interpretation of PCR results is dependent on the clinical context, and the possibility of detection of latent or dead organisms or carry-over contamination should be considered for positive results. The possibility of low yield of infectious organisms from an ocular biopsy should be considered for negative results. In the future, whole genome sequencing techniques may provide equivalent or superior information, but this technology has not yet entered routine clinical use.

Classification criteria are employed to diagnose individual cases for research purposes and attempt to define a homogeneous phenotype for research puposes.3 Classification criteria differ from clinical diagnostic criteria in that although both attempt to optimize sensitivity and specificity, when a trade off is needed, classification criteria emphasize specificity.3 The machine learning process employed did not explicitly use sensitivity and specificity; instead it minimized the misclassification rate (i.e. maximized accuracy). A potential limitation of accuracy is that the results may be dominated by diseases with large prevalences in the sample. Therefore, we also calculated the balanced accuracy for each disease class. While accuracy in our data may be thought of as a weighted average of disease sensitivities, with weights given by the prevalences of the individual diseases in the sample, balanced accuracy is the unweighted average of the sensitivities. Both accuracy and balanced accuracy gave qualitatively and quantitatively similar results.

Two other measures used to judge the performance of a classifier are precision (positive predictive value) and recall (sensitivity). Unlike accuracy, which is calculated for the confusion matrix, these two measures are calculated at the level of the individual disease (and therefore are calculated on a smaller sample size). We were interested in not only the performance for each disease but also within the class (e.g. how well did the criteria perform for all anterior uveitides, etc.). Because of the relatively smaller size of the validation set confusion matrices, we calculated precision and recall on the training set data as a sensitivity analysis of the performance of these criteria (see supplemental table, available online at ajo.com). In general, these measures are in line with the accuracy of the uveitic classes and the misclassification rates of the individual diseases.1438

In the absence of a “gold standard’ for disease diagnosis, consensus techniques were used to include cases in the final database, which was important for developing accurate criteria, as uveitis expert agreement on diagnosis is moderate at best, with substantial individual variability.4 In machine learning accuracy depends on the completeness of the training set data. The exclusion of cases during the case selection phase may have limited the disease variability to less than that seen in nature. As a consequence, there may be cases in clinical care which will not “meet criteria” but where the clinician diagnoses the disease in question, which is appropriate for clinical care. Nevertheless, given the uncertainty of the diagnosis and the limited agreement of clinicians on clinical diagnosis,4 these cases would not be included in research studies where phenotype homogeneity is important.

Finally classification criteria evolve over time. For example, the criteria for rheumatoid arthritis, systemic lupus erythematosus, and juvenile idiopathic arthritis all have been revised over time.3944 For some of the diseases the SUN criteria are similar to criteria previously proposed but update them with newer information, including virologic criteria, immunogenetic criteria, and imaging criteria.15,17,20,25,31,33,34 These differences are discussed in the individual disease manuscripts. It should be expected that the SUN criteria likely will be updated in the future as new information becomes available. However, such updates should use rigorous scientific approaches and discuss the improvements in accuracy.

In sum, the classification criteria developed by the SUN Working Group appear to perform sufficiently well for use in clinical and translational research.

Supplementary Material

1
2

Grant support:

Supported by grant R01 EY026593 from the National Eye Institute, the National Institutes of Health, Bethesda, MD, USA; the David Brown Fund, New York, NY, USA; the Jillian M. And Lawrence A. Neubauer Foundation, New York, NY, USA; and the New York Eye and Ear Foundation, New York, NY, USA.

Footnotes

3

Conflict of Interest: Douglas A. Jabs: none; Peter McCluskey: none; Neal Oden: none; Alan G. Palestine: none; Jan Peterson: none; Sophia Saleem: none; Jennifer E. Thorne: Dr. Thorne engaged in part of this research as a consultant and was compensated for the consulting service; Brett E. Trusko: none.

Publisher's Disclaimer: This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

REFERENCES

  • 1.Jabs DA, Busingye J. Approach to the diagnosis of the uveitides. Am J Ophthalmol 2013;156:228–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Jabs DA, Rosenbaum JT, Nussenblatt RB, the Standardization of Uveitis Nomenclature (SUN) Working Group. Standardization of uveitis nomenclature for reporting clinical data. Report of the first international workshop. Am J Ophthalmol 2005;140:509–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Aggarwal R, Ringold S, Khanna D, et al. Distinctions between diagnostic and classification criteria. Arthritis Care Res 2015;67:891–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Jabs DA, Dick A, Doucette JT, Gupta A, Lightman S, McCluskey P, Okada AA, Palestine AG, Rosenbaum JT, Saleem SM, Thorne J, Trusko, B for the Standardization of Uveitis Nomenclature Working Group. Interobserver agreement among uveitis experts on uveitic diagnoses: the Standard of Uveitis Nomenclature Experience. Am J Ophthalmol 2018; 186:19–24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Trusko B, Thorne J, Jabs D, et al. Standardization of Uveitis Nomenclature Working Group. The SUN Project. Development of a clinical evidence base utilizing informatics tools and techniques. Methods Inf Med 2013;52:259–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Okada AA, Jabs DA. The SUN Project. The future is here. Arch Ophthalmol 2013;131:787–9. [DOI] [PubMed] [Google Scholar]
  • 7.Delbecq AL, Van de Ven AH, Gustafson DH. Group Techniques to Program Planning. A Guide to Nominal Group and Delphi Processes. Glenview, Scott Foresman & Co., 1975. [Google Scholar]
  • 8.Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. New York, Chapman and Hall/CRC, 1984. [Google Scholar]
  • 9.Obuchowski NA. Estimating and comparing diagnostic tests’ accuracy when the gold standard is not binary. Academic Radiology 2005;12:1198–1204. [DOI] [PubMed] [Google Scholar]
  • 10.Van Calster B, Vergouwe Y, Looman CWN, Van Belle V, Timmerman D, Steyerberg EW. Assessing the discriminative ability of risk models for more than two outcome categories. European J Epidemiol 2012;27:761–70. [DOI] [PubMed] [Google Scholar]
  • 11.Van Calster B, Van Belle V, Verouwe Y, et al. Extending the c-statistic to nominal polytomous outcomes: the Polytomous Discrimination Index. Stat Med 2012;31(23):2610–26. [DOI] [PubMed] [Google Scholar]
  • 12.Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Software 2010;36:1–13. [Google Scholar]
  • 13.Dusa A, Thiem A. Enhancing the minimization of Boolean and multivariate output functions with eQMC. J Mathematic Sociol 2015;39:92–108. [Google Scholar]
  • 14.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for cytomegalovirus anterior uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 15.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for herpes simplex anterior uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 16.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for varicella zoster anterior uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 17.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for Fuchs uveitis syndrome. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 18.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for juvenile idiopathic arthritis-associated anterior uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 19.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for spondyloarthritis/HLA-B27-associated anterior uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 20.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for tubulointerstitial nephritis with uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 21.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for pars planitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 22.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for intermediate uveitis, non-pars planitis type. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 23.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for multiple sclerosis-associated intermediate uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 24.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for acute posterior multifocal placoid pigment epitheliopathy. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 25.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for birdshot chorioretinitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 26.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for multiple evanescent white dot syndrome. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 27.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for multifocal choroiditis with panuveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 28.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for punctate inner choroiditis. Am J Ophthalmol 2020;volume:pp. [DOI] [PMC free article] [PubMed]
  • 29.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for serpiginous choroiditis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 30.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for Behçet disease uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 31.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for sarcoidosis-associated uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 32.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for sympathetic ophthalmia. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 33.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for Vogt-Koyanagi-Harada disease. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 34.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for acute retinal necrosis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 35.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for cytomegalovirus retinitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 36.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for syphilitic uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 37.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for toxoplasmic retinitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 38.The Standardization of Uveitis Nomenclature (SUN) Working Group. Classification criteria for tubercular uveitis. Am J Ophthalmol 2020;volume:pp. [DOI] [PubMed]
  • 39.Petri M, Orbai A-M, Alarcon GS, et al. Derivation and validation of the Systemic Lupus International Collaborating Clinics classification criteria for systemic lupus erythematosus. Arthritis Rheum 2012;64:2677–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tan EM, Cohen AS, Fries JF et al. The 1982 revised criteria for the classification of systemic lupus erythematosus. Arthritis Rheum 1982;25:1271–7. [DOI] [PubMed] [Google Scholar]
  • 41.Aletaha D, Neogi T, Silman AJ, et al. 2010 rheumatoid arthritis classification criteria. An American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum 2010;62:2569–81. [DOI] [PubMed] [Google Scholar]
  • 42.Aringer M, Costenbader K, Daikh D, et al. 2019 European League Against Rheumatism/American College of Rheumatology classification criteria for systemic lupus erythematosus. Arthritis Rheumol 2019;71:1400–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Petty RE, Southwood TR, Manners P, et al. International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton 2001. J Rheumatol 2004;31:390–2. [PubMed] [Google Scholar]
  • 44.Arnett FC, Edworthy SM, Block DA, et al. The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. Arthritis Rheum 1988;31:315–24. [DOI] [PubMed] [Google Scholar]
  • 45.McCannel CA, Holland GN, Helm CJ, Cornell PJ, Winston JV, Rimmer TG. Causes of uveitis in the general practice of ophthalmology. UCLA Community-Based Uveitis Study Group. Am J Ophthalmol 1996;121:35–46. [DOI] [PubMed] [Google Scholar]
  • 46.Kempen JH, Ganesh SK, Sangwan VS, Rathinam SR. Interobserver agreement in grading activity and site of inflammation in eyes of patients with uveitis. Am J Ophthalmol 2008;146:813–8. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2

RESOURCES