Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs

Caroline Vasseneix; Raymond P Najjar; Xinxing Xu; Zhiqun Tang; Jing Liang Loo; Shweta Singhal; Sharon Tow; Leonard Milea; Daniel Shu Wei Ting; Yong Liu; Tien Y Wong; Nancy J Newman; Valerie Biousse; Dan Milea; on behalf of the BONSAI Group

doi:10.1212/WNL.0000000000012226

. 2021 Jul 27;97(4):e369–e377. doi: 10.1212/WNL.0000000000012226

Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs

Caroline Vasseneix ¹, Raymond P Najjar ¹, Xinxing Xu ¹, Zhiqun Tang ¹, Jing Liang Loo ¹, Shweta Singhal ¹, Sharon Tow ¹, Leonard Milea ^1,^✉, Daniel Shu Wei Ting ¹, Yong Liu ¹, Tien Y Wong ¹, Nancy J Newman ¹, Valerie Biousse ¹, Dan Milea ^1,^✉; on behalf of the BONSAI Group¹

PMCID: PMC8362357 PMID: 34011570

Abstract

Objective

To evaluate the performance of a deep learning system (DLS) in classifying the severity of papilledema associated with increased intracranial pressure on standard retinal fundus photographs.

Methods

A DLS was trained to automatically classify papilledema severity in 965 patients (2,103 mydriatic fundus photographs), representing a multiethnic cohort of patients with confirmed elevated intracranial pressure. Training was performed on 1,052 photographs with mild/moderate papilledema (MP) and 1,051 photographs with severe papilledema (SP) classified by a panel of experts. The performance of the DLS and that of 3 independent neuro-ophthalmologists were tested in 111 patients (214 photographs, 92 with MP and 122 with SP) by calculating the area under the receiver operating characteristics curve (AUC), accuracy, sensitivity, and specificity. Kappa agreement scores between the DLS and each of the 3 graders and among the 3 graders were calculated.

Results

The DLS successfully discriminated between photographs of MP and SP, with an AUC of 0.93 (95% confidence interval [CI] 0.89–0.96) and an accuracy, sensitivity, and specificity of 87.9%, 91.8%, and 86.2%, respectively. This performance was comparable with that of the 3 neuro-ophthalmologists (84.1%, 91.8%, and 73.9%, p = 0.19, p = 1, p = 0.09, respectively). Misclassification by the DLS was mainly observed for moderate papilledema (Frisén grade 3). Agreement scores between the DLS and the neuro-ophthalmologists’ evaluation was 0.62 (95% CI 0.57–0.68), whereas the intergrader agreement among the 3 neuro-ophthalmologists was 0.54 (95% CI 0.47–0.62).

Conclusions

Our DLS accurately classified the severity of papilledema on an independent set of mydriatic fundus photographs, achieving a comparable performance with that of independent neuro-ophthalmologists.

Classification of Evidence

This study provides Class II evidence that a DLS using mydriatic retinal fundus photographs accurately classified the severity of papilledema associated in patients with a diagnosis of increased intracranial pressure.

Papilledema, defined as optic nerve head swelling associated with any cause of intracranial hypertension, can result in permanent vision loss.^1,2 Papilledema severity at presentation is the most important prognostic factor for subsequent visual outcomes.^3-9 Patients with severe papilledema may have progressive vision loss and visual field constriction due to retinal nerve fiber loss, thus requiring closer monitoring and more invasive treatment, whereas those with mild papilledema and no optic atrophy usually have good visual outcomes.^5,8,9 However, the evaluation of papilledema severity, based on the 5-grade modified Frisén scale classification mainly used in clinical trials (with 1 being very mild papilledema and 5 very severe papilledema),¹⁰ is difficult to apply and subject to high variability.^10-14 Hence, neurologists, especially those not confident in performing ophthalmoscopy,¹⁵ usually rely on ophthalmologists to determine the severity of papilledema.^16,17 Fundus photography is now increasingly used in various clinical settings for screening purposes,^18,19 and may be augmented with artificial intelligence deep learning techniques for automated image interpretation.^20,21 Recently, the Brain and Optic Nerve Study with Artificial Intelligence (BONSAI) deep learning system (DLS)²² was shown to accurately discriminate papilledema from normal and other abnormal optic discs on fundus photographs, with a performance comparable to that of expert neuro-ophthalmologists.²³

The aim of the current study was to develop, train, and test a new DLS to automatically classify the severity of papilledema and to compare the performance of this DLS with the classification performance of 3 neuro-ophthalmologists.

Methods

This study, performed by the BONSAI Consortium,²² included investigators and patients from 14 countries.

Our primary research questions, aiming to provide a Class II level of evidence, were the following: (1) Is a DLS capable of discriminating mild to moderate from severe papilledema on mydriatic fundus photographs? (2) Is the DLS's performance in classifying the severity of papilledema on fundus photographs comparable to that of neuro-ophthalmologists?

Standard Protocol Approvals, Registrations, and Patient Consents

The study was approved by the Centralized Institutional Review Board of SingHealth, Singapore, and each contributing institution for any experiments using human subjects, and was conducted in accordance with the Declaration of Helsinki. Informed consent was exempted given the retrospective nature of the study and the use of de-identified ocular fundus photographs.

Inclusion/Exclusion Criteria

The study included unaltered, de-identified digital ocular fundus photographs obtained in patients with confirmed intracranial hypertension and papilledema from 19 neuro-ophthalmology centers participating in the BONSAI consortium.²² The fundus photographs, taken at various fields of view (20°–45°) including the optic disc, were obtained after pupillary dilation, using 15 different cameras, mydriatic or nonmydriatic, depending on the center (table 1).

Table 1.

Cameras Used in the Participating Centers

Open in a new tab

Experts from each participating center provided photographs of patients with confirmed papilledema (criteria previously published²²). Intracranial hypertension was confirmed in every patient by brain imaging (e.g., showing an intracranial mass or venous sinus thrombosis) or elevated CSF opening pressure and follow-up visits. Papilledema was diagnosed only if the optic disc swelling was related to confirmed raised intracranial pressure (ICP). Patients with a diagnosis of idiopathic intracranial hypertension met the modified Dandy criteria.²⁴

The fundus photographs were divided into 2 datasets representing a mix of consecutive and convenience samples. The training dataset, used to train the DLS to classify the severity of papilledema, was composed of all papilledema images previously included in the training cohort of our first BONSAI study.²² The testing dataset was obtained by choosing randomly 222 images from 4 participating centers of the same study, independent from the training dataset, in order to test the performance of the DLS after training, and to test 3 independent neuro-ophthalmologists for comparison with the DLS.

Two experts (V.B., N.J.N.) independently reviewed all fundus photographs of the training (2,524 photographs) and testing (222 photographs) datasets, and classified papilledema severity according to a simple 2-grade classification (see below). Classification by the experts was performed under standard conditions on a computer screen (LG-34WK650, 100% brightness, 80% contrast) using semi-automated software developed by L.M. and R.P.N.²⁵; the severity scores were automatically included into an Excel spreadsheet. In the case of discordance between the 2 experts, the classification was adjudicated by 2 additional neuro-ophthalmologists (D.M., C.V.), and a consensus was obtained for all images, used as reference standard. Only patients with active papilledema were included, and optic discs with atrophic papilledema (defined as definite atrophy with no active swelling) were excluded from the study (394/2,524 [15.6%] photographs excluded in the training dataset and 8/222 [3.6%] in the testing dataset). Fundus photographs of insufficient quality were also excluded (27/2,524 [1.1%] in the training dataset, none in the testing dataset). A total of 2,103 and 214 fundus photographs were included in the training and testing datasets, respectively (figure 1).

Process of inclusion and exclusion of fundus photographs in the training and testing datasets.

Papilledema Severity Classification

We created a simple 2-grade papilledema severity classification (figure 2): (1) mild to moderate papilledema, corresponding to Frisén grades 1–3, defined as disc edema with no obscuration of major blood vessels (arteries and veins) on the disc; (2) severe papilledema, that is, Frisén grades 4 and 5, defined as disc edema associated with any obscuration of major blood vessels (arteries and/or veins) on the disc. The presence of hemorrhages and exudates did not influence the classification, and optic discs with atrophic papilledema were excluded from the study.

Study Population

Training Dataset

A total of 2,103 fundus photographs of 965 patients, 1,052 mild to moderate papilledema and 1,051 severe papilledema, were included in the training dataset, collected from 16 participating centers of BONSAI (table 2).²² A total of 685 patients had a photograph taken in both eyes and 146 in 1 eye.

Table 2.

List of Participating Centers and Number of Photographs Included per Center

graphic file with name NEUROLOGY2020157602T2.jpg

Open in a new tab

Testing Dataset

The testing dataset included 214 photographs (92 MP, 122 SP) from 111 patients (103 patients with both eyes imaged and 8 with 1 eye), randomly collected from 4 participating centers (Bangkok, Thailand; Freiburg, Germany; Tehran, Iran; Angers, France) of the BONSAI study (table 2).²²

Deep Learning System

Deep learning is a technique of machine learning, which consists of multiple layers of convolutional neural networks with the capability to learn image features and classify images without using hand-crafted features.²⁰ DLSs need to be trained on large datasets, and the classification performance is subsequently evaluated on part of the training dataset (validation) or on an independent external dataset (testing dataset).²⁶

In our study, the DLS was composed of a segmentation network and a classification network. First, the optic disc was automatically located on the fundus photograph by the segmentation network (U-Net²⁷) as previously described for the BONSAI-DLS.²² The segmentation network is based on U-Net,²⁷ which was widely used for biomedical image segmentation tasks. The U-Net was trained to localize the optic disc region based on a total of 6,370 fundus images with masks annotated in pixel level. The trained U-Net was then applied to test the full fundus image to generate the optic disc region automatically, which was further used as the input to the classification network.

Then, the classification network classified the optic disc into 1 of 2 classes: mild to moderate or severe papilledema (classification network, VGGNet²⁸). The classification network is based on the convolutional neural networks. At the last convolutional layer of the VGGNet, 2 dense layers were added with a SoftMax layer to obtain the 2-class outputs. The classification network was initialized using weights pretrained on ImageNet²⁹ and fine-tuned in an end-to-end manner to achieve the optimal performance. The classification network was trained on 2,103 fundus photographs (training dataset) to automatically classify papilledema's severity into the 2 classes defined as the reference standard (mild to moderate and severe papilledema). The network weights were updated iteratively based on the difference signals between the outputs of DLS and the clinical reference standard on the severity levels using backpropagation algorithm.³⁰ The trained segmentation network and the classification network were tested on another 214 images from the external testing dataset to get the prediction outcome. To report performance characteristics for this model, the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity were calculated. Diagnosis was provided at eye level for each included image.

Testing of 3 Independent Neuro-ophthalmologists for Comparison With the DLS

In order to compare the performance of the DLS with the classification performance of neuro-ophthalmologists, we tested 3 independent neuro-ophthalmologists (S.S., J.L.L., S.T.) who were asked to independently classify the papilledema severity into one of the 2 previously described classes, after a brief training session. The testing was performed on the same 214 images from the testing dataset used for the DLS. For this purpose, images were presented to the 3 neuro-ophthalmologists on the same individual computer screen (LG-34WK650, 100% brightness, 80% contrast), using the semiautomated software described above.²⁵ The neuro-ophthalmologists were masked to patients' clinical information and to the classification assigned by the other evaluators and by the DLS. For comparisons and statistical analysis purposes, we used a majority agreement grade, defined as the severity classification reported by at least 2 of the 3 participating neuro-ophthalmologists, for each image.

Statistical Analysis

The performance characteristics of the DLS and of the 3 neuro-ophthalmologists were evaluated by calculating the AUC, sensitivity, specificity, and accuracy in the external testing dataset.

A pairwise McNemar test with Bonferroni correction was used to compare the accuracies, sensitivities, and specificities between the DLS and each neuro-ophthalmologist and between the DLS and the majority agreement among the 3 neuro-ophthalmologists.

Cohen kappa agreement scores were used for comparison between the DLS and each of the 3 neuro-ophthalmologists and between the DLS and the majority agreement among the 3 neuro-ophthalmologists and Fleiss kappa score³¹ was used for intergrader agreement analysis among the 3 neuro-ophthalmologists. Kappa agreement scores were interpreted according to previously published scale (0–0.20: no agreement, 0.21–0.39: minimal; 0.40–0.59: weak; 0.60–0.79: moderate; 0.80–0.90: strong; >0.90: almost perfect).³²

A p value less than 0.05 was considered statistically significant.

Data Availability

Anonymized data will be shared by justified request from any qualified investigator.

Results

Patient and Image Characteristics

The final training dataset included 965 patients (representing 1777 eyes and 2,103 images); 397 patients (731 eyes) had mild to moderate papilledema, 431 (772 eyes) had severe papilledema, and 137 (274 eyes) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Among those 965 patients, 822 (85%) had papilledema due to idiopathic intracranial hypertension (IIH) and 143 (15%) presented with secondary causes of intracranial hypertension such as cerebral venous sinus thrombosis, meningitis, or brain tumor. Patient demographics and image characteristics are described in table 3.

Table 3.

Demographics and Image Characteristics in the Training and Testing Datasets

Open in a new tab

The testing dataset included 111 patients (representing 214 eyes or images), of whom 40 (77 eyes or images) had mild to moderate papilledema, 56 (107 eyes or images) had severe papilledema, and 15 (30 eyes or images) had mild to moderate papilledema in one eye and severe papilledema in the other eye. Seventy-five patients (68%) had papilledema from IIH and the remaining from secondary causes.

Papilledema Severity Classification by the DLS and Neuro-ophthalmologists

In the testing dataset, the DLS successfully discriminated severe from mild to moderate papilledema with an AUC of 0.93 (95% confidence interval [CI] 0.89–0.96), an accuracy of 87.9% (95% CI 82.7%–91.9%), a sensitivity of 91.8% (95% CI 86.9%–96.7%), and a specificity of 82.6% (95% CI 74.9%–90.4%) (figures 3 and 4). Among the 26 images misclassified by the DLS, 16 cases of mild to moderate papilledema were misclassified as severe; 14 of these 16 (87.5%) were photographs of papilledema Frisén grade 3, and 2 of them Frisén grade 2; 10 images of severe papilledema were misclassified as mild to moderate, all Frisén grade 4 with minimal vessel obscuration on the disc or with hemorrhages and cotton-wool spots on the disc (eFigures 1 and 2, data available from Dryad, doi.org/10.5061/dryad.66t1g1k1x). The ability of the 3 independent neuro-ophthalmologists to discriminate severe from mild to moderate papilledema was comparable to the DLS, with an accuracy of the majority agreement among neuro-ophthalmologists of 84.1% (95% CI 78.5%–88.7%, p = 0.19), a sensitivity of 91.8% (95% CI 86.9%–96.7%, p = 1), and a specificity of 73.9% (95% CI 64.9%–82.9%, p = 0.09) (figures 3 and 4).

ROC curve representing the performance of the DLS, 3 neuro-ophthalmology graders, and the majority agreement among the 3 neuro-ophthalmologists (obtained when at least 2 graders agreed on the grade) in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). The optimal performance of the DLS (blue dot) was calculated with the Youden index for best cutoff between sensitivity and specificity.

Comparison of the performance (accuracy, sensitivity, specificity) of the DLS to (A) the detailed performance of each grader and (B) the performance of the majority agreement among the 3 graders in discriminating severe papilledema (SP) from mild to moderate papilledema (MP) on 214 fundus photographs (92 MP and 122 SP). Statistical significance of DLS vs human graders *p < 0.05; **p < 0.01; ***p < 0.001. Data are represented as mean (95% confidence interval [CI]). 95% CIs were calculated using the asymptotic method.

Agreement scores between the DLS and neuro-ophthalmologist 1, 2, and 3 were 0.72 (95% CI 0.67–0.76), 0.43 (95% CI 0.37–0.49), and 0.60 (95% CI 0.55–0.65), respectively, and between the DLS and the majority agreement of neuro-ophthalmologists, 0.62 (95% CI 0.57–0.68). A weak intergrader agreement of 0.54 (95% CI 0.47–0.62) was found among the 3 neuro-ophthalmologists. Disagreement among neuro-ophthalmologists was observed for 67 photographs, among them 46 (68.6%) photographs of moderate papilledema with a Frisén grade of 3, the other 21 photographs of severe papilledema with a Frisén grade of 4.

Discussion

In this study, a DLS trained on 2,103 ocular fundus photographs to classify the severity of papilledema from intracranial hypertension discriminated between mild to moderate and severe papilledema on an independent testing dataset of 214 fundus photographs, with a comparable performance to that of 3 neuro-ophthalmologists.

Papilledema^33,34 is the only objective clinical sign of intracranial hypertension.^35,36 Because the degree of papilledema at presentation is a reliable indicator of subsequent visual outcomes from secondary optic atrophy,^3-9 the severity of papilledema influences the management and the frequency of visual monitoring during follow-up of patients with elevated ICP.^37,38 Our simple binary classification of papilledema severity aimed at identification of patients with lower risk (mild to moderate papilledema) or higher risk (severe papilledema) of visual loss.^5,6,8,9 Hence, a patient with mild to moderate papilledema from IIH might be managed with weight loss and oral medications as an outpatient, whereas a patient with severe papilledema should have closer follow-up, sometimes inpatient, and benefit from timely surgical intervention, especially in those cases with visual loss.³⁸ A DLS, capable of automatically classifying the severity of papilledema on fundus photographs, could assist nonexperts with a more accurate prognosis, treatment strategy, and effect of treatment, as a diagnostic or prognostic tool along with the clinical examination and additional tests such as computer-assisted perimetry or optical coherence tomography (OCT).

It could be argued that ophthalmology consultation would obviate the need for automated fundus photographic interpretation by a DLS. However, despite the simplification of the papilledema severity classification in our study, the intergrader agreement among the 3 neuro-ophthalmologists was relatively weak (0.54), confirming the variability of human evaluation of papilledema severity, particularly for moderate papilledema (Frisén grade 3).¹³ Nevertheless, our DLS could accurately distinguish between mild to moderate and severe papilledema on fundus photographs. The performance of the DLS was similar to that of 3 independent neuro-ophthalmologists with a comparable accuracy and sensitivity, and with a nonsignificantly higher specificity (82.6% for the DLS vs 73.9% for the majority agreement among the 3 neuro-ophthalmologists, p = 0.09). The agreement scores between the DLS and the majority agreement among the 3 neuro-ophthalmologists (κ = 0.62) was higher than the intergrader agreement among the 3 neuro-ophthalmologists (κ = 0.54). Moderate intergrader agreement scores were also observed in studies involving glaucoma or retinal diseases, for example, for glaucomatous damage assessment of the optic disc under stereoscopic conditions by 6 glaucoma experts³⁹ (κ = 0.50), or for plus disease retinopathy of prematurity diagnosis by 9 experts⁴⁰ (κ = 0.59 to 0.92). Similar results were previously described with a machine learning technique in a small study,⁴¹ which showed that a computer-aided image analyses, used to analyze features of papilledema on fundus photographs, could automatically grade papilledema with a substantial agreement with one expert grading (κ = 0.71).

Our results might have implications for the management of raised ICP and papilledema in the future. However, further prospective validation studies are needed, at best in nonophthalmologic clinical settings (i.e., neurology clinics, neurosurgery clinics, or emergency departments), and ideally using nonmydriatic digital cameras.^15,42 If those studies confirm its applicability, a DLS, connected to a camera on site²¹ or remotely, could be used for the assessment of papilledema severity.

Our study has inherent limitations. As it was a retrospective data collection, visual function was not available and objective data such as OCT retinal nerve fiber thickness or macular ganglion cell complex (GCC) analysis were not systematically collected. Moreover, atrophic papilledema was excluded from this study, as we only trained the DLS to grade the severity of papilledema, not to identify associated atrophy, a difficult assessment even for the most experienced ophthalmologists.¹³ Hence, some of the patients with longstanding raised ICP who had both atrophy and residual papilledema at first presentation were excluded. In a future project, GCC-OCT, which has proven useful in the detection of optic atrophy associated with papilledema,^43,44 could also be incorporated into the DLS strategy, as already done in some glaucoma studies.⁴⁵ We used a simplified classification of papilledema into 2 grades, instead of the 5 grades used in the Frisén scale. The Frisén scale is notoriously difficult to use, especially when attempting to differentiate grades 3 and 4.^13,14 The few cases of misclassification by the DLS were mainly observed for moderate papilledema (Frisén grade 3) or for Frisén grade 4 papilledema with mild vessel obscuration on the optic disc or associated with hemorrhages or cotton-wool spots. Those images were challenging to classify by the experts as well. The lack of reproducibility and the inability of the Frisén scale to discern optic disc changes over time was demonstrated by Sinclair et al.,¹³ who proposed an alternative ranking of optic disc appearance related to papilledema severity that incorporates the development of secondary optic atrophy, with an improvement of complete agreement among reviewers from 1.6% of photographs with Frisén scale to 44.6% with their optic disc ranking scheme. In the prospective Intracranial Hypertension Treatment Trial, experts initially agreed on Frisén scale classification for only 42% of images, and intragrader agreement rates varied from 55% to 73%.¹⁴ Our simplified binary classification was designed to signal the presence of severe papilledema, a finding that should influence the acute management of these patients by nonophthalmologists.

We developed, trained, and tested a DLS that accurately discriminated mild to moderate from severe papilledema on mydriatic fundus photographs. In a subsequent comparison, the DLS had a performance comparable to 3 independent neuro-ophthalmologists. The automated recognition of severe papilledema by a DLS could be helpful in neurology, neurosurgery, and emergency settings for the management of patients with raised ICP. Additional prospective studies are needed to confirm the applicability of this DLS in real-life clinical settings.

Glossary

AUC: area under the receiver operating characteristic curve
BONSAI: Brain and Optic Nerve Study with Artificial Intelligence
CI: confidence interval
DLS: deep learning system
GCC: ganglion cell complex
ICP: intracranial pressure
IIH: idiopathic intracranial hypertension
OCT: optical coherence tomography

Appendix 1. Authors

Appendix 1.

Open in a new tab

Appendix 2. Coinvestigators

Appendix 2.

Open in a new tab

Footnotes

Class of Evidence: NPub.org/coe

Contributor Information

Collaborators: on behalf of the BONSAI Group, Philippe Gohier, Neil Miller, Tanyatuth Padungkiatsagul, Anuchit Poonyathalang, Yanin Suwan, Kavin Vanikieti, Giulia Amore, Piero Barboni, Michele Carbonelli, Valerio Carelli, Chiara La Morgia, Martina Romagnoli, Marie-Bénédicte Rougier, Selvakumar Ambika, Komma Swetha, Pedro Fonseca, Miguel Raimundo, Steffen Hamann, Isabelle Karlesand, Lars Fuhrmann, Sebastian Küchlin, Wolf Alexander Lagrèze, Nicolae Sanda, Gabriele Thumann, Florent Aptel, Christophe Chiquet, Kaiqun Liu, Hui Yang, Carmen KM Chan, Noel CY Chan, Carol Y Cheung, Tran Thi Ha Chau, James Acheson, Maged S Habib, Neringa Jurkute, Patrick Yu-Wai-Man, Richard Kho, Jost B Jonas, John J. Chen, Nouran Sabbagh, Catherine VignalClermont, Rabih Hage, Raoul Kanav Khanna, Jeong-Min Hwang, Dong Hyun Kim, Hee Kyung Yang, Tin Aung, Ching-Yu Cheng, Ecosse Lamoureux, Leopold Schmetterer, Zhubo Jiang, Clare L Fraser, Luis J. Mejico, and Masoud Aghsaei Fard

Study Funding

Singapore National Medical Research Council (Clinician Scientist Individual Research grant CIRG18Nov-0013), The Duke-NUS Medical School, Ophthalmology and Visual Sciences Academic Clinical Program grant (05/FY2019/P2/06-A60) departmental grant, NIH/NEI core grant P30-EY06360 (Department of Ophthalmology, Emory University School of Medicine, Atlanta, GA), NIH/NINDS (RO1NSO89694).

Disclosure

C. Vasseneix, R.P. Najjar, X. Xu, Z. Tang, J.L. Loo, S. Singhal, S. Tow, L. Milea, D. Ting, Y. Liu, and T.Y. Wong report no disclosures relevant to the manuscript. N.J. Newman is consultant for GenSight Biologics and Neurophoenix, Santhera Pharmaceuticals/Chiesi, and Stealth BioTherapeutics. V. Biousse is consultant for GenSight Biologics and Neurophoenix. D. Milea reports no disclosures relevant to the manuscript. Go to Neurology.org/N for full disclosures.

References

1.Friedman DI. The pseudotumor cerebri syndrome. Neurol Clin. 2014;32(2):363-396. [DOI] [PubMed] [Google Scholar]
2.Corbett JJ. Visual loss in pseudotumor cerebri: follow-up of 57 patients from five to 41 years and a profile of 14 patients with permanent severe visual loss. Arch Neurol. 1982;39(8):461-474. [DOI] [PubMed] [Google Scholar]
3.Orcutt JC, Page NGR, Sanders MD. Factors affecting visual loss in benign intracranial hypertension. Ophthalmology. 1984;91(11):1303-1312. [DOI] [PubMed] [Google Scholar]
4.Liu KC, Bhatti MT, Chen JJ, et al. Presentation and progression of papilledema in cerebral venous sinus thrombosis. Am J Ophthalmol. 2020;213:1-8. [DOI] [PubMed] [Google Scholar]
5.Chen JJ, Thurtell MJ, Longmuir RA, et al. Causes and prognosis of visual acuity loss at the time of initial presentation in idiopathic intracranial hypertension. Invest Ophthalmol Vis Sci. 2015;56(6):3850-3859. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Gospe SM, Bhatti MT, El-Dairi MA. Anatomic and visual function outcomes in paediatric idiopathic intracranial hypertension. Br J Ophthalmol. 2016;100(4):505-509. [DOI] [PubMed] [Google Scholar]
7.Takkar A, Goyal MK, Bansal R, Lal V. Clinical and neuro-ophthalmologic predictors of visual outcome in idiopathic intracranial hypertension. Neuroophthalmology. 2018;42(4):201-208. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wall M, Falardeau J, Fletcher WA, et al. Risk factors for poor visual outcome in patients with idiopathic intracranial hypertension. Neurology. 2015;85(9):799-805. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Micieli JA, Bruce BB, Vasseneix C, et al. Optic nerve appearance as a predictor of visual outcome in patients with idiopathic intracranial hypertension. Br J Ophthalmol. 2019;103(10):1429-1435. [DOI] [PubMed] [Google Scholar]
10.Scott CJ. Diagnosis and grading of papilledema in patients with raised intracranial pressure using optical coherence tomography vs clinical expert assessment using a clinical staging scale. Arch Ophthalmol. 2010;128(6):705-711. [DOI] [PubMed] [Google Scholar]
11.Frisen L. Swelling of the optic nerve head: a staging scheme. J Neurol Neurosurg Psychiatry. 1982;45(1):13-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Kardon R. Optical coherence tomography in papilledema: what am I missing? J Neuroophthalmol. 2014;34(suppl S10-7):S10-S17. [DOI] [PubMed] [Google Scholar]
13.Sinclair AJ, Burdon MA, Nightingale PG, et al. Rating papilloedema: an evaluation of the Frisén classification in idiopathic intracranial hypertension. J Neurol. 2012;259(7):1406-1412. [DOI] [PubMed] [Google Scholar]
14.Fischer WS, Wall M, McDermott MP, Kupersmith MJ, Feldon SE. Photographic reading center of the idiopathic intracranial hypertension treatment trial (IIHTT): methods and baseline results. Invest Ophthalmol Vis Sci. 2015;56(5):3292-3303. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Bruce BB, Lamirel C, Wright DW, et al. Nonmydriatic ocular fundus photography in the emergency department. N Engl J Med. 2011;364(4):387-389. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Friesner D, Rosenman R, Lobb BM, Tanne E. Idiopathic intracranial hypertension in the USA: the role of obesity in establishing prevalence and healthcare costs. Obes Rev. 2011;12(5):e372-e380. [DOI] [PubMed] [Google Scholar]
17.Mollan SP, Aguiar M, Evison F, Frew E, Sinclair AJ. The expanding burden of idiopathic intracranial hypertension. Eye. 2019;33(3):478-485. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Bruce BB, Thulasi P, Fraser CL, et al. Diagnostic accuracy and use of nonmydriatic ocular fundus photography by emergency physicians: phase II of the FOTO-ED study. Ann Emerg Med. 2013;62(1):28-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ivan Y, Ramgopal S, Cardenas-Villa M, et al. Feasibility of the digital retinography system camera in the pediatric emergency department. Pediatr Emerg Care. 2018;34(7):488-491. [DOI] [PubMed] [Google Scholar]
20.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-175. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Milea D, Najjar RP, Zhubo J, et al. Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med. 2020;382(18):1687-1695. [DOI] [PubMed] [Google Scholar]
23.Biousse V, Newman NJ, Najjar RP, et al. Optic disc classification by deep learning versus expert neuro‐ophthalmologists. Ann Neurol. 2020;88(4):785-795. [DOI] [PubMed] [Google Scholar]
24.Friedman DI, Liu GT, Digre KB. Revised diagnostic criteria for the pseudotumor cerebri syndrome in adults and children. Neurology. 2013;81(13):1159-1165. [DOI] [PubMed] [Google Scholar]
25.Milea L, Najjar RP. Classif-Eye: A Semi-automated Image Classification Application; 2020. GitHub repository. github.com/milealeonard/Classif-Eye/ [Google Scholar]
26.Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc abnormalities. Curr Opin Neurol. 2020;33(1):106-110. [DOI] [PubMed] [Google Scholar]
27.Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry. Nat Methods. 2019;16(1):67-70. [DOI] [PubMed] [Google Scholar]
28.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Presented at the International Conference on Learning Representations (ICLR); April 10, 2015; San Diego, CA.
29.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Presented at the Conference on Computer Vision and Pattern Recognition (CVPR); June 20–25, 2009; Miami, FL.
30.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. [DOI] [PubMed] [Google Scholar]
31.Vanbelle S. Asymptotic variability of (multilevel) multirater kappa coefficients. Stat Methods Med Res. 2019;28(10-11):3012-3026. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276-282. [PMC free article] [PubMed] [Google Scholar]
33.Friedman DI, Quiros PA, Subramanian PS, et al. Headache in idiopathic intracranial hypertension: findings from the Idiopathic Intracranial Hypertension Treatment Trial. Headache. 2017;57(8):1195-1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Fisayo A, Bruce BB, Newman NJ, Biousse V. Overdiagnosis of idiopathic intracranial hypertension. Neurology. 2016;86(4):341-350. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Radojicic A, Vukovic-Cvetkovic V, Pekmezovic T, Trajkovic G, Zidverc-Trajkovic J, Jensen RH. Predictive role of presenting symptoms and clinical findings in idiopathic intracranial hypertension. J Neurol Sci. 2019;399:89-93. [DOI] [PubMed] [Google Scholar]
36.Crum OM, Kilgore KP, Sharma R, et al. Etiology of papilledema in patients in the eye clinic setting. JAMA Netw Open. 2020;3(6):e206625. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Onyia CU, Ogunbameru IO, Dada OA, et al. Idiopathic intracranial hypertension: proposal of a stratification strategy for monitoring risk of disease progression. Clin Neurol Neurosurg. 2019;179:35-41. [DOI] [PubMed] [Google Scholar]
38.Mollan SP, Davies B, Silver NC, et al. Idiopathic intracranial hypertension: consensus guidelines on management. J Neurol Neurosurg Psychiatry. 2018;89(10):1088-1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma. Ophthalmology. 1992;99(2):215-221. [DOI] [PubMed] [Google Scholar]
40.Brown JM, Campbell JP, Beers A, et al. Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803-810. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Echegaray S, Zamora G, Yu H, Luo W, Soliz P, Kardon R. Automated analysis of optic nerve images for detection and staging of papilledema. Invest Ophthalmol Vis Sci. 2011;52(10):7470-7478. [DOI] [PubMed] [Google Scholar]
42.Irani NK, Bidot S, Peragallo JH, Esper GJ, Newman NJ, Biousse V. Feasibility of a nonmydriatic ocular fundus camera in an outpatient neurology clinic. Neurologist. 2020;25(2):19-23. [DOI] [PubMed] [Google Scholar]
43.Moreno-Ajona D, McHugh JA, Hoffmann J. An update on imaging in idiopathic intracranial hypertension. Front Neurol. 2020;11:453. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Athappilly G, García-Basterra I, Machado-Miller F, Hedges TR, Mendoza-Santiesteban C, Vuong L. Ganglion cell complex analysis as a potential indicator of early neuronal loss in idiopathic intracranial hypertension. Neuroophthalmology. 2018;43(1):10-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Medeiros FA, Jammal AA, Thompson AC. From machine to machine. Ophthalmology. 2019;126(4):513-521. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Anonymized data will be shared by justified request from any qualified investigator.

[R1] 1.Friedman DI. The pseudotumor cerebri syndrome. Neurol Clin. 2014;32(2):363-396. [DOI] [PubMed] [Google Scholar]

[R2] 2.Corbett JJ. Visual loss in pseudotumor cerebri: follow-up of 57 patients from five to 41 years and a profile of 14 patients with permanent severe visual loss. Arch Neurol. 1982;39(8):461-474. [DOI] [PubMed] [Google Scholar]

[R3] 3.Orcutt JC, Page NGR, Sanders MD. Factors affecting visual loss in benign intracranial hypertension. Ophthalmology. 1984;91(11):1303-1312. [DOI] [PubMed] [Google Scholar]

[R4] 4.Liu KC, Bhatti MT, Chen JJ, et al. Presentation and progression of papilledema in cerebral venous sinus thrombosis. Am J Ophthalmol. 2020;213:1-8. [DOI] [PubMed] [Google Scholar]

[R5] 5.Chen JJ, Thurtell MJ, Longmuir RA, et al. Causes and prognosis of visual acuity loss at the time of initial presentation in idiopathic intracranial hypertension. Invest Ophthalmol Vis Sci. 2015;56(6):3850-3859. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Gospe SM, Bhatti MT, El-Dairi MA. Anatomic and visual function outcomes in paediatric idiopathic intracranial hypertension. Br J Ophthalmol. 2016;100(4):505-509. [DOI] [PubMed] [Google Scholar]

[R7] 7.Takkar A, Goyal MK, Bansal R, Lal V. Clinical and neuro-ophthalmologic predictors of visual outcome in idiopathic intracranial hypertension. Neuroophthalmology. 2018;42(4):201-208. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Wall M, Falardeau J, Fletcher WA, et al. Risk factors for poor visual outcome in patients with idiopathic intracranial hypertension. Neurology. 2015;85(9):799-805. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Micieli JA, Bruce BB, Vasseneix C, et al. Optic nerve appearance as a predictor of visual outcome in patients with idiopathic intracranial hypertension. Br J Ophthalmol. 2019;103(10):1429-1435. [DOI] [PubMed] [Google Scholar]

[R10] 10.Scott CJ. Diagnosis and grading of papilledema in patients with raised intracranial pressure using optical coherence tomography vs clinical expert assessment using a clinical staging scale. Arch Ophthalmol. 2010;128(6):705-711. [DOI] [PubMed] [Google Scholar]

[R11] 11.Frisen L. Swelling of the optic nerve head: a staging scheme. J Neurol Neurosurg Psychiatry. 1982;45(1):13-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Kardon R. Optical coherence tomography in papilledema: what am I missing? J Neuroophthalmol. 2014;34(suppl S10-7):S10-S17. [DOI] [PubMed] [Google Scholar]

[R13] 13.Sinclair AJ, Burdon MA, Nightingale PG, et al. Rating papilloedema: an evaluation of the Frisén classification in idiopathic intracranial hypertension. J Neurol. 2012;259(7):1406-1412. [DOI] [PubMed] [Google Scholar]

[R14] 14.Fischer WS, Wall M, McDermott MP, Kupersmith MJ, Feldon SE. Photographic reading center of the idiopathic intracranial hypertension treatment trial (IIHTT): methods and baseline results. Invest Ophthalmol Vis Sci. 2015;56(5):3292-3303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Bruce BB, Lamirel C, Wright DW, et al. Nonmydriatic ocular fundus photography in the emergency department. N Engl J Med. 2011;364(4):387-389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Friesner D, Rosenman R, Lobb BM, Tanne E. Idiopathic intracranial hypertension in the USA: the role of obesity in establishing prevalence and healthcare costs. Obes Rev. 2011;12(5):e372-e380. [DOI] [PubMed] [Google Scholar]

[R17] 17.Mollan SP, Aguiar M, Evison F, Frew E, Sinclair AJ. The expanding burden of idiopathic intracranial hypertension. Eye. 2019;33(3):478-485. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Bruce BB, Thulasi P, Fraser CL, et al. Diagnostic accuracy and use of nonmydriatic ocular fundus photography by emergency physicians: phase II of the FOTO-ED study. Ann Emerg Med. 2013;62(1):28-33. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Ivan Y, Ramgopal S, Cardenas-Villa M, et al. Feasibility of the digital retinography system camera in the pediatric emergency department. Pediatr Emerg Care. 2018;34(7):488-491. [DOI] [PubMed] [Google Scholar]

[R20] 20.Ting DSW, Pasquale LR, Peng L, et al. Artificial intelligence and deep learning in ophthalmology. Br J Ophthalmol. 2019;103(2):167-175. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Milea D, Najjar RP, Zhubo J, et al. Artificial intelligence to detect papilledema from ocular fundus photographs. N Engl J Med. 2020;382(18):1687-1695. [DOI] [PubMed] [Google Scholar]

[R23] 23.Biousse V, Newman NJ, Najjar RP, et al. Optic disc classification by deep learning versus expert neuro‐ophthalmologists. Ann Neurol. 2020;88(4):785-795. [DOI] [PubMed] [Google Scholar]

[R24] 24.Friedman DI, Liu GT, Digre KB. Revised diagnostic criteria for the pseudotumor cerebri syndrome in adults and children. Neurology. 2013;81(13):1159-1165. [DOI] [PubMed] [Google Scholar]

[R25] 25.Milea L, Najjar RP. Classif-Eye: A Semi-automated Image Classification Application; 2020. GitHub repository. github.com/milealeonard/Classif-Eye/ [Google Scholar]

[R26] 26.Milea D, Singhal S, Najjar RP. Artificial intelligence for detection of optic disc abnormalities. Curr Opin Neurol. 2020;33(1):106-110. [DOI] [PubMed] [Google Scholar]

[R27] 27.Falk T, Mai D, Bensch R, et al. U-Net: deep learning for cell counting, detection, and morphometry. Nat Methods. 2019;16(1):67-70. [DOI] [PubMed] [Google Scholar]

[R28] 28.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Presented at the International Conference on Learning Representations (ICLR); April 10, 2015; San Diego, CA.

[R29] 29.Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. Presented at the Conference on Computer Vision and Pattern Recognition (CVPR); June 20–25, 2009; Miami, FL.

[R30] 30.LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436-444. [DOI] [PubMed] [Google Scholar]

[R31] 31.Vanbelle S. Asymptotic variability of (multilevel) multirater kappa coefficients. Stat Methods Med Res. 2019;28(10-11):3012-3026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.McHugh ML. Interrater reliability: the kappa statistic. Biochem Med. 2012;22(3):276-282. [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Friedman DI, Quiros PA, Subramanian PS, et al. Headache in idiopathic intracranial hypertension: findings from the Idiopathic Intracranial Hypertension Treatment Trial. Headache. 2017;57(8):1195-1205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Fisayo A, Bruce BB, Newman NJ, Biousse V. Overdiagnosis of idiopathic intracranial hypertension. Neurology. 2016;86(4):341-350. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Radojicic A, Vukovic-Cvetkovic V, Pekmezovic T, Trajkovic G, Zidverc-Trajkovic J, Jensen RH. Predictive role of presenting symptoms and clinical findings in idiopathic intracranial hypertension. J Neurol Sci. 2019;399:89-93. [DOI] [PubMed] [Google Scholar]

[R36] 36.Crum OM, Kilgore KP, Sharma R, et al. Etiology of papilledema in patients in the eye clinic setting. JAMA Netw Open. 2020;3(6):e206625. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R37] 37.Onyia CU, Ogunbameru IO, Dada OA, et al. Idiopathic intracranial hypertension: proposal of a stratification strategy for monitoring risk of disease progression. Clin Neurol Neurosurg. 2019;179:35-41. [DOI] [PubMed] [Google Scholar]

[R38] 38.Mollan SP, Davies B, Silver NC, et al. Idiopathic intracranial hypertension: consensus guidelines on management. J Neurol Neurosurg Psychiatry. 2018;89(10):1088-1100. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for glaucoma. Ophthalmology. 1992;99(2):215-221. [DOI] [PubMed] [Google Scholar]

[R40] 40.Brown JM, Campbell JP, Beers A, et al. Imaging and Informatics in Retinopathy of Prematurity (i-ROP) Research Consortium. Automated diagnosis of plus disease in retinopathy of prematurity using deep convolutional neural networks. JAMA Ophthalmol. 2018;136(7):803-810. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R41] 41.Echegaray S, Zamora G, Yu H, Luo W, Soliz P, Kardon R. Automated analysis of optic nerve images for detection and staging of papilledema. Invest Ophthalmol Vis Sci. 2011;52(10):7470-7478. [DOI] [PubMed] [Google Scholar]

[R42] 42.Irani NK, Bidot S, Peragallo JH, Esper GJ, Newman NJ, Biousse V. Feasibility of a nonmydriatic ocular fundus camera in an outpatient neurology clinic. Neurologist. 2020;25(2):19-23. [DOI] [PubMed] [Google Scholar]

[R43] 43.Moreno-Ajona D, McHugh JA, Hoffmann J. An update on imaging in idiopathic intracranial hypertension. Front Neurol. 2020;11:453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Athappilly G, García-Basterra I, Machado-Miller F, Hedges TR, Mendoza-Santiesteban C, Vuong L. Ganglion cell complex analysis as a potential indicator of early neuronal loss in idiopathic intracranial hypertension. Neuroophthalmology. 2018;43(1):10-17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Medeiros FA, Jammal AA, Thompson AC. From machine to machine. Ophthalmology. 2019;126(4):513-521. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Accuracy of a Deep Learning System for Classification of Papilledema Severity on Ocular Fundus Photographs

Caroline Vasseneix, MD

Raymond P Najjar, PhD

Xinxing Xu, PhD

Zhiqun Tang, PhD

Jing Liang Loo, MBBS, MMed, FRCS(Ed)

Shweta Singhal, MBBS

Sharon Tow, MBBS

Leonard Milea

Daniel Shu Wei Ting, MD, PhD

Yong Liu, PhD

Tien Y Wong, MD, PhD

Nancy J Newman, MD

Valerie Biousse, MD

Dan Milea, MD, PhD

Abstract

Objective

Methods

Results

Conclusions

Classification of Evidence

Methods

Standard Protocol Approvals, Registrations, and Patient Consents

Inclusion/Exclusion Criteria

Table 1.

Figure 1. Flowchart of Inclusion and Exclusion of Fundus Photographs.

Papilledema Severity Classification

Figure 2. Papilledema Severity Classification.

Study Population

Training Dataset

Table 2.

Testing Dataset

Deep Learning System

Testing of 3 Independent Neuro-ophthalmologists for Comparison With the DLS

Statistical Analysis

Data Availability

Results

Patient and Image Characteristics

Table 3.

Papilledema Severity Classification by the DLS and Neuro-ophthalmologists

Figure 3. Performance (Area Under the Receiver Operating Characteristic [ROC] Curve) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists in Classifying Papilledema Severity.

Figure 4. Performance (Accuracy, Sensitivity, Specificity) of the Deep Learning System (DLS) and the 3 Neuro-ophthalmologists in Classifying Papilledema Severity.

Discussion

Glossary

Appendix 1. Authors

Appendix 2. Coinvestigators

Footnotes

Contributor Information

Study Funding

Disclosure

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases