Use of Deep Learning Models in the Diagnosis of Proptosis Through Orbital Magnetic Resonance Imaging

Uğur Kesimal; Habip Eser Akkaya; Önder Polat; Murat Sağlam

doi:10.12659/MSM.951157

. 2026 Apr 18;32:e951157. doi: 10.12659/MSM.951157

Use of Deep Learning Models in the Diagnosis of Proptosis Through Orbital Magnetic Resonance Imaging

Uğur Kesimal ^1,^A,^B,^C,^D,^E,^F,^✉, Habip Eser Akkaya ^2,^A,^B,^C,^D,^E,^F, Önder Polat ^1,^A,^B,^C,^D,^E,^F, Murat Sağlam ^1,^A,^B,^C,^D,^E,^F

PMCID: PMC13101562 PMID: 41999029

Abstract

Background

Proptosis is a common manifestation of orbital disease; however, current diagnostic tools, such as the Hertel exophthalmometer and manual radiological measurements, have limited reproducibility and are observer-dependent. More objective, automated approaches are needed. In this single-center retrospective study, orbital magnetic resonance imaging (MRI) examinations from 521 participants (261 with proptosis, 260 controls) were analyzed. Proptosis was defined on MRI using interzygomatic line–based distance criteria. Three-dimensional convolutional neural network models based on DenseNet121, DenseNet169, DenseNet264, and ResNet50 architectures were trained on volumetric orbital MRI data.

Material/Methods

Data were divided into training, validation, and test sets, and 5-fold cross-validation with early stopping was used to optimize and validate model performance. Diagnostic performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity.

Results

DenseNet121 achieved the best overall performance, with mean accuracy of 95.0%, AUC of 0.986, sensitivity of 92.7%, and specificity of 96.9% across 5-fold cross-validation.

Conclusions

To the best of our knowledge, prior artificial intelligence studies in orbital imaging have primarily focused on CT-based measurements, radiomics approaches, or thyroid-associated orbitopathy assessment rather than end-to-end 3-dimensional deep learning analysis of orbital MRI volumes. In this context, the present study explores a volumetric MRI-based deep learning framework for automated proptosis detection, emphasizing patient-level classification and explainable model interpretation.

Keywords: Deep Learning; Magnetic Resonance Imaging; Neural Networks, Computer; Radionuclide Imaging; Prostaglandins G; Magnetic Resonance Imaging; Artificial Intelligence; Cross-Sectional Studies; Radiotherapy; Professional-Patient Relations; Magnetic Resonance Imaging

Introduction

Proptosis is the forward protrusion of the eyeball, which is noticeable as a clinical feature [1–3]. In adults, the most common cause of proptosis is thyroid ophthalmopathy, commonly known as Graves’ disease. However, a variety of other etiological factors can contribute to the development of proptosis. For example, inflammatory diseases, such as sarcoidosis and orbital pseudotumor, can induce periorbital edema and inflammation, resulting in proptosis. Neoplasms are also a significant cause, with tumors such as glioma, lymphoma, and metastases affecting the ocular or orbital tissues, thereby leading to proptosis. Other notable causes include trauma, infections, lesions of the lacrimal gland, and pathologies affecting the orbital bones [4–6]. Given this diversity, the accurate diagnosis and grading of proptosis require a detailed evaluation.

The most common method for assessing proptosis is with the use of the Hertel exophthalmometer [7–9]. This instrument quantifies the degree of proptosis by measuring the distance from the lateral orbital rim to the corneal surface along a plane perpendicular to the frontal plane. However, the Hertel exophthalmometer has low reproducibility between different observers and even within repeated measurements by the same observer, thus negatively affecting the reliability of its results [1,10]. Radiological imaging techniques play a crucial role in the diagnosis and grading of proptosis, with computed tomography (CT) and magnetic resonance imaging (MRI) being the preferred modalities [6]. CT is advantageous for evaluating bony structures, while MRI offers superior soft tissue detail, making it particularly useful for diagnosing inflammatory conditions and tumors. Although various methodologies for measuring and grading proptosis have been described in the literature, there are important inconsistencies between these techniques in terms of interobserver and intraobserver reliability. Measuring proptosis manually via CT or MRI is not only time-consuming and inefficient but also prone to subjective interpretation by radiologists, resulting in low reproducibility across measurements [11–14].

Accurate assessment of proptosis not only requires determining its severity but also involves vector analysis and the evaluation of associated clinical findings and whether the condition is unilateral or bilateral. These factors are essential for identifying the potential etiological causes of proptosis and for developing an appropriate treatment plan.

Deep learning, an advanced machine learning approach, has achieved substantial improvements in medical image analysis, including the detection, segmentation, and grading tasks in various organs. However, to date [13], there are no published studies applying 3-dimensional (3D) convolutional neural networks (CNNs) to the automated detection of proptosis on orbital MRI. Existing work on thyroid-associated orbitopathy has primarily focused on conventional measurements or radiomics-based approaches on CT and MRI, which still require manual interaction and may be less scalable.

Therefore, the aim of this study was to develop and evaluate 3D deep learning models based on DenseNet and ResNet architectures for automated detection of proptosis using orbital MRI data. We hypothesized that such models would provide high diagnostic performance while offering objective and reproducible assessment, with the potential for integration into routine radiology workflows and screening of patients with suspected thyroid-associated ophthalmopathy.

Material and Methods

Ethics and Data Governance

This retrospective study was approved by the Institutional Ethics Committee of Ankara Training and Research Hospital (approval number: 06.09.23/no: 1362) and was conducted in accordance with the principles of the Declaration of Helsinki. The requirement for written informed consent was waived by the ethics committee owing to the retrospective nature of the study and the use of fully anonymized imaging data.

All MRI examinations were anonymized prior to analysis by removing personal identifiers from DICOM headers and associated clinical metadata. Labeling and data processing were performed on secure institutional workstations within the hospital network environment. Imaging data were not transferred outside the originating institution, and no external cloud-based processing or third-party data sharing was performed during model development or analysis.

Ethics approval was granted for a broader retrospective imaging research project involving orbital MRI analysis and artificial intelligence (AI)–based evaluation of orbital pathologies. The present study represents a predefined sub-analysis within this approved research scope, focusing specifically on automated detection of proptosis using 3D deep learning models. All study procedures were conducted in accordance with the approved protocol.

MRI Technique

Orbital MRI scans for all patients were performed using a 1.5 T Siemens Aera (Erlangen, Germany) device in our department. The standard protocols used for orbital MRI included coronal T1-weighted (T1W), coronal short tau inversion recovery, sagittal T2-weighted (T2W), axial T1W, axial fat-suppressed T2W, and diffusion-weighted imaging sequences. In cases in which pathology was suspected during routine imaging, intravenous gadolinium contrast agents were administered (at a dose of 0.1 mmol/kg), followed by additional transverse and coronal T1W sequences. All assessments were conducted retrospectively, involving the analysis of cases with appropriate clinical information through Picture Archiving and Communication System (PACS) monitors, followed by the documentation of findings. Diffusion-weighted images were not used as primary model input in this study, to minimize the effect of diffusion-related artifacts on model training and evaluation.

When evaluating proptosis, scan alignment was carefully planned to be parallel to the optic nerve head and lens. In addition, the patients were instructed to keep their eyes open and look straight ahead without any eye movement. The interzygomatic line, which passes across the anterior portions of the zygomatic bones, was used as the reference for measuring proptosis. A normal distance from this line to the anterior surface of the globe is up to 23 mm, and any measurement above this value is considered indicative of proptosis. Similarly, the normal distance from the interzygomatic line to the posterior surface of the globe is 5.9 mm, and measurements below this threshold also suggest proptosis.

Data Collection and Preprocessing

Inclusion Criteria

The inclusion criteria were as follows: (1) patients who underwent orbital MRI between January 2018 and August 2023; (2) availability of complete MRI sequences required for orbital evaluation; (3) for the proptosis group: presence of proptosis confirmed by interzygomatic line–based distance criteria on MRI together with compatible clinical or radiological documentation; and (4) for the control group: absence of clinical and radiological evidence of orbital pathology.

Exclusion Criteria

The exclusion criteria were as follows: (1) incomplete imaging protocols; (2) significant motion artifacts degrading image quality; (3) history of prior orbital surgery; (4) severe craniofacial anatomical deformities affecting orbital measurements; and (5) follow-up MRI examinations from the same patient.

Control Selection and Verification

Control participants were selected from individuals who underwent MRI for non-orbital indications. The absence of orbital pathology was confirmed through both imaging review by an experienced neuroradiologist and evaluation of electronic medical records, including clinical notes and radiology reports.

A comprehensive MRI dataset was obtained in this single-center retrospective study conducted at our institution, covering orbital MRI examinations performed between January 2018 and August 2023. Patients were eligible for inclusion in the proptosis group if clinical records and MRI reports documented proptosis and if interzygomatic line–based distance criteria were met. Control participants were selected from individuals who underwent MRI for non-orbital indications and had no clinical or radiological evidence of orbital pathology. In total, 521 participants were included, consisting of 261 patients with proptosis and 260 controls. Each of the 521 participants was represented by a single, unique MRI examination; no follow-up scans were included in the dataset. Unilateral and bilateral cases were eligible; when both eyes from the same participant fulfilled the criteria, they were analyzed jointly within the same patient-level data instance. A total of 684 consecutive orbital MRI examinations were screened. Exclusion criteria included incomplete protocols, motion artifacts, prior orbital surgery, and severe craniofacial deformities (n=163). The final cohort consisted of 521 patients. The detailed cohort accounting process, including screening, exclusions, and final group allocation, is illustrated in Figure 1. Controls were intentionally matched in size to reduce class-imbalance bias. All splits were performed strictly at the patient level to eliminate data leakage. No missing imaging data were present; therefore, no imputation was performed. All images were anonymized before analysis.

Flow diagram of cohort construction and participant selection.

To ensure consistent and suitable input for the deep learning models, all MRI scans were preprocessed before training. All MRI scans underwent a standardized preprocessing pipeline to ensure consistency across the dataset. First, voxel intensities were normalized to a range of [0.0, 1.0] using min–max scaling. To establish a uniform spatial orientation, all volumes were transformed to the right–anterior–superior orientation.

To reduce computational overhead and focus on the relevant anatomy, a foreground cropping technique was applied to remove excess background noise based on the image intensity profile. The volumes were then resampled and resized to a fixed spatial resolution of 120×120×22 voxels. Finally, a divisible padding (k=32) was applied to ensure compatibility with the downsampling layers of the DenseNet and ResNet architectures. No explicit image registration or multi-scanner harmonization was performed in this study, to preserve the original anatomical proportions of the orbits. No explicit geometric data augmentation (such as random rotations, flips, or scaling) was applied in this initial study, in order to preserve the anatomical geometry of the orbital region; the implications of this choice and plans to explore augmentation in future work are discussed in the Limitations section. All MRI examinations were reviewed and labeled on the PACS workstation by an experienced neuroradiologist using the predefined interzygomatic line–based criteria for the presence or absence of proptosis. The unit of analysis was defined at the patient-level. Any MRI volume was fed into the model as a positive case if at least 1 eye (unilateral) or both eyes (bilateral) met the interzygomatic line–based distance criteria for proptosis. This ensured that the model learned to detect the presence of the pathology regardless of whether it affected 1 or both orbits.

Labeling Protocol

Labeling was performed by a neuroradiologist with over 12 years of experience, blinded to clinical diagnosis and AI outputs. Formal interobserver agreement analysis was not conducted and is acknowledged as a limitation.

All labels were assigned retrospectively on the PACS workstation by an experienced neuroradiologist using a standardized measurement protocol based on the interzygomatic line. For each MRI examination, axial images demonstrating the clearest visualization of both the anterior zygomatic bones and the maximal anteroposterior extent of the globes were selected for measurement, and the same approach was applied consistently across all cases. The interzygomatic line was defined as a straight line connecting the anterior cortices of the right and left zygomatic bones on the selected axial slice and was used as a fixed bony reference. The anterior globe margin was identified as the most anterior point of the globe contour, corresponding to the corneal apex or anterior scleral surface, whereas the posterior globe margin was defined as the most posterior point of the globe contour along the same anteroposterior axis. Using PACS measurement tools, perpendicular distances from the interzygomatic line to both the anterior and posterior globe margins were obtained. An MRI examination was labeled as proptosis-positive if at least 1 eye met the predefined criteria of an anterior distance greater than 23 mm and/or a posterior distance less than 5.9 mm; otherwise, the examination was labeled as proptosis-negative. Both orbits were evaluated within the same examination, and patient-level labeling was applied, such that unilateral or bilateral findings resulted in a positive label for the entire MRI volume.

Proptosis classification was based on predefined interzygomatic line–based distance thresholds applied at the patient level. An MRI examination was classified as proptosis-positive if at least 1 eye met either of the following criteria: an anterior globe distance greater than 23 mm or a posterior globe distance less than 5.9 mm. Both measurements were obtained independently, and no requirement was imposed for simultaneous fulfillment of both thresholds. In cases in which the anterior and posterior measurements were inconsistent, the presence of either abnormal measurement was considered sufficient for assigning a positive label. No additional adjudication rule or secondary consensus review was applied, as labeling followed a predefined rule-based protocol. If neither criterion was met in both eyes, the examination was labeled as proptosis-negative.

Model Architectures

Four deep learning models were trained and evaluated: DenseNet121, DenseNet169, DenseNet264, and ResNet50, with the numbers at the end of each model name indicating the number of layers. All models are based on CNN architectures, which are highly effective for processing image data and are widely used in tasks involving feature extraction and classification. Convolutional layers, which are the main components of CNNs, enable the capture of local features within images, allowing deeper layers to learn more complex structures. The DenseNet and ResNet architectures used in this study are particularly noteworthy for their distinct connection strategies that enhance performance in deep network architectures. These models were selected owing to their proven effectiveness, especially in the field of medical imaging. DenseNet increases the reuse of learned features by establishing dense connections between layers, while ResNet addresses the vanishing gradient problem in deep networks through the use of residual connections.

Owing to the 3D nature of brain MRI images, 3D CNNs were used. These 3D CNNs consider the volumetric structure of the data, extracting spatial features across all 3 axes in the images. As a result, the model can learn from the entire volume of the MRI image rather than just individual 2D slices, which provide richer information. In this study, both the DenseNet and ResNet architectures were implemented in their 3D versions, allowing each model’s convolutional layers to process structural information in images across all 3 axes by using 3D filters. By using 3D CNNs, the model processed the entire orbital volume as a single input instance, allowing for the simultaneous extraction of spatial features from both orbits and their surrounding structures. This volumetric approach eliminated the need for separate eye-level channel encoding, as the network could learn the anatomical relationships across the entire axial, coronal, and sagittal planes.

Model Training and Validation

To optimize model performance, the dataset was split at the patient level into training, validation, and test sets so that all MRI data from a given individual were assigned to the same subset and never appeared simultaneously in training and test folds. Five-fold cross-validation was used: in each iteration, 4 folds were used for training and 1 fold for testing, and this procedure was repeated 5 times. Data splitting for the 5-fold cross-validation was conducted using unique patient identifiers to strictly enforce patient-level exclusivity and prevent any form of data leakage. For each iteration, 64% of the data was used for training, 16% for validation, and 20% as holdout test data. Therefore, the 4 folds were divided into training and validation sets using an 80: 20 ratio.

Early stopping based on validation loss was applied to prevent overfitting. The main hyperparameters (learning rate, batch size, and number of epochs) were selected using a simple grid search on the training and validation data only, without using test data for model selection. The final reported performance metrics represent the mean values across the 5 test folds.

Model performance was evaluated using the following 4 metrics: (1) accuracy: the overall proportional value of correctly classified MRI images; (2) area under the receiver operating characteristic (ROC) curve (AUC): a metric that measures the model’s ability to distinguish between classes; (3) sensitivity: the model’s ability to accurately identify patients with proptosis; and (4) specificity: the model’s ability to correctly identify patients without proptosis.

To assess the statistical precision of our findings, 95% confidence intervals (CIs) for the AUC were calculated using a non-parametric bootstrapping method with 1000 resamples. This method was selected to provide a robust, distribution-free estimate of the performance variability across the 5-fold cross-validation process.

Results

In this study, a total of 521 participants were examined, divided into 2 main groups: the patient group and the control group. The patient group included 261 cases, with 51.3% female (134 individuals) and 48.7% male (127 individuals), and a mean age of 46.41±15.57 years, ranging from 7 to 85 years. The control group included 260 cases, 65.8% female (171 individuals) and 34.2% male (89 individuals), with a mean age of 43.96±19.49 years, ranging from 5 to 88 years.

We evaluated the performance of all 4 deep learning models on unseen MRI images using 5-fold cross-validation. Table 1 summarizes the mean performance metrics along with their standard deviations, demonstrating high stability across architectures. Absolute classification counts for each model are presented in Table 2, showing the mean distribution of true positives, true negatives, false positives, and false negatives across the test folds. Please note that these values are not integers, as they represent the mathematical average of the counts relative to the total sample size (N=521) divided by the number of cross-validation folds. Among the evaluated architectures, DenseNet121 achieved the best overall performance, with a mean accuracy of 95.0%, mean AUC of 0.986, sensitivity of 92.7%, and specificity of 96.9% across folds.

Table 1.

Mean diagnostic performance metrics with standard deviations across 5-fold cross-validation.

	DenseNet121	DenseNet169	DenseNet264	ResNet50
Accuracy (%)	95.01±1.27	93.10±2.29	91.96±3.59	89.26±4.80
AUC (%)	98.64±0.43	98.28±0.87	98.00±1.66	96.16±2.28
Sensitivity (%)	92.70±2.59	90.83±2.87	93.14±3.69	88.13±8.99
Specificity (%)	96.92±1.05	95.38±2.92	90.77±4.38	90.39±7.32

Open in a new tab

All values indicate percentage. AUC – area under the receiver operating characteristic curve.

Table 2.

Aggregate mean classification counts of true positives (TP), true negatives (TN), false negatives (FN), and false positives (FP) per test fold.

	DenseNet121	DenseNet169	DenseNet264	ResNet50
TP	48.8	47.6	48.8	46.2
TN	50.4	49.6	47.2	47.0
FN	3.6	4.8	3.6	5.0
FP	1.6	2.4	4.8	6.2

Open in a new tab

The low standard deviations observed for DenseNet121 indicate stable performance across different cross-validation folds, suggesting that the model generalizes well within this single-center dataset. The per-fold AUC values and their corresponding 95% CIs are detailed in Table 3. For our primary model, DenseNet121, the AUC remained highly consistent across folds (range: 98.11–99.24%), with a narrow 95% CI of 98.33–99.00%, confirming the model’s strong generalization capabilities within this dataset. In contrast, the deeper DenseNet variants and ResNet50 exhibited slightly lower mean performance and higher variability, although all models reached accuracy and AUC values compatible with potential clinical utility.

Table 3.

Per-fold AUC values and 95% CIs estimated via non-parametric bootstrapping.

	DenseNet121	DenseNet169	DenseNet264	ResNet50
Fold 1 (%)	98.44	98.26	95.90	97.79
Fold 2 (%)	99.24	98.95	98.66	99.02
Fold 3 (%)	98.85	98.93	99.45	94.21
Fold 4 (%)	98.56	96.82	96.56	96.12
Fold 5 (%)	98.11	98.45	99.41	93.68
95% CI	98.33–99.00	97.57–98.84	96.60–99.26	94.38–97.95

Open in a new tab

All values indicate percentage. AUC – area under the receiver operating characteristic curve.

The diagnostic performance and stability of the best-performing DenseNet121 model are further illustrated by the average confusion matrix (Figure 2) and the average ROC curve (Figure 3) across the 5-fold cross-validation. Explainability was evaluated through regional sensitivity analysis (occlusion sensitivity), which identified the anatomical areas, such as the retrobulbar and orbital regions, most critical to the model’s classification decisions (Figure 4). To verify the reliability of these explainability maps, a model parameter randomization sanity check was performed (Figure 5); the resulting noisy patterns confirmed that the saliency focus in the trained model was a direct result of learned diagnostic patterns. This comparison serves as a sanity check, confirming that the explainability results are fundamentally dependent on the model’s learned diagnostic features rather than underlying image gradients or the occlusion algorithm itself.

Confusion matrix for the DenseNet121 model, showing the average values across 5-fold cross-validation.

Receiver operating characteristic curve for the DenseNet121 model, displaying the average values across 5-fold cross-validation.

Regional sensitivity analysis of various MRI images. Color-coded regions (blue, green, and yellow) represent the areas most focused on by the model during decision-making, in order of importance. Red regions indicate areas with the least influence on the model’s decision.

Model parameter randomization sanity check. This figure displays the occlusion sensitivity maps generated using a model with completely randomized weights.

Discussion

In this study, we developed and evaluated 3D deep learning models based on DenseNet and ResNet architectures for the automated detection of proptosis on orbital MRI. All models achieved high diagnostic performance, supporting the feasibility of fully automated MRI-based assessment of globe protrusion. These findings complement previous work on conventional measurements and radiomics in thyroid-associated orbitopathy and orbital imaging by demonstrating that end-to-end 3D convolutional neural networks can accurately differentiate patients with and without proptosis using routine MRI data. Unlike prior radiomics-based approaches, this study applies a fully end-to-end 3D MRI-based deep learning pipeline for proptosis detection. Standard architectures were intentionally selected to prioritize transparency, reproducibility, and clinical implementability over architectural novelty.

Explainable AI is crucial in medical applications affecting clinical decisions [15]. The present study emphasized the importance of using explainable AI techniques, such as regional sensitivity analysis, to bring transparency to the model’s decision-making process. This is particularly important in diagnosing proptosis, as clinicians must understand how and why the model arrives at a specific conclusion before integrating such systems into clinical workflows. By showing which regions of the MRI image are most important in the model’s predictions, we ensured that the model was not only accurate but also interpretable, increasing its reliability for use in clinical settings [16,17].

DenseNet121 outperformed the deeper DenseNet variants and ResNet50 in terms of mean accuracy, AUC, sensitivity, and specificity. This may be related to its favorable balance between depth and parameter efficiency, as well as dense skip connections that promote feature reuse and stable gradient flow without excessively increasing model complexity. Such architecture appears well suited to the small, homogeneous orbital region and the moderate dataset size used in this study.

From a clinical perspective, automated MRI-based detection of proptosis could support radiologists and ophthalmologists in several ways. First, it may provide more objective and reproducible measurements than the Hertel exophthalmometer, particularly in patients with borderline or asymmetric findings. Second, integrating a deep learning–based alert into the radiology workflow could facilitate screening and longitudinal follow-up in patients with thyroid-associated ophthalmopathy or other orbital pathologies, helping to detect progression earlier and to standardize reporting. Finally, explainable AI techniques, such as the regional sensitivity analysis used in this study, can highlight the orbital regions that drive the model’s predictions and thereby increase clinical trust and acceptance.

This study has several limitations. First, it was conducted in a single center with a limited sample size, which may restrict generalizability to other scanners, protocols, and patient populations. Second, although we used 5-fold cross-validation and early stopping, we did not include an external test set, and some degree of overfitting cannot be excluded. Third, no formal intraobserver reliability analysis was performed, because all labels were generated retrospectively using a predefined rule-based protocol by a single experienced neuroradiologist. Although the deterministic measurement thresholds reduce subjective interpretation, the absence of repeated measurements limits the ability to quantify measurement consistency. Therefore, claims regarding labeling objectivity and reproducibility should be interpreted with caution. Future studies should include repeated measurements and multi-reader validation to assess both intra- and interobserver agreement. Image labels were assigned by a single experienced neuroradiologist, and formal interobserver agreement was not assessed. Fourth, we did not apply geometric data augmentation, which might further improve robustness to acquisition variability. Future work should address these limitations by including multi-center data, performing external validation, systematically evaluating interobserver variability, and exploring augmentation strategies and ensemble models. Additionally, no demographic statistical analysis was performed, and patient etiologies were not collected; this should be considered a limitation.

Conclusions

In this single-center retrospective study, 3D deep learning models based on DenseNet and ResNet architectures were able to automatically detect proptosis on orbital MRI with high accuracy. Among the evaluated architectures, DenseNet121 achieved the best and most consistent performance across cross-validation folds. By providing objective and reproducible assessment of globe protrusion, such models have the potential to complement conventional clinical tools and support radiologists and clinicians in the diagnosis and follow-up of proptosis, particularly in thyroid-associated ophthalmopathy. Future studies should validate these findings in larger, multi-center cohorts, investigate the impact of scanner and protocol variability, and evaluate integration of the model into real-world clinical workflows, including prospective testing and external validation.

Footnotes

Financial support: None declared

Conflict of interest: None declared

Publisher’s note: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher

Ethics Approval: This study was approved by our institutional ethics review board.

Informed Consent: Informed consent was waived by the ethics committee due to the retrospective nature of the study.

Declaration of Figures’ Authenticity: All figures submitted have been created by the authors who confirm that the images are original with no duplication and have not been previously published in whole or in part.

References

1.Ameri H, Fenton S. Comparison of unilateral and simultaneous bilateral measurement of the globe position, using the Hertel exophthalmometer. Ophthalmic Plast Reconstr Surg. 2004;20(6):448–51. doi: 10.1097/01.iop.0000143712.42344.8c. [DOI] [PubMed] [Google Scholar]
2.Segni M, Bartley GB, Garrity JA, et al. Comparability of proptosis measurements by different techniques. Am J Ophthalmol. 2002;133(6):813–18. doi: 10.1016/s0002-9394(02)01429-0. [DOI] [PubMed] [Google Scholar]
3.Chang AA, Bank A, Francis IC, Kappagoda MB. Clinical exophthalmometry: A comparative study of the Luedde and Hertel exophthalmometers. Aust N Z J Ophthalmol. 1995;23(4):315–18. doi: 10.1111/j.1442-9071.1995.tb00182.x. [DOI] [PubMed] [Google Scholar]
4.Bartalena L, Pinchera A, Marcocci C. Management of Graves’ ophthalmopathy: Reality and perspectives. Endocr Rev. 2000;21(2):168–99. doi: 10.1210/edrv.21.2.0393. [DOI] [PubMed] [Google Scholar]
5.Topilow NJ, Tran AQ, Koo EB, Alabiad CR. Etiologies of proptosis: A review. Intern Med Rev (Wash D C) 2020;6(3):imr.v6i3.852. doi: 10.18103/imr.v6i3.852. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Lloyd GA. The radiological investigation of proptosis. Br J Radiol. 1970;43(505):1–18. doi: 10.1259/0007-1285-43-505-1. [DOI] [PubMed] [Google Scholar]
7.Dunsky IL. Normative data for hertel exophthalmometry in a normal adult black population. Optom Vis Sci. 1992;69(7):562–64. doi: 10.1097/00006324-199207000-00009. [DOI] [PubMed] [Google Scholar]
8.Migliori ME, Gladstone GJ. Determination of the normal range of exophthalmometric values for black and white adults. Am J Ophthalmol. 1984;98(4):438–42. doi: 10.1016/0002-9394(84)90127-2. [DOI] [PubMed] [Google Scholar]
9.Bolaños Gil de Montes F, Pérez Resinas FM, Rodríguez García M, González Ortiz M. Exophthalmometry in Mexican adults. Rev Invest Clin. 1999;51(6):341–43. [PubMed] [Google Scholar]
10.Sleep TJ, Manners RM. Interinstrument variability in Hertel-type exophthalmometers. Ophthalmic Plast Reconstr Surg. 2002;18(4):254–57. doi: 10.1097/00002341-200207000-00004. [DOI] [PubMed] [Google Scholar]
11.Park NR, Moon JH, Lee JK. Hertel exophthalmometer versus computed tomography scan in proptosis estimation in thyroid-associated orbitopathy. Clin Ophthalmol. 2019;13:1461–67. doi: 10.2147/OPTH.S216838. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Huh J, Park SJ, Lee JK. Measurement of proptosis using computed tomography based three-dimensional reconstruction software in patients with Graves’ orbitopathy. Sci Rep. 2020;10(1):14554. doi: 10.1038/s41598-020-71098-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Huang G, Liu Z, Weinberger KQ. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016; pp. 2261–69. [Google Scholar]
14.Targ S, Almeida D, Lyman D. Resnet in resnet: Generalizing residual architectures. 2016 arXiv preprint arXiv: 1603.08029. [Google Scholar]
15.Ghosh S, Abushukair HM, Ganesan A, Pan C, et al. Harnessing explainable artificial intelligence for patient-to-clinical-trial matching: A proof-of-concept pilot study using phase I oncology trials. PLoS One. 2024;19(10):e0311510. doi: 10.1371/journal.pone.0311510. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Cham: Springer International Publishing; 2014. [Google Scholar]
17.Fan X, Li H, Liu L, et al. Alzheimer’s Disease Neuroimaging Initiative. Early diagnosing and transformation prediction of Alzheimer’s disease using multi-scaled self-attention network on structural MRI images with occlusion sensitivity analysis. J Alzheimers Dis. 2024;97(2):909–26. doi: 10.3233/JAD-230705. [DOI] [PubMed] [Google Scholar]

[b1-medscimonit-32-e951157] 1.Ameri H, Fenton S. Comparison of unilateral and simultaneous bilateral measurement of the globe position, using the Hertel exophthalmometer. Ophthalmic Plast Reconstr Surg. 2004;20(6):448–51. doi: 10.1097/01.iop.0000143712.42344.8c. [DOI] [PubMed] [Google Scholar]

[b2-medscimonit-32-e951157] 2.Segni M, Bartley GB, Garrity JA, et al. Comparability of proptosis measurements by different techniques. Am J Ophthalmol. 2002;133(6):813–18. doi: 10.1016/s0002-9394(02)01429-0. [DOI] [PubMed] [Google Scholar]

[b3-medscimonit-32-e951157] 3.Chang AA, Bank A, Francis IC, Kappagoda MB. Clinical exophthalmometry: A comparative study of the Luedde and Hertel exophthalmometers. Aust N Z J Ophthalmol. 1995;23(4):315–18. doi: 10.1111/j.1442-9071.1995.tb00182.x. [DOI] [PubMed] [Google Scholar]

[b4-medscimonit-32-e951157] 4.Bartalena L, Pinchera A, Marcocci C. Management of Graves’ ophthalmopathy: Reality and perspectives. Endocr Rev. 2000;21(2):168–99. doi: 10.1210/edrv.21.2.0393. [DOI] [PubMed] [Google Scholar]

[b5-medscimonit-32-e951157] 5.Topilow NJ, Tran AQ, Koo EB, Alabiad CR. Etiologies of proptosis: A review. Intern Med Rev (Wash D C) 2020;6(3):imr.v6i3.852. doi: 10.18103/imr.v6i3.852. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b6-medscimonit-32-e951157] 6.Lloyd GA. The radiological investigation of proptosis. Br J Radiol. 1970;43(505):1–18. doi: 10.1259/0007-1285-43-505-1. [DOI] [PubMed] [Google Scholar]

[b7-medscimonit-32-e951157] 7.Dunsky IL. Normative data for hertel exophthalmometry in a normal adult black population. Optom Vis Sci. 1992;69(7):562–64. doi: 10.1097/00006324-199207000-00009. [DOI] [PubMed] [Google Scholar]

[b8-medscimonit-32-e951157] 8.Migliori ME, Gladstone GJ. Determination of the normal range of exophthalmometric values for black and white adults. Am J Ophthalmol. 1984;98(4):438–42. doi: 10.1016/0002-9394(84)90127-2. [DOI] [PubMed] [Google Scholar]

[b9-medscimonit-32-e951157] 9.Bolaños Gil de Montes F, Pérez Resinas FM, Rodríguez García M, González Ortiz M. Exophthalmometry in Mexican adults. Rev Invest Clin. 1999;51(6):341–43. [PubMed] [Google Scholar]

[b10-medscimonit-32-e951157] 10.Sleep TJ, Manners RM. Interinstrument variability in Hertel-type exophthalmometers. Ophthalmic Plast Reconstr Surg. 2002;18(4):254–57. doi: 10.1097/00002341-200207000-00004. [DOI] [PubMed] [Google Scholar]

[b11-medscimonit-32-e951157] 11.Park NR, Moon JH, Lee JK. Hertel exophthalmometer versus computed tomography scan in proptosis estimation in thyroid-associated orbitopathy. Clin Ophthalmol. 2019;13:1461–67. doi: 10.2147/OPTH.S216838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12-medscimonit-32-e951157] 12.Huh J, Park SJ, Lee JK. Measurement of proptosis using computed tomography based three-dimensional reconstruction software in patients with Graves’ orbitopathy. Sci Rep. 2020;10(1):14554. doi: 10.1038/s41598-020-71098-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13-medscimonit-32-e951157] 13.Huang G, Liu Z, Weinberger KQ. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016; pp. 2261–69. [Google Scholar]

[b14-medscimonit-32-e951157] 14.Targ S, Almeida D, Lyman D. Resnet in resnet: Generalizing residual architectures. 2016 arXiv preprint arXiv: 1603.08029. [Google Scholar]

[b15-medscimonit-32-e951157] 15.Ghosh S, Abushukair HM, Ganesan A, Pan C, et al. Harnessing explainable artificial intelligence for patient-to-clinical-trial matching: A proof-of-concept pilot study using phase I oncology trials. PLoS One. 2024;19(10):e0311510. doi: 10.1371/journal.pone.0311510. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b16-medscimonit-32-e951157] 16.Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. Cham: Springer International Publishing; 2014. [Google Scholar]

[b17-medscimonit-32-e951157] 17.Fan X, Li H, Liu L, et al. Alzheimer’s Disease Neuroimaging Initiative. Early diagnosing and transformation prediction of Alzheimer’s disease using multi-scaled self-attention network on structural MRI images with occlusion sensitivity analysis. J Alzheimers Dis. 2024;97(2):909–26. doi: 10.3233/JAD-230705. [DOI] [PubMed] [Google Scholar]

PERMALINK

Use of Deep Learning Models in the Diagnosis of Proptosis Through Orbital Magnetic Resonance Imaging

Uğur Kesimal

Habip Eser Akkaya

Önder Polat

Murat Sağlam