Highlights
-
•
SMM-YOLOv8n model achieves 88.6% mAP@50 for detecting second-molar lesions.
-
•
Deep learning reduces diagnostic time by 8.79 minutes per 60 images.
-
•
Model improves junior dentists’ accuracy by 17.1% sensitivity and 9.7% PPV.
-
•
First AI system classifying third-molar-related caries, ERR, and combined lesions.
Key words: Deep learning, Panoramic radiography, Third molar, Root resorption, Caries detection, Diagnostic imaging
Abstract
Objectives
This study aimed to develop an automated deep learning (DL) system to detect and classify second molar (M2) pathologies associated with impacted third molars (ITMs) on panoramic radiographs. We sought to enhance diagnostic accuracy and support informed clinical decision-making.
Methods
We constructed a dataset of 1,170 panoramic radiographs that show second molars (M2s) adjacent to impacted third molars (ITMs). The cases were retrospectively divided into 4 groups: (1) no lesions; (2) caries (tooth decay); (3) external root resorption (ERR; loss of tooth root structure from external factors); or (4) both pathologies. Three oral surgeons, each with extensive experience, annotated the images using standardized criteria, resolving any disagreements by consensus. Our enhanced SMM-YOLOv8n model is based on You Only Look Once version 8 (YOLOv8) and features Slim-Neck optimization and multidimensional attention mechanisms. We trained the model with transfer learning (applying knowledge from pre-trained models to new tasks). We evaluated it using 5-fold cross-validation (dividing the data into 5 parts and rotating the validation sets).
Results
SMM-YOLOv8n achieved an mAP@50 of 0.886 on the internal test set. The macro-averaged precision, recall, and F1-score were 0.894, 0.96, and 0.926 (calculated at an IoU threshold of 0.5). These results show clear improvement over the baseline YOLOv8. For 60 images, the clinicians' mean sensitivity increased by 0.171, and the average interpretation time decreased by 8.79 minutes.
Conclusions
SMM-YOLOv8n offers accurate and efficient detection of ITM-related M2 pathologies on panoramic radiographs. This approach may enhance early diagnosis, aid in treatment planning, and reduce the need for cone-beam computed tomography in initial assessments.
Clinical significance
This DL-based diagnostic tool may serve as a valuable decision-support system, particularly in clinical settings with limited access to specialized dental expertise.
Introduction
The impaction of mandibular third molars (ITMs) is a common clinical issue. This occurs because evolutionary changes limit mandibular arch space, preventing full dental eruption.1 Extracting ITMs is routine, but the procedure carries risks, including swelling, bleeding, nerve injury, and delayed healing (such as alveolitis).2 Clinicians disagree on the best management of asymptomatic ITMs. Traditionally, conservative monitoring is preferred over immediate extraction to avoid surgical risks.3 However, new evidence suggests that retained asymptomatic ITMs may be associated with odontogenic problems. These problems include recurrent pericoronitis, cysts, and damage to second molars (M2s), such as external root resorption (ERR) and cervical caries.4 Studies report a 0.3% to 24.2% rate of M2 ERR near ITMs, and increased caries risk when teeth are in close contact.5, 6, 7, 8, 9 These findings underscore the importance of early detection and intervention in preserving M2 function.
Recent AI advancements are transforming dental diagnostics. Deep learning (DL) algorithms now excel at analyzing orthodontic measurements, detecting caries in radiographs, classifying periodontal disease, and screening for osteoporosis.10, 11, 12, 13, 14, 15 These studies use various DL architectures to solve clinical tasks. For example, some networks can classify the type of impaction with high diagnostic consistency.16 Deep learning models have achieved F1-scores of 68.3% to 94.3% for caries classification and 42.8% to 95.4% for detection or segmentation, matching, or even exceeding clinician accuracy in some cases.17, 18, 19, 20 New work also explores multi-task frameworks. These can predict third molar classification and their relationship with the inferior alveolar nerve, aiding preoperative planning.21 These models report accuracies of 78.65% to 93.9%.22, 23, 24, 25 Comparisons show these DL systems often perform as well as experienced oral radiologists in certain tasks.25,26 These findings demonstrate the feasibility and growing power of DL for ITM analysis.
Current diagnostic workflows are not efficient enough. Given the larger field of view compared to bitewing radiographs, we chose panoramic radiography for this study because it captures the entire dentition and jaws.27,28 Visual assessment of panoramic radiographs is subjective, and using CBCT is selective and resource-intensive. This highlights the need for a standardized, objective screening tool.
In this study, we build an optimized deep learning (DL) framework based on YOLOv8. This framework detects ITM-associated M2 abnormalities using panoramic radiographs. Its goal is to reduce the financial and radiation burden associated with CBCT and to improve diagnostic speed. This, in turn, supports earlier treatment and preservation of teeth.
Material and methods
Study design
This diagnostic study explored whether SMM-YOLOv8n can automatically detect and classify lesions on second molars next to impacted third molars. Figure 1 shows the study design for model development and validation. Since we were primarily developing and testing the algorithm, we did not conduct a formal sample size calculation. The Institutional Ethics Committee of the Affiliated Hospital of Stomatology at Nanjing Medical University approved the study (Approval No. J2023-181-001). Reporting followed the CLAIM checklist (Appendix 1) for artificial intelligence in medical imaging, as well as other topic-specific guidelines.29
Fig. 1.
The flowchart of the study. The flowchart illustrates the study design for developing the detection system. In this flowchart, ‘a’ indicates the number of patients, ‘m’ indicates the number of images and ‘n’ indicates the number of teeth.
Data acquisition
We randomly selected panoramic radiographs from 700 subjects between November 2023 and September 2024 from the Department of Oral and Maxillofacial Surgery database at Jiangsu Provincial Stomatological Hospital. The images met the following criteria: (1) At least 300 dpi; (2) Complete area covering ITMs and adjacent M2s. We excluded cases with operational errors, artifacts, subtle lesion masking, or high noise. After exclusions, 684 subjects were chosen. All images were captured using the DEXISTM diagnostic X-ray Equipment, OC200D.
Annotation procedures
Three oral and maxillofacial surgeons, each with over 5 years of experience, reviewed and annotated all panoramic radiographs on standard laptops. To maintain consistency and accuracy, all raters completed a training program led by a senior expert with more than 15 years of experience. The training covered how to identify and classify second molar pathologies using the International Caries Detection and Assessment System (ICDAS) and Feiglin's criteria for external root resorption. Both were adapted for panoramic images.30,31
After training, examiners completed a calibration exercise using 50 pre-annotated images showing a full range of pathologies. These images were based on clinically confirmed cases where available. The same senior dentist who led the training provided the ground truth for these images. The raters used the same 50 images repeatedly during calibration. They repeated the calibration until both intra- and inter-examiner agreement exceeded a weighted kappa value (κw) of 0.8, confirming excellent consistency.
To establish ground truth, each image was assessed by all 3 raters using LabelImg (v1.8.4, https://github.com/tzutalin/labelImg). Each second molar near an impacted third molar was placed in 1 of 4 groups: (1) no lesions; (2) caries; (3) external root resorption (ERR); or (4) both pathologies (Appendix 2). Based on ICDAS and Feiglin's criteria, radiographic features differentiate root caries from external root resorption. On panoramic images, external cervical resorption appears as a well-defined radiolucency with sharp borders below the contact point. Root caries appears as an ill-defined, saucer-shaped radiolucency.32 The 'background' refers to image regions that do not contain the target pathologies for the purpose of calculating specificity and other background-related metrics. True Negative (TN) was defined as a model-predicted anchor or region proposal that correctly remained unassigned to any pathology class. False Positive (FP) was counted when the model generated a bounding box prediction in an area where no ground-truth class existed. False Negatives (FN) are not defined for the background class itself, as they pertain to missed detections of the positive pathology classes.
For object detection tasks, the raters delineated the precise boundaries of pathological regions using bounding boxes and assigned corresponding class labels to each. Any discrepancies in classification or object detection among the 3 raters were identified and resolved through consensus meetings chaired by the senior expert.
Dataset characteristics
Before training, the whole image set (1,170 teeth/965 no lesions/118 caries/76 ERR/11 both pathologies) was randomly allocated into a training set (N = 591/66/37/4), a validation set (191/24/19/2), and a test set (183/28/20/5) with a ratio of 6:2:2.
It's worth noting that our data was assigned at the patient level. To be specific, images with 2 or more labels of the same patient are grouped into the same sub-dataset, rather than being split randomly. This approach can prevent data loss by preserving the full image's features, thereby avoiding the destruction of correlation information in our targeted areas.
Data pre-processing
Image augmentation was employed to expand the dataset, thereby enhancing the model’s robustness. In our study, existing samples were subtly modified using methods such as random cropping, rotation, cutout, and paddling.
You Only Look Once (YOLOv8n) is utilized as a base framework in this study. It consists of 3 sections: Backbone, Neck, and Head. The Backbone is responsible for extracting and transforming features from input images into rich information. The Neck fuses these features, and the Head detects and classifies targets based on the Neck's output.
The network structure diagram of the SMM-YOLOv8n detection model (Fig. 2) shows that advancements were made to the Neck part of the original YOLOv8n model. To achieve the goals of real-time detection on edge devices in hospital settings while maintaining efficient accuracy, the Slim-Neck optimization solution (SNs) was introduced into our model.33 Specifically, it’s a simple and efficient combination of GSConv, which replaces some conventional convolutions, the efficient cross-stage partial block (VoV-GSCSP), and attention modules at the end of the backbone. To address the difficulty that modeling attention across multiple dimensions can impose a heavier computational burden, the Multidimensional Collaborative Attention (MCA) mechanism was added to the end of the backbone.34 Additionally, regarding the loss function for bounding box regression, most existing methods fail to be optimized when the predicted box has the same aspect ratio as the ground truth box but differs in width and height.35
Fig. 2.
SMM-YOLOv8n network structure. Detailed structure of the proposed SMM-YOLOv8n model comprises 3 core components: backbone, neck, and head. In the backbone, C2f replaces C3 module. Spatial Pyramid Pooling Fast (SPPF) module is input for multi-scale feature extraction via iterative max-pooling. In the neck, VoV-GSCSP integrates grouped spatial-channel-wise convolutions to replace traditional CSP and Concat combines upsampled deep features with shallow features for multi-scale detection. In the head, MCALayer dynamically weights features to improve small lesion detection and MPDIoU Loss optimizes bounding box regression for dense dental targets.
The network architecture was designed using the Pytorch machine learning framework (version 1.11). The modeling was implemented on an Intel Core i7-11700 CPU (Intel) and a GeForce GTX 4080 GPU (NVIDIA) with 24 GB GDDR6X memory. The operating system was Linux Ubuntu 20.04, and the frameworks utilized were CUDA 11.3, CuDNN 8.0.4, and OpenCV 4.6.0.6. The programming language used was Python. In terms of training parameter processing, set the number of training epochs to 100, the training batch size to 32, and the training momentum to 0.937. Opt for a dynamic learning rate, with lr0 = 0.01 and weight_decay = 0.0005. The optimization function employs SGD.
External validation dataset
To rigorously assess the generalizability of the proposed SMM-YOLOv8n model, an external validation was conducted using retrospective data collected from 5 independent stomatology hospitals. The inclusion of data from different imaging devices and clinical settings was intentional to better represent real-world heterogeneity in a multi-center validation.
The external validation set comprised 40 panoramic radiographs (with Changzhou Hospital of Traditional Chinese Medicine 1, Xuyi County People's Hospital 1, Nanjing Lishui District Hospital of Traditional Chinese Medicine 1, Sihong County People's Hospital 1, and Nanjing Stomatological Hospital 36), featuring 67 second molars adjacent to impacted third molars. The specific manufacturers and models of the panoramic X-ray equipment used across the 5 participating hospitals were not retrievable retrospectively.
Inclusion and exclusion criteria for the external set were consistent with those applied to the internal training set (as detailed in Section 2.2), ensuring comparability.
Ground-truth annotations for the external validation set were established by a separate group of 3 dentists, each with over 5 years of experience, who were not involved in annotating the internal training dataset. These raters were blinded to the annotations from the internal set and to each other's initial assessments. They followed the same standardized criteria.
Model evaluation metrics
The performance of the proposed SMM-YOLOv8n model was comprehensively assessed using a set of well-established metrics for object detection. These metrics were computed using standard Python libraries, including TorchMetrics and OpenCV, to ensure reproducibility and alignment with common field practices. Model performance was primarily evaluated through the following metrics:
-
1.
Precision: This metric quantifies the percentage of correctly identified positive predictions among all instances classified as positive by the model:
Precision = True Positives / (True Positives + False Positives)
-
2.
Recall (Sensitivity): Recall measures the model’s ability to correctly detect all actual positive instances:
Recall = True Positive / (True Positive + False Negatives).
-
3.
Specificity: Specificity evaluates the model’s performance in correctly rejecting negative instances:
Specificity = True Negatives / (True Negatives + False Positives).
-
4.
Intersection over Union (IoU): This metric measures the spatial overlap between the predicted bounding box and the ground truth bounding box. It is defined as the area of intersection divided by the area of the union of the 2 bounding boxes. IoU values range from 0 to 1, with higher values indicating better localization performance.
-
5.
Mean Average Precision (mAP@50): The average precision (AP) was computed at an Intersection over Union (IoU) threshold of 0.5, and the mean across all classes yielded mAP@50, providing an overall measure of detection accuracy across various lesion types.
-
6.
F1-Score: The harmonic mean of precision and recall, providing a balanced evaluation of the model’s accuracy and robustness:
F1-Score = 2 × (Precision × Recall)/(Precision + Recall).
Statistical analysis
The data were analyzed using statistical software (SPSS Inc., version 29.0.1.0, IBM). Statistical analyses were performed using appropriate tests based on variable types. Continuous variables (eg, measurement values) were analyzed using independent t-tests or paired t tests where appropriate. Categorical variables (eg, pathology types) were analyzed using Chi-square tests. A significance level of p < .05 was used for all statistical tests.
Results
Patient characteristics
Patients' characteristics are described in Table 1, where we count patients rather than teeth. This table presents the demographic and clinical characteristics of the study participants for analysis, with females accounting for a larger percentage. Regarding age distribution, the majority of participants were young adults, with the 20 to 29 years age group being the largest. The type of tooth impaction was classified using Winter’s classification of impacted third molars. Mesioangular was the most common type across both genders.
Table 1.
Baseline characteristics of the study participants
| Characteristic | Male (n = 228) | Female (n = 456) | Total (n = 684) |
|---|---|---|---|
| Age (years) | |||
| <20 | 10 (4.4) | 22 (4.8) | 32 (4.7) |
| 20-29 | 134 (58.8) | 310 (68.0) | 444 (64.9) |
| 30-39 | 62 (27.2) | 104 (22.8) | 166 (24.3) |
| 40-49 | 16 (7.0) | 15 (3.3) | 31 (4.5) |
| 50-59 | 5 (2.2) | 5 (1.1) | 10 (1.5) |
| ≥60 | 1 (0.4) | 0 (0.0) | 1 (0.1) |
| Impaction type | |||
| Mesioangular | 154 (67.5) | 286 (62.7) | 440 (64.3) |
| Distoangular | 6 (2.6) | 24 (5.3) | 30 (4.4) |
| Horizontal | 7 (3.1) | 10 (2.2) | 17 (2.5) |
| Vertical | 57 (25.0) | 117(25.7) | 174(25.4) |
| Buccal | 1 (0.4) | 2 (0.4) | 3 (0.4) |
| Lingual | 3 (1.3) | 13 (2.9) | 16 (2.3) |
| Inverted | 0 (0.0) | 4 (0.9) | 4 (0.6) |
Data are presented as n (%).
Label reliability
To assess the consistency and reliability of the annotations provided by the 3 maxillofacial surgeon raters, we calculated the intraclass correlation coefficient (ICC). The inter-rater reliability analysis yielded ICCs of 0.456 (95% CI: 0.421-0.490, P < .001) for single measurements and 0.715 (95% CI: 0.686-0.742, P < .001) for average measurements. These results indicate moderate to strong agreement among the raters, with the higher ICC for average measurements justifying the use of consensus-based ground-truth labels for model training and evaluation. The statistical significance confirms the reliability of the annotated dataset for the subsequent deep learning tasks.
Model performance
Performance of SMM-YOLOv8n
The enhanced model demonstrated robust diagnostic capabilities, achieving an overall classification accuracy of 93.75% across the entire image dataset (Table 2). Stratified performance analysis revealed hierarchical accuracy among lesion categories: both pathologies (100%), ERR (95.2%), caries (94.3%), and no lesions (85.5%). The "background" class in the evaluation refers to the non-pathological image areas, which are a standard component in object detection model evaluation. Notably, the "both pathologies" category achieved perfect or near-perfect metrics. However, it is crucial to interpret these results with caution due to the very limited sample size for this category in our test set. The framework maintained a high recall-precision balance, evidenced by its class-agnostic mean average precision at 50% IoU threshold (mAP@50: 0.886), confirming consistent detection reliability across heterogeneous lesion presentations (Fig. 3).
Table 2.
Evaluation of the performance of the SMM-YOLOv8n
| TP |
TN |
FP |
FN |
ACC | SE | SP | NPV | PPV | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | % | n | % | n | % | n | % | ||||||
| No | 352 | 67.0 | 97 | 18.5 | 52 | 9.9 | 24 | 4.6 | 85.5 | 93.6 | 65.1 | 80.2 | 87.1 |
| Caries | 32 | 6.1 | 463 | 88.2 | 11 | 2.1 | 19 | 3.6 | 94.3 | 62.7 | 97.7 | 96.1 | 74.4 |
| ERR | 20 | 3.8 | 480 | 91.4 | 15 | 2.9 | 10 | 1.9 | 95.2 | 66.7 | 97.0 | 98.0 | 57.1 |
| Both | 29 | 5.5 | 496 | 94.5 | 0 | 0 | 0 | 0 | 100 | 100 | 100 | 100 | 100 |
| Background | 0 | 0 | 472 | 89.9 | 14 | 2.7 | 39 | 7.4 | 89.9 | 0 | 97.1 | 92.4 | 0 |
Fig. 3.
Improved YOLOv8n model’s Precision-Recall (PR) curve. PR curves demonstrate the detection performance of the improved YOLOv8n model across different dental lesion categories. Sky blue ("no"): Highest overall performance mAP@0.5 = 0.961), maintaining precision >0.9 even at high recall levels. Red ("both"): Near-perfect precision (0.995) but limited to lower recall ranges, suggesting high confidence for positive cases. Orange ("caries") & Green ("ERR"): Balanced performance mAP@0.5 = 0.804 and 0.783, respectively, reflecting reliable detection of caries and external root resorption (ERR). Blue ("all classes"): mAP@0.5 of 0.886, indicating robust multi-class generalization.
Improvements compared to the original model
The SMM-YOLOv8n model outperformed the baseline YOLOv8n algorithm, achieving a 0.4% improvement in F1-score and a 2.6% increase in mAP (Appendix 3). At the same time, the number of Gflops and parameters was significantly reduced, with 0.18 trainable elements in the model (in terms of parameters) and 2.4 billion floating-point operations the model or device can perform per second (in terms of Gflops). These optimizations significantly enhance clinical deploy ability by minimizing hardware resource demands while maintaining diagnostic fidelity, particularly in settings with limited computational infrastructure.
Robustness on the external dataset
To assess the cross-institutional robustness of SMM-YOLOv8n, an external validation was conducted using datasets from 5 independent hospitals. The model achieved a sensitivity of 0.711, a positive predictive value (PPV) of 0.865, and an F1-score of 0.781. These results demonstrate its consistent diagnostic capability across heterogeneous clinical environments, confirming significant generalizability for multi-center dental applications.
Model net explainability
The comparative diagnostic performance of SMM-YOLOv8n and the original YOLOv8n model in M2 lesion detection and classification were illustrated in Figure 4. The original YOLOv8n model exhibited false-positive predictions in "no lesions" cases, highlighting its reduced specificity in distinguishing healthy anatomy (Fig. 4A). For “caries” and “ERR” detection, SMM-YOLOv8n demonstrated higher prediction confidence, achieving confidence scores of 0.9 and 0.855(the average number), respectively (Figs. 4B and C). SMM-YOLOv8n showed higher predictive confidence (0.89) for “both pathologies” than the original YOLOv8n model (0.83) (Fig. 4D). The evaluation metrics for the original model’s performance are presented in Table 3. In comparison to the results in Table 2, it further demonstrates that our model achieved better performance than the original model.
Fig. 4.
Comparison of the original YOLOv8n, and SMM-YOLOv8n predictions. (A) YOLOv8n and SMM-YOLOv8n correctly predict "No lesions" with higher confidence (0.9) for the improved model. (B) SMM-YOLOv8n matches the expert diagnosis with high confidence (0.9), whereas YOLOv8n misclassifies one lesion as "No lesions" (low confidence: 0.3). (C) Both models detect ERR but with varying confidence (SMM-YOLOv8n: 0.86/0.85 vs. YOLOv8n: 0.88/0.58). (D) SMM-YOLOv8n improves confidence (0.92/0.89) over YOLOv8n (0.89/0.83).
Table 3.
Evaluation of the performance of the original YOLOv8 model
| TP |
TN |
FP |
FN |
ACC | SE | SP | NPV | PPV | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| n | % | n | % | n | % | n | % | ||||||
| No | 357 | 68.0 | 95 | 18.1 | 54 | 10.3 | 19 | 3.6 | 86.1 | 94.9 | 63.8 | 83.3 | 86.9 |
| Caries | 32 | 6.1 | 458 | 87.2 | 16 | 3.0 | 19 | 3.6 | 93.3 | 62.7 | 96.6 | 96.0 | 66.7 |
| ERR | 21 | 4.0 | 476 | 90.7 | 19 | 3.6 | 9 | 1.7 | 94.7 | 70.0 | 96.2 | 98.1 | 52.5 |
| Both | 29 | 5.5 | 496 | 94.5 | 0 | 0.0 | 0 | 0.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| Background | 0 | 0 | 457 | 87.0 | 13 | 2.5 | 55 | 10.5 | 87.0 | 0.0 | 97.2 | 89.3 | 0.0 |
Robust model convergence and validation
The progressive reduction and eventual stabilization of composite loss values during training epochs indicated robust model convergence and strong generalization without overfitting (Fig. 5). Over 100 epochs, both training and validation losses decreased rapidly and stabilized, demonstrating effective learning with minimal overfitting. The model achieved high precision (0.92) and recall (0.88), with a mean Average Precision (mAP) at 0.5 of 0.89, confirming strong detection accuracy. A moderate mAP@0.5:0.95 (0.65) suggests potential for improved localization precision at higher IoU thresholds. These findings collectively validated the framework's computational stability, diagnostic reproducibility, and clinical-grade classification accuracy.
Fig. 5.
Training dynamics and model performance metrics across 100 epochs. The progressive reduction and eventual stabilization of composite loss values during training epochs demonstrated robust model convergence, indicating strong generalization capability without overfitting.
Dental clinician comparison
With the assistance of SMM-YOLOv8n, junior dentists assessed 60 images on average 8.79 minutes faster than without the system (Fig. 6). Beyond the efficiency gain, the model significantly enhanced diagnostic performance across all 5 participating dentists. The mean sensitivity improved by 0.171, positive predictive value (PPV) by 0.097, and F1-score by 0.170, indicating consistently elevated accuracy across multiple metrics and underscoring the model's utility as an effective clinical decision support tool. Interobserver reliability, assessed using Fleiss’ κ among the 5 junior dentists and one senior dentist, yielded a coefficient of 0.686 (Appendix 4). Furthermore, Cohen’s κ analysis confirmed satisfactory intraobserver agreement for each evaluator, indicating high self-consistency in repeated assessments.
Fig. 6.
The sensitivity, positive predictive value, F1-score, and diagnostic time of junior dentists before and after SMM-YOLOv8n assistance. Represents statistical significance *P < .05, **р < .01, ***P < .001, ns, indicates non-significant differences. (A) Sensitivity: No significant difference (nS) between groups. (B) Positive predictive value (PPV): significant improvement with AI (**P < .01). (C) F1-score: balanced accuracy improved (*P < .05). (D) Diagnostic time: AI reduced time per case by 35% (*P < .05).
Discussion
In this diagnostic study, we proposed an automated deep learning approach to localize and classify M2 pathologies associated with ITMs in panoramic radiographs. To achieve this goal, an improved YOLOv8n model, SMM-YOLOv8n, was developed, achieving an accuracy of at least 90%. This model holds promise for future clinical applications, as it enables the detection and assessment of ITM-related M2 pathologies on panoramic radiographs (PAN) without relying on manual dentist evaluation.
Before the ITM's treatment, surgeons evaluated the necessity and difficulty of extraction according to multiple factors.36 In clinical practice, while CBCT provides superior imaging of teeth and adjacent structures by generating high-resolution 3-dimensional volumetric images without magnification or superimposition, PAN remains the standard preoperative diagnostic tool due to its lower financial burden on patients.37 However, as 2-dimensional (2D) images, PAN has limitations in precisely assessing the positions and relationships of specific structures, especially for novice surgeons.22
In dentistry, deep learning is widely applied for medical image detection, segmentation, and classification, assisting in the diagnosis of conditions such as caries, lesions, and cysts.38 Kug Jin Jeon et al. developed 3 deep learning models to determine the actual contact relationship between the third molars and the mandibular canal solely based on panoramic radiographs.22 Serkan Yilmaz et al. utilized YOLOv4 in deep learning methods for tooth classification in dental panoramic radiographs.39 Taha Zirek et al. successfully developed a model to detect all impacted teeth and classify them using the Winter classification system, utilizing panoramic radiographs.40 The decision to extract ITMs depends on the conditions of adjacent M2s, which remains a great challenge for clinicians.41
Previous studies have demonstrated the robust capability of deep learning models in detecting various dental and maxillofacial lesions. For instance, deep learning has been successfully applied to the detection and classification of jaw cysts on panoramic images.42 In the domain of periodontal health, specialized frameworks have been developed to automate the detection of key landmarks.43 Furthermore, systematic reviews have synthesized evidence confirming the high diagnostic performance of AI algorithms for detecting other common conditions on panoramic radiographs, including dental caries and periapical lesions, with recent models reporting accuracy exceeding 90% in some tasks.44
To our knowledge, this constitutes the first fully automated method for detecting and classifying M2 pathologies associated with ITMs using panoramic radiographs. Prior studies have focused on isolated tasks. Vinayahalingam et al. developed a U-Net-based approach for segmenting mandibular third molars (M3) and inferior alveolar nerves (IAN) on panoramic radiographs to assess nerve injury risks during extraction; however, they did not address broader M2 pathology classification.45 Similarly, Zadrozny et al. evaluated an AI system for the automated detection of common dental pathologies on panoramic images; however, its reliability for ITM-specific M2 pathologies remained unvalidated, and it demonstrated low sensitivity (0.390) for periapical lesions.46 A recent 2-stage model for assessing the proximity between M3s and the mandibular canal achieved a classification accuracy of 0.85.47 While such models are known for high localization precision, our one-stage SMM-YOLOv8n offers a competitive balance between performance and efficiency. It attains a comparable mAP@50 of 0.886 without the computational overhead of a region proposal stage, which is advantageous for potential clinical deployment where speed is a consideration. Despite achieving higher mAP@50 than comparable models, such as Celik's YOLOv3-based tool with 0.96 mAP@0.5 for impacted molar detection,38 our model's 88.6% accuracy falls short of the >95% clinical benchmark.
Bitewing radiographs are primarily used to visualize the crowns of posterior teeth and their interproximal areas. The status of tooth roots requires imaging that captures the apical region; however, a systematic review confirms that bitewing radiographs typically do not show root apices. A significant portion of AI studies for apical lesions use panoramic radiographs, recognizing their role in broader detection. Studies indicate that detecting early-stage caries lesions is challenging with any radiographic modality, including bitewings; however, the limited field of view of bitewings may overlook these lesions by failing to provide broader contextual anatomy.48,49 A bitewing-focused study trained its model on 730 bitewing radiographs with 1,115 annotated lesions.50 While substantial, its scope is inherently limited to pathologies visible within the bitewing's field of view. In contrast, panoramic radiography provides a comprehensive view of the entire mouth. Additionally, panoramic imaging enables a thorough evaluation of important anatomical structures related to third molars, such as the inferior alveolar nerve canal, thereby allowing for a more comprehensive assessment of the necessity and risk of extraction.25 Therefore, panoramic images are selected as targets for our study to train a complementary screening tool.
This study has several limitations inherent to its single-center retrospective design. First, selection bias may arise because our tertiary referral center receives a higher proportion of complex cases, potentially leading to model overfitting to severe pathologies. Second, information bias is possible because historical medical records may contain inconsistencies. Third, spectrum bias arises from strict image inclusion criteria (≥300 dpi, artifact exclusion), which created ideal training conditions but may overestimate performance in routine clinical settings. We acknowledge that while this ensured data quality, it may limit real-world applicability. Additionally, class imbalance persists for the rare "both pathologies" category (only 11 samples), potentially affecting the model's ability to detect subtle, co-occurring lesions. While data splitting was performed at the patient level to prevent leakage, a potential correlation risk remains if multiple teeth from the same patient exhibited different pathologies. Finally, an 8.2% performance decrease in external validation, although within an acceptable range for medical AI, highlights the need to further assess the generalization capability.
Future work should expand datasets across institutions in accordance with Danjo et al.'s recommendations, incorporate data augmentation, and explore hybrid architectures to bridge the accuracy gaps.36 Additionally, we will focus specifically on multi-center validation to enhance generalizability, adaptive image-quality training across diverse clinical environments, and targeted collection of rare cases through collaboration.
Conclusion
In conclusion, this research developed a deep learning model to detect M2 lesions caused by ITMs. The model demonstrates stronger discrimination and calibration. It therefore provides clinicians with potential support for devising tailored treatment plans for patients with asymptomatic ITMs.
Conflict of interest
None disclosed.
Acknowledgments
Data availability
The complete datasets generated during this study are not publicly accessible to protect participants' privacy and confidentiality. De-identified data may be provided by the corresponding author upon formal request, subject to ethical review and compliance with institutional data-sharing policies. The source code for the analytical framework is openly available on GitHub: https://github.com/0126Jiang/SMM-YOLOv8n.
Funding
This work was supported by the Jiangsu Provincial Health Commission Scientific Research Project (Grant No. 2023-011), the Jiangsu Provincial Natural Science Foundation and Key Research & Development Program (Grant No. BK20221300), and the Innovation and Entrepreneurship Training Program for College Students in Jiangsu Province (Grant No. 4176, Project: A deep learning model for second-molar lesions related to impacted third molars).
Acknowledgements
We want to acknowledge Department of Oral and Maxillofacial Surgery, the Affiliated Stomatology Hospital of Nanjing Medical University for providing clinical data and technical support. Special thanks to the 3 oral surgeons who contributed to dataset annotation and validation. Finally, we appreciate the constructive feedback from the editors and anonymous reviewers, which significantly improved this work.
Footnotes
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.identj.2026.109467.
Appendix. Supplementary materials
References
- 1.Pinto A.C., Francisco H., Marques D., Martins J.N.R., Carames J. Worldwide prevalence and demographic predictors of impacted third molars-systematic review with meta-analysis. J Clin Med. 2024;13(24):7533. doi: 10.3390/jcm13247533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Murad M., Al-Maslamani L., Yates J. Removal of mandibular third molars: an overview of risks, a proposal for international community and guidance. Dent Med Probl. 2024;61(4):481–488. doi: 10.17219/dmp/166156. [DOI] [PubMed] [Google Scholar]
- 3.Nunn M.E., Fish M.D., Garcia R.I., et al. Retained asymptomatic third molars and risk for second molar pathology. J Dent Res. 2013;92(12):1095–1099. doi: 10.1177/0022034513509281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Li D., Tao Y., Cui M., Zhang W., Zhang X., Hu X. External root resorption in maxillary and mandibular second molars associated with impacted third molars: a cone-beam computed tomographic study. Clin Oral Investig. 2019;23(12):4195–4203. doi: 10.1007/s00784-019-02859-3. [DOI] [PubMed] [Google Scholar]
- 5.Prasanna Kumar D., Sharma M., Vijaya Lakshmi G., Subedar R.S., Nithin V.M., Patil V. Pathologies associated with second mandibular molar due to various types of impacted third molar: a comparative clinical study. J Maxillofac Oral Surg. 2022;21(4):1126–1139. doi: 10.1007/s12663-021-01517-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schriber M., Rivola M., Leung Y.Y., Bornstein M.M., Suter V.G.A. Risk factors for external root resorption of maxillary second molars due to impacted third molars as evaluated using cone beam computed tomography. Int J Oral Maxillofac Surg. 2020;49(5):666–672. doi: 10.1016/j.ijom.2019.09.016. [DOI] [PubMed] [Google Scholar]
- 7.Nemcovsky C.E., Libfeld H., Zubery Y. Effect of non-erupted 3rd molars on distal roots and supporting structures of approximal teeth. A radiographic survey of 202 cases. J Clin Periodontol. 1996;23(9):810–815. doi: 10.1111/j.1600-051x.1996.tb00616.x. [DOI] [PubMed] [Google Scholar]
- 8.Yamaoka M., Furusawa K., Ikeda M., Hasegawa T. Root resorption of mandibular second molar teeth associated with the presence of the third molars. Aust Dent J. 1999;44(2):112–116. doi: 10.1111/j.1834-7819.1999.tb00211.x. [DOI] [PubMed] [Google Scholar]
- 9.Li Z.B., Qu H.L., Zhou L.N., Tian B.M., Chen F.M. Influence of non-impacted third molars on pathologies of adjacent second molars: a retrospective study. J Periodontol. 2017;88(5):450–456. doi: 10.1902/jop.2016.160453. [DOI] [PubMed] [Google Scholar]
- 10.Ariizumi D., Sakamoto T., Yamamoto M., Nishii Y. External root resorption of second molars due to impacted mandibular third molars during orthodontic retention. Bull Tokyo Dent Coll. 2022;63(3):129–138. doi: 10.2209/tdcpublication.2021-0044. [DOI] [PubMed] [Google Scholar]
- 11.Kuwada C., Ariji Y., Fukuda M., Kise Y., Fujita H., Katsumata A., Ariji E. Deep learning systems for detecting and classifying the presence of impacted supernumerary teeth in the maxillary incisor region on panoramic radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol. 2020;130(4):464–469. doi: 10.1016/j.oooo.2020.04.813. [DOI] [PubMed] [Google Scholar]
- 12.Mohammad-Rahimi H., Motamedian S.R., Rohban M.H., et al. Deep learning for caries detection: a systematic review. J Dent. 2022;122 doi: 10.1016/j.jdent.2022.104115. [DOI] [PubMed] [Google Scholar]
- 13.Krois J., Ekert T., Meinhold L., et al. Deep learning for the radiographic detection of periodontal bone loss. Sci Rep. 2019;9(1):8495. doi: 10.1038/s41598-019-44839-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Samaranayake L., Tuygunov N., Schwendicke F., et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview. Part 1: fundamentals of AI, and its contemporary applications in dentistry. Int Dent J. 2025;75(2):383–396. doi: 10.1016/j.identj.2025.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Tuygunov N., Samaranayake L., Khurshid Z., et al. The transformative role of artificial intelligence in dentistry: a comprehensive overview part 2: the promise and perils, and the international dental federation communique. Int Dent J. 2025;75(2):397–404. doi: 10.1016/j.identj.2025.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Guldiken I.N., Tekin A., Kanbak T., Kahraman E.N., Özcan M. Prognostic evaluation of lower third molar eruption status from panoramic radiographs using artificial intelligence-supported machine and deep learning models. Bioengineering (Basel) 2025;12(11):1176. doi: 10.3390/bioengineering12111176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kurt Bayrakdar S., Orhan K., Bayrakdar I.S., et al. A deep learning approach for dental implant planning in cone-beam computed tomography images. BMC Med Imaging. 2021;21(1):86. doi: 10.1186/s12880-021-00618-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Esmaeilyfard R., Bonyadifard H., Paknahad M. Dental caries detection and classification in CBCT images using deep learning. Int Dent J. 2024;74(2):328–334. doi: 10.1016/j.identj.2023.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dayı B., Üzen H., Çiçek İ.B., Duman Ş.B. A novel deep learning-based approach for segmentation of different type caries lesions on panoramic radiographs. Diagnostics (Basel) 2023;13(2):202. doi: 10.3390/diagnostics13020202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Kühnisch J., Meyer O., Hesenius M., Hickel R., Gruhn V. Caries detection on intraoral images using artificial intelligence. J Dent Res. 2022;101(2):158–165. doi: 10.1177/00220345211032524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Faadiya A.N., Widyaningrum R., Arindra P.K., Diba S.F. The diagnostic performance of impacted third molars in the mandible: a review of deep learning on panoramic radiographs. Saudi Dent J. 2024;36(3):404–412. doi: 10.1016/j.sdentj.2023.11.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Jeon K.J., Choi H., Lee C., Han S.S. Automatic diagnosis of true proximity between the mandibular canal and the third molar on panoramic radiographs using deep learning. Sci Rep. 2023;13(1) doi: 10.1038/s41598-023-49512-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Jing Q., Dai X., Wang Z., et al. Fully automated deep learning model for detecting proximity of mandibular third molar root to inferior alveolar canal using panoramic radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol. 2024;137(6):671–678. doi: 10.1016/j.oooo.2024.02.011. [DOI] [PubMed] [Google Scholar]
- 24.Liu M.Q., Xu Z.N., Mao W.Y., et al. Deep learning-based evaluation of the relationship between mandibular third molar and mandibular canal on CBCT. Clin Oral Investig. 2022;26(1):981–991. doi: 10.1007/s00784-021-04082-5. [DOI] [PubMed] [Google Scholar]
- 25.Al Salieti H., Al Sliti H., Alkadi S. Diagnostic performance of deep learning models in classifying mandibular third molar and mandibular canal contact status on panoramic radiographs: a systematic review and meta-analysis. Imaging Sci Dent. 2025;55(2):139–150. doi: 10.5624/isd.20240239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Assiri H.A., Hameed M.S., Alqarni A., Dawasaz A.A., Arem S.A., Assiri K.I. Artificial intelligence application in a case of mandibular third molar impaction: a systematic review of the literature. J Clin Med. 2024;13(15):4431. doi: 10.3390/jcm13154431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kato C.N., Barra S.G., Tavares N.P., et al. Use of fractal analysis in dental images: a systematic review. Dentomaxillofac Radiol. 2020;49(2) doi: 10.1259/dmfr.20180457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Negi S., Mathur A., Tripathy S., et al. Artificial intelligence in dental caries diagnosis and detection: an umbrella review. Clin Exp Dent Res. 2024;10(4) doi: 10.1002/cre2.70004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Schwendicke F., Singh T., Lee J.H., Gaudin R., Krois J. Artificial intelligence in dental research: checklist for authors, reviewers, readers. J Dent. 2021;107(1) [Google Scholar]
- 30.Feiglin B. Root resorption. Aust Dent J. 1986;31(1):12–22. doi: 10.1111/j.1834-7819.1986.tb02978.x. [DOI] [PubMed] [Google Scholar]
- 31.Gugnani N., Pandit I.K., Srivastava N., Gupta M., Sharma M. International caries detection and assessment system (ICDAS): a new concept. Int J Clin Pediatr Dent. 2011;4(2):93–100. doi: 10.5005/jp-journals-10005-1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Marinescu I.R., Bănică A.C., Mercuţ V., et al. Root resorption diagnostic: role of digital panoramic radiography. Curr Health Sci J. 2019;45(2):156–166. doi: 10.12865/CHSJ.45.02.05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Li H.L., Li J., Wei H.B., Liu Z., Zhan Z.F., Ren Q.L. Slim-neck by GSConv: a lightweight-design for real-time detector architectures. J Real-Time Image Pr. 2024;21(3) [Google Scholar]
- 34.Yu Y., Zhang Y., Cheng Z.Y., Song Z., Tang C.K. MCA: multidimensional collaborative attention in deep convolutional neural networks for image recognition. Eng Appl Artif Intel. 2023;126 [Google Scholar]
- 35.Ma S, Xu Y. MPDIoU: a loss for efficient and accurate bounding box regression, (2023) arXiv:2307.07662. 10.48550/arXiv.2307.07662.
- 36.Danjo A., Kuwada C., Aijima R., et al. Limitations of panoramic radiographs in predicting mandibular wisdom tooth extraction and the potential of deep learning models to overcome them. Sci Rep. 2024;14(1) doi: 10.1038/s41598-024-81153-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Adali E., Ozden Yuce M., Isik G., Sener E., Mert A. Treatment decision for impacted mandibular third molars: effects of cone-beam computed tomography and level of surgeons' experience. PLoS One. 2024;19(12) doi: 10.1371/journal.pone.0314883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Celik M.E. Deep learning based detection tool for impacted mandibular third molar teeth. Diagnostics (Basel) 2022;12(4):942. doi: 10.3390/diagnostics12040942. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Yilmaz S., Tasyurek M., Amuk M., Celik M., Canger E.M. Developing deep learning methods for classification of teeth in dental panoramic radiography. Oral Surg Oral Med Oral Pathol Oral Radiol. 2024;138(1):118–127. doi: 10.1016/j.oooo.2023.02.021. [DOI] [PubMed] [Google Scholar]
- 40.Zirek T., Öziç M., Tassoker M. AI-Driven localization of all impacted teeth and prediction of winter angulation for third molars on panoramic radiographs: Clinical user interface design. Comput Biol Med. 2024;178 doi: 10.1016/j.compbiomed.2024.108755. [DOI] [PubMed] [Google Scholar]
- 41.Kou Z., Zhang W., Li C., et al. A prediction model for external root resorption of the second molars associated with third molars. Int Dent J. 2025;75(1):195–205. doi: 10.1016/j.identj.2024.09.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kaygısız Ö F., Uranbey Ö., Gürsoytrak B., Gür Z.B., Çiçek A., Canbal M.A. A deep learning approach based on YOLO v11 for automatic detection of jaw cysts. BMC Oral Health. 2025;25(1):1518. doi: 10.1186/s12903-025-06767-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Szabó V., Orhan K., Dobó-Nagy C., et al. Deep learning-based periapical lesion detection on panoramic radiographs. Diagnostics (Basel) 2025;15(4):510. doi: 10.3390/diagnostics15040510. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Do K.Q., Thai T.T., Lam V.Q., Nguyen T.T. Development and validation of artificial intelligence models for automated periodontitis staging and grading using panoramic radiographs. BMC Oral Health. 2025;25(1):1623. doi: 10.1186/s12903-025-07025-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Vinayahalingam S., Xi T., Berge S., Maal T., de Jong G. Automated detection of third molars and mandibular nerve by deep learning. Sci Rep. 2019;9(1):9007. doi: 10.1038/s41598-019-45487-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zadrożny Ł., Regulski P., Brus-Sawczuk K., et al. Artificial intelligence application in assessment of panoramic radiographs. Diagnostics (Basel) 2022;12(1) doi: 10.3390/diagnostics12010224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Soltani P., Sohrabniya F., Mohammad-Rahimi H., et al. A two-stage deep-learning model for determination of the contact of mandibular third molars with the mandibular canal on panoramic radiographs. BMC Oral Health. 2024;24(1):1373. doi: 10.1186/s12903-024-04850-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Pornprasertsuk-Damrongsri S., Vachmanus S., Papasratorn D., et al. Clinical application of deep learning for enhanced multistage caries detection in panoramic radiographs. Sci Rep. 2025;15(1) doi: 10.1038/s41598-025-16591-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Salehizeinabadi M., Neghab S., Ameli N., Baghi K.K., Pacheco-Pereira C. Automated classification of dental caries in bitewing radiographs using machine learning and the ICCMS framework. Int J Dent. 2025;2025 doi: 10.1155/ijod/6644310. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Alaqla A., Khanagar S.B., Albelaihi A.I., Singh O.G., Alfadley A. Application and performance of artificial intelligence-based models in the detection, segmentation and classification of periapical lesions: a systematic review. Front Dent Med. 2025;6 doi: 10.3389/fdmed.2025.1717343. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The complete datasets generated during this study are not publicly accessible to protect participants' privacy and confidentiality. De-identified data may be provided by the corresponding author upon formal request, subject to ethical review and compliance with institutional data-sharing policies. The source code for the analytical framework is openly available on GitHub: https://github.com/0126Jiang/SMM-YOLOv8n.






