Abstract
Purpose
Panoramic radiographs have recently become a platform for deep learning models, which show potential in enhancing diagnostic accuracy for detecting contact between mandibular third molars and the mandibular canal. However, detailed information regarding the accuracy of these models in identifying such contact remains limited.
Materials and Methods
In accordance with the PRISMA-2020 and PRISMA-DTA guidelines, the PubMed, ScienceDirect, Web of Science, Embase, and EBSCO databases were systematically searched up to September 2024. Eligible studies employed deep learning models based on convolutional neural networks to classify the contact between mandibular third molars and the mandibular canal. Extracted metrics included accuracy, sensitivity, specificity, precision, and F1-score. A meta-analysis using random effects models pooled these performance metrics, while univariate and multivariate meta-regressions were conducted to explore sources of heterogeneity. Study quality was assessed using the QUADAS-2 tool.
Results
Seven studies incorporating 4,955 panoramic radiographs reported pooled performance metrics of 83.4% accuracy, 80.2% sensitivity, 85.8% specificity, 83.3% precision, and an F1-score of 80.9%. High heterogeneity (I2 > 90%) was primarily attributable to variations in sample size, image resolution, model architecture, and model complexity. Meta-regression analyses identified image resolution and architecture (e.g., VGG-16, AlexNet) as key factors. Although the overall risk of bias was low, the patient selection domain was often unclear.
Conclusion
Deep learning models exhibit significant promise in evaluating mandibular third molar and mandibular canal contact on panoramic radiographs, potentially complementing traditional methods. The adoption of standardized protocols, diverse datasets, and explainable artificial intelligence will be crucial for broader clinical application.
Keywords: Deep Learning, Convolutional Neural Networks, Mandibular Third Molar, Panoramic Radiography, Mandibular Canal
Introduction
Mandibular third molar (MM3) extractions rank among the most common dental procedures, yet they carry the risk of damaging the inferior alveolar nerve (IAN) located within the mandibular canal (MC), potentially leading to temporary or permanent sensory impairment.1 Although the incidence of IAN injury is below 8.4%, it is most common when MM3s are closely associated with the MC.2,3,4 Thus, an accurate preoperative evaluation of the MM3–MC relationship is essential. Panoramic radiographs (PANs) are frequently used due to their accessibility, affordability, and lower radiation exposure compared to cone-beam computed tomography (CBCT).5,6
Nonetheless, PAN—a 2-dimensional imaging modality—can complicate the precise assessment of the MM3 and MC relationship.7 CBCT remains the gold standard for detailed evaluation because of its 3-dimensional imaging capabilities.8,9 However, CBCT is associated with higher costs, increased radiation exposure, and issues with accessibility.10,11 Consequently, there is a pressing need for accurate, non-invasive diagnostic tools that can efficiently work with PANs.
Deep learning (DL), a branch of machine learning and artificial intelligence (AI), is distinguished by its ability to learn from unstructured or unlabeled data through representations inspired by the human brain.12 It has demonstrated promise in image processing and classification, outperforming traditional methods in various diagnostic fields, including dentistry.13,14,15 Recent advances in DL models, particularly convolutional neural networks (CNNs), have shown potential in automatically determining the contact status between MM3s and the MC on PANs.16,17,18,19,20,21,22,23,24,25,26 These models have achieved results that match or exceed those of human clinicians, particularly when distinguishing between contact and non-contact cases, thus potentially bridging the gap between the accessibility of PAN and the accuracy of CBCT.
This systematic review and meta-analysis aimed to assess the diagnostic performance of DL models in classifying the contact status between MM3 and the MC on PANs. By synthesizing current evidence, the review compares the accuracy, precision, and clinical utility of AI models to traditional diagnostic methods, thereby emphasizing their prospective role in routine clinical practice.
Materials and Methods
The study was registered with the National Institute for Health Research International Prospective Register of Systematic Reviews (PROSPERO, registration number CRD42024586600).
This systematic review and meta-analysis adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses 2020 (PRISMA-2020) guidelines and its extension for Diagnostic Test Accuracy Studies (PRISMA-DTA).27,28 The study protocol was based on the following PICO elements:29 population: patients' PANs that displayed the MM3 and MC, intervention: DL models for classifying the contact status between the MC and MM3 on PANs, comparison: CBCT or CT images, outcome: diagnostic performance of DL in classifying the MM3–MC contact status.
Eligibility criteria
Studies were included if they met the consensus criteria established by the reviewers (HA, HA, and SA): 1) prospective or retrospective observational studies, 2) use of DL models based on CNNs to classify contact status on PANs, 3) classification of the contact between the MC and third molars as either contact or non-contact, 4) reporting of performance metrics to assess DL models, 5) use of CBCT or CT as the gold standard for confirming MC and third molar contact, and 6) publication in English. Exclusion criteria comprised 1) systematic or narrative reviews, editorials, letters, correspondence, and book chapters, 2) studies utilizing AI tools other than CNN-based DL models, 3) studies that relied solely on expert opinion on PANs for confirming contact status, and 4) non-English publications.
Information sources and search strategy
An electronic search was performed in 5 databases up to September 7, 2024: PubMed, ScienceDirect, Web of Science, Embase, and EBSCO, following a search strategy developed by 2 reviewers (HA and HA).
After incorporating the variables of the research question and identifying the criteria for article selection, a search strategy using key terms and logical operators (AND, OR) was devised for PubMed, ScienceDirect, and Web of Science. No publication date restrictions were applied. The key terms included: "machine learning," "deep learning," "artificial intelligence," "mandibular canal," "inferior alveolar nerve canal," "inferior alveolar nerve," "third molar," "wisdom tooth," "panoramic radiograph," and "orthopantomogram."
Each of these 3 databases was queried using the following search string: (((machine learning OR deep learning OR artificial intelligence) AND (mandibular canal OR inferior alveolar nerve canal OR Inferior alveolar nerve)) AND (third molar OR wisdom tooth)) AND (panoramic radiograph OR orthopantomogram).
Screening and selection process
The Rayyan QCRI online platform was used for article screening.30 After duplicate records were removed, 2 reviewers (HA, HA) independently screened titles and abstracts. Discrepancies were resolved by discussion or by consulting a third reviewer (SA). Full-text assessments were then performed on all potentially eligible papers against the inclusion and exclusion criteria.
Data collection process
Two independent reviewers (HA and HA) extracted key information from each study, including study characteristics (year, location, sample size, and contact status), model architectures, augmentation techniques, input features, and training protocols. Diagnostic performance metrics—including accuracy, sensitivity, precision, F1-score, and specificity—were also recorded, with any missing metrics computed from the available data. Discrepancies were resolved through discussion with a third reviewer (SA), and limitations regarding model performance and generalizability were documented and summarized.
Risk of bias and applicability
Following the Cochrane Collaboration recommendations, the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool was used to evaluate the risk of bias and applicability of all eligible articles.31 Two reviewers (HA and HA) conducted the assessment, resolving any disagreements through discussion. The risk of bias evaluation included 4 domains: patient selection, index test, reference standard, and flow and timing, with the first 3 also informing the applicability assessment.32
Meta-analysis and meta-regression
A meta-analysis was undertaken to evaluate the diagnostic performance of DL models in classifying the contact status between the MC and MM3 on PANs. The dataset was meticulously cleaned and prepared, focusing on key performance metrics such as accuracy, sensitivity, precision, F1 score, and specificity. Heterogeneity among studies was quantified using the I2 statistic, which indicated significant variability. To account for this, the DerSimonian-Laird random effects model was applied to calculate pooled effect sizes, and confidence intervals were computed to delineate the expected range of the true effect sizes. Forest plots were generated to visually represent individual study results alongside the pooled estimates. Python (version 3.9.13) was used for these meta-analytical calculations.
Univariate and multivariate meta-regression analyses were then conducted to examine the influence of various factors—such as sample size, image resolution, model architecture, and model complexity (measured by the number of layers)—on diagnostic metrics including accuracy, precision, recall (sensitivity), F1-score, and specificity. For categorical variables like model architecture, dummy coding was employed. Coefficients, P-values, and R2 values were extracted to interpret these relationships quantitatively. The multivariate meta-regression evaluated the combined effects of these factors, analyzing predictor combinations (e.g., sample size and image resolution, or model complexity and architecture) to determine their joint impact. A random-effects model accounted for study heterogeneity, and variability due to overfitting and multicollinearity was assessed using R2 values. Sensitivity analyses further evaluated the robustness of the models.
Results
Study selection
The search identified 320 records: 264 from ScienceDirect, 19 from PubMed, 18 from Web of Science, 4 from Embase, and 15 from EBSCO. Two reviewers (HA and HA) independently screened the records. Following duplicate removal, 280 records remained. After screening titles and abstracts, 268 records were excluded, leaving 12 for full-text review. Five studies were subsequently excluded, resulting in 7 studies that met the eligibility criteria and were included for data extraction and quality assessment (Fig. 1).
Fig. 1. PRISMA flow diagram of the study selection process.
Risk of bias and applicability
The QUADAS-2 tool was used to assess risk of bias and applicability for all included articles. As shown in Table 1 and illustrated in Figures 2 and 3, more than half of the studies did not clearly indicate whether patients were consecutively or randomly enrolled, resulting in 71.1% (5/7) of the articles exhibiting an unclear risk of bias in the patient selection domain.18,19,20,22 Nonetheless, all studies were judged to have a low risk of bias in the index test, reference standard, and flow and timing domains. The applicability concerns mirrored this pattern, with all studies receiving low-risk ratings in patient selection, index test, and reference standard categories.
Table 1. QUADAS-2 assessment of risk of bias and applicability concerns in the reviewed studies.
Fig. 2. Proportional analysis of risk of bias across various study domains.
Fig. 3. Proportional analysis of applicability concerns across study domains.
Study characteristics
This review includes studies conducted between 2021 and 2024 that evaluated the diagnostic performance of DL models in detecting contact between MM3 and the MC. The 7 studies originated from various Asian countries, including China, Japan, Thailand, South Korea, and Iran. Sample sizes varied, with Jing et al.17 reporting the largest cohort, comprising 798 contact cases and 1745 non-contact cases. All studies focused on age groups from 18 to 76 years. Detailed study characteristics are provided in Table 2.
Table 2. Characteristics of the included studies.
MM3: mandibular third molar
Deep learning models in dental radiography
A range of DL models and data augmentation strategies were employed to enhance the detection of contacts between MM3 and the inferior alveolar canal (IAC). Zhu et al.19 utilized the YOLOv4 model, applying horizontal and vertical flipping and mosaicking as augmentation techniques to train on PAN data. In contrast, Sukegawa et al.20 and Yari et al.22 implemented rotation and translation adjustments using the ResNet50 architecture, focusing on detailed features from cropped radiographic images. Takebe et al.16 and Jing et al.17 also used YOLO models, but with different versions and input feature modifications, to adapt to the resolution and quality of panoramic images. Papasratorn et al.18 integrated multiple models, including AlexNet and GoogleNet, and applied complex augmentations like warping and brightness adjustment to train the models on cropped images emphasizing the MM3 and IAC. Jeon et al.33 selected EfficientDet-D4 and BetinaNet, focusing on improving image quality through resizing and manual labeling to identify critical dental structures more accurately. The specific augmentation techniques, model architectures, and input features utilized in these studies are detailed in Table 3.
Table 3. Overview of deep learning models, augmentation techniques, and input features.
PAN: panoramic radiography, MM3: mandibular third molar, IAN: inferior alveolar nerve, IAC: inferior nerve canal, MC: mandibular canal
Data handling, model training, and performance evaluation
The reviewed studies employed various data splits for training, validation, and testing to optimize DL detection of MM3 contact with the MC. Zhu et al.19 and Jing et al.17 primarily used an 80/10/10 split for their datasets, focusing on extensive training to refine model performance. In contrast, Jeon et al.33 utilized a 60/20/20 split, allowing more substantial validation and testing phases. Optimization algorithms varied, with Sukegawa et al.20 implementing stochastic gradient descent and sharpness-aware minimization (SAM) for enhanced generalization, running for 300 epochs with a batch size of 32. Similarly, Papasratorn et al.18 and Yari et al.22 utilized the Adam optimizer with extended training periods of 1000 and 50 epochs, respectively. The range of training epochs and batch sizes varied widely, from a minimum of 2 in the study by Jeon et al. to a maximum of 1200 in the research conducted by Jing et al. The specifics of each study's data split, optimization algorithms, and training details are further outlined in Table 4.
Table 4. Details of the training protocols and optimization algorithms.
Zhu et al.,19 utilizing YOLOv4, focused on sensitivity metrics, highlighting the model's ability to identify true positives correctly. Sukegawa et al.20 achieved commendable F1 scores with ResNet50 and a variant thereof, indicating an effective balance between precision and recall. Takebe et al.,16 using YOLOv3, outperformed traditional diagnostic methods, significantly advancing automated dental imaging assessments.
Models such as VGG-16 used by Papasratorn et al.18 demonstrated superior performance metrics. Conversely, Jeon et al.33 encountered challenges in optimizing sensitivity with RetinaNet and EfficientDet-D4, which could impact their utility in real-world clinical settings. Furthermore, Yari et al.22 and Jing et al.17 with ResNet-50 and YOLOv5 revealed high specificity, reinforcing their potential to minimize false positives in clinical diagnostics. The specific performance metrics for these models are presented in Table 5.
Table 5. Evaluation metrics and model comparisons.
Meta-analysis and meta-regression
This meta-analysis evaluated the diagnostic performance of DL models in identifying the contact status between the MC and MM3 on PANs. Due to the significant heterogeneity among studies, as indicated by an I2 statistic exceeding 90%, a random effects model was applied to integrate variability.
The study by Zhu et al.19 was excluded from the meta-analysis because it did not report accuracy metrics, and these metrics could not be calculated from the available data. For the remaining studies, the DerSimonian-Laird random effects model was used to pool effect sizes for key performance metrics, including accuracy, sensitivity (recall), precision, F1-score, and specificity, as outlined in Table 6.
Table 6. Pooled diagnostic performance metrics of deep learning models from the meta-analysis.
The letters a, b, and c denote multiple models in the same study. CI: confidence interval
The pooled results indicated an accuracy of 83.4% (95% CI: 0.630-1.038), sensitivity of 80.2% (95% CI: 0.478-1.126), precision of 83.3% (95% CI: 0.644-1.022), an F1-score of 80.9% (95% CI: 0.544-1.074), and specificity of 85.8% (95% CI: 0.653-1.063).
Forest plots illustrating individual study effects, confidence intervals, and pooled estimates for accuracy, recall (sensitivity), precision, specificity, and F1-score are provided in Figures 4A to 4E.
Fig. 4. Forest plots illustrating diagnostic performance metrics for deep learning models. A. Accuracy. B. Sensitivity. C. Precision. D. Specificity. E. F1-score.
Univariate meta-regression analysis demonstrated that sample size had a positive but statistically insignificant association with all performance metrics (P>0.05), with R2 values ranging from 0.01 to 0.15. Image resolution exhibited a significant negative effect on accuracy (P=0.031), precision (P=0.024), and F1-score (P=0.019). Model architecture revealed considerable variability; architectures such as VGG-16 (P=0.012), AlexNet (P=0.015), and GoogLeNet (P=0.018) consistently enhanced performance metrics, whereas RetinaNet (P=0.042) and YOLOv3 (P=0.048) adversely affected certain metrics.
The multivariate meta-regression identified image resolution (P=0.027) and model architecture (P=0.011) as primary determinants of performance metrics (R2=1.0000). Predictor combinations, including sample size with image resolution, accounted for a substantial portion of the variance in accuracy (R2=77.8%, P=0.033) and precision (R2=76.9%, P=0.029). Sensitivity analyses confirmed the stability of the predictor coefficients and R2 values across iterations. Detailed results of the meta-regression analysis are available in the supplementary information.
Discussion
The findings of this systematic review and meta-analysis reveal promising diagnostic performance metrics for DL models in classifying the contact status between MM3 and the MC on PANs. Pooling data from 7 studies involving 4,955 PANs yielded high accuracy, sensitivity, specificity, precision, and F1 scores—all exceeding 80%.16,17,18,19,20,22,33 These results underscore the potential of DL models as a complementary diagnostic tool, especially in clinical settings where cone-beam computed tomography is either unavailable or impractical. However, substantial heterogeneity across studies introduces several challenges that must be addressed to improve the reliability and applicability of these models in diverse clinical environments.
Although the pooled metrics indicate strong diagnostic performance, the wide variability in confidence intervals (for example, sensitivity ranging from 47.8% to 112.6%) raises concerns regarding clinical reliability. This variability is likely attributable to differences in dataset characteristics, model architectures, and training methodologies. While high pooled metrics such as 80.2% sensitivity and 85.8% specificity are encouraging, the observed variability underscores the need for standardized imaging protocols and rigorous validation across diverse populations. Clinicians should consider these models as adjunctive tools rather than complete replacements for CBCT, particularly in cases demanding high diagnostic certainty. High sensitivity (80.2%) minimizes missed cases, which is crucial for patient safety, whereas high specificity (85.8%) reduces false positives, thereby avoiding unnecessary interventions. The trade-off between these metrics highlights the importance of tailoring model application to specific clinical contexts—models with high sensitivity may be more suitable when case detection is prioritized, whereas high specificity is advantageous when diagnostic precision is paramount.
The marked heterogeneity, as evidenced by an I2 statistic exceeding 90%, reflects differences in study design, DL model architectures, data preprocessing techniques, and dataset characteristics. Variations in image resolution and augmentation methods significantly impacted diagnostic outcomes. For instance, models such as ResNet-50 and YOLOv5 achieved high specificity but struggled with sensitivity, indicating that the choice of model should be aligned with specific clinical needs.17,22 Furthermore, advanced architectures like VGG-16 and AlexNet consistently improved performance metrics, whereas models like RetinaNet and YOLOv3 exhibited more variability, emphasizing the importance of selecting an appropriate architecture for each clinical task.18,33 These findings suggest that optimizing model architecture and preprocessing strategies can mitigate variability and improve diagnostic consistency. Moreover, the scope of model comparisons varied widely across studies. Certain studies evaluated a single DL model architecture, such as YOLOv316 or ResNet-50,22 without benchmarking against other architectures. In contrast, others compared multiple models, including AlexNet, VGG-16, and GoogleNet,18 introducing variability in the reported diagnostic metrics. Studies that compared a broader range of models often reported greater performance variability due to inherent differences in model strengths and weaknesses. For example, while VGG-16 demonstrated superior specificity, YOLOv4 excelled in sensitivity, further complicating the interpretation of pooled performance metrics.
Meta-regression analysis identified key factors contributing to heterogeneity, including image resolution, sample size, and model complexity. Although larger sample sizes generally enhanced model performance, this effect was not statistically significant, highlighting the need for more extensive datasets.19,20 Image resolution negatively impacted several performance metrics, possibly due to issues like overfitting or computational inefficiencies. These insights underscore the necessity of standardizing imaging protocols and incorporating diverse, high-quality datasets to achieve robust model performance across varied clinical scenarios.
Most of the included studies were conducted in Asian countries, which may limit the generalizability of the findings to other populations. Variations in imaging equipment, patient demographics, and dental healthcare systems could affect the applicability of these models in Western or African contexts. Although the predominance of studies from Asia limits generalizability, future research can address this bias by expanding data sources to include panoramic radiographs from Western, African, and Middle Eastern populations. Additionally, advanced data augmentation techniques—such as adaptive histogram equalization and synthetic image generation—can simulate a wider range of imaging conditions and anatomical variations. Transfer learning, wherein models pre-trained on Asian datasets are fine-tuned using smaller, region-specific datasets, may also enhance adaptability to different populations. Finally, multinational collaborations could facilitate data sharing and validation of deep learning models across varied clinical settings, ensuring broader applicability in dental diagnostics.17,19
While DL models offer significant promise as complements to traditional diagnostic methods, challenges remain regarding their integration into routine clinical practice. Standardized training protocols, robust validation on diverse datasets, and the development of explainable AI tools are needed to enhance transparency and foster clinician trust. For instance, techniques such as Grad-CAM and LIME can provide visual explanations for model predictions, thereby promoting acceptance among practitioners.16 Additionally, achieving an optimal balance between sensitivity and specificity is critical to minimize the risks of both overdiagnosis and underdiagnosis.
Despite the encouraging performance of deep learning models on panoramic radiographs, current limitations suggest they are best used as adjunctive tools rather than outright replacements for CBCT. Although these models can deliver rapid, automated assessments of MM3 and MC relationships, they should be integrated with traditional diagnostic methods, particularly in complex cases that require precise 3-dimensional evaluation.9,11 Future advancements—such as the incorporation of 3-dimensional imaging data and refined data augmentation techniques—could further enhance the clinical utility of these tools.
In conclusion, DL models exhibit promising diagnostic performance in assessing MM3 and MC contact on panoramic radiographs, with pooled metrics exceeding 80% for accuracy, sensitivity, precision, F1-score, and specificity. These models hold considerable potential to complement traditional diagnostic approaches, particularly in contexts where CBCT is unavailable or impractical. However, our meta-regression analysis indicates that factors such as image resolution and model architecture significantly affect DL model performance, emphasizing the need for standardized imaging protocols and model optimization. Despite challenges including performance variability, limited generalizability, and the need for standardization in model development and evaluation, DL models offer transformative possibilities for dental diagnostics.
Nonetheless, these models should be considered an adjunct to, rather than a replacement for, CBCT—especially in complex cases that require precise 3-dimensional assessments. Future work should focus on refining model architectures, expanding data diversity, and establishing standardized validation protocols to improve the reliability and clinical integration of DL-based diagnostics, ultimately optimizing decision-making, minimizing inferior alveolar nerve injury risks during MM3 extraction, and reducing unnecessary radiation exposure from CBCT when appropriate.
Footnotes
Conflicts of Interest: None
References
- 1.Carter K, Worthington S. Predictors of third molar impaction: a systematic review and meta-analysis. J Dent Res. 2016;95:267–276. doi: 10.1177/0022034515615857. [DOI] [PubMed] [Google Scholar]
- 2.Gomes AC, Vasconcelos BC, Silva ED, Caldas Ade F, Jr, Pita Neto IC. Sensitivity and specificity of pantomography to predict inferior alveolar nerve damage during extraction of impacted lower third molars. J Oral Maxillofac Surg. 2008;66:256–259. doi: 10.1016/j.joms.2007.08.020. [DOI] [PubMed] [Google Scholar]
- 3.Gu L, Zhu C, Chen K, Liu X, Tang Z. Anatomic study of the position of the mandibular canal and corresponding mandibular third molar on cone-beam computed tomography images. Surg Radiol Anat. 2018;40:609–614. doi: 10.1007/s00276-017-1928-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Korkmaz YT, Kayıpmaz S, Senel FC, Atasoy KT, Gumrukcu Z. Does additional cone beam computed tomography decrease the risk of inferior alveolar nerve injury in high-risk cases undergoing third molar surgery? Does CBCT decrease the risk of IAN injury? Int J Oral Maxillofac Surg. 2017;46:628–635. doi: 10.1016/j.ijom.2017.01.001. [DOI] [PubMed] [Google Scholar]
- 5.Różyło-Kalinowska I. Panoramic radiography in dentistry. Clin Dent Rev. 2021;5:26 [Google Scholar]
- 6.Ghaeminia H, Meijer GJ, Soehardi A, Borstlap WA, Mulder J, Bergé SJ. Position of the impacted third molar in relation to the mandibular canal. Diagnostic accuracy of cone beam computed tomography compared with panoramic radiography. Int J Oral Maxillofac Surg. 2009;38:964–971. doi: 10.1016/j.ijom.2009.06.007. [DOI] [PubMed] [Google Scholar]
- 7.Del Lhano NC, Ribeiro RA, Martins CC, Assis NMSP, Devito KL. Panoramic versus CBCT used to reduce inferior alveolar nerve paresthesia after third molar extractions: a systematic review and meta-analysis. Dentomaxillofac Radiol. 2020;49:20190265. doi: 10.1259/dmfr.20190265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Elkhateeb SM, Awad SS. Accuracy of panoramic radiographic predictor signs in the assessment of proximity of impacted third molars with the mandibular canal. J Taibah Univ Med Sci. 2018;13:254–261. doi: 10.1016/j.jtumed.2018.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Baqain ZH, AlHadidi A, AbuKaraky A, Khader Y. Does the use of cone-beam computed tomography before mandibular third molar surgery impact treatment planning? J Oral Maxillofac Surg. 2020;78:1071–1077. doi: 10.1016/j.joms.2020.03.002. [DOI] [PubMed] [Google Scholar]
- 10.Jain S, Choudhary K, Nagi R, Shukla S, Kaur N, Grover D. New evolution of cone-beam computed tomography in dentistry: combining digital technologies. Imaging Sci Dent. 2019;49:179–190. doi: 10.5624/isd.2019.49.3.179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Matzen LH, Berkhout E. Cone beam CT imaging of the mandibular third molar: a position paper prepared by the European Academy of DentoMaxilloFacial Radiology (EADMFR) Dentomaxillofac Radiol. 2019;48:20190039. doi: 10.1259/dmfr.20190039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Halbouni A, Gunawan TS, Habaebi MH, Halbouni M, Kartiwi M, Ahmad R. Machine learning and deep learning approaches for cybersecurity: a review. IEEE Access. 2022;10:19572–19585. [Google Scholar]
- 13.Schwendicke F, Samek W, Krois J. Artificial intelligence in dentistry: chances and challenges. J Dent Res. 2020;99:769–774. doi: 10.1177/0022034520915714. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Corbella S, Srinivas S, Cabitza F. Applications of deep learning in dentistry. Oral Surg Oral Med Oral Pathol Oral Radiol. 2021;132:225–238. doi: 10.1016/j.oooo.2020.11.003. [DOI] [PubMed] [Google Scholar]
- 15.Heo MS, Kim JE, Hwang JJ, Han SS, Kim JS, Yi WJ, et al. Artificial intelligence in oral and maxillofacial radiology: what is currently possible? Dentomaxillofac Radiol. 2021;50:20200375. doi: 10.1259/dmfr.20200375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Takebe K, Imai T, Kubota S, Nishimoto A, Amekawa S, Uzawa N. Deep learning model for the automated evaluation of contact between the lower third molar and inferior alveolar nerve on panoramic radiography. J Dent Sci. 2023;18:991–996. doi: 10.1016/j.jds.2022.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jing Q, Dai X, Wang Z, Zhou Y, Shi Y, Yang S, et al. Fully automated deep learning model for detecting proximity of mandibular third molar root to inferior alveolar canal using panoramic radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol. 2024;137:671–678. doi: 10.1016/j.oooo.2024.02.011. [DOI] [PubMed] [Google Scholar]
- 18.Papasratorn D, Pornprasertsuk-Damrongsri S, Yuma S, Weerawanich W. Investigation of the best effective fold of data augmentation for training deep learning models for recognition of contiguity between mandibular third molar and inferior alveolar canal on panoramic radiographs. Clin Oral Investig. 2023;27:3759–3769. doi: 10.1007/s00784-023-04992-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Zhu T, Chen D, Wu F, Zhu F, Zhu H. Artificial intelligence model to detect real contact relationship between mandibular third molars and inferior alveolar nerve based on panoramic radiographs. Diagnostics (Basel) 2021;11:1664. doi: 10.3390/diagnostics11091664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sukegawa S, Tanaka F, Hara T, Yoshii K, Yamashita K, Nakano K, et al. Deep learning model for analyzing the relationship between mandibular third molar and inferior alveolar nerve in panoramic radiography. Sci Rep. 2022;12:16925. doi: 10.1038/s41598-022-21408-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Fang X, Zhang S, Wei Z, Wang K, Yang G, Li C, et al. Automatic detection of the third molar and mandibular canal on panoramic radiographs based on deep learning. J Stomatol Oral Maxillofac Surg. 2024;125:101946. doi: 10.1016/j.jormas.2024.101946. [DOI] [PubMed] [Google Scholar]
- 22.Yari A, Fasih P, Nouralishahi A, Mohammadikhah M, Nikeghbal D. Performance of artificial intelligence versus clinicians on the detection of contact between mandibular third molar and inferior alveolar nerve. Oral Sci Int. 2025;22:e1256 [Google Scholar]
- 23.Fukuda M, Ariji Y, Kise Y, Nozawa M, Kuwada C, Funakoshi T, et al. Comparison of 3 deep learning neural networks for classifying the relationship between the mandibular third molar and the mandibular canal on panoramic radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol. 2020;130:336–343. doi: 10.1016/j.oooo.2020.04.005. [DOI] [PubMed] [Google Scholar]
- 24.Fukuda M, Kise Y, Nitoh M, Ariji Y, Fujita H, Katsumata A, et al. Deep learning system to predict the three-dimensional contact status between the mandibular third molar and mandibular canal using panoramic radiographs. Oral Sci Int. 2024;21:46–53. [Google Scholar]
- 25.Kempers S, van Lierop P, Hsu TH, Moin DA, Bergé S, Ghaeminia H, et al. Positional assessment of lower third molar and mandibular canal using explainable artificial intelligence. J Dent. 2023;133:104519. doi: 10.1016/j.jdent.2023.104519. [DOI] [PubMed] [Google Scholar]
- 26.Lo Casto A, Spartivento G, Benfante V, Di Raimondo R, Ali M, Di Raimondo D, et al. Artificial intelligence for classifying the relationship between impacted third molar and mandibular canal on panoramic radiographs. Life (Basel) 2023;13:1441. doi: 10.3390/life13071441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi: 10.1136/bmj.n71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McInnes MD, Moher D, Thombs BD, McGrath TA, Bossuyt PM, et al. the PRISMA-DTA Group. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319:388–396. doi: 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]
- 29.Huang X, Lin J, Demner-Fushman D. Evaluation of PICO as a knowledge representation for clinical questions. AMIA Annu Symp Proc. 2006;2006:359–363. [PMC free article] [PubMed] [Google Scholar]
- 30.Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. 2016;5:210. doi: 10.1186/s13643-016-0384-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Deeks JJ, Bossuyt P, Leeflang MM, Takwoingi Y. Cochrane handbook for systematic reviews of diagnostic test accuracy. Chichester: John Wiley & Sons; 2023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155:529–536. doi: 10.7326/0003-4819-155-8-201110180-00009. [DOI] [PubMed] [Google Scholar]
- 33.Jeon KJ, Choi H, Lee C, Han SS. Automatic diagnosis of true proximity between the mandibular canal and the third molar on panoramic radiographs using deep learning. Sci Rep. 2023;13:22022. doi: 10.1038/s41598-023-49512-4. [DOI] [PMC free article] [PubMed] [Google Scholar]










