Abstract
Objectives
Artificial intelligence (AI) has shown increasing potential in dental diagnostics, yet its accuracy for binary classification of dental caries across different imaging modalities remains unclear. This study aimed to systematically evaluate the diagnostic performance of AI models using clinical intraoral images and dental radiographs.
Methods
Following the PRISMA-DTA guidelines, PubMed, Embase, Scopus, Web of Science, and IEEE Xplore were systematically searched for studies published between January 2015 and June 2025. Eligible studies applied AI models for caries diagnosis with extractable sensitivity and specificity. Data on dentition, dataset, analysis unit, caries prevalence in test dataset, and preprocessing methods were extracted. Reporting quality and risk of bias were assessed using CLAIM and QUADAS-2. Pooled estimates were calculated with a bivariate random-effects model, with subgroup analyses by image type and analytical unit.
Results
25 studies met the criteria, and 13 were included in the meta-analysis. Pooled sensitivity, specificity, and the area under the curve (AUC) were 0.86, 0.91 and 0.94, respectively. Intraoral image–based models achieved higher sensitivity (0.88) and AUC (0.95), while radiograph-based models showed higher specificity (0.92). Tooth-level analyses yielded stable, clinically relevant performance (0.87/0.91). High heterogeneity (I² > 90%) was partly explained by image type, model architecture, reference standard variation, and test set caries prevalence.
Conclusion
AI models showed good diagnostic accuracy for caries detection across imaging modalities and analytical units. However, given the substantial heterogeneity and limitations in study quality and reference standards, these summary estimates should be interpreted with caution. AI-based systems may serve as complementary decision-support tools in clinical practice, but further standardization, external validation, and high-quality multicenter studies are required before broad clinical implementation.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12903-026-07770-4.
Keywords: Artificial intelligence, Deep learning, Dental caries, Diagnostic accuracy, Systematic review, Meta-analysis
Introduction
Dental caries is one of the most prevalent chronic diseases worldwide and affects individuals across all age groups [1]. Advanced carious lesions may progress to pulpitis and apical periodontitis, ultimately leading to tooth loss and adverse systemic consequences. In addition to its clinical impact, dental caries imposes a substantial economic burden on both individuals and healthcare systems. According to the World Health Organization, the direct and indirect costs associated with the treatment of dental caries amount to billions of U.S. dollars annually, representing a major public health challenge [2–4]. Early detection and accurate diagnosis are therefore critical for preventing disease progression and enabling timely, minimally invasive intervention. However, conventional diagnostic approaches—such as visual inspection, tactile probing, and radiographic interpretation—are inherently subjective and highly dependent on clinician experience, particularly for early-stage and proximal lesions. These limitations are further amplified in regions with limited access to trained dental professionals, underscoring the need for scalable, objective, and reliable diagnostic tools [5].
In recent years, artificial intelligence (AI), particularly deep learning–based approaches, has emerged as a powerful tool for image-based medical diagnosis. In dentistry, AI models have been applied to a wide range of dental imaging modalities, including intraoral photographs and various forms of radiography, to support caries detection [6–20]. Growing evidence suggests that AI systems can extract clinically relevant features from these images and achieve diagnostic performance comparable to, and in some settings exceeding, that of human examiners, while also offering advantages in speed and consistency [8, 21–23]. These developments have generated considerable interest in the potential role of AI as a decision-support tool for dental caries diagnosis.
Despite this rapidly expanding body of literature, substantial methodological heterogeneity exists across published studies. Variations in dataset composition, image type, diagnostic unit (e.g., tooth-level, surface-level, or image-level), reference standards, and model architecture complicate the interpretation and comparability of reported results [24]. Moreover, AI-based caries studies have encompassed a wide range of analytical tasks, including lesion segmentation, detection, and diagnostic classification, further contributing to inconsistencies in outcome reporting and performance metrics.
Two recent systematic reviews have evaluated the diagnostic accuracy of AI for dental caries detection using bitewing radiographs, with a particular focus on approximal or proximal lesions [25, 26]. While these studies provided important evidence for radiograph-based AI, their scope was restricted to a single imaging modality and did not examine how diagnostic performance varies across different image types, analytical units, or reporting quality. In addition, these reviews did not focus exclusively on binary diagnostic classification. In contrast, the present review was designed to synthesize evidence across multiple imaging modalities—including intraoral photographs, near-infrared and fluorescence-based imaging, as well as panoramic and periapical radiographs—while focusing specifically on binary caries diagnosis and integrating PRISMA-DTA, CLAIM, and QUADAS-2 to jointly evaluate diagnostic performance, reporting quality, and risk of bias.
Therefore, the present systematic review and meta-analysis was designed to address these gaps by quantitatively evaluating the diagnostic accuracy of AI models for binary classification of dental caries (presence versus absence) using clinical imaging data. By integrating diagnostic performance with rigorous assessment of reporting quality, risk of bias, and heterogeneity, this study aims to provide a robust evidence base for the clinical translation of AI in caries diagnosis and to identify priorities for future research.
Methods
This systematic review was designed and reported in accordance with the PRISMA - Diagnostic Test Accuracy (PRISMA-DTA) Checklist to enhance transparency and reproducibility in diagnostic accuracy reviews [27]. The study protocol was prospectively registered at PROSPERO (CRD420251108203).
Eligibility criteria
Inclusion criteria
Studies were included if they met all of the following conditions: (i) employed AI models to perform binary diagnostic classification of dental caries using either dental radiographs or clinical intraoral images; (ii) used alternative classification schemes but reported data that could be converted into a binary diagnostic outcome; (iii) reported at least one standard diagnostic accuracy metric, such as sensitivity, specificity, accuracy, or the area under the receiver operating characteristic curve (AUC); (iv) were published between January 2015 and June 2025.
Exclusion criteria
Studies were excluded if they: (i) focused exclusively on image segmentation or lesion localization without performing binary diagnostic classification; (ii) did not report diagnostic performance metrics or lacked essential methodological details; (iii) were non-original research articles, such as reviews, conference abstracts, case reports, or commentaries.
Information sources and search
A systematic search was conducted in the following electronic databases: PubMed/MEDLINE, Embase, Web of Science, Scopus, Google Scholar, and IEEE Xplore. To ensure reproducibility and avoid uncontrolled retrieval, the Google Scholar search was restricted to the first 200 results sorted by relevance, and the same predefined search terms and inclusion criteria were applied. The search covered literature published between January 2015 and June 2025. The lower time limit was selected because modern deep learning–based methods for medical image analysis became established after 2015, whereas earlier studies primarily relied on traditional machine-learning techniques that are not directly comparable to contemporary AI models. Search strategies combined keywords and Medical Subject Headings (MeSH) related to dental caries and artificial intelligence, as detailed in Table 1. This strategy is consistent with recent systematic reviews in this field and was designed to capture the full spectrum of AI-based caries detection studies while minimizing irrelevant retrieval.
Table 1.
Search strategy
| Search | Topic and terms |
|---|---|
| #1 | Artificial Intelligence:“artificial intelligence” OR “deep learning” OR “machine learning” OR “neural network” OR “computer vision” |
| #2 | Dentistry:“dental caries” OR “tooth decay” OR “carious lesion” |
| #3 | Search #1 and #2 |
Study selection
All retrieved records were initially screened independently by two reviewers based on titles and abstracts, with duplicate entries removed. Studies meeting the inclusion criteria proceeded to full-text evaluation. Any disagreements during the selection process were resolved through discussion, and a third independent reviewer resolved any disagreements.
Data collection and extraction
Two reviewers independently extracted key variables using a predesigned, standardized data collection form. The extracted information included: author and year of publication, imaging modality, dentition, dataset size and split (training/validation/test), unit of analysis, caries prevalence in the test dataset, reference standard, inter-observer agreement when reported (e.g., kappa statistics), image preprocessing and augmentation methods, AI platform or model architecture, and diagnostic performance metrics.
Reporting standards assessment
To systematically evaluate reporting quality and methodological transparency, the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) [28] was applied to all eligible studies. CLAIM is a standardized framework for assessing key aspects of AI-based medical imaging research, including model development, data handling, result reporting, and reproducibility. A predefined subset of eight CLAIM items most relevant to diagnostic accuracy studies was selected, and each item was evaluated according to the original definitions proposed by Mongan et al. These domains included: (i) model type and construction; (ii) data source and image acquisition; (iii) image preprocessing and augmentation; (iv) dataset partitioning; (v) performance metrics; (vi) model interpretability; (vii) external validation and generalizability; and (viii) code availability or reproducibility. Each domain was rated based on whether the corresponding information was clearly and explicitly reported, and overall reporting quality was categorized as high, moderate, or low.
Risk of bias (RoB) assessment
The RoB for all included diagnostic studies was independently assessed by two reviewers using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool [29]. Any disagreements were resolved through discussion, and adjudication was performed by a third reviewer when necessary. QUADAS-2 evaluated study quality across four key domains: patient selection, index test, reference standard, flow and timing. In addition, applicability concerns in this review were assessed based on the first three domains. Studies were marked as low RoB, high RoB, or unclear RoB.
Quantitative synthesis
Quantitative synthesis was performed using the MIDAS v3.0 module and the METANDI command in STATA version 17 (StataCorp, College Station, TX, USA) [30]. Only studies with complete or derivable 2 × 2 contingency tables were included in the meta-analysis. A bivariate random-effects model was applied to estimate the pooled sensitivity and specificity of AI models in the diagnosis of dental caries and to generate a summary receiver operating characteristic (SROC) curve. Between-study heterogeneity was assessed using the I² statistic and by visual inspection of forest plots, while publication bias was evaluated using Deeks’ funnel plot asymmetry test.
Certainty of evidence
The certainty of evidence for pooled diagnostic estimates was assessed using the GRADE approach, considering risk of bias, inconsistency, indirectness, imprecision, and publication bias.
Results
Search and selection
A total of 1,538 records were initially retrieved. After removing duplicates, 1,243 records remained. Following title and abstract screening, 113 articles were assessed as potentially eligible. After full-text evaluation, 25 studies met the inclusion criteria for the qualitative synthesis, of which 13 were eligible for inclusion in the meta-analysis. The detailed selection process is illustrated in the PRISMA flow diagram (Fig. 1).
Fig. 1.
PRISMA flow diagram illustrating the search strategy
Basic characteristics of included studies
All 25 included studies [11, 12, 18, 19, 31–51] applied artificial intelligence models for dental caries detection based on dental imaging data. Their detailed characteristics are summarized in Supplementary Table S1, and a high-level overview is provided in Table 2. The included articles were published between 2018 and 2024, with the majority appearing after 2021 and a peak in 2024 (n = 11). Although the literature search extended to June 2025, no studies published in 2025 met the predefined inclusion criteria.
Table 2.
Overview of characteristics of included studies (n = 25)
| Category | No. Of studies (%) |
|---|---|
| Imaging modality | |
| BWR | 3 (12%) |
| PA | 1 (4%) |
| PAN | 7 (28%) |
| IOP/QLF/NILT | 11 (44%) |
| Mixed modalities | 3 (12%) |
| Unit of analysis | |
| Tooth-level | 17 (68%) |
| Image-level | 5 (20%) |
| Surface-level | 1 (4%) |
| NR | 3 (12%) |
| Dentition | |
| PER | 11 (44%) |
| PRI | 2 (8%) |
| Mixed | 3 (12%) |
| NR | 9 (36%) |
| Reference standard | |
| ≥ 2 experts | 13 (52%) |
| Single reader | 7 (28%) |
| Diagnostic report | 1 (4%) |
| Atificial lesions (no clinical reference) | 1 (4%) |
| NR | 3 (12%) |
| Inter-observer agreement reported | |
| κ reported | 4 (16%) |
| NR | 21 (84%) |
BWR Bitewing radiographs, PA Periapical radiographs, PAN Panoramic radiographs, IOP Intraoral photographs, QLF Quantitative light-induced fluorescence, NILT Near-infrared light transillumination, PER Permanent dentition, PRI Primary dentition, NR Not reported, κ kappa statistic for inter-observer agreement
Reference standard categories include ≥2 experts (two or more human examiners, with or without formal consensus), single reader, clinical records/diagnostic report (medical or diagnostic records as ground truth), and artificial lesions (caries artificially generated rather than derived from clinical patient data). One study reported both tooth-level and image-level outcomes and was counted in both categories
Across the included studies, a wide range of imaging modalities, analytical units, and modeling approaches was observed (Supplementary Table S1), reflecting substantial methodological diversity in this field. Most studies relied on clinical radiographs or intraoral images, whereas only a small number used ex vivo specimens or artificially generated lesions. Caries prevalence in the test sets varied markedly, spanning from low-prevalence screening-type datasets to high-prevalence diagnostic cohorts.
The AI models evaluated encompassed conventional convolutional neural networks, transformer-based architectures, and self-supervised learning frameworks. Most studies assessed diagnostic performance at the tooth level, while fewer adopted image-level or surface-level units of analysis. Image preprocessing and data augmentation were commonly applied to enhance model robustness. Reference standards were typically based on expert annotation, although reporting of inter-observer agreement was limited, with kappa statistics available in only a small subset of studies.
CLAIM checklist
In the included AI-based diagnostic studies, methodological quality was evaluated using the eight core items of the CLAIM checklist. Most studies provided reasonably comprehensive descriptions of the model type, image source, image preprocessing procedures, and reported performance metrics. Nevertheless, several recurring limitations were observed. First, external validation was lacking in 70% of the studies, as no independent external test set was employed, thereby limiting the assessment of model generalizability. Second, model interpretability was insufficient in 62% of the studies, which failed to provide visual explanations (e.g., Grad-CAM) or to elucidate the underlying decision-making process. Third, code availability and reproducibility were restricted—only one study provided an online link to the model, whereas the remainder did not disclose source code or model specifications, hampering transparency and reproducibility. Overall, based on total scores, one study (4%) was classified as “high” quality, 21 (84%) as “moderate,” and three (12%) as “low.”
RoB assessment
Detailed assessments of risk of bias and applicability concerns are provided in Supplementary Table S2. RoB was assessed using the QUADAS-2 tool. 12 studies (48%) demonstrated a low RoB across all four domains, and 14 studies (56.0%) were judged to have a low risk in terms of applicability. Among the four domains, the highest proportion of high RoB was observed in the “reference standard” domain, where only 17 studies (68%) were classified as low risk. The main reasons included unclear definition of the reference standard, lack of a standardized criterion, or absence of blinded comparison.
Of the 25 included studies, 13 met the eligibility criteria for meta-analysis. The remaining 12 studies were excluded due to factors such as in vitro study design, use of artificially generated caries images, unclear reference standards, insufficient description of the AI model, or inability to extract a 2×2 contingency table.
Quantitative analysis
High between-study heterogeneity was observed, with I² values exceeding 90% for both sensitivity and specificity, indicating substantial variability across the included studies. Despite this, the pooled analysis demonstrated that AI models for caries diagnosis achieved a combined sensitivity of 0.86 (95% CI: 0.82–0.89) and a combined specificity of 0.91 (95% CI: 0.88–0.94) (Fig. 2). The SROC curve yielded an AUC of 0.94 (95% CI: 0.92–0.96), indicating excellent overall diagnostic performance (Fig. 3). Most data points clustered in the upper-left region of the curve, reflecting generally high diagnostic accuracy across studies. The 95% confidence region in the SROC plot was relatively narrow, whereas the prediction region was comparatively wider, suggesting that although heterogeneity was substantial, the pooled estimates remained stable at the population level.
Fig. 2.
Forest plot for sensitivity and specificity
Fig. 3.

Summary receiver operating characteristic curve (sROC) for predictive studies
The Deeks’ funnel plot (Fig. 4) revealed no obvious asymmetry in the distribution between 1/√ESS and the log diagnostic odds ratio. The regression line was approximately vertical, and scatter points were largely symmetrical, with one small-sample study located near the X-axis. The P-value of the Deeks’ test was 0.63, indicating no statistically significant evidence of publication bias (P > 0.05). These findings suggest that no strong small-study effects or publication bias were detected.
Fig. 4.
Deeks’ funnel plot for publication bias assessment
Subgroup analysis
A subgroup analysis was performed based on image type and unit of analysis (Fig. 5). Among the 13 studies included, 8 used dental radiographs, while 5 employed clinical intraoral images. Regarding the unit of analysis, 11 studies evaluated individual teeth, one assessed entire images, and one assessed individual tooth surfaces. In tooth-level studies, each individual tooth was treated as a separate diagnostic unit, with AI predictions and reference standard labels assigned per tooth rather than per image. Given the limited number of studies in the latter two categories, and considering that tooth-level analysis more closely reflects routine clinical decision-making, only studies using the tooth as the unit of analysis were included in the subgroup analysis.
Fig. 5.
Subgroup performance comparison of Al models for dental caries diagnosis
Among the 8 studies using dental radiographs, the pooled sensitivity was 0.83 (95% CI: 0.79–0.87) and the pooled specificity was 0.92 (95% CI: 0.87–0.95), with an AUC of 0.93 (95% CI: 0.90–0.95). Heterogeneity analysis revealed high heterogeneity, with I² values of 89.30% for sensitivity and 94.70% for specificity. For the 5 studies using clinical intraoral images, the pooled sensitivity was 0.88 (95% CI: 0.83–0.92) and the pooled specificity was 0.91 (95% CI: 0.86–0.94), with an AUC of 0.95 (95% CI: 0.93–0.97). Sensitivity and specificity I² values were 93.39% and 96.00%, respectively, also indicating substantial heterogeneity.
In addition, the pooled results from the 11 studies using the individual tooth as the unit of analysis demonstrated a sensitivity of 0.87 (95% CI: 0.83–0.89) and a specificity of 0.91 (95% CI: 0.88–0.94), with an AUC of 0.94 (95% CI: 0.92–0.96). Heterogeneity remained high in this subgroup, with I² values of 88.13% for sensitivity and 95.71% for specificity.
Certainty of evidence
Using the GRADE framework, the certainty of evidence for pooled sensitivity, specificity, and AUC was rated as low to moderate. Downgrades were mainly due to high heterogeneity (I² > 90%), moderate methodological quality in most studies, variability in datasets and reference standards, and potential selective reporting. While results indicate promising diagnostic accuracy of AI models for dental caries, the current evidence base remains insufficient for strong clinical recommendations.
Discussion
This systematic review and meta-analysis assessed the diagnostic performance of AI models for the diagnosis of dental caries. A key strength of this study was its emphasis on the binary diagnostic accuracy of AI models in determining whether dental caries are present, rather than on broader tasks such as lesion segmentation or localization. This targeted approach narrowed the scope of the review, allowing for stricter inclusion criteria that captured only studies with well-defined diagnostic objectives and standardized performance metrics. This approach improved the homogeneity and comparability of the meta-analysis, thereby strengthening the representativeness and clinical relevance of its conclusions. Unlike previous reviews covering multiple task types [52, 53], this review focused on a core task with clear clinical translational potential, offering stronger evidence for applying AI to caries screening and diagnostic assistance.
The meta-analysis demonstrated that AI models achieved good diagnostic accuracy, with pooled sensitivity of 0.86 (95% CI: 0.82–0.89), specificity of 0.91 (95% CI: 0.88–0.94), and an AUC of 0.94 (95% CI: 0.92–0.96), indicating substantial potential for clinical application in dental caries diagnosis. Subgroup analyses further refined and validated the applicability and robustness of these findings. These findings are broadly consistent with two recent systematic reviews by Carvalho [25] and Ammar [26], which also reported high sensitivity and specificity of AI for proximal caries detection based on bitewing radiographs. However, those reviews were restricted to a single imaging modality and did not focus specifically on binary diagnostic classification or methodological reporting quality. By contrast, the present study extends this evidence base by synthesizing results across multiple imaging modalities and analytical units, while integrating diagnostic performance with CLAIM and QUADAS-2 assessments. This broader analytical framework provides a more comprehensive and clinically informative evaluation of AI-based caries detection.
The subgroup analysis further revealed the influence of image type and unit of analysis on the diagnostic performance of AI models. Models using clinical intraoral images showed higher sensitivity (0.88) and AUC (0.95), likely because high-resolution color images could clearly display changes in lesion color, surface texture, and morphology, enabling deep learning models to capture subtle pathological features [12, 54]. However, such images were more prone to interference from lighting, saliva reflections, calculus, and staining, which might account for their slightly lower specificity compared with dental radiographs. In contrast, models using dental radiographs showed a slight specificity advantage (0.92), likely because they provided a stable view of hard tissue density changes and were less influenced by lighting or color factors [52, 55]. Nevertheless, radiographs had limitations in visualizing early enamel caries, which might lead to underestimation of lesion extent and reduced sensitivity.
For the unit of analysis, tooth-level studies showed pooled sensitivity of 0.87 and specificity of 0.91, indicating balanced performance and close alignment with routine clinical workflows. Using the tooth as the analytical unit can reduce cumulative error from detecting multiple objects within a single image and minimize bias from inconsistent local annotations, in line with the “unit-of-analysis” consistency principle proposed in previous studies [56, 57]. Therefore, future AI model development and validation should prioritize tooth-level analysis. However, subgroup analyses by dentition type were limited by the small and imbalanced number of available studies and should therefore be interpreted cautiously. Combining tooth-level analysis with multimodal imaging may further enhance diagnostic precision by leveraging the complementary strengths of different modalities.
With respect to publication bias, although the Deeks’ funnel plot test did not reveal statistically significant bias (P = 0.63) and the data points between 1/√ESS and log(DOR) were generally symmetrically distributed, this result should be interpreted with caution. First, the inclusion of only 13 studies in the meta-analysis limited the statistical power for detecting bias. Second, AI research is often published with a focus on “high-performance metrics.” This practice may encourage selective reporting, in which well-performing models are preferentially reported while failed experiments or negative results are omitted. Such bias may not be fully captured by the current funnel plot analysis.
Despite the overall favorable diagnostic performance observed in this meta-analysis, a first major limitation was the substantial heterogeneity across studies (I² > 90%), which indicates that the pooled estimates should be interpreted cautiously and may not be directly generalizable to all clinical settings. Several methodological and clinical factors may have contributed to this variability. One important source of heterogeneity was the imaging modality. Although subgroup analyses were conducted for dental radiographs and clinical intraoral photographs, heterogeneity remained high within each subgroup. This may be attributable to differences in image resolution, contrast, acquisition protocols, and the visual presentation of carious lesions across imaging systems and clinical settings.
In addition to imaging modality, substantial heterogeneity was also driven by variability in model architectures, reference standards, and dataset characteristics. The included studies employed a wide range of convolutional neural networks, transformer-based models, and self-supervised learning frameworks, each with different feature extraction mechanisms, training data requirements, and optimization strategies, which are known to influence diagnostic performance. However, the limited number of studies within each specific model category and their highly unbalanced distribution precluded formal model-type–specific subgroup analyses or meta-regression in the present study. Moreover, although most studies relied on expert-based annotations, key methodological details such as assessor blinding, the number of evaluators, and inter-observer agreement were often incompletely reported, with kappa statistics available in only 4 of 25 studies. Importantly, expert-based labeling should not be regarded as a uniform or error-free gold standard, particularly for early or proximal lesions, and such variability in reference standards introduces uncertainty into both sensitivity and specificity estimates across studies.
Dataset-level factors may have further contributed to heterogeneity. Most studies were based on single-center datasets with variable participant demographics, dentition types, and image quality. In addition, caries prevalence in the test sets ranged widely (15.68%–80.63%), reflecting spectrum effects that can substantially influence diagnostic metrics and limit the generalizability of pooled estimates to routine clinical populations.
A second major limitation of the current evidence base relates to the quality and transparency of reporting, as reflected by the CLAIM assessment. Among the 25 included studies, only one was rated as high quality, while the majority were classified as moderate quality and three were considered low quality. The primary deficiencies were concentrated in two critical domains: model interpretability and reproducibility. More than 60% of the studies did not provide visual explanations of model decision-making (e.g., Grad-CAM or LIME), which may undermine clinical interpretability, acceptability, and trustworthiness. In addition, most studies failed to release source code, training protocols, or model weights, severely limiting opportunities for independent validation and reproducibility. Only one study provided an online access link to the developed model, whereas the remaining studies disclosed no such information. It should also be acknowledged that CLAIM scoring was based exclusively on publicly available information without contacting original authors, which may have contributed to lower ratings in certain domains. Nevertheless, from the perspective of clinical translation and scientific reproducibility, such reporting gaps represent a substantive limitation of the current literature. These shortcomings restrict the transferability and adaptability of AI models across diverse clinical environments and hinder their inclusion in future large-scale comparative analyses or ensemble learning frameworks.
Taken together, the heterogeneity in diagnostic performance and the methodological limitations identified through the CLAIM assessment indicate that AI-based caries detection should be interpreted within a clearly defined clinical context. From a clinical perspective, current AI models are best viewed as complementary decision-support and screening tools rather than substitutes for professional judgment, as clinicians integrate multiple sources of information beyond image-based assessment alone. In this setting, the relatively high pooled sensitivity observed in this meta-analysis suggests that AI may help reduce missed caries cases in routine practice; however, because most models are trained and evaluated using binary labels without lesion-stage stratification, they may be more reliable for identifying established rather than very early lesions. Accordingly, the clinical utility of these models is likely to vary across populations with different caries prevalence, with potentially greater value in high-prevalence settings and an increased risk of false-positive findings in low-prevalence contexts.
For translation into routine practice, regulatory, ethical, and workflow considerations will be equally important. Regulatory approval will depend not only on diagnostic accuracy but also on evidence of robustness, transparency, and generalizability across diverse populations and imaging systems. Ethically, the deployment of AI in dental diagnostics requires careful attention to issues of accountability, automation bias, and the preservation of professional oversight. From a workflow perspective, AI systems are most likely to be adopted successfully when they are seamlessly integrated into existing imaging and reporting platforms, providing real-time, interpretable decision support without increasing clinician burden. Addressing these aspects, together with methodological standardization and external validation, will be essential for enabling safe, effective, and scalable clinical implementation of AI-based caries detection.
Conclusions
This systematic review and meta-analysis evaluated the diagnostic performance of AI models for dental caries detection. The pooled results suggest that AI systems demonstrate promising diagnostic accuracy, with a sensitivity of 0.86, a specificity of 0.91, and an AUC of 0.94. Subgroup analyses further indicate that favorable performance is observed in studies using dental radiographs and tooth-level analysis. However, these findings should be interpreted with caution due to substantial heterogeneity, variability in reference standards, and limitations in reporting quality, as reflected by the low-to-moderate certainty of evidence. Future well-designed, multicenter studies with standardized diagnostic criteria and external validation are needed to confirm these results and support broader clinical implementation.
Supplementary Information
Authors’ contributions
Jing Lai, Shanshan Guo, and Xueman Wang conceived and designed the study and drafted the manuscript. Ke Wang, Xulan Li, and Xin Yu contributed to data collection, analysis, and interpretation, and provided critical revisions to the manuscript. All authors reviewed the findings, commented on manuscript drafts, and approved the final version for submission.
Funding
This research was supported by the internal scientific research funding of Chongqing Dental Hospital.
Data availability
All data generated or analyzed in this study are included in this article and its online supplementary materials. Additional inquiries can be addressed to the corresponding author.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.GBD 2017 Oral Disorders Collaborators, Bernabe E, Marcenes W, et al. Global, Regional, A systematic analysis for the global burden of disease 2017 Study.and National levels and trends in burden of oral conditions from 1990 to 2017. J Dent Res. 2020;99(4):362–73. 10.1177/0022034520908533. [DOI] [PMC free article] [PubMed]
- 2.Feldens CA, Ardenghi TM, Dos Santos Dullius AI, Vargas-Ferreira F, Hernandez PA, Kramer PF. Clarifying the impact of untreated and treated dental caries on oral Health-Related quality of life among adolescents. Caries Res. 2016;50(4):414–21. 10.1159/000447095. [DOI] [PubMed] [Google Scholar]
- 3.Listl S, Galloway J, Mossey PA, Marcenes W. Global economic impact of dental diseases. J Dent Res. 2015;94(10):1355–61. 10.1177/0022034515602879. [DOI] [PubMed] [Google Scholar]
- 4.World Health Organization. Global oral health status report: towards universal health coverage for oral health by 2030. Geneva: World Health Organization; 2022. [Google Scholar]
- 5.Frencken JE, Sharma P, Stenhouse L, Green D, Laverty D, Dietrich T. Global epidemiology of dental caries and severe periodontitis – a comprehensive review. J Clin Periodontol. 2017;44(S18):S94–105. 10.1111/jcpe.12685. [DOI] [PubMed] [Google Scholar]
- 6.Patil S, Albogami S, Hosmani J et al. Artificial Intelligence in the Diagnosis of Oral Diseases: Applications and Pitfalls. Diagnostics (Basel). 2022;12(5):1029. Published 2022 Apr 19. 10.3390/diagnostics12051029. [DOI] [PMC free article] [PubMed]
- 7.Bayraktar Y, Ayan E. Diagnosis of interproximal caries lesions with deep convolutional neural network in digital bitewing radiographs. Clin Oral Investig. 2022;26(1):623–32. 10.1007/s00784-021-04040-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cantu AG, Gehrung S, Krois J, et al. Detecting caries lesions of different radiographic extension on bitewings using deep learning. J Dent. 2020;100:103425. 10.1016/j.jdent.2020.103425. [DOI] [PubMed] [Google Scholar]
- 9.Mao YC, Chen TY, Chou HS et al. Caries and Restoration Detection Using Bitewing Film Based on Transfer Learning with CNNs. Sensors (Basel). 2021;21(13):4613. Published 2021 Jul 5. 10.3390/s21134613. [DOI] [PMC free article] [PubMed]
- 10.Lee S, Oh SI, Jo J, Kang S, Shin Y, Park JW. Deep learning for early dental caries detection in bitewing radiographs. Sci Rep. 2021;11(1):16807. Published 2021 Aug 19. 10.1038/s41598-021-96368-7. [DOI] [PMC free article] [PubMed]
- 11.Chen X, Guo J, Ye J, Zhang M, Liang Y. Detection of proximal caries lesions on bitewing radiographs using deep learning method. Caries Res. 2022;56(5–6):455–63. 10.1159/000527418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Lee JH, Kim DH, Jeong SN, Choi SH. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. J Dent. 2018;77:106–11. 10.1016/j.jdent.2018.07.015. [DOI] [PubMed] [Google Scholar]
- 13.Li S, Liu J, Zhou Z, et al. Artificial intelligence for caries and periapical periodontitis detection. J Dent. 2022;122:104107. 10.1016/j.jdent.2022.104107. [DOI] [PubMed] [Google Scholar]
- 14.Lin X, Hong D, Zhang D, Huang M, Yu H. Detecting proximal caries on periapical radiographs using convolutional neural networks with diferent training strategies on small datasets. Diagnostics. 2022;12(5):1047. 10.3390/diagnostics12051047. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Bui TH, Hamamoto K, Paing MP. Automated Caries Screening Using Ensemble Deep Learning on Panoramic Radiographs. Entropy (Basel). 2022;24(10):1358. Published 2022 Sep 24. 10.3390/e24101358. [DOI] [PMC free article] [PubMed]
- 16.Asci E, Kilic M, Celik O et al. A Deep Learning Approach to Automatic Tooth Caries Segmentation in Panoramic Radiographs of Children in Primary Dentition, Mixed Dentition, and Permanent Dentition. Children (Basel). 2024;11(6):690. Published 2024 Jun 5. 10.3390/children11060690. [DOI] [PMC free article] [PubMed]
- 17.Zhu H, Cao Z, Lian L, Ye G, Gao H, Wu J. CariesNet: a deep learning approach for segmentation of multi-stage caries lesion from oral panoramic X-ray image. Neural Comput Appl Published Online January. 2022;7. 10.1007/s00521-021-06684-2. [DOI] [PMC free article] [PubMed]
- 18.Kühnisch J, Meyer O, Hesenius M, Hickel R, Gruhn V. Caries detection on intraoral images using artificial intelligence. J Dent Res. 2022;101(2):158–65. 10.1177/00220345211032524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Holtkamp A, Elhennawy K, Cejudo Grano de Oro JE, Krois J, Paris S, Schwendicke F. Generalizability of Deep Learning Models for Caries Detection in Near-Infrared Light Transillumination Images. J Clin Med. 2021;10(5):961. Published 2021 Mar 1. 10.3390/jcm10050961. [DOI] [PMC free article] [PubMed]
- 20.Xiong Y, Zhang H, Zhou S et al. Simultaneous detection of dental caries and fissure sealant in intraoral photos by deep learning: a pilot study. BMC Oral Health. 2024;24(1):553. Published 2024 May 12. 10.1186/s12903-024-04254-1. [DOI] [PMC free article] [PubMed]
- 21.Mörch CM, Atsu S, Cai W, et al. Artificial intelligence and ethics in dentistry: A scoping review. J Dent Res. 2021;100(13):1452–60. 10.1177/00220345211013808. [DOI] [PubMed] [Google Scholar]
- 22.Mertens S, Krois J, Cantu AG, Arsiwala LT, Schwendicke F. Artificial intelligence for caries detection: randomized trial. J Dent. 2021;115:103849. 10.1016/j.jdent.2021.103849. [DOI] [PubMed] [Google Scholar]
- 23.Güneç HG, Ürkmez EŞ, Danaci A, Dilmaç E, Onay HH, Cesur Aydin K. Comparison of artificial intelligence vs. junior dentists’ diagnostic performance based on caries and periapical infection detection on panoramic images. Quant Imaging Med Surg. 2023;13(11):7494–503. 10.21037/qims-23-762. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Mohammad-Rahimi H, Motamedian SR, Rohban MH, et al. Deep learning for caries detection: A systematic review. J Dent. 2022;122:104115. 10.1016/j.jdent.2022.104115. [DOI] [PubMed] [Google Scholar]
- 25.Carvalho BKG, Nolden EL, Wenning AS, et al. Diagnostic accuracy of artificial intelligence for approximal caries on bitewing radiographs: A systematic review and meta-analysis. J Dent. 2024;151:105388. 10.1016/j.jdent.2024.105388. [DOI] [PubMed] [Google Scholar]
- 26.Ammar N, Kühnisch J. Diagnostic performance of artificial intelligence-aided caries detection on bitewing radiographs: a systematic review and meta-analysis. Jpn Dent Sci Rev. 2024;60:128–36. 10.1016/j.jdsr.2024.02.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McInnes MDF, Moher D, Thombs BD, et al. Preferred reporting items for a systematic review and Meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. 2018;319(4):388–96. 10.1001/jama.2017.19163. [DOI] [PubMed] [Google Scholar]
- 28.Mongan J, Moy L, Kahn CE. Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers. Radiology: Artif Intell. 2020;2(2):e200029. 10.1148/ryai.2020200029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Sounderajah V, Ashrafian H, Rose S, et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat Med. 2021;27:1663–5. 10.1038/s41591-021-01521-y. [DOI] [PubMed] [Google Scholar]
- 30.Nyaga VN, Arbyn M, Aerts M. METANDI and MIDAS: commands for meta-analysis of diagnostic accuracy data in Stata. J Stat Softw. 2014;56(4):1–18. 10.18637/jss.v056.i04. [Google Scholar]
- 31.ForouzeshFar P, Safaei AA, Ghaderi F, Hashemikamangar SS. Dental caries diagnosis from bitewing images using convolutional neural networks. BMC Oral Health. 2024;24(1):211. 10.1186/s12903-024-03973-9. Published 2024 Feb 10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Zhang JW, Fan J, Zhao FB, Ma B, Shen XQ, Geng YM. Diagnostic accuracy of artificial intelligence-assisted caries detection: a clinical evaluation. BMC Oral Health. 2024;24(1):1095. 10.1186/s12903-024-04847-w. Published 2024 Sep 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Frenkel E, Neumayr J, Schwarzmaier J, et al. Caries detection and classification in photographs using an artificial Intelligence-Based Model-An external validation study. Diagnostics (Basel). 2024;14(20):2281. 10.3390/diagnostics14202281. Published 2024 Oct 14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kang S, Shon B, Park EY, Jeong S, Kim EK. Diagnostic accuracy of dental caries detection using ensemble techniques in deep learning with intraoral camera images. PLoS ONE. 2024;19(9):e0310004. 10.1371/journal.pone.0310004. Published 2024 Sep 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Turosz N, Chęcińska K, Chęciński M, Lubecka K, Bliźniak F, Sikora M. Artificial intelligence (AI) assessment of pediatric dental panoramic radiographs (DPRs): A clinical study. Pediatr Rep. 2024;16(3):794–805. 10.3390/pediatric16030067. Published 2024 Sep 11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Szabó V, Szabó BT, Orhan K, et al. Validation of artificial intelligence application for dental caries diagnosis on intraoral bitewing and periapical radiographs. J Dent. 2024;147:105105. 10.1016/j.jdent.2024.105105. [DOI] [PubMed] [Google Scholar]
- 37.Das M, Shahnawaz K, Raghavendra K, Kavitha R, Nagareddy B, Murugesan S. Evaluating the accuracy of AI-Based software vs human interpretation in the diagnosis of dental caries using intraoral radiographs: an RCT. J Pharm Bioallied Sci. 2024;16(Suppl 1):S812–4. 10.4103/jpbs.jpbs_1029_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Alam MK, Alanazi NH, Alazmi MS, Nagarajappa AK. Al-Based detection of dental caries: comparative analysis with clinical examination. J Pharm Bioallied Sci. 2024;16(Suppl 1):S580–2. 10.4103/jpbs.jpbs_872_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kawazu T, Takeshita Y, Fujikura M, et al. Preliminary study of dental caries detection by deep neural network applying domain-specific transfer learning. J Med Biol Eng. 2024;44(1):43–8. 10.1007/s40846-023-00874-1.
- 40.Liu Y, Xia K, Cen Y, Ying S, Zhao Z. Artificial intelligence for caries detection: a novel diagnostic tool using deep learning algorithms. Oral Radiol. 2024;40(3):375–84. 10.1007/s11282-024-00741-x. [DOI] [PubMed] [Google Scholar]
- 41.Oztekin F, Katar O, Sadak F, et al. An explainable deep learning model to prediction dental caries using panoramic radiograph images. Diagnostics (Basel). 2023;13(2):226. 10.3390/diagnostics13020226. Published 2023 Jan 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lasri I, El-Marzouki N, Riadsolh A, et al. Automated detection of dental caries from oral images using deep donvolutional neural networks. Int J Online Biomed Eng (iJOE). 2023;19(18):53–70. 10.3991/ijoe.v19i18.45133.
- 43.Portella PD, de Oliveira LF, Ferreira MFC, Dias BC, de Souza JF, Assunção LRDS. Improving accuracy of early dental carious lesions detection using deep learning-based automated method. Clin Oral Investig. 2023;27(12):7663–70. 10.1007/s00784-023-05355-x. [DOI] [PubMed] [Google Scholar]
- 44.Zhou X, Yu G, Yin Q et al. Tooth Type Enhanced Transformer for Children Caries Diagnosis on Dental Panoramic Radiographs. Diagnostics (Basel). 2023;13(4):689. Published 2023 Feb 12. 10.3390/diagnostics13040689. [DOI] [PMC free article] [PubMed]
- 45.Park EY, Jeong S, Kang S, Cho J, Cho JY, Kim EK. Tooth caries classification with quantitative light-induced fluorescence (QLF) images using convolutional neural network for permanent teeth in vivo. BMC Oral Health. 2023;23(1):981. 10.1186/s12903-023-03669-6. Published 2023 Dec 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.AlSayyed A, Taqateq AM, Al-Sayyed R, et al. Employing CNN ensemble models in classifying dental caries using oral photographs. Int J Data Netw Sci. 2023;7(4):1465–76. 10.5267/j.ijdns.2023.8.002.
- 47.Taleb A, Rohrer C, Bergner B, et al. Self-Supervised learning methods for Label-Efficient dental caries classification. Diagnostics (Basel). 2022;12(5):1237. 10.3390/diagnostics12051237. Published 2022 May 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Zhou X, Yu G, Yin Q, Liu Y, Zhang Z, Sun J. Context aware convolutional neural network for children caries diagnosis on dental panoramic radiographs. Comput Math Methods Med. 2022;2022:6029245. 10.1155/2022/6029245. Published 2022 Sep 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Vinayahalingam S, Kempers S, Limon L, et al. Classification of caries in third molars on panoramic radiographs using deep learning. Sci Rep. 2021;11(1):12609. 10.1038/s41598-021-92121-2. Published 2021 Jun 15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Bui TH, Hamamoto K, Paing MP. Deep fusion feature extraction for caries detection on dental panoramic radiographs. Appl Sci. 2021;11(5):2005. 10.3390/app11052005. [Google Scholar]
- 51.Schwendicke F, Elhennawy K, Paris S, Friebertshäuser P, Krois J. Deep learning for caries lesion detection in near-infrared light transillumination images: A pilot study. J Dent. 2020;92:103260. 10.1016/j.jdent.2019.103260. [DOI] [PubMed] [Google Scholar]
- 52.Schwendicke F, Tzschoppe M, Paris S. Artificial intelligence in caries detection: a systematic review. J Dent. 2019;91:103226. 10.1016/j.jdent.2019.103226. [DOI] [PubMed] [Google Scholar]
- 53.Prajapati S, Nagarajappa AK, Mitra D, et al. Artificial intelligence in dentistry: A scoping review on current applications and future perspectives. J Dent. 2021;108:103632. 10.1016/j.jdent.2021.103632.33711405 [Google Scholar]
- 54.Srivastava MM, Kumar P, Ribeiro E, et al. Deep learning for detection of caries in bitewing radiographs. J Dent Res. 2019;98(11):1223–9. 10.1177/0022034519878800. [Google Scholar]
- 55.Devito KL, de Souza Barbosa F, Felippe Filho WN. The performance of artificial intelligence in detecting approximal caries in digital bitewing radiographs. Oral Surg Oral Med Oral Pathol Oral Radiol Endod. 2008;106(6):879–87. 10.1016/j.tripleo.2008.07.006. [DOI] [PubMed] [Google Scholar]
- 56.Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36. 10.7326/0003-4819-155-8-201110180-00009. [DOI] [PubMed] [Google Scholar]
- 57.Heo MS, Kim JE, Hwang JJ, et al. Influence of analysis unit on diagnostic accuracy in dental imaging AI. Dentomaxillofac Radiol. 2022;51(4):20210395. 10.1259/dmfr.20210395. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data generated or analyzed in this study are included in this article and its online supplementary materials. Additional inquiries can be addressed to the corresponding author.




