Skip to main content
Journal of Conservative Dentistry and Endodontics logoLink to Journal of Conservative Dentistry and Endodontics
. 2026 Mar 30;29(4):399–405. doi: 10.4103/JCDE.JCDE_1068_25

Beta testing and clinical evaluation of an artificial intelligence-powered clinical decision support system for endodontic case assessment

Rohan Gupta 1,
PMCID: PMC13086377  PMID: 42004790

Abstract

Background:

Structured clinical digital decision support may help reduce variability in endodontic diagnosis, case difficulty assessment, and referral communication.

Aim:

To evaluate the agreement between a structured clinical decision support system for endodontic case assessment and independent expert endodontist judgement.

Materials and Methods:

Sixty anonymised simulated endodontic cases were entered into a previously developed clinical decision support system. Thirty cases were entered by an endodontist and thirty by a general dental practitioner. System-generated diagnostic and treatment assessments were independently reviewed by two blinded endodontists using standardised diagnostic terminology. Independent expert judgement served as the reference standard. Proportion agreement with 95% confidence intervals was calculated using the Wilson method.

Results:

Agreement with the expert reference standard ranged from 83% to 97%. Inter-operator agreement between reviewers was 95%. Discrepancies were primarily associated with incomplete clinical information.

Conclusion:

The system demonstrated close alignment with expert judgement in this exploratory evaluation. Structured digital prescriptions may support consistent diagnostic reasoning while preserving clinician oversight.

Keywords: Artificial intelligence, beta testing, clinical decision support, diagnosis, endodontics, referral

INTRODUCTION

Artificial intelligence (AI) has been increasingly discussed within conservative dentistry and endodontics as a supportive tool capable of augmenting clinical decision-making rather than replacing clinician judgement. Recent perspectives in the published literature have highlighted that AI applications are gradually transitioning from experimental image analysis toward broader roles that include diagnostic assistance, workflow optimisation, and standardisation of clinical reasoning in complex cases. The potential of AI as a futuristic adjunct in advanced endodontics has largely been discussed in literature, particularly when used to complement established biological and clinical principles rather than function autonomously, in a clinically validated, ethically guided integration into routine practice.[1,2] The study focuses on the clinical evaluation of an AI-assisted decision support system, examining its performance when applied to real-world endodontic case assessments. In endodontics, diagnostic accuracy, assessment of procedural complexity, and referral decisions are traditionally informed by clinician experience, interpretation of radiographic findings, and application of professional guidelines.[3,4] Even among specialists, variability in interpretation has been repeatedly documented, particularly in radiographic assessment of periapical disease and borderline diagnostic scenarios.[5,6]

AI applications in endodontics have largely been evaluated using image-based accuracy metrics, comparing lesion detection performance against expert annotations.[7,8] However, fewer studies have explored how AI-assisted systems perform when integrated into broader clinical reasoning workflows that include patient history, clinical examination findings, and treatment planning. Moreover, reference standards in such evaluations must be defined carefully.

The present beta testing study was therefore designed to evaluate agreement between system-generated assessments and independent expert endodontist judgement. This approach is consistent with prior observer-agreement studies and evaluations of clinical decision support systems reported in dentistry and endodontics.[9,10]

MATERIALS AND METHODS

This application was developed using AI. Sections of this article have been polished for readability with the help of Large Language Model AI-Notebooklm.

Study design

An exploratory agreement-based evaluation design was employed. The objective was to assess concordance between system-generated assessments and independent expert endodontist evaluations, as well as inter-operator reliability between expert reviewers. The study did not aim to evaluate treatment outcomes or predictive accuracy.

Case selection

Sixty anonymised simulated endodontic cases were included during the beta testing phase. Cases were selected to represent a range of pulpal and periapical diagnoses, anatomical variations, and procedural complexity levels. Simulated cases were used to permit controlled inclusion of complete and incomplete data scenarios while avoiding patient-identifiable information.

Thirty cases were entered by an endodontist and thirty by a general dental practitioner (GDP). All cases included patient history, chief complaint, clinical examination findings, sensibility testing results, and radiographic images [Figure 1a and b]. Where relevant, cases included features suggestive of traumatic dental injuries.

Figure 1.

Figure 1

Standardised case report generated by the clinical decision support system showing, (a) Clinician-entered submission data, (b) Guideline-based assessment and referral recommendation, (c) Treatment plan and medications, (d) Documentation elements for referral and continuity of care

Consent and ethical considerations

The cases used in this study were anonymised. Users of the platform were required to confirm that informed patient consent had been obtained for AI-assisted analysis of clinical data and radiographs.

System-generated assessments

For each case, the system generated a structured digital prescription comprising pulpal and apical diagnoses using established endodontic diagnostic terminology, case difficulty classification, and evidence-based treatment considerations consistent with published endodontic literature and clinical practice guidelines.[11,12,13,14] Medication guidance was provided where appropriate, and referral recommendations were generated when predefined complexity thresholds were met. Outputs were exported without modification. [Figure 1a-d].

Reference standard definition

Independent expert endodontist judgement served as the reference standard. Two experienced endodontists, not involved in system development or case entry, independently reviewed all system-generated outputs. Reviewers were blinded to the identity of the case entrant and to each other’s assessments.

Each reviewer assigned pulpal and apical diagnoses and proposed treatment plans using standardised terminology and guideline-based reasoning. Disagreements were recorded, allowing direct assessment of inter-operator reliability.

Statistical analysis

The agreement between system-generated assessments and each blinded reviewer was calculated as the proportion of cases with consistent diagnostic and management outcomes. Ninety-five per cent confidence intervals (CIs) were estimated using the Wilson score method. Agreement beyond chance was evaluated using Cohen’s kappa for diagnostic categories. All analyses were conducted in keeping with recommendations for exploratory evaluation of clinical decision support systems. Statistical analysis was performed using R (version 4.5.2; R Core Team, Vienna, Austria). Inter-operator reliability between the two blinded endodontists was assessed using overall per cent agreement. Differences in diagnostic agreement between the Endodontist and GDP subgroups were evaluated. Fisher’s exact test was used instead of the Chi-square test for this comparison, as the contingency tables contained cells with expected frequencies of <5 due to the high agreement rates. A P < 0.05 was considered statistically significant.

RESULTS

Overall agreement

Across all 60 evaluated cases, agreement between the system-generated assessments and the blinded expert reviewers was high. Concordance was observed consistently across diagnostic, case difficulty, and referral decision domains.

Beta testing outcomes

Among cases entered by the endodontist, both blinded reviewers agreed with the system output in 29 of 30 cases, corresponding to an agreement proportion of 0.97. In contrast, for cases entered by GDPs, agreement was observed in 25 of 30 cases for one reviewer and 26 of 30 cases for the other, yielding agreement proportions ranging from 0.83 to 0.87.

Five disagreement cases occurred within the GDP group, while one disagreement case occurred within the endodontist group. Disagreement cases within the GDP group were associated with incomplete or ambiguous clinical information, particularly relating to history of present illness and diagnostic testing.

Diagnostic agreement

Agreement between the AI-assisted system and the blinded reviewers was high for pulpal diagnosis. Agreement with reviewer 1 was 90.0% (κ = 0.87; 95% CI: 0.80–0.95), while agreement with reviewer 2 was 91.7% (κ = 0.89; 95% CI: 0.82–0.96).

For apical diagnosis, complete agreement was observed between the system and both reviewers, with 100% harmony (κ =1.00; 95% CI: 0.94–1.00).

Case difficulty classification and referral decisions

Case difficulty classification demonstrated substantial to near-perfect agreement. Agreement between the system and reviewer 1 was 91.7% (κ = 0.87; 95% CI: 0.82–0.96), while agreement with reviewer 2 reached 95.0% (κ = 0.92; 95% CI: 0.86–0.98).

Referral recommendations showed complete agreement between the system and both reviewers, with 100% agreement (κ = 1.00), indicating consistent alignment in decisions regarding specialist referral.

Inter-operator reliability

Inter-operator agreement between the two blinded endodontists was observed in 57 of 60 cases, corresponding to a proportion agreement of 0.95 with a Wilson 95% CI of 0.86–0.98. This high level of agreement supports the internal consistency of the expert reference standard used in the study.

Paired comparison analysis

McNemar’s tests demonstrated no statistically significant systematic disagreement between system-generated assessments and expert judgements across diagnostic and decision-making domains (P > 0.05 for all comparisons). This indicates that the observed discrepancies were not directionally biased.

Subgroup and sensitivity analyses

Agreement was higher for cases entered by the endodontist (97%) than for those entered by GDPs (83%–87%), highlighting the influence of data completeness and precision on system performance. However, this difference did not reach statistical significance (Fisher’s exact test, P = 0.112), suggesting that while data quality may influence system performance, the sample size was insufficient to confirm a systematic disparity.

In sensitivity analyses excluding cases with incomplete clinical information, agreement between the system and expert reviewers exceeded 95% across diagnostic, difficulty classification, and referral outcomes. The findings of this study are presented in Tables 1 and 2 and Figure 2. These findings underscore the dependence of AI-assisted clinical decision support on structured and comprehensive clinical inputs.

Table 1.

Agreement between the artificial intellegence-assisted system and blinded endodontist reviewers across diagnostic, case difficulty, and referral domains

Outcome Comparator Agreement (%) Cohen’s κ 95% CI
Pulpal diagnosis System versus reviewer 1 90.0 0.87 0.80–0.95
Pulpal diagnosis System versus reviewer 2 91.7 0.89 0.82–0.96
Apical diagnosis System versus reviewer 1 100 1.00 0.94–1.00
Apical diagnosis System versus reviewer 2 100 1.00 0.94–1.00
Case difficulty System versus reviewer 1 91.7 0.87 0.82–0.96
Case difficulty System versus reviewer 2 95.0 0.92 0.86–0.98
Referral decision System versus reviewer 1 100 1.00 0.94–1.00
Referral decision System versus reviewer 2 100 1.00 0.94–1.00

Agreement is expressed as percentage agreement and Cohen’s kappa (κ) with 95% CI. CI: Confidence interval, κ: Cohen’s kappa

Table 2.

Subgroup agreement and inter-operator reliability by data entry source and inter-operator reliability between blinded endodontist reviewers

Comparison Cases (n) Agreement (%) Statistical test
Endodontist-entered cases 30 97 Descriptive
GDP-entered cases 30 83–87 Fisher’s exact test
(P=0.112)
Reviewer 1 versus reviewer 2 60 95 Percent agreement

Statistical significance for subgroup comparisons was assessed using Fisher’s exact test where applicable. GDP: General dental practitioner

Figure 2.

Figure 2

Agreement profiles between the clinical decision support system and blinded expert endodontists across diagnostic and management domains, (a) Percentage agreement between system and reviewers 1 and 2, (b) Effect of data completeness on agreement proportion, (c) Agreement proportion for endodontist-entered versus general dental practitioner-entered cases, (d) Confusion matrix for case difficulty classification

DISCUSSION

The findings indicate that when complete and structured clinical information is provided, the system can generate assessments closely aligned with expert endodontist judgement. Agreement levels observed in this study are comparable to those reported for AI-based radiographic interpretation systems evaluated in isolation.[6,7,8] Importantly, the present system extends beyond imaging by integrating clinical history, examination findings, and assessment of procedural complexity into a unified workflow.

Although the difference in accuracy between endodontists and GDPs was not statistically significant, the trend suggests that the AI system’s performance is dependent on the precision of clinical inputs. The lower agreement in the GDP group was driven by five specific cases where ambiguous or incomplete data entry led to system errors. The reduced agreement observed in GDP-entered cases highlights a fundamental principle of clinical decision support systems: Output quality is dependent on input quality. AI cannot compensate for incomplete clinical assessment and should function as an adjunct rather than a substitute for thorough examination.[1,2]

Existing AI applications in endodontics largely focus on lesion detection or image classification.[7,8] While such tools demonstrate technical capability, they do not address treatment planning, procedural difficulty assessment, or referral communication. Similarly, digital complexity assessment tools such as the Endodontic Complexity Assessment Tool (E CAT) and American Association of Endodontists (AAE) Endocase provide structured scoring but remain static instruments without automated synthesis of clinical and radiographic data.[15]

The system evaluated in this study differs in its ability to generate a structured, editable digital prescription that consolidates diagnosis, complexity assessment, treatment considerations, and referral justification within a single workflow. This integrative approach aligns with definitions of advanced clinical decision support systems, which emphasise synthesis of patient-specific data with a knowledge base to generate contextual guidance.

From a clinical perspective, such integration may support consistency in diagnostic reasoning and referral justification, particularly in general practice settings where endodontic complexity may be underestimated. Preservation of full clinical context within referral documentation may also improve communication between referring clinicians and specialists, a recognised limitation in current patient care pathways.[10,16]

The editable nature of system outputs is a deliberate safeguard against automation bias. Clinicians retain responsibility for final decisions, consistent with ethical frameworks governing AI deployment in healthcare.[17] Recent literature on AI in endodontics suggests that AI contribute meaningfully to diagnostic consistency when applied to well-defined clinical problems and supported by appropriate validation. Studies focusing on caries detection have shown that AI-based systems can improve identification and classification accuracy on radiographic images, while still requiring clinician oversight to ensure appropriate interpretation and clinical relevance. The progression from experimental models toward end-to-end systems capable of segmentation, classification, and clinical interpretation underscores a broader shift toward integrating AI within routine dental workflows. However, these studies also emphasise that the effectiveness of such technologies depends on careful training, validation, and alignment with established clinical decision-making processes. Within this evolving landscape, the findings of the present study support the view that AI-assisted tools in endodontics may be most valuable when designed to complement clinician judgement, reinforce structured reasoning, and facilitate transparent documentation, rather than function as standalone diagnostic authorities.[18,19,20]

Limitations

The sample size was modest. Patient outcomes and longitudinal effects on referral patterns were not evaluated. The reference standard relied on expert judgement rather than histological confirmation or long-term outcomes, reflecting a common limitation in endodontic diagnostic research.

CONCLUSION

This beta testing study demonstrates that an AI-powered clinical decision support system integrating multimodal clinical data, radiographic interpretation, and guideline-based reasoning can generate assessments closely aligned with expert endodontist judgment. High agreement and inter-operator reliability support the feasibility of such integrated workflows, while observed discrepancies underscore the importance of complete clinical data and clinician oversight. These findings provide a foundation for further clinical validation and refinement.

Notes

Views expressed in the paper are those of the author and do not represent views of the Indian Armed Forces.

Ethical considerations statement

This clinical decision support system was developed using independently implemented computational logic derived from publicly accessible specialty guidelines for academic and non-commercial research purposes. The system does not reproduce proprietary documents. Its design adheres to established ethical principles for dental AI, including transparency, clinician accountability, respect for patient autonomy, privacy protection, equity, and governance oversight. All outputs are clinician-editable, and final clinical responsibility remains with the treating practitioner. The platform is intended as a proof-of-concept research tool to enhance diagnostic consistency and professional communication while preserving human supervision and ethical safeguards in accordance with current guidance on responsible AI deployment in dentistry.[17]

Conflicts of interest

There are no conflicts of interest.

Funding Statement

Nil.

REFERENCES

  • 1.Singh S, Asthana G. Artificial intelligence. A futuristic tool for advanced endodontics. J Conserv Dent Endod. 2024;27:447–8. doi: 10.4103/JCDE.JCDE_171_24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Marwaha J. Artificial intelligence in conservative dentistry and endodontics: A game-changer. J Conserv Dent Endod. 2023;26:514–8. doi: 10.4103/JCDE.JCDE_7_23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Ørstavik D, Pitt Ford TR. 2nd. Oxford: Wiley-Blackwell; 2008. Essential Endodontology: Prevention and Treatment of Apical Periodontitis. [Google Scholar]
  • 4.Ng YL, Mann V, Rahbaran S, Lewsey J, Gulabivala K. Outcome of primary root canal treatment: Systematic review of the literature – Part 2. Influence of clinical factors. Int Endod J. 2008;41:6–31. doi: 10.1111/j.1365-2591.2007.01323.x. [DOI] [PubMed] [Google Scholar]
  • 5.Rosenberg PA, Frisbie J, Lee J, Lee K, Frommer H, Kottal S, et al. Evaluation of pathologists (histopathology) and radiologists (cone beam computed tomography) differentiating radicular cysts from granulomas. J Endod. 2010;36:423–8. doi: 10.1016/j.joen.2009.11.005. [DOI] [PubMed] [Google Scholar]
  • 6.Patel S, Wilson R, Dawood A, Foschi F, Mannocci F. The detection of periapical pathosis using digital periapical radiography and cone beam computed tomography – Part 2: A 1-year post-treatment follow-up. Int Endod J. 2012;45:711–23. doi: 10.1111/j.1365-2591.2012.02076.x. [DOI] [PubMed] [Google Scholar]
  • 7.Ekert T, Krois J, Meinhold L, Elhennawy K, Emara R, Golla T, et al. Deep learning for the radiographic detection of apical lesions. J Endod. 2019;45:917–22.e5. doi: 10.1016/j.joen.2019.03.016. [DOI] [PubMed] [Google Scholar]
  • 8.Pauwels R, Brasil DM, Yamasaki MC, Jacobs R, Bosmans H, Freitas DQ, et al. Artificial intelligence for detection of periapical lesions on intraoral radiographs: Comparison between convolutional neural networks and human observers. Oral Surg Oral Med Oral Pathol Oral Radiol. 2021;131:610–6. doi: 10.1016/j.oooo.2021.01.018. [DOI] [PubMed] [Google Scholar]
  • 9.Mendonça EA. Clinical decision support systems: Perspectives in dentistry. J Dent Educ. 2004;68:589–97. [PubMed] [Google Scholar]
  • 10.Sutton RT, Pincock D, Baumgart DC, Sadowski DC, Fedorak RN, Kroeker KI. An overview of clinical decision support systems: Benefits, risks, and strategies for success. NPJ Digit Med. 2020;3:17. doi: 10.1038/s41746-020-0221-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Essam O, Umerji S, Blundell K. Endodontic assessment, complexity, diagnosis and treatment planning. Br Dent J. 2025;238:441–7. doi: 10.1038/s41415-025-8452-6. [doi: 10.1038/s41415-025-8452-6] [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.American Association of Endodontists . Chicago: Colleagues for Excellence; 2013. Consensus Endodontic Diagnostic Terminology. Available from: https://www.aae.org/specialty/wp-content/uploads/sites/2/2023/12/Fall2013-EndoDiagnosis.pdf . [Last accessed on 2026 Jan 21] [Google Scholar]
  • 13.American Association of Endodontists Endodontic Case Difficulty Assessment form and Guidelines. 2022. Available from: https://www.aae.org/specialty/wp-content/uploads/sites/2/2022/01/CaseDifficultyAssessmentFormFINAL2022.pdf . [Last accessed on 2026 Jan 21]
  • 14.Duncan HF, Kirkevang LL, Peters OA, El-Karim I, Krastl G, Del Fabbro M, et al. Treatment of pulpal and apical disease: The European Society of Endodontology (ESE) S3-level clinical practice guideline. Int Endod J. 2023;56(Suppl 3):238–95. doi: 10.1111/iej.13974. [DOI] [PubMed] [Google Scholar]
  • 15.Essam O, Boyle EL, Whitworth JM, Jarad FD. The endodontic complexity assessment tool (E-CAT): A digital form for assessing root canal treatment case difficulty. Int Endod J. 2021;54:1189–99. doi: 10.1111/iej.13506. [DOI] [PubMed] [Google Scholar]
  • 16.Liew J, Zainal Abidin I, Cook N, Kanagasingam S. Clinical decision-making in complex endodontic cases between postgraduate students across dental specialties at a UK dental school: A pilot study. Eur J Dent Educ. 2022;26:707–16. doi: 10.1111/eje.12751. [DOI] [PubMed] [Google Scholar]
  • 17.Rokhshad R, Ducret M, Chaurasia A, Karteva T, Radenkovic M, Roganovic J, et al. Ethical considerations on artificial intelligence in dentistry: A framework and checklist. J Dent. 2023;135:104593. doi: 10.1016/j.jdent.2023.104593. [DOI] [PubMed] [Google Scholar]
  • 18.Marwaha J, Singla M, Nath A, Arya A. Revolutionizing the diagnosis of dental caries using artificial intelligence-based methods. J Conserv Dent Endod. 2025;28:401–5. doi: 10.4103/JCDE.JCDE_172_25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Boy AF, Akhyar A, Arif TY, Syahrial S. Artificial intelligence for dental caries detection: A mixup, fine-tuning, and quantization approach on the MobileNetV2 model. J Conserv Dent Endod. 2025;28:764–71. doi: 10.4103/JCDE.JCDE_362_25. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Marwaha J, Nath A, Singla M, Arya A. An end-to-end deep-learning system for segmentation and classification of dental caries from radiovisiography images. J Conserv Dent Endod. 2025;28:1133–8. doi: 10.4103/JCDE.JCDE_384_25. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Conservative Dentistry and Endodontics are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES