Machine learning in dentistry: a scoping review

Shrey Lakhotia; Hormazd Godrej; Amandeep Kaur; Chaitanya Sai Nutakki; Michelle Mun; Pascal Eber; Leo Anthony Celi

doi:10.1371/journal.pdig.0000940

. 2025 Jul 23;4(7):e0000940. doi: 10.1371/journal.pdig.0000940

Machine learning in dentistry: a scoping review

Shrey Lakhotia ¹, Hormazd Godrej ², Amandeep Kaur ³, Chaitanya Sai Nutakki ⁴, Michelle Mun ^5,^6,^#, Pascal Eber ^7,^8,^9,^#, Leo Anthony Celi ^10,^11,^12,^¤,^*

Editor: Erika Ong¹³

¹Helios Enter Data Warehouse IT Exp., Henry Ford Health System, Detroit, Michigan, United States of America

²Independent Researcher, Mumbai, India

³Department of Oral Health Sciences, Post Graduate Institute of Medical Education and Research, Chandigarh, India

⁴Department of Computer Science and Engineering, SRM University, Mangalagiri, India

⁵Faculty of Medicine, Dentistry and Health Sciences, Melbourne Dental School, The University of Melbourne, Melbourne, Victoria, Australia

⁶Centre for Digital Transformation of Health, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia

⁷Division of Oral and Maxillofacial Surgery, Massachusetts General Hospital, Boston, Massachusetts, United States of America

⁸Department of Oral and Maxillofacial Surgery, Harvard School of Dental Medicine, Boston, Massachusetts, United States of America

⁹Division of Oral and Maxillofacial Surgery, Medical University Hannover, Hannover, Germany

¹⁰Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America

¹¹Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America

¹²Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America

¹³University of Pittsburgh Department of Surgery, UNITED STATES OF AMERICA

^✉

* E-mail: lceli@mit.edu

Leo Anthony Celi is the Editor-in Chief of PLOS Digital Health.

¤ Senior author.

Contributed equally.

Roles

Shrey Lakhotia: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Hormazd Godrej: Investigation, Methodology, Resources

Amandeep Kaur: Investigation, Writing – review & editing

Chaitanya Sai Nutakki: Data curation, Investigation, Resources

Michelle Mun: Investigation, Resources

Pascal Eber: Investigation, Resources

Leo Anthony Celi: Conceptualization, Resources, Supervision, Writing – review & editing

Erika Ong: Editor

PMCID: PMC12286321 PMID: 40700462

Abstract

Artificial intelligence (AI), specifically machine learning (ML), is increasingly applied in decision-making for dental diagnosis, prognosis, and treatment. However, the methodological completeness of published models has not been rigorously assessed. We performed a scoping review of PubMed-indexed articles (English, 1 January 2018â€’31 December 2023) that used ML in any dental specialty. Each study was evaluated with the TRIPOD + AI rubric for key reporting elements such as data preprocessing, model validation, and clinical performance. Out of 1,506 identified studies, 280 met the inclusion criteria. Oral and maxillofacial radiology (27.5%), oral and maxillofacial surgery (15.0%), and general dentistry (14.3%) were the most represented specialties. Sixty-four studies (22.9%) lacked comparison with a clinical reference standard or existing model performing the same task. Most models focused on classification (59.6%), whereas generative applications were relatively rare (1.4%). Key gaps included limited assessment of model bias, poor outlier reporting, scarce calibration evaluation, low reproducibility, and restricted data access. ML could transform dental care, but robust calibration assessment and equity evaluation are critical for real-world adoption. Future research should prioritize error explainability, outlier reporting, reproducibility, fairness, and prospective validation.

Author summary

Machine learning (ML) techniques are increasingly applied to imaging-driven clinical specialties such as dentistry. We reviewed all English-language PubMed studies (2018–2023) that applied ML in dentistry. Each paper was evaluated for key reporting areas such as data preprocessing, model validation, clinical performance, calibration, reproducibility, and equity considerations. Among the 280 eligible studies, most of the studies were in the subspecialty of oral and maxillofacial radiology. However, fewer than one-third reported calibration, outlier handling, equity considerations, and reproducibility. We underline the need to address equity to ensure safe implementation of ML in diverse populations. Open-source code and deidentified data will strengthen reproducibility and accelerate innovation. We advocate standardized evaluation criteria to guide responsible ML integration into dental diagnostics and treatment planning.

Introduction

The use of artificial intelligence (AI) in healthcare has changed how diagnosis and treatments are done across many medical fields [1], providing unique opportunities for making decisions based on data. Dentistry, a field that heavily relies on imaging data, stands to benefit substantially from such advancements. Machine learning (ML), a part of AI, has become an important tool allowing predictive modeling and decision support systems which can improve the accuracy of diagnoses as well as tailor treatment plans to individual patients.

Despite this progress, the use of ML in dentistry is still inconsistent. Some specific areas, such as oral and maxillofacial radiology [2], have shown significant improvement by using ML for tasks limited to image segmentation and disease classification. Also, issues about generalizability, explainability of errors in clinical performance, and fairness of these models haven’t been properly dealt with, which makes them difficult to be accepted clinically. These limitations highlight the need for a full analysis of ML models in dentistry.

For example, Arsiwala-Scheppach et al. reviewed 168 papers published from January 2015 through May 2021 and highlighted reporting deficits but did not evaluate calibration performance or fairness [3]. A recent systematic review by Kukreja examined dental-AI adoption in education and clinical practice (7 papers) omitted calibration performance and fairness metrics [4]. By screening the literature from 2018 to 2023, extracting 280 studies across all dental specialties, and systematically recording granular elements of model reporting that include metrics for calibration performance, clinical performance, and fairness, we believe our review provides the first field-wide map of model reliability and equity.

To our knowledge, this is the first review to systematically address these critical aspects to improve the transparency and reliability of ML models in dental practice within the PubMed-indexed literature. This review provides the largest scoping synthesis of ML in dentistry, with 280 studies included over a 5-year period. Through this paper, we seek to advance the responsible and effective adoption of ML in dentistry, ensuring that it meets the demands of diverse clinical settings and patient populations. The research question addressed in this scoping review is: What applications of machine learning have been reported in dentistry, how are model performance metrics documented, and to what extent do published studies address equity and reproducibility?

Materials and methods

The review was conducted as a scoping review following the Arksey & O’Malley framework (as updated by JBI). Reporting adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) guidelines [5]. This ensures transparency and rigor in reporting scoping reviews. The PRISMA-ScR checklist has been completed and is provided as supplementary material S1 File. Before the evaluation started, the protocol was not included in a public database such as Open Science Framework (OSF) due to the exploratory nature of the review.

Scope of the review

This scoping review evaluates ML models applied in dentistry (2018 – 2023) using the PICO framework [6].

Information sources and search strategy

We queried PubMed using Boolean operators to combine search terms effectively. PubMed was selected for its broad coverage of topics in dentistry and machine learning, ensuring clinical relevance of included studies. Boolean operators such as OR were used to expand the scope by including synonyms or related terms and AND was used to combine the terms related to machine learning and dentistry simultaneously. We started with the search terms: “machine learning" OR “deep learning" AND “dentistry". After evaluating the preliminary results, which involved consultation among reviewers with expertise in both dentistry and data science, additional terms such as - “neural network”, “artificially intelligent”, “natural language processing”, “algorithm”, “Artificial Intelligence”, “stomatology” and “oral medicine” were included to expand the scope of the search. The final query was applied as a single combined search. S1 Table provides the full search query for clarity and reproducibility.

The most recent search was conducted on December 27, 2024.

Selection process

The search was performed with Boolean operators, and only original studies published in print and in English between 2018 and 2023 were considered. We employed a two-stage review methodology to review the resultant group of 1506 studies. Stage one involved the initial screening of identified studies. Reviewer teams, each consisting of a data scientist and a dentist, independently assessed titles, abstracts, and, if needed, a full-text review to determine eligibility based on the inclusion criteria. Any discrepancies within a team were adjudicated by a third reviewer (dentist). Stage two focused on data extraction from the studies confirmed for inclusion after the initial screening.

Data collection process

The nbib file, a standard bibliographic file format that contains metadata such as titles, authors, publication dates, and abstracts, was downloaded from PubMed from the search results. This was imported into Zotero [7], a reference management tool, to organize the bibliographic information. A csv file was exported from Zotero that had the relevant fields like titles, summaries, the year when published, as well as authors, and uploaded onto Google Sheets. Extra columns in this sheet were added to include all the criteria used in the evaluation based on the Tripod+AI guidelines [8]. In stage one, answers were recorded for each paper individually by 2 reviewers regarding the paper meeting inclusion criteria as categorical- yes/no/maybe responses. Any paper that was marked maybe or where the 2 reviewers’ responses didn’t match was reviewed by a third reviewer and the final response was recorded. In stage two, all the papers that had “Yes” responses in stage one, were included. For this stage, reviewer pairs (each comprising a dentist and a data scientist) reviewed the full-text articles collaboratively and reached a consensus before systematically logging the required data points. In case of any clarification needed during this process, other reviewers were engaged. Data was logged by marking most criteria with binary yes/no responses, and categorizing certain criteria with descriptive categorical variables from the included studies as defined in the S2 Table. For some studies that we could not find complete papers from public sources, we got access through researchers linked to educational institutions. This study was based solely on the analysis of published studies and did not involve human participants or patient-level data. Therefore, ethical approval was not applicable.

Eligibility criteria

This review followed the Arksey & O’Malley framework (as updated by JBI) and integrates the PICO elements directly into the inclusion criteria below.

Inclusion criteria: PICO stands for Population, Intervention, Comparator, and Outcome, and was used to structure the review. Studies were included if they met all the following conditions:

Population (P) – Original research involving human dental patients.
Intervention (I) – Studies that propose a machine learning model using individual patient-level data intended for direct point-of-care use that performs any of the following tasks (a task is considered as the main objective of the ML model addressed in the paper):
- Classification (predicts categorical outcomes)
- Regression (predicts a continuous outcome)
- Segmentation (predicts the specific region or structure in an image)
- Generation (generation of new data)
For the purpose of this review, ‘direct point-of-care use’ was interpreted as models designed to directly inform the clinician of an immediate clinical decision, such as diagnostic (e.g., identifying caries on a radiograph), prognostic assessment (e.g., predicting treatment success), or treatment planning for an individual patient. This excluded models developed primarily for administrative purposes, population-level risk stratification, or image processing tasks (e.g., image quality enhancement) that did not themselves yield a diagnostic or prognostic output, even if they might indirectly support clinical decisions.
English language, published in print 2018 – 2023, full text accessible. We chose this time period to capture the trends in this topic over a longer time frame, given the growing interest in the last couple of years.
Study Type – Only peer-reviewed, original research articles reporting primary data were considered eligible; reviews, editorials, commentaries, and similar publication types were excluded.

The Comparison elements (C) and the Outcome (O) of the PICO framework were descriptively treated: the studies were eligible regardless of whether they reported a comparator (e.g., clinician evaluation) or outcome (e.g., specific performance metrics). Whenever such data were present, we extracted them for synthesis.

Exclusion criteria: Studies were excluded if they met any of the following:

The described “algorithms" were standards based on expert judgments, not derived from data using machine learning.
The paper was inaccessible despite institutional access (e.g., two articles published in 2022 and 2023 were omitted for this reason).
The paper was a preprint.
The study focused on data processing tasks (e.g., data optimization) without developing an ML model.

In total, we chose 280 papers to examine and assess by applying the Tripod+AI guidelines to ensure transparency and reproducibility of the research.

The study selection process is summarized in Fig 1.

Each investigation was evaluated based on predetermined criteria to ensure consistency and rigor. The evaluation process focused on the following key elements:

Clinical goal: The clinical outcome targeted by the machine learning model, as defined in the abstract or introduction.
Dental specialty: The dental specialty addressed by the model (e.g., orthodontics, oral radiology).
Model type: The type of machine learning model used (e.g., classification, regression, segmentation).
Outlier reporting: Whether the study reported handling of outliers.
Validation strategy: Whether the study used any approach to validate the model’s performance, including k-fold cross-validation or external datasets.
Performance metrics: Evaluation of model performance, including discrimination (e.g., AUROC), calibration (e.g., Brier score), and clinically relevant metrics (e.g., PPV, NPV).
Interpretability and bias: Whether the study reported methods to interpret model outputs (e.g., SHAP values) and addressed bias or fairness.

The full list of evaluation criteria, along with their detailed descriptions and definitions, is provided in S2 Table.

Synthesis of results

The extracted variables will be summarized as percentages.

Subgroup analyses will be performed based on: (a) dental specialty (e.g., orthodontics, oral radiology), (b) model type (e.g., classification, segmentation), and (c) machine learning approach (supervised vs. unsupervised).

We will present the synthesis in three ways: (a) a brief narrative summary highlighting key findings; (b) a summary table listing the percentage of key features (e.g., validation strategy, calibration reporting, bias assessment); and (c) a stacked bar chart that visualizes the distribution of studies by publication year and dental specialty.

Results

Overall study characteristics

The overall study characteristics of the 280 included studies are as below:

Study design:

267 (95.36%) of studies were retrospective.
10 (3.57%) of studies were prospective and 3 (1.07%) were both retrospective and prospective.

Data handling:

236 (84.29%) of studies did not report any handling of outliers.

Model tasks:

Classification tasks were the most common, conducted in 167 (59.64%) of studies.
Generative tasks appeared in 4 (1.43%) of studies, beginning from 2021 onward.

Dental specialties:

Oral and maxillofacial radiology comprised 77 (27.50%) of studies, making it the most researched specialty.
Endodontics, pediatric dentistry, oral medicine, orofacial pain and dental anesthesiology were the least represented, together accounting for 8.93%.

Measures of model performance

Validation Strategies: 14.64% of the studies did not report any validation strategy.
Calibration metrics: Calibration metrics, such as Brier scores and calibration plots, were reported in only 20.36% of studies, indicating a lack of focus on the accuracy of predicted probabilities.
Clinically relevant metrics: Among the included studies, 68.21% reported at least one or more clinically relevant performance metrics, such as true positives, true negatives, false positives, and false negatives. However, as detailed in the Discussion, a comprehensive discussion of the clinical implications of these specific error types was often lacking in the studies.

Bias and fairness considerations:

Only 25.71% of studies explicitly addressed bias, fairness, or generalizability to diverse populations, underscoring an area needing significant improvement.

Subgroup analysis results

By dental specialties (considering specialties with 10 or more papers):
- (a) Studies related to endodontics focused primarily on classification tasks (80%).
- (b) Oral and maxillofacial surgery had the highest percentage of regression tasks (26.19%).
By model tasks:
- (a) Classification tasks dominated, accounting for 59.64% of studies.
- (b) Segmentation, regression, and generative tasks followed, comprising 23.21%, 15.71%, and 1.43%, respectively.
By machine learning approach:
- (a) Supervised approach was the most common (98.21%).

Key findings

Discrimination and calibration performance evaluation: There is a notable discrepancy in the number of studies reporting discrimination performance (86.43%) compared to reporting calibration performance metrics (20.36%). This issue brings up questions regarding the trustworthiness of these models and the associated systemic risks when used within clinical setups.
While 68.21% reported clinically relevant performance metrics, there was limited discussion of the distinctions and implications of such metrics. These metrics are essential for dental professionals to evaluate the practical utility of adopting these machine learning (ML) models in real-world clinical workflows. For instance, understanding how false negatives (missed diagnoses) differ from false positives (overdiagnoses) could provide insights into understanding the models’ reliability in clinical practice.
Outliers: 84.29% of studies did not mention outlier removal. This is critical as improper handling of outliers can lead to bias in the model parameters.
Reproducibility: Methodologies were described in a manner that allows reproduction in 30.71% of the studies. Likewise, datasets are available to the public in 11.79%.
Bias and equity: Just 25.71% of research took into account matters tied to bias and fairness. This emphasizes the potential threat of propagation of health inequality when the models are used in real-life situations.

These findings are summarized in Table 1.

Table 1. Key findings: performance metrics and bias reporting.

Metric	Percentage (%)
Outliers reported	15.71
Discrimination metrics reported	86.43
Calibration metrics reported	20.36
Clinically relevant metrics reported (e.g., true positives, true negatives, false positives, false negatives)	68.21
Interpretability tools used	39.64
Bias/fairness addressed	25.71
Datasets publicly available	11.79

Open in a new tab

Notes: The table summarizes the key findings regarding the performance metrics and bias reporting in the reviewed studies.

See Fig 2 for a stacked bar chart showing the annual number of ML studies by dental specialty.

The results highlight the urgent need for better standards in assessing machine learning models, especially to equitable healthcare outcomes for every patient demographic group and more reproducibility.

The study underscores the necessity for consistent reporting criteria in future research by highlighting the variation in methodological quality among the included studies.

Discussion

In this scoping review, we evaluated the papers proposing machine learning (ML) models in dentistry, focusing on different criteria related to study design, data management, methods, measures of model performance, and consideration of bias and fairness. Our results show that there is a nascent interest in unsupervised approaches in machine learning (ML) research in dentistry since 2021.

To ensure the efficient and fair implementation of ML in dentistry, several issues must be resolved:

More ML research in other dental specialties

The potential of ML in diagnostic imaging and treatment planning is highlighted by the predominance of studies in oral and maxillofacial radiology (27.50%), oral and maxillofacial surgery (15.00%), and general dentistry (14.29%). Methods including random forests, support vector machines (SVMs), and deep learning have been used to improve diagnostic precision and forecast treatment results. Notwithstanding these developments, the sparse use of ML in other dental specialties suggests a need for broader exploration and implementation.

Outlier reporting

Outlier removal in medical imaging is a nuanced topic that needs to be implemented to reduce model bias and improve performance metrics. As compared to numerical data, implementing this in imaging data will require a specialized approach.

Model performance metrics

The majority of the studies have reported discrimination performance metrics, but far fewer reported calibration metrics. In high-risk environments like healthcare, the models published should address calibration, as models need to be reliable in quantifying risks to ensure transparency, trust, and equity in the use of AI in healthcare. When ML models are not calibrated, they may produce overconfident predictions that skew clinical judgments and may misallocate resources. For example, a poorly calibrated model may exaggerate risk, resulting in needless treatments, or underestimate illness risk in specific populations, resulting in insufficient therapy.

Interpretability and explainability of errors

In our study, at least one measure of a clinically important performance indicator, such as true positives, true negatives, false positives, and false negatives, was reported in 68.21% of studies. These metrics are pivotal in assessing model performance beyond general accuracy or other broad measures.

Although interpretability is essential for establishing trust in machine learning (ML) models, focusing just on it runs the risk of reinforcing bias by oversimplifying each decision an ML model makes and neglecting the underlying systemic biases in the data. Therefore, we advocate for more emphasis on the explainability of errors in ML model performance rather than on their general interpretability. For example, false positives (overdiagnoses) might result in wasteful procedures, while false negatives (missed diagnoses) can postpone critical treatments. Similarly, true positives and true negatives provide insights into the model’s strength in correctly identifying clinical outcomes.

Researchers can increase clinician trust, enhance patient outcomes, and guarantee a safer integration of machine learning into dental practices by highlighting the explainability of these errors.

Equity, bias, and fairness considerations

Few studies addressed issues of bias, fairness, or generalizability across diverse populations. This is critical, as the datasets used for training ML models, more often than not, do not capture the implicit variables that affect the outcomes across geographies and at different times. Hence, such models may not perform adequately in varied demographic settings, potentially exacerbating health inequity. Obermeyer et al. [9] demonstrated the importance of subgroup analyses by showing that a widely used clinical model systematically underestimated health risk among Black patients. Their findings underscore the critical importance of selecting appropriate outcome variables: the original model predicted healthcare costs, which did not accurately reflect true health status due to underlying systemic inequities in healthcare access.

Regulatory and ethical implications

Following legal and ethical guidelines is essential when integrating machine learning into healthcare practice. In its proposal for an artificial intelligence law, the European Commission highlights the importance of trustworthy AI by classifying AI applications according to risk categories and defining specifications for high-risk systems [10]. Dental machine learning applications, especially those that impact clinical decisions, can be classified as high-risk, requiring strict adherence to guidelines to ensure patient safety [11]. With the recent increase in FDA-approved medical devices, it becomes more imperative to recalibrate evaluation standards regularly [12].

Limitations and future directions

This scoping review has a few limitations. First, this review was limited only to PubMed, which may have excluded relevant studies from other databases. This single-database approach means our findings, particularly claims regarding the ‘field-wide’ scope or being the ‘largest’ synthesis, should be interpreted as specific to the PubMed-indexed literature. Second, our reliance on studies published in English may have introduced language bias. Third, our review does not include studies published from January 1, 2024, as the review of the existing 280 studies required considerable time and effort. The inclusion of those published studies may have strengthened the review, as the rapid pace of ML research may render some findings obsolete. Fourth, this review is limited by the heterogeneity of included studies, as the studies differed on the model task and the type of data on which it was trained to perform (e.g., different types of radiographs, photographs, clinical notes, etc.).

Our findings are consistent with previous reviews that highlight the growing use of deep learning in medical imaging, but also indicate shortcomings in bias reporting [13] and calibration performance evaluation [14]. Our specific recommendations for future dental ML research are four-fold. First, bias mitigation strategies such as (a) subgroup analysis; (b) fairness-aware machine learning techniques; and (c) prospective studies with diverse populations should be implemented to promote fairness in dental AI research. Secondly, to enable wider clinical adoption and guide clinical decision-making, it is imperative to include calibration performance metrics such as: (a) reliability diagrams plotting predicted probabilities against actual outcomes, and the calibration error reporting the difference between these two; (b) bin-based error aggregates like expected calibration error (ECE), maximum calibration error (MCE); and (c) scalar proper scoring rules such as Brier score, calibration slope, and intercept. Thirdly, future reviews should incorporate additional databases such as those strong in computer science (e.g., IEEE Xplore, Scopus) apart from PubMed, which is primarily focused on biomedical and health sciences literature. Finally, techniques such as thresholding techniques based on autoencoder neural networks and clustering algorithms can be used to identify image outliers. Future studies should report handling of outliers, even if there were no outliers identified.

Collaborations between dental professionals, data scientists, and regulatory bodies [15] are essential to harness the full potential of ML in dentistry while safeguarding ethical and equitable care. Future reviews should include additional databases to ensure comprehensive coverage.

Critical-care datathons built on the MIMIC (Medical Information Mart for Intensive Care) database [16] that aim to address specific clinical challenges using AI demonstrate how open data, an official scoring script, and a public leaderboard can accelerate research progress. Dentistry currently lacks such community benchmarks. Professional societies and researchers should coordinate to release de-identified dental image repositories with standard train/validation/test splits and automated evaluation scripts, akin to the CheXpert [17] or PhysioNet [18] Challenge frameworks.

Implications for stakeholders

Practitioners: Evaluate calibration plots and subgroup errors before deployment. Educators: Incorporate fairness toolkits into dental curricula. Policymakers: Require public reporting of calibration and bias analyses for AI devices. Researchers & professional societies: Establish open, standardized data repositories (e.g., multi-institutional radiograph sets with de-identified metadata) and public benchmark tasks so models can be compared on identical test sets.

Conclusion

In conclusion, the application of ML in dentistry has evolved significantly with rapid advancement in the field. However, critical gaps remain in the research, including the evaluation of calibration performance, the reporting of bias and outlier handling, and data and code sharing. Addressing these gaps is needed to avoid reinforcing biases and to ensure equitable patient-centered care.

Supporting information

S1 File. PRISMA-ScR checklist.

The completed PRISMA-ScR checklist is provided in the supporting information file labeled “S1_File.pdf” to ensure transparency and adherence to reporting standards.

(PDF)

pdig.0000940.s001.pdf^{(177KB, pdf)}

S2 File. Data extracted from analyzed studies.

This file is provided in the supporting information file labeled “S2_File.xlsx”. It contains the data extracted from all included studies.

(XLSX)

pdig.0000940.s002.xlsx^{(394.5KB, xlsx)}

S1 Table. Search strategy for PubMed.

The search strategy is provided in the supporting information file labeled “S1_Table.pdf”.

(PDF)

pdig.0000940.s003.pdf^{(34.9KB, pdf)}

S2 Table. Evaluation criteria for included studies.

The evaluation criteria are provided in the supporting information file labeled “S2_Table.pdf”.

(PDF)

pdig.0000940.s004.pdf^{(102.1KB, pdf)}

Acknowledgments

We would like to thank Stuti Agrawal, Dr. Kshitij Chavan, Dr. Rounak Dey, Soaad Hossain, Dr. Kokila Jaiswal, Dr. Gayathri Ramasamy, N P V S Subrahmanya Sastry, Dev Sharma, Professor Ashlesha Shimpi, Dr. Neel Shimpi, Dr. Arjun Singh, Dr. Mohita Sinha, Deeti Tarsaria, Dr. Herninder Kaur Thind, Gnana Kartheek Tirumalasetti, Eptehal Nashnoush and Professor Karmen Williams for their invaluable support in reviewing the papers. Their feedback and insights have significantly enriched the quality of this work.

Data Availability

All data analyzed during this study were obtained from publicly available databases, including PubMed. The search strategy and inclusion criteria are outlined in the manuscript, and a detailed search strategy is provided in the Supporting Information. The extracted data used for this analysis are provided as supplementary material S2 File to ensure transparency and reproducibility. No additional data were generated or analyzed. For further inquiries, please contact the corresponding author.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]
2.Leite AF, Vasconcelos K de F, Willems H, Jacobs R. Radiomics and machine learning in oral healthcare. Proteomics Clin Appl. 2020;14(3):e1900040. doi: 10.1002/prca.201900040 [DOI] [PubMed] [Google Scholar]
3.Arsiwala-Scheppach LT, Chaurasia A, Müller A, Krois J, Schwendicke F. Machine learning in dentistry: a scoping review. J Clin Med. 2023;12(3):937. doi: 10.3390/jcm12030937 [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Jha Kukreja B, Kukreja P. Integration of artificial intelligence in dentistry: a systematic review of educational and clinical implications. Cureus. 2025;17(2):e79350. doi: 10.7759/cureus.79350 [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. doi: 10.7326/M18-0850 [DOI] [PubMed] [Google Scholar]
6.Richardson WS, Wilson MC, Nishikawa J, Hayward RSA. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12. doi: 10.7326/acpjc-1995-123-3-a12 [DOI] [PubMed] [Google Scholar]
7.Zotero (RRID:SCR_013784). https://www.zotero.org
8.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]
10.European Commission. Proposal for a regulation laying down harmonised rules on artificial intelligence. https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence
11.Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27(4):582–4. doi: 10.1038/s41591-021-01312-x [DOI] [PubMed] [Google Scholar]
12.Muehlematter UJ, Daniore P, Vokinger KN. Approval of artificial intelligence, machine learning-based medical devices in the USA and Europe 2015 -20): a comparative analysis. Lancet Digit Health. 2021;3(3):e195–203. doi: 10.1016/S2589-7500(20)30292-2 [DOI] [PubMed] [Google Scholar]
13.Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc. 2024;31(5):1172–83. doi: 10.1093/jamia/ocae060 [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22. doi: 10.1016/j.jclinepi.2022.11.015 [DOI] [PubMed] [Google Scholar]
15.Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med. 2020;3:118. doi: 10.1038/s41746-020-00324-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. doi: 10.1038/s41597-022-01899-x [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. AAAI. 2019;33(01):590–7. doi: 10.1609/aaai.v33i01.3301590 [DOI] [Google Scholar]
18.Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. doi: 10.1161/01.cir.101.23.e215 [DOI] [PubMed] [Google Scholar]

PLOS Digit Health. doi: 10.1371/journal.pdig.0000940.r001

Decision Letter 0

Henry Horng-Shing Lu, Erika Ong

15 Apr 2025

PDIG-D-24-00600Machine learning in dentistry: a scoping reviewPLOS Digital Health Dear Dr. Celi, Thank you for submitting your manuscript to PLOS Digital Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Digital Health's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 30 days. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at digitalhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pdig/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to any formatting updates and technical items listed in the 'Journal Requirements' section below.* A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.* An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. We look forward to receiving your revised manuscript.Kind regards,Erika OngAcademic EditorPLOS Digital Health Erika OngAcademic EditorPLOS Digital Health Leo Anthony CeliEditor-in-ChiefPLOS Digital Healthorcid.org/0000-0001-6712-6626Journal Requirements:

1. We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex.

Additional Editor Comments (if provided):[Note: HTML markup is below. Please do not edit.]Reviewers' Comments:Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

Reviewer #4: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: No

Reviewer #3: N/A

Reviewer #4: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Digital Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This scoping review provides a much-needed overview of the landscape of machine learning in dentistry. It sheds light on important shortcomings in current research (notably the lack of calibration reporting, bias considerations, and open data). The paper is generally well-organized and informative. I offer the following constructive suggestions to further enhance the clarity and impact of the review:

• The Introduction could be made more concise, for example, the sentence “Dentistry... is ready to gain a lot from these advancements” could be rephrased in a more scholarly tone. Additionally, it’s better to avoid repetition – the phrase “to the best of our knowledge” appears twice in proximity.

• The Authors mention that previous reviews have explored ML in dentistry but that this work is unique in emphasizing explainability of metrics, calibration, and bias (which is an excellent positioning). It would strengthen the introduction to briefly cite those prior reviews and explicitly state how this review differs from them.

• The methodology is generally well-detailed and rigorous. One concern is the restriction to PubMed-indexed articles. Considering the interdisciplinary nature of ML, relevant studies might appear in computer science venues or other databases. In the manuscript’s Limitations, The authors acknowledge that limiting to PubMed and English could exclude some studies. To preempt readers’ concerns, it would help to briefly justify this in the Methods as well – for example, stating that PubMed was chosen for its coverage of biomedical journals.

• The authors should perhaps specify whether the multi-investigator reviewing process applied to full-text screening as well (it’s currently implied).

• One suggestion is to clarify in the Methods how the TRIPOD+AI rubric was applied. TRIPOD+AI is a reporting guideline; did the authors score each study on certain checklist items, or simply used it as an informal guide to decide which information to record (like calibration, interpretability, bias)?

• One area the authors might want to expand on is model validation approaches. Many readers will wonder: how many studies performed external validation or multi-center evaluation versus those that only did internal validation?

• Table 1 succinctly summarizes key performance metric reporting frequencies, which is a great addition. Depending on journal space, the authors might consider an additional figure or table to enrich the results. For example, a bar chart of the number of studies per dental specialty or per year could visually illustrate trends.

• To further enrich the discussion, the authors can consider adding concrete examples or referencing known techniques. For instance, when discussing bias and fairness, they can mention subgroup analysis and fairness-aware algorithms as remedies– citing a relevant study or framework (perhaps outside dentistry) that successfully applied such techniques, to illustrate what future dental ML studies could do.

• Since this is a scoping review, it would be helpful to explicitly state what the implications are for different stakeholders. The authors can consider mentioning implications for practitioners or educators.

• One additional future direction to mention could be the development or adoption of community data repositories or benchmarks for dental AI. Given that small number of studies shared their data publicly, the field would benefit from common datasets.

• Minor points: Overall, the manuscript is well-written. There are a few minor typographical and grammatical errors to correct for a polished final version. For example, “knolwedge” should be “knowledge”, and “simulatenously” should be “simultaneously”.

Reviewer #2: This manuscript addresses an important and rapidly evolving topic—the use of machine learning (ML) in dentistry—and commendably covers a wide breadth of studies. While the review is comprehensive in scope, incorporating the following refinements could further enhance clarity, transparency, and methodological rigor:

1. While scoping reviews can serve exploratory purposes, transparency and reproducibility would be further strengthened if a protocol is registered or published a priori (e.g., on Open Science Framework (OSF) or in journals that accept scoping review protocols). Just as a note for future scoping reviews: PROSPERO does not accept scoping review protocols.

2. Although PRISMA-ScR was used for reporting, it is unclear what methodological framework guided the conduct of the review (e.g., JBI methodology guidance for scoping reviews, Arksey and O'Malley, or its updated versions). Clarifying this could provide readers with greater insight into the review process.

3. Including the study types (study design) in the eligibility criteria could assist readers in understanding the weight and diversity of the evidence reviewed. Additionally, describing the study types in the "Characteristics of Included Studies" section would provide a clearer overview of the included literature.

4. Is there a specific rationale for selecting the time frame 2018 to 2023? If so, it would be helpful to mention this explicitly in the eligibility criteria.

5. Since the PICO framework is used, consider integrating it directly into the inclusion criteria section. This could streamline the methodology and avoid redundancy.

6. The PRISMA flow diagram could be further improved by including details such as the number of duplicates removed, specific reasons for article exclusion, and information at each screening stage.

7. The PRISMA-ScR checklist item “Synthesis of Results” is listed as reported on Page 4, but there is no clear description of how results will be synthesized on that page. This section could be a suitable place to describe the planned analyses, such as subgroup analysis or other methods of result synthesis.

8. It may be helpful to provide a comprehensive description of the characteristics of included studies (e.g., study design, geographical distribution) and use a thematic approach to present the findings in a more concise and organized manner, which could improve readability.

9. You highlighted the need to address equity and bias; however, providing specific examples from the reviewed literature could better demonstrate potential real-world implications or consequences of unaddressed biases.

10. Including practical recommendations for how dental practitioners can critically evaluate ML models before integrating them into clinical practice could further enhance the applicability of your findings.

11. While you acknowledged the limitation regarding the exclusive use of PubMed, it may be worth explicitly recommending that future reviews incorporate additional databases (e.g., Embase, Scopus) to ensure a more comprehensive literature search.

12. Minor Comments and Corrections:

• Grammar and Proofreading: Page 3, line 24: typo correction — "knolwedge" → "knowledge".

• Percentage Reporting: Ensure consistency in reporting percentages (e.g., use either 60% or 59.8% consistently).

• Figures: Consider including more graphical summaries (e.g., visual representations of dental specialties and ML tasks) to facilitate quick interpretation for readers.

Reviewer #3: The authors perform a scoping review of the dentistry literature to assess ML models in dentistry from 2018 to 2023. They follow the PICO framework and using PRISMA-ScR.

The overall quality of the review is high with careful attention paid to critical details regarding potential interpretability and downstream translation of these models. The emphasis on calibration and bias of the models is particularly important given the impact these elements will have on downstream translation. The search strategy only included PubMed and did not include IEEE Xplore for more technical type papers which may have offered greater methodological detail in modeling strategies. The authors should note the use of only one database in their limitations.

It would be helpful also to understand how many studies were in each type of application, for example, computer vision vs. tabular analysis to achieve its relevant tasks of classification, segmentation, etc.

Minor:

In the manuscript there is use of informal English such as "can't" rather than cannot and "couldn't" rather than could not. Would defer to the publisher regarding appropriate use of contractions.

Reviewer #4: General Assessment:

The manuscript explores a timely and relevant topic with well-defined objectives and a structured methodology, appropriate for a scoping review. However, several critical issues must be addressed before it can be considered for publication.

Major Concerns:

Language and Style:

The manuscript requires thorough language editing. Grammatical errors and awkward phrasing affect clarity. Redundancies (e.g., “To the best of our knowledge” appearing twice) and placeholders (e.g., "REFERENCE") should be corrected.

Methods:

The definition of machine learning is unclear. For example, exclusion criterion #1 may confuse traditional statistical methods with ML. The description of evaluation criteria lacks details on scoring, instruments used, and reviewer agreement resolution. The absence of PROSPERO registration needs clearer justification.

Results Interpretation:

Although percentages are presented clearly, the discussion is often superficial. Important findings, like the low reporting of calibration metrics, should be connected to their clinical relevance. Repetitive phrases should be consolidated, and vague language replaced with more precise analysis.

Bias, Equity, and Reproducibility:

The treatment of fairness and bias is shallow. Mentioning that 25% of studies addressed bias is not enough—examples and tools (e.g., AIF360, Fairlearn) should be discussed. Similarly, the low data sharing rate (11.72%) demands stronger commentary on its impact on reproducibility and open science.

Discussion and Conclusion:

The discussion is generally well-structured, with relevant ethical and regulatory context. However, claims about unsupervised learning trends contradict the data and should be revised. The discussion on interpretability needs concrete examples. Limitations should be more deeply analyzed, especially regarding the exclusive use of PubMed. The conclusion lacks a strong final message and should include 2–3 practical recommendations for future research.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions.Reproducibility:To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

PLOS Digit Health. 2025 Jul 23;4(7):e0000940. doi: 10.1371/journal.pdig.0000940.r002

Author response to Decision Letter 1

1 Jun 2025

Attachment

Submitted filename: Response to Reviewers.docx

pdig.0000940.s005.docx^{(26.7KB, docx)}

PLOS Digit Health. doi: 10.1371/journal.pdig.0000940.r003

Decision Letter 1

Henry Horng-Shing Lu, Erika Ong

25 Jun 2025

Machine learning in dentistry: a scoping review

PDIG-D-24-00600R1

Dear Dr Celi,

We are pleased to inform you that your manuscript 'Machine learning in dentistry: a scoping review' has been provisionally accepted for publication in PLOS Digital Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow-up email from a member of our team.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact digitalhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Digital Health.

Best regards,

Erika Ong

Academic Editor

PLOS Digital Health

***********************************************************

Additional Editor Comments (if provided):

Reviewer Comments (if any, and for reference):

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

Reviewer #3: All comments have been addressed

Reviewer #4: (No Response)

**********

2. Does this manuscript meet PLOS Digital Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

Reviewer #3: N/A

Reviewer #4: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: (No Response)

Reviewer #3: Yes

Reviewer #4: No

**********

6. Review Comments to the Author

Reviewer #1: I appreciate the time and effort that the authors have put into correcting the original manuscript and incorporating the feedback.

I agree with the changes made and have no additional comments.

Reviewer #2: (No Response)

Reviewer #3: (No Response)

Reviewer #4: Thank you for the opportunity to review your manuscript entitled “Machine learning in dentistry: a scoping review.” This work addresses a timely and important topic at the intersection of artificial intelligence and dental clinical practice. The study is methodologically grounded, follows PRISMA-ScR guidelines, and provides a broad overview of 280 studies published between 2018 and 2023.

While the manuscript has strong potential, it does not yet meet the full publication criteria of PLOS Digital Health in its current form. Below are detailed comments and recommendations for improvement:

1. Language and Presentation

- The manuscript requires significant language editing. The English is often unclear or inconsistent, with a mix of academic and informal tone. Phrases like “dentistry has a long way to go” and “this is critical, as…” weaken scientific precision.

- Some terminology is used without clear definition (e.g., “generative tasks,” “direct point-of-care use”), and terms like “bias” and “fairness” are often interchanged without technical clarity.

Recommendation: Have the manuscript professionally edited for clarity, consistency, and scientific tone.

2. Redundancy and Structure

- The manuscript frequently repeats concepts across sections, particularly related to calibration, explainability, and bias. This weakens the impact of the key messages.

- The Introduction and Discussion sections contain overlapping content that can be streamlined.

Recommendation: Consolidate redundant sections and focus each paragraph on a distinct, well-supported point.

3. Analytical Depth

- The analysis is largely descriptive (e.g., percentage reporting of various model features), which is appropriate for a scoping review. However, the conclusions drawn often overreach the data, suggesting systemic field-wide risks without deeper statistical or stratified analysis.

- Claims regarding calibration, fairness, and risk in real-world implementation would benefit from more nuanced analysis — for example, by grouping results by geography, algorithm type, or model task.

Recommendation: Limit conclusions to what is strictly supported by the data, or enhance the analysis to justify broader claims.

4. Transparency and Supplementary Material

- The manuscript refers to supplementary materials (e.g., S2 File with extracted data), which is good practice. However, it was not possible to assess these files from the PDF itself.

Recommendation: Ensure all supporting files (search strategy, extracted data) are included and clearly referenced in submission.

5. Ethical and Methodological Standards

- The manuscript meets ethical requirements. It does not involve human subjects or private data, and the methodology is aligned with scoping review standards.

- The use of TRIPOD+AI criteria is a strength and improves reporting rigor.

Overall Recommendation: Minor Revision

The topic is highly relevant, and the paper is methodologically promising. However, the manuscript requires substantial improvements in:

- Language and clarity;

- Depth and precision of analysis;

- Alignment between results and conclusions;

- Overall scientific tone and structure.

Addressing these points will significantly improve the manuscript’s clarity, impact, and suitability for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

Do you want your identity to be public for this peer review? If you choose “no”, your identity will remain anonymous but your review may still be made public.

For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

Reviewer #4: No

**********

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File. PRISMA-ScR checklist.

The completed PRISMA-ScR checklist is provided in the supporting information file labeled “S1_File.pdf” to ensure transparency and adherence to reporting standards.

(PDF)

pdig.0000940.s001.pdf^{(177KB, pdf)}

S2 File. Data extracted from analyzed studies.

This file is provided in the supporting information file labeled “S2_File.xlsx”. It contains the data extracted from all included studies.

(XLSX)

pdig.0000940.s002.xlsx^{(394.5KB, xlsx)}

S1 Table. Search strategy for PubMed.

The search strategy is provided in the supporting information file labeled “S1_Table.pdf”.

(PDF)

pdig.0000940.s003.pdf^{(34.9KB, pdf)}

S2 Table. Evaluation criteria for included studies.

The evaluation criteria are provided in the supporting information file labeled “S2_Table.pdf”.

(PDF)

pdig.0000940.s004.pdf^{(102.1KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.docx

pdig.0000940.s005.docx^{(26.7KB, docx)}

Data Availability Statement

[pdig.0000940.ref001] 1.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. doi: 10.1038/s41591-018-0300-7 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref002] 2.Leite AF, Vasconcelos K de F, Willems H, Jacobs R. Radiomics and machine learning in oral healthcare. Proteomics Clin Appl. 2020;14(3):e1900040. doi: 10.1002/prca.201900040 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref003] 3.Arsiwala-Scheppach LT, Chaurasia A, Müller A, Krois J, Schwendicke F. Machine learning in dentistry: a scoping review. J Clin Med. 2023;12(3):937. doi: 10.3390/jcm12030937 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref004] 4.Jha Kukreja B, Kukreja P. Integration of artificial intelligence in dentistry: a systematic review of educational and clinical implications. Cureus. 2025;17(2):e79350. doi: 10.7759/cureus.79350 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref005] 5.Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. doi: 10.7326/M18-0850 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref006] 6.Richardson WS, Wilson MC, Nishikawa J, Hayward RSA. The well-built clinical question: a key to evidence-based decisions. ACP J Club. 1995;123(3):A12. doi: 10.7326/acpjc-1995-123-3-a12 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref007] 7.Zotero (RRID:SCR_013784). https://www.zotero.org

[pdig.0000940.ref008] 8.Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024;385:e078378. doi: 10.1136/bmj-2023-078378 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref009] 9.Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447–53. doi: 10.1126/science.aax2342 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref010] 10.European Commission. Proposal for a regulation laying down harmonised rules on artificial intelligence. https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence

[pdig.0000940.ref011] 11.Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27(4):582–4. doi: 10.1038/s41591-021-01312-x [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref012] 12.Muehlematter UJ, Daniore P, Vokinger KN. Approval of artificial intelligence, machine learning-based medical devices in the USA and Europe 2015 -20): a comparative analysis. Lancet Digit Health. 2021;3(3):e195–203. doi: 10.1016/S2589-7500(20)30292-2 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref013] 13.Chen F, Wang L, Hong J, Jiang J, Zhou L. Unmasking bias in artificial intelligence: a systematic review of bias detection and mitigation strategies in electronic health record-based models. J Am Med Inform Assoc. 2024;31(5):1172–83. doi: 10.1093/jamia/ocae060 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref014] 14.Andaur Navarro CL, Damen JAA, van Smeden M, Takada T, Nijman SWJ, Dhiman P, et al. Systematic review identifies the design and methodological conduct of studies on machine learning-based prediction models. J Clin Epidemiol. 2023;154:8–22. doi: 10.1016/j.jclinepi.2022.11.015 [DOI] [PubMed] [Google Scholar]

[pdig.0000940.ref015] 15.Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med. 2020;3:118. doi: 10.1038/s41746-020-00324-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref016] 16.Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1. doi: 10.1038/s41597-022-01899-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[pdig.0000940.ref017] 17.Irvin J, Rajpurkar P, Ko M, Yu Y, Ciurea-Ilcus S, Chute C, et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. AAAI. 2019;33(01):590–7. doi: 10.1609/aaai.v33i01.3301590 [DOI] [Google Scholar]

[pdig.0000940.ref018] 18.Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):E215-20. doi: 10.1161/01.cir.101.23.e215 [DOI] [PubMed] [Google Scholar]

PERMALINK

Machine learning in dentistry: a scoping review

Shrey Lakhotia

Hormazd Godrej

Amandeep Kaur

Chaitanya Sai Nutakki

Michelle Mun

Pascal Eber

Leo Anthony Celi

Roles

Abstract

Author summary

Introduction

Materials and methods

Scope of the review

Information sources and search strategy

Selection process

Data collection process

Eligibility criteria

Fig 1. PRISMA flow diagram for study selection.

Synthesis of results

Results

Overall study characteristics

Measures of model performance

Subgroup analysis results

Key findings

Table 1. Key findings: performance metrics and bias reporting.

Fig 2. Annual count of studies by dental specialty.

Discussion

More ML research in other dental specialties

Outlier reporting

Model performance metrics

Interpretability and explainability of errors

Equity, bias, and fairness considerations

Regulatory and ethical implications

Limitations and future directions

Implications for stakeholders

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Henry Horng-Shing Lu

Erika Ong

Roles

Author response to Decision Letter 1

Decision Letter 1

Henry Horng-Shing Lu

Erika Ong

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases