Artificial intelligence in the diagnosis of dental diseases on panoramic radiographs: a preliminary study

Junhua Zhu; Zhi Chen; Jing Zhao; Yueyuan Yu; Xiaojuan Li; Kangjian Shi; Fan Zhang; Feifei Yu; Keying Shi; Zhe Sun; Nengjie Lin; Yuanna Zheng

doi:10.1186/s12903-023-03027-6

. 2023 Jun 3;23:358. doi: 10.1186/s12903-023-03027-6

Artificial intelligence in the diagnosis of dental diseases on panoramic radiographs: a preliminary study

Junhua Zhu ¹, Zhi Chen ¹, Jing Zhao ¹, Yueyuan Yu ¹, Xiaojuan Li ², Kangjian Shi ², Fan Zhang ², Feifei Yu ¹, Keying Shi ¹, Zhe Sun ¹, Nengjie Lin ¹, Yuanna Zheng ^1,^✉

PMCID: PMC10239110 PMID: 37270488

Abstract

Background

Artificial intelligence (AI) has been introduced to interpret the panoramic radiographs (PRs). The aim of this study was to develop an AI framework to diagnose multiple dental diseases on PRs, and to initially evaluate its performance.

Methods

The AI framework was developed based on 2 deep convolutional neural networks (CNNs), BDU-Net and nnU-Net. 1996 PRs were used for training. Diagnostic evaluation was performed on a separate evaluation dataset including 282 PRs. Sensitivity, specificity, Youden’s index, the area under the curve (AUC), and diagnostic time were calculated. Dentists with 3 different levels of seniority (H: high, M: medium, L: low) diagnosed the same evaluation dataset independently. Mann-Whitney U test and Delong test were conducted for statistical analysis (ɑ=0.05).

Results

Sensitivity, specificity, and Youden’s index of the framework for diagnosing 5 diseases were 0.964, 0.996, 0.960 (impacted teeth), 0.953, 0.998, 0.951 (full crowns), 0.871, 0.999, 0.870 (residual roots), 0.885, 0.994, 0.879 (missing teeth), and 0.554, 0.990, 0.544 (caries), respectively. AUC of the framework for the diseases were 0.980 (95%CI: 0.976–0.983, impacted teeth), 0.975 (95%CI: 0.972–0.978, full crowns), and 0.935 (95%CI: 0.929–0.940, residual roots), 0.939 (95%CI: 0.934–0.944, missing teeth), and 0.772 (95%CI: 0.764–0.781, caries), respectively. AUC of the AI framework was comparable to that of all dentists in diagnosing residual roots (p > 0.05), and its AUC values were similar to (p > 0.05) or better than (p < 0.05) that of M-level dentists for diagnosing 5 diseases. But AUC of the framework was statistically lower than some of H-level dentists for diagnosing impacted teeth, missing teeth, and caries (p < 0.05). The mean diagnostic time of the framework was significantly shorter than that of all dentists (p < 0.001).

Conclusions

The AI framework based on BDU-Net and nnU-Net demonstrated high specificity on diagnosing impacted teeth, full crowns, missing teeth, residual roots, and caries with high efficiency. The clinical feasibility of AI framework was preliminary verified since its performance was similar to or even better than the dentists with 3–10 years of experience. However, the AI framework for caries diagnosis should be improved.

Keywords: Artificial intelligence (AI), Dental disease, Diagnosis, Panoramic radiograph, Preliminary reading

Background

Dental diseases are prevalent all over the world. According to the 2017 Global Burden of Disease study, approximately 3.5 billion people worldwide suffer from dental diseases, mainly untreated caries, severe periodontal disease, edentulism, and severe tooth loss (with just 1 to 9 remaining teeth) [1]. Dental diseases, especially untreated ones, may cause infections, pain, restricted mouth opening and even life-threatening conditions that seriously affect quality of life, productivity and work capacity, and social participation of patients [2].

Clinical examination combined with radiographs is a commonly used method for the diagnosis of dental diseases [3]. Due to the complex anatomy and progress of diseases, interpreting radiographs quickly and accurately is challenging for the dentists [4]. Artificial intelligence (AI) have been proven to significantly increase the workflow efficiency and accuracy in the field of medical imaging [5]. Nowadays, images in dentistry are commonly digitizing and easily translated into computer language [6]. Therefore, the application of AI in the auxiliary diagnosis of dental diseases is promising [7, 8].

In the field of oral and maxillofacial radiology, the studies on the application of AI were mainly based on panoramic radiographs (PRs) [9], since they have a wide range of display, can be easily obtained in dental clinic, and are suitable for the computer-aid diagnose of various dental diseases or conditions [10]. However, low contrast, overlapping structures and unclear edges of teeth in PRs increase the difficulty of segmentation [11, 12]. In recent years, considerable results in the segmentation have been achieved by using of convolutional neural networks (CNNs)-based image segmentation [13, 14]. Many CNNs-based models are developed for the diagnosis of a particular disease or condition [15–19]. However, in fact, patients always suffer from multiple dental diseases at the same time, which can be identified by PRs [20, 21]. Until now, there are limited studies related to CNN-based diagnosis of multiple diseases on PRs [22–24].

In our previous study, we proposed a dual subnetworks structure based on border guidance and feature map distortion, called BDU-Net [25]. It showed great potential on improving the performance of teeth instance segmentation. In the presence of missing teeth or misalignment, BDU-Net’s segmentation performance appeared to be better than other networks. Therefore, in this study we aimed to built an AI framework based on 2 deep CNNs, BDU-Net and no-new-Net (nnU-Net) for diagnosing 5 common dental diseases on PRs, and the null hypothesis was that there is no difference between the performance of AI framework and dentists. The initial performance of the AI framework on diagnosing dental diseases was satisfactory, except caries. The clinical feasibility of the AI framework was preliminary verified by comparing with the diagnosing results and efficiency of dentists with different experience. But at the same time, some limitations and problems were revealed.

Methods

Ethics approval

The study was conducted at the Stomatology Hospital of Zhejiang Chinese Medical University. PRs were taken with the patients’ informed consents for their therapeutic or diagnostic purposes, and these data could be used for medical research without compromising their privacy. Therefore, no additional informed consents from these patients were added to this study. The study was approved by the Ethics Committee of Stomatology Hospital of Zhejiang Chinese Medical University (approval no. 330,108,002 − 202,200,005), and was performed in accordance with the Declaration of Helsinki.

Selection of panoramic radiographs

The PRs were retrospectively selected from an image database of patients who visited the hospital between April 2019 and July 2021. The inclusion criteria for PRs included: permanent dentition: age > 16. The exclusion criteria included: (1) retained deciduous teeth and deciduous dentition; (2) severe crowded teeth (more than 8 mm per arch); (3) blurred and incomplete PRs were excluded from further analysis; (4) artifact of earrings, glasses and removable dentures on the PRs; (5) edentulous jaw. All PRs were produced using a Sirona digital machine (ORTHOPHOS XG 5 DS Ceph, Bensheim, Germany) with standard parameters, operating with tube voltages between 60 and 90 kV and tube operating currents between 1 and 10 mA. A default program of the device with a predetermined magnification of 1.9× and a rotation time of 14.1 s was used for X-ray exposures. The resolution of PRs was 2440 × 1280. PRs were exported to Portable Network Graphics (PNG) format.

Annotation of the data

A total of 1996 images of 1996 patients including 912 males and 1084 females, with a mean age of 37 years (ranging from 17 ~ 83 years old) made up the training dataset. A free open-source software 3D Slicer was applied as the annotation tool. Three dentists with more than 12 years of clinical experience independently and blindly marked the areas of impacted teeth, residual roots, caries, full crowns, and other teeth on the PRs. All caries, which were identifiable on PRs, both primary and secondary, were marked. It meant that early caries that have not caused hard tissue defects were not studied. The annotated images were reviewed and revised by another 2 oral and maxillofacial imaging experts and achieved final confirmation [26]. Prior to the annotation and review process, each participant was instructed and calibrated on the annotation task using a standardized protocol. The set of common points of most labels was selected as ground truth.

All confirmed data were divided into 3 mutually exclusive sets. The training set in Table 1 was used to train the framework. The validation set was used in the training phase to verify the effectiveness of the framework training and to select hyperparameters. The test set was used for initial framework performance evaluation.

Table 1.

The numbers of diseases in the training set and evaluation set

Diseases	training set	evaluation set
Caries	1648	689
Residual roots	230	62
Impacted teeth	808	384
Full crowns	256	275
Missing teeth	1091	512

Open in a new tab

The AI framework development

Our proposed AI framework incorporated a full-mouth teeth instance segmentation network and a multiple dental disease segmentation network to enable the diagnosis of multiple dental diseases on PRs within a single framework (Fig. 1).

Fig. 1 — Multiple dental diseases’ diagnostic process of the proposed AI framework on PRs.

nnU-Net was used to segment the semantics of dental diseases. Since one nnU-Net can segment just one single disease, 4 parallel nnU-Net were designed for segment impacted teeth, residual roots, caries, and full crowns respectively. Like the other U-Net architectures, a U-shaped configuration of convolutional network layers with skip connections was designed [27]. nnU-Net analyzed the characteristics of the input dataset and performs suitable pre-processing operations on the dataset based on the information obtained from the analysis. The hyperparameters in nnU-Net was automatically set, such as training batch size, image block size, down-sampling times, etc. (Fig. 2). This study used a five-fold cross-validation approach, using cross-entropy loss and dice loss as loss functions during the training process. We chose Adam as the optimizer, with the learning rate set to a dynamic adjustment strategy and used an online data augmentation strategy during the training process.

Fig. 2 — The workflow of disease segmentation by nnU-Net.

In order to obtain the tooth position information and further diagnose the missing teeth, teeth instance segmentation network called BDU-Net was introduced. BDU-Net is mainly composed of two sub-networks. One is the region sub-network used to generate the region segmentation results, and the other is the border sub-network that adjusts the segmentation boundaries (Fig. 3). In this study, BDU-Net was used to segment all the teeth on the PRs. Teeth were numbering and the missing teeth were reported. We generated boundary labels using the Canny algorithm based on conventional boundary detection, which did not rely on additional manual annotation [28]. During training, random affine elastic transformation was used to augment the data. To ensure fairness, all experiments were implemented with the SGD optimizer, where the learning rate was 0.01, the momentum was 0.9, the batch size was 1, and the number of epochs was 100. The network was implemented on NVIDIA GeForce RTX 2080Ti GPU using PyTorch framework. Finally, the 2 segmentation results were combined, and a complete complementary diagnostic result with both disease type and disease location was generated.

Fig. 3 — The structure of BDU-Net for teeth instance segmentation

The sensitivity and specificity of the AI framework for the detection of 5 different dental diseases were initially evaluated by using test set. Sensitivity (Sen) refers to the ability of the framework to find all positive samples, that is, how many real positive samples can be covered by the prediction results given by the framework. Similarly, specificity (Spe) is used for negative samples, that is, how many of the actual negative samples are predicted correctly. The index values were calculated using confusion matrix. The sensitivity and specificity were calculated according to the following formula:

TP, TN, FP and FN denote true positives, true negatives, false positives and false negatives, respectively.

The results of sensitivity and specificity were 0.863 and 0.983 for diagnosing missing teeth, 0.821 and 0.989 for diagnosing caries, 0.718 and 0.997 for diagnosing residual roots, 0.942 and 0.986 for diagnosing impacted teeth, and 0.835 and 0.991 for diagnosing full crowns. These results were close to or better than relevant studies [24, 29].

Separate evaluation dataset

The diagnostic performance of the proposed framework was evaluated by using a separate evaluation dataset. The sample size of the dataset was calculated according to the following formula:

Prev is prevalence, d means the precision of estimate (i.e. the maximum marginal error) [30]. According to the literature, the Prev for these 5 diseases were set as 86.2%, 60.37%, 24.6%, 24%, and 22.3%, respectively [31–34]. For ɑ is 0.05, Inline graphic is inserted by 1.96, and d is 0.1. The sample size calculated using the above parameters was N(max)= (53, 94, 226, 95, 221, 11, 2, 7, 5, 47). Therefore, the recommended sample size was 226. In the present study, a total of 282 images of 282 patients including 131 males and 151 females, with a mean age of 34 years (ranging from 18 ~ 85 years old), made up the final evaluation dataset.

Three dentists, each with more than 15 years of experience and who did not attend the annotation of the previous training dataset, independently read the images and made diagnoses. Any disagreements were discussed among all three dentists, and consensus results were used as the gold standard.

Performance evaluation of the proposed AI framework

The 282 PRs were uploaded to the framework and automatically read and marked. Since the images lacked annotations, classification indicators were used to assess the dentists’ performance and the framework’s performance, instead of segmentation indicators. Sensitivity, specificity, Youden’s index, and AUC were assessed. Youden’s index was calculated according to the formula:

AUC is an effective way to summarize the overall diagnostic accuracy of the test, which was calculated by MedCalc Statistical Software version 19.2.1 (MedCalc Software Ltd., Ostend, Belgium).

To test the validity of the framework, 9 dentists with 3 different levels of seniority from the Stomatology Hospital of Zhejiang Chinese Medical University were invited to evaluate the same batch of PRs independently, and to generate clinical imaging report of each PR. Three dentists with high seniority had over 10 years of clinical experience (H1, H2, H3), 3 dentists with medium seniority had 3–10 years of clinical experience (M1, M2, M3), and 3 dentists with low seniority had less than 3 years of clinical experience (L1, L2, L3). Before starting the experiment, dentists were pre-trained to diagnose 5 dental diseases on PRs to familiarize themselves with the pattern of diagnosis. The diagnostic results of 5 diseases from 9 dentists and the framework were compared with the gold standard (Table 1).

Diagnostic time of both the framework and the dentists was calculated to evaluate the efficiency. The framework’s diagnostic time was the time taken from the image input to the result output, which was recorded automatically on the computer. The dentists’ diagnostic time was measured by an observer using a stopwatch, starting when the image was opened on the computer and ending when the dentist had completed the initial full diagnosis of the PR.

Statistical analysis

Mann-Whitney U test was used to assess the differences between the diagnostic time of framework and dentists. Statistical analysis was conducted using the SPSS 26.0 software (IBM SPSS Statistics Base Integrated Edition 26, Armonk, NY, USA). The results of AUC of the framework and the dentists were statistical analyzed in MedCalc Statistical Software version 19.2.1 (MedCalc Software Ltd., Ostend, Belgium) by using the DeLong test. The statistical levels of significance were both set at ɑ=0.05.

Results

Table 2 shows the diagnostic performance of the framework for impacted teeth. Compared with dentists, the framework had the lowest specificity (0.996). The framework’s sensitivity (0.964), Youden’s index (0.960), and AUC (0.980) were similar to M3, and just lower than that of H1 and H2. The AUC of the framework was significantly higher than M1, M2, L1, L2, and L3 (p < 0.05), and significantly lower than H1 (p < 0.05).