Abstract
Digital dental X-ray images are an important basis for diagnosing dental diseases, especially endodontic and periodontal diseases. Conventional diagnostic methods depend on the experience of doctors, so they are highly subjective and consume more energy than other approaches. The current computer-aided interpretation technology has low accuracy and poor lesion classification. This study proposes an efficient and accurate method for identifying common lesions in digital dental X-ray images by a convolutional neural network (CNN). In total, 188 digital dental X-ray images that were previously diagnosed as periapical periodontitis, dental caries, periapical cysts, and other common dental diseases by dentists in Qilu Hospital of Shandong University were collected and augmented. The images and labels were inputted into four CNN models for training, including visual geometry group (VGG)-16, InceptionV3, residual network (ResNet)-50, and densely connected convolutional networks (DenseNet)-121. The average classification accuracy of the four trained network models on the test set was 95.9%, while the classification accuracy of the trained DenseNet-121 network model reached 99.5%. It is demonstrated that the use of CNNs to interpret digital dental X-ray images is an efficient and accurate way to conduct auxiliary diagnoses of dental diseases.
Keywords: Convolutional neural network, Digital dental X-ray images, Computer-aided interpretation technology
Introduction
Medical imaging is an important tool for the screening, diagnosis, treatment guidance, and evaluation of clinical diseases [1]. Common medical imaging techniques include X-ray, ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI). With the development of medical imaging technology, a large number of medical images are produced every day. How to mine useful information from medical image data has become a research hotspot in academic and medical fields [2].
Digital dental X-ray imaging is an important basis for the diagnosis of endodontic diseases and periodontal diseases, especially tooth defects, apical lesions, and alveolar bone resorption. Conventional diagnosis methods depend on the professional knowledge of doctors, which is subjective, making such approaches energy-consuming and prolonging the treatment period. Therefore, new intelligent technology for image interpretation is needed to solve these problems.
Deep learning [3] uses a deep neural network to automatically learn the abstract features of the data by simulating the human brain to reflect the essential features of data [4]. In recent years, deep learning has achieved great success in image recognition [5], speech recognition [6, 7], natural language processing [8, 9], computer vision [10, 11], and other fields. In medical image processing, deep learning based on convolutional neural networks (CNNs) [12] has become the mainstream research method for topics such as breast cancer recognition [13], lung nodule detection [14], diabetic retinopathy detection [15], brain tumor segmentation [16], and Alzheimer’s detection [17].
Research on endodontic and periodontal disease diagnosis based on deep learning is still in its infancy, and the main research direction is to determine whether an input image contains caries or periodontal lesions. By analyzing the characteristics of a CNN, four different types of network models with different depths were selected and trained on a multilesion dataset. The traditional two-classification approach was extended to multiclassification, and the best network model was finally obtained.
Materials and Methods
Dataset
The digital dental X-ray images used in the experiment were all taken during the actual treatments in Qilu Hospital of Shandong University. They could not be directly inputted into the network because the sizes of the original images were inconsistent, and some image sizes were too large and would affect the network training speed. Therefore, referring to the related literature [12], the original images were preprocessed by the following steps:
Data filtering: We screened the original data and excluded the images of other lesion types (unrelated to this paper) or the images with poor imaging effects under the guidance of professional doctors. If these images are inputted into the network for training, the training results will be greatly affected.
Data labeling: After data filtering, professional doctors classified and labeled the images according to the type of lesions, which were divided into four categories: normal, periapical periodontitis, caries, and periapical cyst.
Data augmentation: Due to the limited amount of training data, only 188 original images were collected, so we augmented the data by mirroring, rotating, randomly cropping, and scaling. Firstly, the original data were expanded by 5 times through vertical mirroring, horizontal mirroring, and 180° rotating and scale magnification, and then the expanded images were randomly cropped to obtain a total of 1880 images.
Data normalization: Using the “resize()” function in OpenCV, all the images were normalized to 256 × 256 × 3.
Finally, 1880 training data points, forming a dataset with a size of 256 × 256 × 3, were obtained, including 320 normal images, 440 periapical periodontitis images, 880 caries images, and 240 periapical cyst images. The four types of images are shown in Fig. 1.
Fig. 1.
Examples of four kinds of digital dental X-ray images
To speed up the data reading speed and facilitate the migration of the dataset, the H5py library in Python was used to form the preprocessed images into a dataset. The H5py library can encapsulate data and labels in a file with format of .h5 and compress the file, which greatly facilitates database migration. At the same time, the data set obtained in this way can read all the data at one time, and the reading speed is faster than reading the original data directly. According to the ratio of 8:1, the dataset was divided into a training set and verification set. Thirty images were reselected and preprocessed as test data. Finally, 188 images were obtained as the final test set to verify the generalization ability of the developed network.
Measures
Before introducing the measures, we will introduce four common professional terms:
TP (true positives): patients who are correctly diagnosed as sick.
FP (false positives): healthy people who are correctly diagnosed as healthy.
TN (true negatives): healthy people who are wrongly diagnosed as sick.
FN (false negatives): patients who are wrongly diagnosed as healthy.
In this study, the measures we used include sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy. Their calculation formulas are as follows:
Experiments
By analyzing the structural characteristics of CNNs, combined with the network structure used in the existing research, four kinds of networks with different depths were selected for training: visual geometry group (VGG)-16 [18], InceptionV3 [19], residual network (ResNet)-50 [20], and DenseNet-121 [21]. Among them, VGG-16 and InceptionV3 were the networks used by the existing technology. The specific design for all experiments in this paper was as follows:
Put the training set and verification set into four kinds of network models for training, and obtain the trained network model. During the training process, the Adam optimizer was selected as the gradient descent optimization function without adding regularization and dropout layers, and the loss function was the categorical cross-entropy loss function.
- Carry out the classification accuracy test and time consumption test.
-
2.1Put the test set data into the trained network model to obtain the classification accuracies of the four network models on the test set.
-
2.2Calculate the time consumptions of the four network models.
-
2.3Select the best network model according to the classification accuracy and time consumption results.
-
2.1
Use a class activation heat map for the visualization experiment, which represents the importance of each position in the original image to the class.
All the experiments were completed with a Windows 10 system, the programming language was Python 3.6, the compiler was PyCharm 2020.1, and the network framework was tensorflow 2.2.0. The computer processor model used in the experiment was an Intel Core i5-10,400, the RAM was 16 GB, the main frequency was 2 GHz, and the GPU model was an NVDIA RTX 2060.
Results
The image interpretation results of the four network models (VGG-16, InceptionV3, ResNet-50, DenseNet-121) on the test set are shown in Tables 1, 2, 3, 4, and 5 and Fig. 1. It can be seen from the five sets of results that DenseNet-121 outperformed the other three network models in all measures. Among them, DenseNet-121 achieved the highest accuracy, and only one image was recognized inaccurately among the 188 test images. Through this image, we found that the main lesion feature of this image was dental caries, but there was also a slight symptom of periapical periodontitis, which led to its recognition as an image with periapical periodontitis.
Table 1.
Sensitivity of four networks with different type of images
| Model | Sensitivity with different type of images | |||
|---|---|---|---|---|
| Normal | Periapical periodontitis | Caries | Periapical cyst | |
| VGG-16 | 1.000 | 1.000 | 0.814 | 1.000 |
| InceptionV3 | 1.000 | 1.000 | 0.929 | 1.000 |
| ResNet-50 | 1.000 | 0.979 | 0.957 | 1.000 |
| DenseNet-121 | 1.000 | 1.000 | 0.986 | 1.000 |
Table 2.
Specificity of four networks with different type of images
| Model | Specificity with different type of images | |||
|---|---|---|---|---|
| Normal | Periapical periodontitis | Caries | Periapical cyst | |
| VGG-16 | 0.959 | 0.979 | 1.000 | 0.975 |
| InceptionV3 | 1.000 | 0.964 | 1.000 | 1.000 |
| ResNet-50 | 1.000 | 0.979 | 0.992 | 1.000 |
| DenseNet-121 | 1.000 | 0.993 | 1.000 | 1.000 |
Table 3.
PPV of four networks with different type of images
| Model | PPV with different type of images | |||
|---|---|---|---|---|
| Normal | Periapical periodontitis | Caries | Periapical cyst | |
| VGG-16 | 0.870 | 0.941 | 1.000 | 0.882 |
| InceptionV3 | 1.000 | 0.906 | 1.000 | 1.000 |
| ResNet-50 | 1.000 | 0.940 | 0.985 | 1.000 |
| DenseNet-121 | 1.000 | 0.980 | 1.000 | 1.000 |
Table 4.
NPV of four networks with different type of images
| Model | NPV with different type of images | |||
|---|---|---|---|---|
| Normal | Periapical periodontitis | Caries | Periapical cyst | |
| VGG-16 | 1.000 | 1.000 | 0.901 | 1.000 |
| InceptionV3 | 1.000 | 1.000 | 0.959 | 1.000 |
| ResNet-50 | 1.000 | 0.993 | 0.975 | 1.000 |
| DenseNet-121 | 1.000 | 1.000 | 0.992 | 1.000 |
Table 5.
Accuracy of four networks with different type of images
| Model | Accuracy with different type of images | |||
|---|---|---|---|---|
| Normal | Periapical periodontitis | Caries | Periapical cyst | |
| VGG-16 | 0.968 | 0.984 | 0.931 | 0.979 |
| InceptionV3 | 1.000 | 0.973 | 0.973 | 1.000 |
| ResNet-50 | 1.000 | 0.979 | 0.979 | 1.000 |
| DenseNet-121 | 1.000 | 0.995 | 0.995 | 1.000 |
The time periods required for the four network models to interpret a digital dental X-ray image were all on the order of milliseconds. The most time-consuming model was DenseNet-121, which took 19.98 ms; the fastest network model was VGG-16, which took 9.75 ms. Table 6 shows the parameter information and time consumptions of the four networks in detail.
Table 6.
Information and time consumptions of the four networks
| Model | Layer count | Parameter count | Time (ms) |
|---|---|---|---|
| VGG-16 | 16 | 17991992 | 9.75 |
| InceptionV3 | 48 | 29176088 | 17.14 |
| ResNet-50 | 50 | 36695416 | 14.45 |
| DenseNet-121 | 121 | 13591608 | 19.98 |
Combined with the experimental results regarding the classification accuracy and time complexity, the DenseNet-121 network achieved the highest classification accuracy and a similar time complexity (milliseconds) to those of the other three networks. Therefore, we conducted visualization experiments only on the DenseNet-121 network to verify the accuracy of feature extraction. Figure 3 fully proves that the features learned by the trained network model were accurate to some extent.
Fig. 3.
Class activation heat maps for three kinds of lesions
Discussion
In China, nearly 50% of residents suffered from various dental diseases in 2020. Although both the number of dental clinics and the number of dentists are increasing every year, the quality of treatment still needs to be improved. As the main imaging basis for dental disease diagnosis, digital dental X-ray imaging has a high penetration rate because of its simple imaging principle, short capture time, and low cost. The accurate and efficient identification of lesions in an input image is a crucial link in the process of dental disease treatment, which requires practitioners to have sufficient medical knowledge and rich clinical experience. Using deep learning to recognize digital dental X-ray film, a computer can judge the presence of lesions immediately after imaging and mark the recognition results on the image. The recognition results can help doctors, especially doctors at grassroots hospitals, and make efficient and accurate diagnoses, which has enormous economic and social significance.
The training process of a neural network requires a large amount of data, and the larger the amount of data is, the better the training effect that will be obtained; however, when the amount of data is limited, we can use the method of data augmentation. Thanathornwong and Suebnukarn [22] used a data augmentation method to expand the number of digital panoramic radiographs in a dataset and input them into a faster regional CNN to detect periodontal damage in teeth. The digital dental X-ray images used in this study came from different equipment, and the data distribution had a certain level of complexity. In addition, 188 images were expanded to 1880 images by data augmentation. Compared with those of the original data, the locations and sizes of the lesions were changed, which further enriched the data distribution. Therefore, the network could learn more features, which was beneficial for improving the generalization ability of the network model.
It has only been a few years since the application of deep learning in dental disease diagnosis. In 2018, Lee et al. [23] achieved good experimental results by using InceptionV3 to identify dental caries lesions in digital dental X-ray images. In the same year, the same group used the VGG-16 network to identify periodontal lesions in digital X-ray images [24]. On this basis, this study proposed integrating the images of multiple lesion types into the same multiclassification dataset, used a variety of network structures with different depths to train neural network models on the multiclassification dataset, and finally obtained the best network structure: DenseNet-121.
In our study, the four network models have different characteristics in addition to different depths. VGG-16 was proposed earlier, and the network structure is obtained by “stacking” convolutional layers, so it has the advantages of simple structure and easy implementation. A method of splitting a larger two-dimensional convolution into two smaller one-dimensional convolutions was introduced in InceptionV3, which is called the idea of factorization into small convolutions. The core feature of ResNet-50 is the introduction of “residual block,” which directly skips the data output of the previous layers and introduces it into the input part of the latter data layer, which proves that the neural network can develop in a deeper direction. DenseNet-121 contains not only the idea of factorization into small convolutions of InceptionV3, but also the idea of “residual block” of ResNet-50, so the structure is more complex, the depth is deeper, and the feature extraction ability is stronger. It is because of these features that DenseNet-121 can diagnose dental diseases more accurately.
Figure 2 shows that the recognition accuracy of the DenseNet-121 network reached 99.5% on the existing test set. There were two reasons for this result; one was that the network had the deepest network depth, which is helpful for extracting more features. The other reason was that DenseNet-121 used dense connections that combined low-dimensional information with high-dimensional information to judge the types of lesions. According to Table 1, although DenseNet-121 had the most layers (121 layers), it had the fewest parameters because of the use of a convolution layer with a 1 × 1 kernel in the design of the network. On the one hand, this reduced the possibility of overfitting and improved the generalization ability of the network. On the other hand, it could reduce the amount of computations and the time consumption, which was why DenseNet-121 had eight times more layers than VGG-16, but its time consumption was only twice as large.
Fig. 2.
The image interpretation results of the four network models on the test set
The process of deep learning–based training is similar to a closed black box with little human intervention. We input the image into the network model, and the rest of the work is left to the network for self-training. Therefore, the accuracy of the network in terms of learning features cannot be verified in the training process. In a case with a small dataset, the trained network may have high recognition accuracy, but the result of recognition might not be based on the features we expect to learn. Therefore, the visualization experiment is of great significance for measuring the accuracy of feature extraction after network training. In this study, a class activation thermogram was used as the visualization method. Figure 3 shows that the trained DenseNet-121 network could accurately extract the features of various lesions and classify them according to these features.
In summary, a self-made multiclassification (normal, periapical periodontitis, caries, and periapical cyst) dataset was used, and four networks with different depths, VGG-16, InceptionV3, ResNet-50, and DenseNet-121, were selected for training. Finally, the best network model was obtained by comparison in our study. The experimental results illustrated that the four network models used in this study had high classification accuracies with respect to dental diseases. In our research, the network models could identify more odontogenic diseases than other approaches. After comparing and analyzing the test results of the four network models, the DenseNet-121 network model had the best performance on the homemade dataset. In a follow-up study, the trained network model will be trained for a second time on a larger dataset to increase the generalization ability of the network.
Funding
Our research was supported by the National Natural Science Foundation of China (No. 52172282).
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Feng Liu and Lei Gao are the co-first authors.
Contributor Information
Chao Liu, Email: qiluliuchao@sdu.edu.cn.
Min Han, Email: hanmin@sdu.edu.cn.
References
- 1.Shi J, Wang L, Wang S, et al. Applications of deep learning in medical imaging: a survey. Journal of Image and Graphics. 2020;25(10):1953–1981. [Google Scholar]
- 2.Liu F, Zhang J, Yang H. Research Progress of Medical Image Recognition Based on Deep Learning. Chinese Journal of Biomedical Engineering. 2018;37(1):86–94. [Google Scholar]
- 3.Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural networks. Science. 2006;313(5786):504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
- 4.Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Computation. 2006;18(7):1527–1554. doi: 10.1162/neco.2006.18.7.1527. [DOI] [PubMed] [Google Scholar]
- 5.Bengio Y, Lamblin P, Dan P, et al. Greedy layer-wise training of deep networks. International Conference on Neural Information Processing Systems. Kitakyushu: Computer Science 2007;153–160.
- 6.Suk HI, Lee SW, Shen D, et al. Latent feature representation with stacked auto-encoder for AD/MCI diagnosis. Brain Structure and Function. 2015;220(2):841–859. doi: 10.1007/s00429-013-0687-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine. 2012;29(6):82–97. doi: 10.1109/MSP.2012.2205597. [DOI] [Google Scholar]
- 8.Xu W, Rudnicky A. Language modeling for dialog system. International Conference on Spoken Language Processing. Beijing: DBLP 2000;118–121.
- 9.Mikolov T, Deoras A, Povey D, et al. Strategies for training large scale neural network language models. Automatic Speech Recognition and Understanding. Providence: IEEE 2012;196–201.
- 10.Hinton G. Modeling pixel means and covariances using factorized third-order Boltzmann machines. 2010 IEEE Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE 2010;2551–2558.
- 11.Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE 2016;2818–2826.
- 12.Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11):2278–2324. doi: 10.1109/5.726791. [DOI] [Google Scholar]
- 13.Kiymet S, Aslankaya M Y, Taskiran M, et al. Breast Cancer Detection From Thermography Based on Deep Neural Networks. 2019 Innovations in Intelligent Systems and Applications Conference 2019.
- 14.Winkels M, Cohen TS. Pulmonary Nodule Detection in CT Scans with Equivariant CNNs. Medical Image Analysis. 2019;55:15–26. doi: 10.1016/j.media.2019.03.010. [DOI] [PubMed] [Google Scholar]
- 15.Doshi D, Shenoy A, Sidhpura D, et al. Diabetic retinopathy detection using deep convolutional neural networks. 2016 International Conference on Computing, Analytics and Security Trends. IEEE 2016.
- 16.Ren L, Li Q, Guan X, et al. Three-Dimensional Segmentation of Brain Tumors in Magnetic Resonance Imaging Based on Improved Continuous Max-Flow. Laser & Optoelectronics Progress. 2018;55(11):221–229. [Google Scholar]
- 17.Wang X, Qi J, Yang Y, et al. A Survey of Disease Progression Modeling Techniques for Alzheimer's Diseases. IEEE 17th International Conference on Industrial Informatics. IEEE 2019.
- 18.Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. Computer Science, 2014.
- 19.Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the Inception Architecture for Computer Vision. Computer Vision and Pattern Recognition. IEEE 2016;2818–2826.
- 20.He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. Computer Vision and Pattern Recognition. IEEE 2016:770–778.
- 21.Huang G, Liu Z, Laurens V, et al. Densely Connected Convolutional Networks. Conference on Computer Vision and Pattern Recognition. IEEE 2017;2261–2269.
- 22.Thanathornwong B, Suebnukarn S. Automatic detection of periodontal compromised teeth in digital panoramic radiographs using faster regional convolutional neural networks. Imaging Science in Dentistry. 2020;50(2):169–174. doi: 10.5624/isd.2020.50.2.169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lee JH, Kim DH, Jeong SN, et al. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. Journal of Dentistry. 2018;77:106–111. doi: 10.1016/j.jdent.2018.07.015. [DOI] [PubMed] [Google Scholar]
- 24.Lee JH, Kim DH, Jeong SN, et al. Diagnosis and prediction of periodontally compromised teeth using a deep learning-based convolutional neural network algorithm. Journal of Periodontal & Implant Science. 2018;48(2):114–123. doi: 10.5051/jpis.2018.48.2.114. [DOI] [PMC free article] [PubMed] [Google Scholar]



