Abstract
The purpose of this study was to propose a continuity-aware contextual network (Canal-Net) for the automatic and robust 3D segmentation of the mandibular canal (MC) with high consistent accuracy throughout the entire MC volume in cone-beam CT (CBCT) images. The Canal-Net was designed based on a 3D U-Net with bidirectional convolutional long short-term memory (ConvLSTM) under a multi-task learning framework. Specifically, the Canal-Net learned the 3D anatomical context information of the MC by incorporating spatio-temporal features from ConvLSTM, and also the structural continuity of the overall MC volume under a multi-task learning framework using multi-planar projection losses complementally. The Canal-Net showed higher segmentation accuracies in 2D and 3D performance metrics (p < 0.05), and especially, a significant improvement in Dice similarity coefficient scores and mean curve distance (p < 0.05) throughout the entire MC volume compared to other popular deep learning networks. As a result, the Canal-Net achieved high consistent accuracy in 3D segmentations of the entire MC in spite of the areas of low visibility by the unclear and ambiguous cortical bone layer. Therefore, the Canal-Net demonstrated the automatic and robust 3D segmentation of the entire MC volume by improving structural continuity and boundary details of the MC in CBCT images.
Subject terms: Oral anatomy, Machine learning
Introduction
The mandibular canal (MC) is an important mandibular structure that supplies sensation to the lower teeth, chin, and lower lip1. Any injury to the MC can lead to temporary or permanent damage resulting in sensory disturbance sequelae such as paresthesia, hypoesthesia, and dysesthesia, which affects speech, mastication, and quality of life2–5. Therefore, knowing the exact localization of the MC is essential in planning appropriate oral-maxillofacial surgeries such as implant placement and third molar extractions6,7. In preoperative assessments and surgical planning in dental clinics, panoramic radiographs are used as a standard dental imaging tool8,9, but presents limitations in that it is challenging to determine the actual 3D rendering of the entire canal structure as the panoramic radiograph only shows the canal in a single view10. Therefore, additional investigations using CT may be recommended to verify the exact position of the canal in a 3D view in accordance with the as low as reasonably achievable (ALARA) principle8. Due to the advantages of CBCT such as a lower radiation dose, inexpensive image acquisition cost, and high spatial resolution, CBCT has been widely used in dental clinics for 3D diagnosis and treatment planning in the field of oral and maxillofacial surgery11–13. However, the manual segmentation of the MC that is generally performed using 3D cross-sectional slices in CBCT images is time-consuming and labor-intensive10,14. In addition, the ambiguous cortical bone layer surrounding the canal and the unclear medulla pattern also makes it difficult to distinguish the entire MC because of the lower contrast of CBCT images15. Therefore, automatic segmentation of the MC is required to alleviate the workload of dental clinicians by overcoming the limitations of CBCT images.
Among studies for automatic MC segmentation in CBCT images, atlas-based segmentation (ARS) and statistical shape model (SSM) methods have been proposed as two conventional representatives of MC segmentation methods16–18. The SSM method utilized the prior knowledge of shape models to perform MC segmentation17,18. This prior knowledge was required to reconstruct a 3D model of CBCT images, which highly affects the segmentation result17,18. On the other hand, the ARS method only requires the atlas image for MC segmentation, which is independent of prior knowledge16. However, both the SSM and ARS methods exhibit a limitation in dealing with new forms of data beyond the predefined standard since they depend on prior knowledge or other preprocessing techniques16–18. Recently, deep learning methods have been widely used for the detection19–21, classification22–24, segmentation25,26, and enhancement27,28 of medical and dental images. Several convolutional neural networks (CNN) such as 3D U-Net, a type of deep learning method, were used for MC segmentation in CBCT images exhibiting a high accuracy of segmentation10,14. However, these CNNs failed to segment the MC with high consistent accuracy throughout its entire range because of the occasional unclear and ambiguous cortical bone layer caused by the overall lower contrast of CBCT images10,14. CNNs for the segmentation of the entire MC exhibited lower accuracy around the mandibular and mental foramens compared to other parts of the canal10,14 since discrimination of the canal from its surroundings became increasingly less clear towards the mental foramen region, and visibility of the MC clearly decreased on cross-sectional images of more distal regions of the MC15. Precise MC segmentation with high consistent accuracy throughout the entire MC is essential for avoiding nerve injury in oral and maxillofacial surgeries such as mandibular osteotomy and implant surgery29.
We hypothesized that a deep learning model yielded more robust 3D segmentation of the entire MC volume in CBCT images by learning the spatio-temporal features and structural continuity of the MC volume. In this study, we proposed a continuity-aware contextual network (Canal-Net) for the automatic and robust 3D segmentation of the MC with high consistent accuracy throughout the entire MC volume in CBCT images and compared our network with other networks in terms of volumetric accuracy over the entire canal. Our main contributions were as follows: (1) We proposed a continuity-aware contextual network (Canal-Net) that was robust to ambiguous or unclear cortical bone regions of the MC and lower contrast of CBCT images in 3D segmentations of the entire MC. (2) We applied bidirectional convolutional LSTM (ConvLSTM) in order to learn 3D anatomical contextual information of the MC by incorporating spatio-temporal features. (3) We used a multi-task learning framework with multi-planar projection losses (MPL) in three anatomical planes in order to evaluate the global structural continuity of the MC.
Materials and methods
Data acquisition and preparation
We included 50 patients (27 females and 23 males; mean age 25.56 ± 6.73 years) who underwent dental implant surgeries or third molar extractions at the Seoul National University Dental Hospital (2019–2020). The patients had different mandibular canal shapes with various dental conditions including the metallic crowns and implants. The patient data were obtained at 80 kVp and 8 mA using CBCT (CS9300; Carestream Health, New York, USA). The CBCT images had dimensions of 841 × 841 × 289 pixels, voxel sizes of 0.2 × 0.2 × 0.2 mm3, and 16-bit depth. This study was performed with approval from the institutional review board of the Seoul National University Dental Hospital (ERI18001). The ethics committee approved the waiver for the informed consent because this was a retrospective study. The study was performed in accordance with the Declaration of Helsinki.
The mandibular canals including the surrounding cortical bone was manually annotated by an oral and maxillofacial radiologist using a software (3D Slicer for Windows 10, Version 4.10.2; MIT, Massachusetts, USA)30. We used the cropped images consisting of 200 slices of 128 × 128 pixels that were centered at the left and right mandibular regions in order to reduce the memory requirement. Zero-padding was performed to maintain the input volume of the same length for all patients showing the mandibular canals of different lengths. For deep learning, we prepared 60 volumes from 30 patients for the training dataset, 20 from ten patients for the validation dataset, and 20 from ten patients for the test dataset where the right mandible images were horizontally flipped to match the left. We performed five-fold cross-validation, where each training cycle consisted of 60, 20, and 20 volumes for training, validation and test datasets, respectively.
We estimated the minimally required sample size to detect significant differences in the accuracy between the Canal-Net and the other networks, when both assessed the same subjects (CBCT images). We designed to capture a mean accuracy-difference of 0.05, and a standard deviation of 0.10 between the Canal-Net and the other networks. Based on an effect size of 0.5, a significance level of 0.05, and a statistical power of 0.80, we obtained a sample size of N = 128 (G* Power for Windows 10, Version 3.1.9.7; Universität Düsseldorf, Germany). Eventually, we split the CBCT dataset of 2D images into 10,185, 2546, and 3183 for training, validation, and test datasets, respectively.
Continuity-aware contextual network (Canal-Net)
We designed a continuity-aware contextual network (Canal-Net) which had 3D encoder-decoder architecture under a multi-task learning framework consisting of time-distributed convolution blocks, multi-scale inputs31, skip connections, and bidirectional convolutional LSTM (ConvLSTM) with side-output layers31,32 (Fig. 1). The bidirectional ConvLSTM was used to capture anatomical context information in concatenated feature maps extracted from the corresponding encoding path and the previous decoding up-sampling layer. A multi-task learning approach was adopted to simultaneously output the entire MC volume and its 2D multi-planar projections in three anatomical planes, which helped the network learn the overall MC volume and structural continuity (multi-planar projection outputs and the output volume in Fig. 1). The network under multi-task learning was optimized in an end-to-end manner, where the MC segmentation output was generated directly from the input volumes of the CBCT images.
At the encoder, the time-distributed convolution blocks processed sequential information from 3D volumetric inputs as series of features for 2D slices33 (white blocks in Fig. 1). It was a typical convolution passed to a time-distributed wrapper that could be applied to every temporal frame of the input independently33. Convolutional blocks were comprised of two repeated modules of two 3 × 3 × 3 convolutions, batch normalization, ReLU, and 2 × 2 × 2 max-pooling at the encoder path. The number of feature maps gradually decreased from 128 to 64, 32, and 16. To mitigate spatio-temporal information loss caused by max-pooling operations, the multi-scale inputs down-sampled from the original input volume by 2 × 2 × 2 average pooling were concatenated at each level of the encoder (multi-scale inputs in Fig. 1).
At the decoder, the features from time-distributed convolutions at the encoder33 were concatenated with the corresponding up-sampling layer and fed to bidirectional ConvLSTM blocks (Skip connection and yellow blocks in Fig. 1). Long short-term memory (LSTM), one of the recurrent neural networks (RNN)34, was an efficient network for handling spatio-temporal data and was widely used in contextual processing such as natural language processing35 and video segmentation36. The internal matrix multiplication of the original LSTM was replaced by the convolution operation to maintain the input dimension in ConvLSTM37. The ConvLSTM blocks were composed of two repeated modules of two 3 × 3 × 3 bidirectional ConvLSTMs, batch normalization, ReLU, and 2 × 2 × 2 up-sampling at the decoder path. The number of feature maps gradually increased from 16 to 32, 64, and 128. The ConvLSTM captured 3D local anatomical contextual information more effectively by learning the spatio-temporal features of the 3D volumetric data37.
At the output layer, the averaged side-outputs generated from a local output map from every level of the decoder were merged and fed to the bidirectional ConvLSTM, which mitigated the gradient vanishing problem for encouraging the back-propagation of gradient flow (Side output and average layers in Fig. 1). The 3D volume loss and multi-planar projection losses (MPL) from the 2D projections simultaneously encouraged the network to learn the global structural continuity information of the canal under the multi-task learning framework. The MPL were calculated from the 2D projection maps of the output in three anatomical planes. The Dice similarity coefficient score (DSC) was used for the two loss functions38. The loss function ( of the Canal-Net consisted of 3D volume loss () for the entire canal volume, and the MPL as sum of the 2D projection losses in axial- (), coronal- (), and sagittal- () planes, where α and β were constant weights for the 3D volume loss and the summation of the 2D projection map losses, respectively (equation of in Fig. 1). The weights of α and β were optimized for the best performance through an ablation study. The weights of 0.7 and 0.3 for the 3D volume loss and MPL, respectively, exhibited the best performance compared to other weight options (Table 1).
Table 1.
Loss weight | 3D volume | 2D axial | 2D coronal | 2D sagittal |
---|---|---|---|---|
α = 0.1, β = 0.9 | 0.84 ± 0.14 | 0.92 ± 0.07 | 0.91 ± 0.17 | 0.91 ± 0.14 |
α = 0.2, β = 0.8 | 0.85 ± 0.12 | 0.91 ± 0.10 | 0.90 ± 0.12 | 0.89 ± 0.13 |
α = 0.3, β = 0.7 | 0.84 ± 0.16 | 0.90 ± 0.14 | 0.87 ± 0.17 | 0.88 ± 0.17 |
α = 0.4, β = 0.6 | 0.83 ± 0.16 | 0.90 ± 0.15 | 0.83 ± 0.19 | 0.88 ± 0.16 |
α = 0.5, β = 0.5 | 0.86 ± 0.13 | 0.91 ± 0.10 | 0.88 ± 0.16 | 0.91 ± 0.11 |
α = 0.6, β = 0.4 | 0.84 ± 0.14 | 0.89 ± 0.13 | 0.86 ± 0.17 | 0.89 ± 0.13 |
α = 0.7, β = 0.3 | 0.87 ± 0.05 | 0.93 ± 0.07 | 0.91 ± 0.14 | 0.94 ± 0.08 |
α = 0.8, β = 0.2 | 0.87 ± 0.09 | 0.92 ± 0.09 | 0.91 ± 0.14 | 0.92 ± 0.09 |
α = 0.9, β = 0.1 | 0.86 ± 0.13 | 0.91 ± 0.10 | 0.87 ± 0.16 | 0.92 ± 0.10 |
The proposed networks were trained using an Adam optimizer, and the learning rate of 0.00025 was reduced on plateau by a factor of 0.5 every 25 epochs in 300 epochs with the batch size of 1. They were implemented with Python3 based on Keras with a Tensorflow backend using a single NVIDIA Titan RTX GPU 24G.
Performance evaluation of Canal-Net for MC segmentation
We compared the performance of the MC segmentation by Canal-Net with those by other networks of 2D U-Net39, SegNet40, 3D U-Net41, 3D U-Net with MPL (MPL 3D U-Net), and 3D U-Net with ConvLSTM (ConvLSTM 3D U-Net). To evaluate the performances quantitatively, we compared the 2D segmentation performance metrics of the Dice similarity coefficient score (), Jaccard index (), precision (), recall () among networks, where TP, FP, and FN denoted true positives, false positives, and false negatives, and also 3D volumetric performance metrics of volume of error () and relative volume difference (), where and represented the number of voxels for the ground truth and for the predicted volume, respectively. We also evaluated the mean curve distance ()), where , and denotes coordinates of a ground truth voxel14, and is an operation which extracted the center curve line through skeletonization for the set of voxels14. The higher values of DSC, JI, PR, and RC, and the lower values of VOE, RVD, and MCD indicated better segmentation performance. We used paired two-tailed t-tests to compare performances between Canal-Net and others (SPSS Statistics for Windows 10, Version 26.0; IBM, Armonk, New York, USA). The statistical significance level was set at 0.05. We also performed the Bland–Altman analysis to analyze the bias and agreement limits of the used segmentation models between the number of pixels of ground truth and prediction results.
Results
The performances of Canal-Net, convLSTM 3D U-Net, MPL 3D U-Net, 3D U-Net, SegNet, and 2D U-Net were evaluated for a total of 20 mandibular canals not used for training. Among them, convLSTM 3D U-Net, MPL 3D U-Net, and 3D U-Net were evaluated to demonstrate the effectiveness of the corresponding components in Canal-Net, while the other networks were used for performance comparisons between 2 and 3D CNN-based approaches. In addition, the Canal-Net was evaluated for the impacts of the weights of α and β on 3D volume loss and MPL, respectively. The Canal-Net with loss weights of α = 0.7 and β = 0.3 achieved the best segmentation performance of 0.87, 0.93, 0.91, and 0.94 DSC for 3D volume, axial, coronal, and sagittal planes, respectively (Table 1).
Table 2 shows the quantitative results of the segmentation performance by the networks. The performances of Canal-Net, ConvLSTM 3D U-Net, MPL 3D U-Net, 3D U-Net, SegNet, and 2D U-Net were compared using 20 total mandibular canals. The Canal-Net achieved the highest values of 0.87 DSC (p < 0.05), 0.80 JI (p < 0.05), 0.89 PR (p = 0.05), and 0.88 RC (p = 0.05) in 2D performance metrics, and also the lowest values of 0.14 RVD (p < 0.05), 0.10 VOE (p < 0.05), and 0.62 MCD (p < 0.05) in 3D performance metrics (Table 2). The Canal-Net outperformed all the other networks in DSC, JI, PR, RC, RVD, and VOE, and significantly so in MCD (p < 0.05) (Table 2). The performance of the networks is also plotted in boxplots (Fig. 2). The Canal-Net achieves the higher performances than the other networks with a smaller dispersion of data, shorter length of whiskers, and rare existence of outliers (Fig. 2).
Table 2.
DSC | JI | PR | RC | RVD | VOE | MCD (mm) | |
---|---|---|---|---|---|---|---|
Canal-Net | 0.87 ± 0.05 | 0.80 ± 0.06 | 0.89 ± 0.06 | 0.88 ± 0.06 | 0.14 ± 0.04 | 0.10 ± 0.04 | 0.62 ± 0.10 |
ConvLSTM 3D U-Net |
0.85 ± 0.08* | 0.77 ± 0.08* | 0.87 ± 0.08* | 0.86 ± 0.09* | 0.17 ± 0.05* | 0.13 ± 0.05* | 0.66 ± 0.12* |
MPL 3D U-Net |
0.84 ± 0.06† | 0.75 ± 0.07† | 0.88 ± 0.07† | 0.82 ± 0.08† | 0.19 ± 0.04† | 0.14 ± 0.06† | 0.69 ± 0.15† |
3D U-Net | 0.83 ± 0.07‡ | 0.74 ± 0.07‡ | 0.85 ± 0.08‡ | 0.84 ± 0.09‡ | 0.19 ± 0.05‡ | 0.15 ± 0.07‡ | 0.69 ± 0.13‡ |
SegNet | 0.84 ± 0.06+ | 0.77 ± 0.06+ | 0.85 ± 0.06+ | 0.85 ± 0.07+ | 0.18 ± 0.04+ | 0.14 ± 0.05+ | 0.78 ± 0.19+ |
2D U-Net | 0.84 ± 0.07Φ | 0.77 ± 0.07Φ | 0.85 ± 0.07Φ | 0.84 ± 0.08Φ | 0.18 ± 0.04Φ | 0.14 ± 0.05Φ | 0.87 ± 0.22Φ |
*Significant difference between Canal-Net and ConvLSTM 3D U-Net (p < 0.05).
†Between Canal-Net and MPL 3D U-Net (p < 0.05).
‡Between Canal-Net and 3D U-Net (p < 0.05).
+Between Canal-Net and SegNet (p < 0.05).
ΦBetween Canal-Net and 2D U-Net (p < 0.05).
In Fig. 3, the Canal-Net exhibited more accurate predictions with more true positives (yellow) and less false positives (red) and false negatives (green) compared to the other networks for MCs with unclear and ambiguous cortical bone layers and metallic objects in CBCT images of lower contrast (Fig. 3a–e). In the 3D segmentation results, the Canal-Net also demonstrated better prediction results with less false positives and false negatives compared to the other networks in the mental foramen area of the various MC shapes (Fig. 4a–e). Furthermore, only a few cases as outliers for the results of the Canal-Net were observed due to other causes such as the presence of a third molar beside the MC (Figs. 3f, 4f). The Canal-Net predicted more accurately the entire MC volume, and demonstrated improved structural continuity and boundary details of the MC from the mental foramen to the mandibular foramen compared to the other networks (Fig. 4a–e).
The DSC and MCD for the whole test dataset were plotted from the mental foramen to the mandibular foramen, and the 3D networks generally exhibited less variations of the performances compared to the 2D networks (Figs. 5 and 6). The Canal-Net demonstrated the most consistent performances with the smallest fluctuations of true segmentation compared to the other networks throughout the entire MC volume (Figs. 5 and 6). As a result, the Canal-Net represented the best 3D segmentation accuracies of RVD, VOE, and MCD throughout the entire MC volume among the networks (Table 2). The Bland–Altman plot between the ground truth and prediction results from the Canal-Net showed higher linear relationships and better agreement limits than those from the other networks (Fig. 7). Therefore, the Canal-Net represented more accurate and robust MC segmentation performance of the entire MC compared to the other networks.
Discussion
In this study, we proposed a continuity-aware contextual network (Canal-Net) which learned 3D local anatomical contextual information and the global continuity of the MC complementally in order to segment the MC with high consistent accuracy throughout the entire MC volume in cone-beam CT (CBCT) images. We employed time-distributed convolution layers for handling time-distributed sequential features with multi-scale inputs at the encoder path33, and bidirectional ConvLSTM layers for extracting spatio-temporal features at the decoder path37. The Canal-Net was able to learn the local anatomical variations of the MC by incorporating the spatio-temporal features effectively, and the global structural continuity information of the MC under the multi-task learning framework, complementally. The Canal-Net used optimized weights for 3D volume loss and multi-planar projection losses in multi-task learning. Therefore, the Canal-Net improved the performance of automatic segmentation of the MC by combining anatomical context information and global structural continuity information, resulting in higher consistent accuracy throughout the entire MC volume in CBCT images.
We compared the Canal-Net with other popular segmentation networks such as 2D U-Net, SegNet, and 3D U-Net, and also with our MPL 3D U-Net and ConvLSTM 3D U-Net for MC segmentation. In performances of MC segmentation in CBCT images, 2D U-Net and SegNet exhibited lower accuracies compared to the 3D networks, generally. False negatives and positives were observed at a higher rate around the mental foramen area with ambiguous or unclear cortical bone layers. Since the 2D networks were not able to learn the 3D contextual features of the MC volume in CBCT images, the 2D networks exhibited coarser 3D segmentation volumes with more fluctuations of 3D performance accuracy from the mental to the mandibular foramen regions. In terms of learning 3D spatial contextual information between image slices of the 3D anatomical structures, 3D U-Net was generally expected to generate more accurate segmentation results compared to 2D networks41. In the present study, the 3D U-Net predicted the more accurate segmentation of the MC with fewer false positives and negatives compared to the 2D U-Net and SegNet. However, the 3D U-Net had still limitations in segmenting the MC regions with unclear cortical bone layers accurately by only learning 3D spatial information between image slices, and exhibited inaccurate segmentation results with disconnections around the mental foramen area.
Both MPL 3D U-Net and ConvLSTM 3D U-Net demonstrated better segmentation results than 3D U-Net in different aspects. The MPL 3D U-Net showed an improved travel course of the MC compared to 3D U-Net because its spatial information was complemented with the global structural continuity information by learning through multi-planar projections. Although the structural continuity of the MC volume was improved by multi-task learning, the MPL 3D U-Net exhibited difficulties in producing segmentation boundaries in detail around the mental foramen area. On the other hand, the ConvLSTM learned anatomical context information through spatio-temporal features, and the MC volume showed smooth boundaries with more consistent accuracies even in unclear cortical bone layer regions in the CBCT images. Therefore, the Canal-Net demonstrated the most accurate segmentation of the entire MC volume compared to the other networks by simultaneously learning global structural continuity through MPL, and anatomical context information through ConvLSTM. Compared with previous studies using 3D U-Net10,14, our Canal-Net achieved 0.87 of DSC and 0.80 of the mean intersection of union (IoU) while two previous studies reported 0.58 of DSC10,14 and 0.58 of mean IoU10,14. Compared with the previous studies10,14, the Canal-Net showed substantially enhanced performance of the MC segmentation in CBCT images.
In the Canal-Net, the MPL provided global structural continuity from three anatomical projection maps with ConvLSTM anatomical context information by spatio-temporal features, complementally. In the MC areas of low visibility with ambiguous or unclear cortical bone layers in CBCT images, the Canal-Net exhibited the best outcomes with continuous and consistent MC volumes from the mental to mandibular foramens. The Canal-Net especially surpassed other networks by showing continuous MC volumes around the mental foramen area where the visibility of the MC tended to diminish15, and in areas affected by metallic objects such as implant fixtures or dental crowns in CBCT images. As a result, the Canal-Net demonstrated the most robust MC segmentation with high consistent DSC throughout the entire MC volume in CBCT images.
The primary reason for improved segmentation performance by Canal-Net was that its network architecture was constructed to complementally learn the 3D anatomical context information of the MC by the spatio-temporal features from the bidirectional ConvLSTM layers and the global structural continuity information by MPL. In the Canal-Net, the complementary context information was successfully learned in the proposed framework, leading to maintaining continuous and consistent MC volumes from the mental to the mandibular foramen areas. The proposed learning process has several advantages. First, it could increase the discriminative capability of intermediate feature representations with multiple regularizations on disentangling subtly correlated tasks48, potentially improving the robustness of the segmentation performance. Second, in the application of MC segmentation, the multi-task learning framework could also provide complementary context information that would serve well to segment the MC maintaining overall continuous and consistent volumes. This could improve the performance accuracy of MC segmentations substantially, especially in MC regions with ambiguous or unclear cortical bone layers in lower contrast CBCT images.
The accurate identification of the whole MC structure in the mandible is an essential prerequisite for the preoperative planning of third molar extractions and implant surgeries to avoid any surgical complications7. However, the exact recognition of the entire canal structure is considered to be a challenging and delicate task for several reasons15. CBCT, the most commonly used 3D dental imaging tool, has lower contrast than CT, which negatively affects the ability to distinguish MCs10,42. As a result, the low visibility of MCs, such as in ambiguous or unclear cortical bone regions, affects the structural continuity of MC segmentation in CBCT images10,14. Furthermore, the visibility of the MC itself is low due to variable cortications and bone densities of the canal wall, the diverse travel courses of the canal, and the spread of vessels and nerve branches15,43–47. The Canal-Net could be used in automatic and robust 3D segmentation of the MC structure for the preoperative planning of third molar extractions and implant surgeries to avoid any surgical complications when using CBCT images. The automatic segmentation of the MC volume by the Canal-Net could provide clinicians with accurate identification of the MC structure in the mandible with high consistent accuracy throughout the entire MC volume ranging from the mental foramen to the mandibular foramen while reducing time and effort.
However, our study had several limitations. First, as there was the problem of reducing the memory requirements for dealing with large amounts of data when using deep 3D networks running on the GPU, it was necessary to optimize the way the memory was used in order to maximize GPU utilization. Therefore, we used the cropped images with smaller dimensions than the original, and preprocessing of the images required additional time and labor. Second, our study had a potential limitation of generalization ability due to using internal data from a single organization. Overfitting of training a deep learning model, which resulted in the model learning statistical regularity specific to the training dataset, could negatively impact the model’s ability to generalize to a new dataset49. Although the proposed network did not show the presence of overfitting for the internal dataset in the five-fold cross-validation, it needs to be trained and evaluated using large datasets from multiple organizations or devices for generalization. Third, the results presented in this study were based on datasets from 50 patients. The proposed method needs to be evaluated for datasets from more patients with various dental restorations and implants. In future studies, we will improve the generalization ability and clinical efficacy of the Canal-Net by using large CBCT datasets acquired under various imaging conditions from multiple organizations or devices.
Conclusions
In this study, we proposed a continuity-aware contextual network (Canal-Net) that was robust to ambiguous or unclear cortical bone regions of the MC and lower contrast of CBCT images in 3D segmentations of the entire MC. The Canal-Net was designed based on a 3D U-Net with the ConvLSTM under the multi-task learning framework using MPL in order to complementally learn anatomical contexts and global structural continuity information. As a result, the Canal-Net achieved substantially enhanced performances compared to other networks such as 2D U-Net, SegNet, 3D U-Net, MPL 3D U-Net, and ConvLSTM 3D U-Net in 2D and 3D performances. Furthermore, Canal-Net demonstrated automatic and robust 3D segmentation of the entire MC volume by improving structural continuity and boundary details of the MC in CBCT images. The Canal-Net could be contributed to accurate and automatic identification of the MC structure for the preoperative planning of third molar extractions and implant surgeries to avoid any surgical complications.
Acknowledgements
This work was supported by a Seoul National University Research Grant in 2021. This work was also supported by the Korea Medical Device Development Fund Grant funded by the Korean Government (The Ministry of Science and ICT, The Ministry of Trade, Industry, and Energy, The Ministry of Health & Welfare, and The Ministry of Food and Drug Safety) (Project Number: 1711174552, KMDF_PR_20200901_0147 and Project Number: 1711174543, RS-2020-KD000011).
Author contributions
B.-S.J.: Contributed to the conception and design, data acquisition, analysis and interpretation, and drafted and critically revised the manuscript. S.Y.: Contributed to the conception and design, data analysis and interpretation, and drafted and critically revised the manuscript. S.-J.L.: Contributed to the data analysis and interpretation, and drafted the manuscript. T.-I.K.: Contributed to conception and design, data interpretation, and drafted the manuscript. J.-M.K.: Contributed to conception and design, data interpretation, and drafted the manuscript. J.-E.K.: Contributed to the conception and design, data interpretation, and drafted the manuscript. K.-H.H.: Contributed to the conception and design, data interpretation, and drafted the manuscript. S.-S.L.: Contributed to the conception and design, data interpretation, and drafted the manuscript. M.-S.H.: Contributed to the conception and design, data interpretation, and drafted the manuscript. W.-J.Y.: Contributed to the conception and design, data acquisition, analysis and interpretation, and drafted and critically revised the manuscript. All authors gave their final approval and agreed to be accountable for all aspects of the work.
Data availability
The datasets generated and/or analyzed during the current study are not publicly available due to the restriction by the Institutional Review Board (IRB) of Seoul National University Dental Hospital in order to protect patients’ privacy but are available from the corresponding author on reasonable request. Please contact the corresponding author for any commercial implementation of our research.
Competing interests
The authors declare no competing interests.
Footnotes
The original online version of this Article was revised: The Acknowledgment section in the original version of this Article was incorrect. Full information regarding the corrections made can be found in the correction for this Article.
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Bo-Soung Jeoun and Su Yang.
Change history
12/7/2022
A Correction to this paper has been published: 10.1038/s41598-022-25677-2
References
- 1.Ghatak RN, Helwany M, Ginglen JG. Anatomy, Head and Neck, Mandibular Nerve. StatPearls; 2020. [PubMed] [Google Scholar]
- 2.Shavit I, Juodzbalys G. Inferior alveolar nerve injuries following implant placement—Importance of early diagnosis and treatment: A systematic review. J. Oral Maxillofac. Res. 2014;5:e2. doi: 10.5037/jomr.2014.5402. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Sarikov R, Juodzbalys G. Inferior alveolar nerve injury after mandibular third molar extraction: A literature review. J. Oral Maxillofac. Res. 2014;5:e1. doi: 10.5037/jomr.2014.5401. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Phillips C, Essick G. Inferior alveolar nerve injury following orthognathic surgery: A review of assessment issues. J. Oral Rehabil. 2011;38:547–554. doi: 10.1111/j.1365-2842.2010.02176.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Loescher AR, Smith KG, Robinson PP. Nerve damage and third molar removal. Dent. Update. 2003;30:375–380. doi: 10.12968/denu.2003.30.7.375. [DOI] [PubMed] [Google Scholar]
- 6.Ai CJ, Jabar NA, Lan TH, Ramli R. Mandibular canal enlargement: Clinical and radiological characteristics. J. Clin. Imaging Sci. 2017;7:28. doi: 10.4103/jcis.JCIS_28_17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Jung YH, Cho BH. Radiographic evaluation of the course and visibility of the mandibular canal. Imaging Sci. Dent. 2014;44:273–278. doi: 10.5624/isd.2014.44.4.273. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Ghaeminia H, et al. Position of the impacted third molar in relation to the mandibular canal. Diagnostic accuracy of cone beam computed tomography compared with panoramic radiography. Int. J. Oral Maxillofac. Surg. 2009;38:964–971. doi: 10.1016/j.ijom.2009.06.007. [DOI] [PubMed] [Google Scholar]
- 9.Vinayahalingam S, Xi T, Berge S, Maal T, de Jong G. Automated detection of third molars and mandibular nerve by deep learning. Sci. Rep. 2019;9:9007. doi: 10.1038/s41598-019-45487-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kwak GH, et al. Automatic mandibular canal detection using a deep convolutional neural network. Sci. Rep. 2020;10:5711. doi: 10.1038/s41598-020-62586-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Ludlow JB, Davies-Ludlow L, Brooks S, Howerton W. Dosimetry of 3 CBCT devices for oral and maxillofacial radiology: CB Mercuray, NewTom 3G and i-CAT. Dentomaxillofac. Radiol. 2006;35:219–226. doi: 10.1259/dmfr/14340323. [DOI] [PubMed] [Google Scholar]
- 12.Arai Y, Tammisalo E, Iwai K, Hashimoto K, Shinoda K. Development of a compact computed tomographic apparatus for dental use. Dentomaxillofac. Radiol. 1999;28:245–248. doi: 10.1038/sj/dmfr/4600448. [DOI] [PubMed] [Google Scholar]
- 13.Pauwels R, et al. Variability of dental cone beam CT grey values for density estimations. Br. J. Radiol. 2013;86:20120135. doi: 10.1259/bjr.20120135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Jaskari J, et al. Deep learning method for mandibular canal segmentation in dental cone beam computed tomography volumes. Sci. Rep. 2020;10:5842. doi: 10.1038/s41598-020-62321-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Oliveira-Santos C, et al. Visibility of the mandibular canal on CBCT cross-sectional images. J. Appl. Oral Sci. 2011;19:240–243. doi: 10.1590/s1678-77572011000300011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kroon D-J. Segmentation of the mandibular canal in cone-beam CT data. Univ. Twente. 2011 doi: 10.3990/1.9789036532808. [DOI] [Google Scholar]
- 17.Abdolali F, et al. Automatic segmentation of mandibular canal in cone beam CT images using conditional statistical shape model and fast marching. Int. J. Comput. Assist. Radiol. Surg. 2017;12:581–593. doi: 10.1007/s11548-016-1484-2. [DOI] [PubMed] [Google Scholar]
- 18.Kainmueller, D., Lamecker, H., Seim, H., Zinser, M. & Zachow, S. Automatic extraction of mandibular nerve and bone from cone-beam CT data. in International Conference on Medical Image Computing and Computer-Assisted Intervention. 76–83. (Springer, 2009). [DOI] [PubMed]
- 19.Ahn JM, et al. A deep learning model for the detection of both advanced and early glaucoma using fundus photography. PLoS ONE. 2018;13:e0207982. doi: 10.1371/journal.pone.0211579. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Phan S, Satoh SI, Yoda Y, Kashiwagi K, Oshika T. Evaluation of deep convolutional neural networks for glaucoma detection. Jpn. J. Ophthalmol. 2019;63:276–283. doi: 10.1007/s10384-019-00659-6. [DOI] [PubMed] [Google Scholar]
- 21.Chang H-J, et al. Deep learning hybrid method to automatically diagnose periodontal bone loss and stage periodontitis. Sci. Rep. 2020;10:7531. doi: 10.1038/s41598-020-64509-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shen W, et al. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognit. 2017;61:663–673. doi: 10.1016/j.patcog.2016.05.029. [DOI] [Google Scholar]
- 23.Kumar A, Kim J, Lyndon D, Fulham M, Feng D. An ensemble of fine-tuned convolutional neural networks for medical image classification. IEEE J. Biomed. Health Inform. 2016;21:31–40. doi: 10.1109/JBHI.2016.2635663. [DOI] [PubMed] [Google Scholar]
- 24.Yu Y, et al. Deep transfer learning for modality classification of medical images. Information. 2017;8:91. doi: 10.3390/info8030091. [DOI] [Google Scholar]
- 25.Cheng JZ, et al. Computer-aided diagnosis with deep learning architecture: Applications to breast lesions in US images and pulmonary nodules in CT scans. Sci. Rep. 2016;6:24454. doi: 10.1038/srep24454. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Christ, P. F. et al. Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks. arXiv preprint: arXiv:1702.05970. 10.48550/arXiv.1702.0597 (2017).
- 27.Yong T-H, et al. QCBCT-NET for direct measurement of bone mineral density from quantitative cone-beam CT: A human skull phantom study. Sci. Rep. 2021;11:1–13. doi: 10.1038/s41598-021-94359-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Heo M-S, et al. Artificial intelligence in oral and maxillofacial radiology: What is currently possible? Dentomaxillofac. Radiol. 2021;50:20200375. doi: 10.1259/dmfr.20200375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Greenstein G, Tarnow D. The mental foramen and nerve: Clinical and anatomical factors related to dental implant placement: A literature review. J. Periodontol. 2006;77:1933–1943. doi: 10.1902/jop.2006.060197. [DOI] [PubMed] [Google Scholar]
- 30.Fedorov A, et al. 3D Slicer as an image computing platform for the quantitative imaging network. Magn. Reason. Imaging. 2012;30:1323–1341. doi: 10.1016/j.mri.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Fu H, et al. Joint optic disc and cup segmentation based on multi-label deep network and polar transformation. IEEE Trans. Med. Imaging. 2018;37:1597–1605. doi: 10.1109/TMI.2018.2791488. [DOI] [PubMed] [Google Scholar]
- 32.Yin P, Yuan R, Cheng Y, Wu Q. Deep guidance network for biomedical image segmentation. IEEE Access. 2020;8:116106–116116. doi: 10.1109/ACCESS.2020.3002835. [DOI] [Google Scholar]
- 33.Novikov AA, Major D, Wimmer M, Lenis D, Buhler K. Deep sequential segmentation of organs in volumetric medical scans. IEEE Trans. Med. Imaging. 2019;38:1207–1215. doi: 10.1109/TMI.2018.2881678. [DOI] [PubMed] [Google Scholar]
- 34.Yu Y, Si X, Hu C, Zhang J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019;31:1235–1270. doi: 10.1162/neco_a_01199. [DOI] [PubMed] [Google Scholar]
- 35.Sutskever, I., Vinyals, O. & Le, Q. V. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215.10.48550/arXiv.1409.3215 (2014).
- 36.Ventura, C. et al. Rvos: End-to-end recurrent network for video object segmentation. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5277–5286. (IEEE, 2019).
- 37.Shi, X. et al. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. arXiv preprint arXiv:1506.04214.10.48550/arXiv.1506.04214 (2015).
- 38.Sudre, C. H., Li, W., Vercauteren, T., Ourselin, S. & Cardoso, M. J. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. 240–248. (Springer, 2017). [DOI] [PMC free article] [PubMed]
- 39.Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. in International Conference on Medical Image Computing and Computer-Assisted Intervention. 234–241. (Springer, 2015).
- 40.Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017;39:2481–2495. doi: 10.1109/TPAMI.2016.2644615. [DOI] [PubMed] [Google Scholar]
- 41.Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O. 3D U-Net: Learning dense volumetric segmentation from sparse annotation. in International Conference on Medical Image Computing and Computer-Assisted Intervention. 424–432. (Springer, 2016).
- 42.Moris, B., Claesen, L., Sun, Y. & Politis, C. Automated tracking of the mandibular canal in cbct images using matching and multiple hypotheses methods. in 2012 Fourth International Conference on Communications and Electronics. 327–332. (IEEE, 2012).
- 43.Denio D, Torabinejad M, Bakland LK. Anatomical relationship of the mandibular canal to its surrounding structures in mature mandibles. J. Endod. 1992;18:161–165. doi: 10.1016/S0099-2399(06)81411-1. [DOI] [PubMed] [Google Scholar]
- 44.Gowgiel JM. The position and course of the mandibular canal. J. Oral Implantol. 1992;18:383–385. [PubMed] [Google Scholar]
- 45.Monsour PA, Dudhia R. Implant radiography and radiology. Aust. Dent. J. 2008;53(Suppl 1):S11–25. doi: 10.1111/j.1834-7819.2008.00037.x. [DOI] [PubMed] [Google Scholar]
- 46.Wadu SG, Penhall B, Townsend GC. Morphological variability of the human inferior alveolar nerve. Clin. Anat. 1997;10:82–87. doi: 10.1002/(SICI)1098-2353(1997)10:2<82::AID-CA2>3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
- 47.Carter RB, Keen EN. The intramandibular course of the inferior alveolar nerve. J. Anat. 1971;108:433–440. [PMC free article] [PubMed] [Google Scholar]
- 48.Ruder, S. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098. 10.48550/arXiv.1706.05098 (2017).
- 49.Kwon O, et al. Automatic diagnosis for cysts and tumors of both jaws on panoramic radiographs using a deep convolution neural network. Dentomaxillofac. Radiol. 2020;49:20200185. doi: 10.1259/dmfr.20200185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The datasets generated and/or analyzed during the current study are not publicly available due to the restriction by the Institutional Review Board (IRB) of Seoul National University Dental Hospital in order to protect patients’ privacy but are available from the corresponding author on reasonable request. Please contact the corresponding author for any commercial implementation of our research.