Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Mar 19.
Published in final edited form as: Proc IEEE Int Symp Biomed Imaging. 2023 Sep 1;2023:10.1109/isbi53787.2023.10230397. doi: 10.1109/isbi53787.2023.10230397

DENTALMODELSEG: FULLY AUTOMATED SEGMENTATION OF UPPER AND LOWER 3D INTRA-ORAL SURFACES

Mathieu Leclercq a, Antonio Ruellas b, Marcela Gurgel b, Marilia Yatabe b, Jonas Bianchi c, Lucia Cevidanes b, Martin Styner a, Beatriz Paniagua d, Juan Carlos Prieto a
PMCID: PMC10949221  NIHMSID: NIHMS1964468  PMID: 38505097

Abstract

In this paper, we present a deep learning-based method for surface segmentation. This technique consists of acquiring 2D views and extracting features from the surface such as the normal vectors. The rendered images are analyzed with a 2D convolutional neural network, such as a UNET. We test our method in a dental application for the segmentation of dental crowns. The neural network is trained for multi-class segmentation, using image labels as ground truth. A 5-fold cross-validation was performed, and the segmentation task achieved an average Dice of 0.97, sensitivity of 0.98 and precision of 0.98. Our method and algorithms are available as a 3DSlicer extension.

Keywords: 3D surface model, segmentation, deep learning

1. INTRODUCTION

Developments in dentistry led to an improved application of 3D technologies such as intra-oral surface (IOS) scanners which are used to design ceramic crowns, veneers, inlays, and occlusal guards, as well as assist with implants. IOS are being used more often for automated diagnosis such as caries detection [1], analysis of risk factors of tooth movement[2], and treatment planning[3]. These 3D surface models require shape analysis techniques for analyzing and understanding the geometry and performing segmentation, classification, and/or retrieval tasks among others. In this paper, we present a novel method for 3D surface segmentation based on a multiview approach. Fast and accurate segmentation of the IOS remains a challenge due to various geometrical shapes of teeth, complex tooth arrangements, different dental model qualities, and varying degrees of crowding problems[4]. Our target application is the multi-class segmentation following the Universal Numbering System proposed by the American Dental Association (ADA), the dental notation system used in the United States.

The multi-view approach consists of generating 2D images of the 3D surface from different viewpoints. The generated images serve as a training set for a neural network. We use Pytorch3D1 to generate images on the fly during training and a one-to-one mapping that relates faces in the 3D model and pixels in the generated images. This is useful in inference time when we have to put the resulting labels from the images back into the 3D model. The remainder of the manuscript is organized as follows: the materials used in this study, related work, details of our implementation, results, and conclusions.

2. MATERIALS

The dataset consists of 78 mandibular IOS (40 for the upper and 38 for the lower dental arches). A digital dental model of the mandibular arch was acquired from intra-oral scanning with the TRIOS 3D intra-oral scanner. The TRIOS intra-oral scanner (IOS) utilizes ”ultrafast optical sectioning” and confocal microscopy to generate 3D images from multiple 2D images with an accuracy of 6.9 ± 0.9 μ m. All scans were obtained according to the manufacturer’s instructions, by one trained operator. The training was done on an NVIDIA TITAN V GPU with 12GB of memory. The intra-oral surfaces were manually segmented by 3 experts.

2.1. 3D shape analysis

Learning-based methods for shape analysis use 3D models to learn descriptors directly from them. There are mainly 3 types of learning-based methods: multi-view, volumetric, and multi-layer-perceptrons (MLP).

Multi-view approaches adapt state-of-the-art 2D CNNs to work on 3D shapes. The main impediment is the arbitrary structures of 3D models which are usually represented by point clouds or triangular meshes, whereas deep learning algorithms use the regular grid-like structures found in 2D/3D images[5, 6]. By rendering 3D objects from different viewpoints, features are extracted using 2D CNNs[7, 8, 9]. On the other hand, volumetric approaches use 3D voxel grids to represent the shape and apply 3D convolutions to learn shape features[10, 11, 12]. Finally, other approaches consume the point clouds directly and implement multi-layer-perceptrons and/or transformer architectures, or a generalization of typical CNNs [13, 14, 15, 16].

2.2. Related work

MeshSegNet[14] uses raw surface attributes as inputs and integrates a graph-constrained learning module followed by a dense fusion strategy applied to combine local-to-global geometric features for the learning of higher-level features for mesh cell annotation. The predictions by MeshSegNet are further post-processed by a graph-cut refinement step for final segmentation. “Deep Learning Approach to Semantic Segmentation in 3D Point Cloud Intra-oral Scans of Teeth” [17] proposes an end-to-end deep learning framework for segmentation of teeth from point clouds representing IOS. TSegNet [18] developed a fully automatic algorithm to segment teeth on 3D dental models guided by the tooth centroid information. Mask-MCNet[19] localizes each individual tooth instance by predicting its 3D bounding box and segments the points that belong to each individual tooth instance. FlyByCNN[5, 6] is a multi-view based approach that uses UNETs[20] to segment each individual image. The merging and annotating teeth and root canals approach [6] uses a classification model to identify upper or lower jaws. Then, it proceeds to align the 3D objects to a template and then label each crown with the universal id.

3. METHOD

3.1. Rendering the 2D views

The Pytorch3D framework allows the rendering of the 3D object from different viewpoints and extracts views that can be fed to a CNN in an end-to-end process. The rendering engine provides a map that relates pixels in the images to faces in the mesh and allows rapid extraction of point data (normals, curvatures, labels, etc.) as well as setting information back into the mesh after inference. In order to get different viewpoints, we apply random rotations to the camera, so that it moves on the surface of a unit sphere. For each snapshot, we generate two images. The first one contains the surface normals encoded in the RGB components. The second one is the label maps that are used as ground truth in the segmentation task. We set the resolution of the rendered images to 320px. We use ambient lights so that the rendered images don’t have any specular components.

3.2. Training the network

We have in total 78 different IOS. To make the validation more accurate, we implemented a 5-fold cross-validation. For each fold, we used 55 scans for the training, 7 for validation, and 15 or 16 for testing. To create the neural networks we use MONAI2, an open-source Pytorch-based framework for deep learning in healthcare imaging. MONAI provides a complete framework to easily create Datasets and integrate neural networks such as a UNET. We use the DiceCELoss, which computes the Dice Loss as well as Cross-Entropy Loss and returns the weighted sum of these two. This loss is frequently used for segmentation tasks. The learning rate is set to 1e4 using the Adam optimizer. The training learns to identify 34 different labels corresponding to the upper and lower crowns. We use one-hot encoding for the 34 different classes: 32 different crowns, in addition to the gum and the background. During training, one crown is randomly removed each time a sample is rendered. This step makes the model more robust as it learns how to deal with missing teeth.

3.3. Prediction

The prediction is composed of three major steps: 1. Render 2D views from the 3D object; 2. Run inference on the 2D views. 3. map the information back into the 3D mesh. The approach allows rendering multiple views and aggregates the results. The reported metrics are calculated using 70 views with a resolution of 320px. The position of the camera is evenly distributed on the surface of the sphere following a Fibonacci lattice. A test is performed with 64 views, the prediction takes about 6 seconds to compute for all the views. After we run inference on the 2D views, we use a weighted majority voting scheme to put information back into the 3D mesh.

3.4. Post-Process

In the event that some faces of the surface are not assigned to any output at the end of the prediction, we apply an ‘island removal’ approach, that assigns the closest-connected label. We also performed a closing operation (dilation + erosion) on the boundary of each tooth to smooth the boundary.

3.5. Distance metric

One way to tell if our model is accurate is to compute the average distance between the borders of the ground truth and the predicted labels for every crown. We do this by isolating each crown with a threshold and then extracting the points on the border. We then compute the distance between each border point on the predicted label and the closest border point on the ground truth. Finally, we divide the sum of the distances for every point by the number of points to get the average distance for one specific crown.

4. RESULTS

4.1. Distance metric

The model was trained with a majority of surfaces with only 14 crowns and no wisdom teeth. We thus decided to show the results for this kind of sample only. Some predictions with wisdom teeth tend to give less accurate predictions, not only for the wisdom teeth labels but also for those of neighbor crowns. Moreover, some IOS scans do not show the entirety of these wisdom teeth, as they are on the border of the jaw and sometimes not fully out. The poor results for the labels that are just next to the wisdom teeth come from the fact that most wisdom teeth got the label from the next crown. However, some predictions of models with wisdom teeth are still accurate.

4.2. Dice coefficients

5 out of the 7 IOS for the upper jaw have Dice coefficients > 0.9. On these 5 scans all crowns are correctly labeled and the result is almost identical to the ground truth. The worse result for the 3rd scan may come from the fact that the IOS looks different from the training data (see Appendix). The 4th scan for the upper jaw and the 3rd, 5th, 8th and 9th for the lower jaw show worse Dice coefficients because they contain 16 teeth (wisdom teeth). As mentioned before, the training set contains a very few number of samples with wisdom teeth. With a wider range of training data we can expect much better results for these types of scans as well.

Figure 3 shows that when the network makes predictions on data types for which it has trained on, it produces excellent results. The Dice coefficients are always above 0.9 and even above 0.95 for 25 out of the 28 different crowns.

Fig. 3.

Fig. 3.

Dice coefficients for jaws with no wisdom teeth. Left is upper jaw, right is lower. We picked from the test sets 22 upper jaws and 24 lower jaws with just the 14 ”regular” teeth (i.e. no wisdom teeth). For each plot, each ”violin” represents a separate crown.

4.3. Resulting labeled IOS

Figure 4 shows the output segmentation for one of our test cases.

Fig. 4.

Fig. 4.

Resulting labeled IOS from the test set. Figures at the top are upper jaws. Bottom ones are lower jaws.

4.4. Comparison with competing methods

Method DSC SEN PPV
PN 0.84±0.11 0.91±0.12 0.79±0.13
PN++ 0.90±0.06 0.98±0.03 0.84±0.10
PC 0.93±0.06 0.98±0.04 0.90±0.08
MS 0.981 ± 0.028 0.98 ± 0.03 0.97±0.03
Ours 0.96±0.03 0.98±0.11 0.98 ± 0.03

This table shows results for 3 state-of-the-art deep learning methods ( PointNet (PN) [13], PointNet++ (PN++) [21] and PointConv(PC) [16]) as well as MeshSegNet (MS) [14]. The results for our segmentation method were obtained using the data set samples of the 5-fold cross-validation.

5. CONCLUSION

This new method for automatic multi-class segmentation of 3D surfaces has proven to be accurate and effective, as well as easy to integrate into existing processing pipeline since the input and output surface have the same number of points and faces after inference, i.e, there is no sub-sampling of the surface as it is required by the competing approaches. A great advantage of this method is the ability to predict the universal ids of the crowns in the upper and lower jaws. The results reported by the competing approaches focused on upper models only or use a classification model to identify upper/lower jaws. Our approach is fully automated and labels both upper and lower crowns with a single model. The model learns to identify features specific to each jaw. We are aware that this model can still be improved with larger and more diverse data sets. However, the results obtained are competitive with existing methods such as MeshSegNet and PointConv. Prediction for jaws with no wisdom teeth is excellent, and we can safely expect better results for wisdom teeth segmentation as our sample size increases.

Our method is available as a 3D Slicer3 extension.

Supplementary Material

Appendix

Fig. 1.

Fig. 1.

Example of a rendered 2D view. Left: surface normals encoded in RGB components, (additional surface properties may be rendered or extracted from the surface using the faceid maps). Right: ground truth labels for the dental crowns rendered with a color map. These image pairs are used in the training for a segmentation task.

Fig. 2.

Fig. 2.

Violin plots showing the distance metric. Left is upper jaw (22 samples), right is lower jaw (24 samples). The samples used for this figure are the ones that have no wisdom teeth.

Footnotes

6. REFERENCES

  • [1].Lim Jung-Hwa, Mangal Utkarsh, Nam Na-Eun, Choi SungHwan, Shim June-Sung, and Kim Jong-Eun, “A comparison of accuracy of different dental restorative materials between intraoral scanning and conventional impression-taking: An in vitro study,” Materials, vol. 14, no. 8, pp. 2060, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Commer P, Bourauel C, Maier K, and Jäger A, “Construction and testing of a computer-based intraoral laser scanner for determining tooth positions,” Medical engineering & physics, vol. 22, no. 9, pp. 625–635, 2000. [DOI] [PubMed] [Google Scholar]
  • [3].Flügge Tabea V, Stefan Schlager, Nelson Katja, Nahles Susanne, and Metzger Marc C, “Precision of intraoral digital dental impressions with itero and extraoral digitization with the itero and a model scanner,” American journal of orthodontics and dentofacial orthopedics, vol. 144, no. 3, pp. 471–478, 2013. [DOI] [PubMed] [Google Scholar]
  • [4].Li Zhongyi and Wang Hao, “Interactive tooth separation from dental model using segmentation field,” PloS one, vol. 11, no. 8, pp. e0161159, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Boubolo Louis, Dumont Maxime, Brosset Serge, Bianchi Jonas, Ruellas Antonio, Gurgel Marcela, Massaro Camila, Aron Aliaga Del Castillo, Marcos Ioshida, Yatabe Marilia S, et al. , “Flyby cnn: a 3d surface segmentation framework,” in Medical Imaging 2021: Image Processing. International Society for Optics and Photonics, 2021, vol. 11596, p. 115962B. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Romain Deleat-Besson, Celia Le, Zhang Winston, Najla Al Turkestani, Lucia Cevidanes, Bianchi Jonas, Ruellas Antonio, Gurgel Marcela, Massaro Camila, Aron Aliaga Del Castillo, et al. , “Merging and annotating teeth and roots from automated segmentation of multimodal images,” in International Workshop on Multimodal Learning for Clinical Decision Support. Springer, 2021, pp. 81–92. [Google Scholar]
  • [7].Su Hang, Maji Subhransu, Kalogerakis Evangelos, and Erik Learned-Miller, “Multi-view convolutional neural networks for 3d shape recognition,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 945–953. [Google Scholar]
  • [8].Kanezaki Asako, Matsushita Yasuyuki, and Nishida Yoshifumi, “Rotationnet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5010–5019. [Google Scholar]
  • [9].Ma Chao, Guo Yulan, Yang Jungang, and An Wei, “Learning multi-view representation with lstm for 3-d shape recognition and retrieval,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1169–1182, 2018. [Google Scholar]
  • [10].Wu Zhirong, Song Shuran, Khosla Aditya, Yu Fisher, Zhang Linguang, Tang Xiaoou, and Xiao Jianxiong, “3d shapenets: A deep representation for volumetric shapes,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1912–1920. [Google Scholar]
  • [11].Wang Peng-Shuai, Liu Yang, Guo Yu-Xiao, Sun Chun-Yu, and Tong Xin, “O-cnn: Octree-based convolutional neural networks for 3d shape analysis,” ACM Transactions on Graphics (TOG), vol. 36, no. 4, pp. 1–11, 2017. [Google Scholar]
  • [12].Riegler Gernot, Ali Osman Ulusoy, and Andreas Geiger, “Octnet: Learning deep 3d representations at high resolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3577–3586. [Google Scholar]
  • [13].Charles R Qi, Hao Su, Mo Kaichun, and Guibas Leonidas J, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 652–660. [Google Scholar]
  • [14].Lian Chunfeng, Wang Li, Wu Tai-Hsien, Wang Fan, Yap Pew-Thian, Ko Ching-Chang, and Shen Dinggang, “Deep multi-scale mesh feature learning for automated labeling of raw dental surfaces from 3d intraoral scanners,” IEEE transactions on medical imaging, vol. 39, no. 7, pp. 2440–2450, 2020. [DOI] [PubMed] [Google Scholar]
  • [15].Li Yangyan, Bu Rui, Sun Mingchao, Wu Wei, Di Xinhan, and Chen Baoquan, “Pointcnn: Convolution on x-transformed points,” Advances in neural information processing systems, vol. 31, 2018. [Google Scholar]
  • [16].Wu Wenxuan, Qi Zhongang, and Fuxin Li, “Pointconv: Deep convolutional networks on 3d point clouds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9621–9630. [Google Scholar]
  • [17].Farhad Ghazvinian Zanjani, David Anssari Moin, Verheij Bas, Claessen Frank, Cherici Teo, Tan Tao, et al. , “Deep learning approach to semantic segmentation in 3d point cloud intra-oral scans of teeth,” in International Conference on Medical Imaging with Deep Learning. PMLR, 2019, pp. 557–571. [Google Scholar]
  • [18].Cui Zhiming, Li Changjian, Chen Nenglun, Wei Guodong, Chen Runnan, Zhou Yuanfeng, Shen Dinggang, and Wang Wenping, “Tsegnet: an efficient and accurate tooth segmentation network on 3d dental model,” Medical Image Analysis, vol. 69, pp. 101949, 2021. [DOI] [PubMed] [Google Scholar]
  • [19].Farhad Ghazvinian Zanjani, Arash Pourtaherian, Zinger Svitlana, David Anssari Moin, Frank Claessen, Cherici Teo, Parinussa Sarah, and Peter HN de With, “Maskmcnet: Tooth instance segmentation in 3d point clouds of intra-oral scans,” Neurocomputing, vol. 453, pp. 286–298, 2021. [Google Scholar]
  • [20].Ronneberger Olaf, Fischer Philipp, and Brox Thomas, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241. [Google Scholar]
  • [21].Charles R Qi, Li Yi, Su Hao, and Guibas Leonidas J, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,” arXiv preprint arXiv:1706.02413, 2017. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix

RESOURCES