Abstract
Purpose:
The purpose of this study was to reduce the experience dependence during the orthognathic surgical planning that involves virtually simulating the corrective procedure for jaw deformities.
Methods:
We introduce a geometric deep learning framework for generating reference facial bone shape models for objective guidance in surgical planning. First, we propose a surface deformation network to warp a patient's deformed bone to a set of normal bones for generating a dictionary of patient-specific normal bony shapes. Subsequently, sparse representation learning is employed to estimate a reference shape model based on the dictionary.
Results:
We evaluated our method on a clinical dataset containing 24 patients, and compared it with a state-of-the-art method that relies on landmark-based sparse representation. Our method yields significantly higher accuracy than the competing method for estimating normal jaws and maintains the midfaces of patients’ facial bones as well as the conventional way.
Conclusions:
Experimental results indicate that our method generates accurate shape models that meet clinical standards.
Keywords: orthognathic surgical planning, surface deformation, unsupervised learning, 3D point cloud
1 ∣. INTRODUCTION
Orthognathic surgery is a surgical procedure to correct jaw deformities. During surgical planning, computed tomography (CT) or cone-beam computed tomography (CBCT) scans are acquired to generate a three-dimensional (3D) shape model of craniomaxillofacial (CMF) bones.1 The deformed upper and lower jaws are virtually osteotomized from the 3D model (Figure 1a) and cut into several small segments. Each bony segment is moved to a desired location to form a new normal-looking bone model, that is, the planned bone (Figure 1b). This planned bone then guides surgeons to perform surgical correction at the time of surgery (Figure 1c).2,3
FIGURE 1.
(a) The bony surface of a patient with jaw deformity, its normal region is marked as “midface,” and its deformed region (in red) is marked as “jaw.” (b) The jaw is cut into several pieces, which are moved to reconstruct a new normal-looking shape model. (c) The postoperative facial bony shape model
Orthognathic surgical planning is experience dependent. Surgeons move bony segments based on their imagination of what the patient's normal-looking bone should look like. Although some guidance can be obtained by comparing the patient's cephalometric analysis measurements4 with the corresponding normative values represented as means and standard deviations, these measurements just provide little guidance to the planning procedure and therefore cannot fully meet clinical requirements. From the clinician's perspective, an objective reference shape model that represents what a patient's normal facial bone should look like is a paradigm change. This reference shape model will deliver a more accurate personalized surgical plan, and thus significantly improve surgical outcomes.
Wang et al.5 developed a method to predict patient-specific reference bony shape models using CMF bony landmarks. By dividing a patient's bony landmarks into jaw and midface landmarks, the patient's midface landmarks are represented with a set of sparse coefficients based on a normal midface dictionary. Then, these coefficients are applied to a normal jaw dictionary to estimate the patient's normal jaw landmarks. By combining the estimated jaw landmarks with the patient's midface landmarks, the whole estimated landmarks are obtained and then used to compute a deformation that deforms the patient's bony surface and generates a reference bony shape model. This method is reliant on linear representation and might not work as expected when the bony shape of a patient differs significantly from the ones in the dictionary. Moreover, this method is dependent on landmark digitization, which can be labor sensitive and error prone.
Geometric deep learning6 can be applied for shape estimation via point cloud representation.7 Qi et al.8 proposed PointNet by applying shared multilayer perceptrons (s-MLPs) and max-pooling to learn deep point features over a point cloud, giving good performance in classification and segmentation tasks. Based on PointNet, Qi et al.9 further introduced PointNet++ to learn local-global shape features from a point cloud with a hierarchical network. Following PointNet++, a series of more advanced point cloud networks have been proposed for classification, segmentation, object detection, and tracking.10 However, it is challenging to employ these techniques to our task, since they are supervised and require paired data that are not always available. Data for paired deformed normal bones are almost impossible to acquire in practice.
In this paper, we introduce a framework to estimate reference CMF bony shape models by applying point cloud deep learning without relying on paired training data. Specifically, in the first step, we propose an unsupervised surface deformation network. Subsequently, a dictionary of patient-specific normal bones is constructed by warping a patient's deformed bone to a set of bones from a collection of normal subjects using the proposed network. Finally, sparse representation learning is employed based on the dictionary to generate a patient-specific reference bony shape model. Experimental results show that the accuracy of estimating reference shape models yielded by the proposed framework is superior to that from the sparse representation method.5
The rest of this paper is organized as follows. We detail our method in Section 2, present experimental results in Section 3, and discuss and conclude in Section 4.
2 ∣. MATERIALS AND METHODS
We propose a framework to estimate a patient's normal bone from its deformed counterpart (Figure 2). The core of our framework is a surface deformation network (SDNet) that operates on point clouds. SDNet predicts vertex-wise displacements for bone correction based on vertices from a random pair of deformed and normal bony surfaces. When correcting the jaw, the midface is mostly unchanged. Using SDNet, the deformed bone is warped to a set of normal bones to generate a dictionary of patient-specific normal bones. Based on the dictionary, we estimate an accurate reference bony shape model tailored for the patient based on sparse representation learning.
FIGURE 2.
(Left) The surface deformation network (SDNet). (Right) The framework for reference CMF bony shape model estimation. The deformed jaw is marked in red. M is the number of normal subjects
2.1 ∣. SDNet architecture
SDNet (Figure 3) learns and fuses a set of hierarchical point features from any pair of deformed and normal bony surfaces to predict vertex-wise displacements. Two encoding branches are applied to learn features from the input two surfaces separately. Using several encoding layers in each branch, SDNet extracts local-to-global shape information with the coordinate vectors of N vertices on a surface. For each encoding layer, the furthest point sampling9 is utilized to select a point subset from the input points; the surface vertices form the input points for the first layer. Around each sampled point, the neighboring points from the input points are gathered in a 3D ball with radius r. Each group of neighboring points is then aggregated to generate a feature vector via PointConv,11 which is essentially s-MLPs applied on point coordinates and point distributions.12 By propagating point features over a cascade of encoding layers, the number (Nsub) of points is gradually reduced while the dimension (Cfeature) of the point feature vector and the receptive field for each sampled point are increased, finally generating a set of local-to-global features. The two encoding branches share their convolutional weights to ensure consistent learning for the input two surfaces.
FIGURE 3.
The architecture of SDNet, including point feature encoding, fusion, and decoding layers. The number (Nsub) of points and the dimension (Cfeature) of point feature vectors vary across the layers, that is, N > N1 > N2 > N3 and Cout ≤ C1 < C2 < C3
SDNet fuses the features learned from the two encoding branches and propagates the fused features over a set of decoding layers. In each fusion layer, we concatenate point features from a pair of weight-shared encoding layers and fuse them via s-MLPs. The fused features are then decoded with three operations, that is, upsampling via point interpolation,9 grouping via 3D ball neighboring, and PointConv convolution. The decoded point features are concatenated with features of the next fusion layer and fed into the next decoding layer for further processing. The point features are repeatedly fused and decoded until the number of points matches the input. Finally, the final features are mapped to N × 3 output displacement vectors with s-MLPs.
2.2 ∣. Loss function
We design a loss function to encourage SDNet to correct the deformed jaw and fix the normal midface:
(1) |
where Ljaw captures the shape dissimilarity between the warped deformed jaw bone and the normal jaw bone based on the relative coordinates of the surface vertices. The relative coordinates are computed based on a set of landmarks, which are a small number of clinically relevant surface vertices for each bone (Figure 4). That is, the relative coordinate vector of the i-th vertex with coordinate vector cnorm_jaw (i) of the normal bone is computed with respect to the j-th landmark with coordinate vector cnorm_landmark (j) as
(2) |
FIGURE 4.
(a) Landmarks localized on midface (green) and jaw (red) of the deformed bone, positions of jaw landmarks will be updated during warping. (b) Landmarks on the normal bone
Similarly, for the warped deformed bone, we have
(3) |
The loss term Ljaw is defined as
(4) |
where Njaw is the number of jaw vertices, K is the number of landmarks, and ∥·∥2 is the ℓ2-norm. The weight w (i, j) is negatively correlated with the Euclidean distance between the i-th vertex and the j-th landmark, and calculated as
(5) |
where d (i, j) = ∥cnorm_jaw (i)–cnorm_landmark (j)∥2.
To fix the midface during warping, we minimize the average magnitude of displacement vectors on the midface, that is,
(6) |
where Nmidface is the number of midface vertices, (i) and cdef_midface (i) are the coordinate vectors of the i-th pair of corresponding vertices in the warped deformed bone and the original deformed bone, respectively.
We encourage a smooth displacement field to avoid mesh folding, by defining Ldf based on the spatial gradients of vertex displacements:
(7) |
where and are the displacement vectors of vertices Vi and Vj on the deformed bone, respectively; is the one-ring neighboring set13 of Vi, and N is the total number of vertices in the deformed bone.
Finally, ℓ2-norm regularization of the network parameters, realized using Lreg, is incorporated to avoid overfitting.
2.3 ∣. Network training
Vertex-wise correspondences of all surfaces are established by matching a template surface from the training set nonrigidly with all training bony surfaces (Figure 5).
FIGURE 5.
Vertex-wise correspondence is established by warping a bony surface template to each training bony surface
Specifically, a group of corresponding landmarks localized on the training surfaces are rigidly aligned. The surface with landmarks that give the smallest distance to the average landmarks is selected as the template, and warped to each surface using landmark-based thin plate spline (TPS) interpolation,14 refined with nonrigid coherent point drift (CPD) matching.15 The warped versions of the surface template are used for network training. To reduce computation cost during training, we reduce the number of vertices via surface simplification.16 The vertex coordinates are min–max normalized. The loss function (1) is minimized with Adam17 optimizer to determine the optimal network parameters.
For testing, we uniformly sample the same number of vertices from a random pair of deformed and normal bony surfaces as input to SDNet. The resulting displacement field, interpolated using TPS, is used to warp the deformed bony surface to generate a normal-looking bony surface.
2.4 ∣. Inferring patient-specific reference bones
A dictionary of patient-specific normal bones is produced by applying SDNet to warp a deformed bone to a series of normal bones. Using the dictionary, we employ sparse representation learning18 to estimate the patient-specific reference bone. Specifically, we construct two dictionaries, that is, a midface dictionary Dmidface and a jaw dictionary Djaw, using corresponding vertices of all warped deformed bony surfaces. Then, the midface vertices Vdef_midface of the deformed bone are represented with a set of sparse coefficients Cmin according to
(8) |
where ∥·∥1 and ∥·∥2 denote ℓ1-norm and ℓ2-norm, respectively, and λ1 and λ2 control the sparsity of representation. With the sparse coefficients Cmin, the normal jaw vertices Vest_jaw are estimated by calculating
(9) |
The original midface vertices Vdef_midface and the estimated jaw vertices Vest_jaw are then combined to estimate a smooth deformation field, which is finally applied to warp the original deformed bone to generate the reference bony shape model.
3 ∣. EXPERIMENTS AND RESULTS
3.1 ∣. Experimental data
For training, we used CT scans of 47 normal subjects from a previous study19 and CT scans of 61 patients with jaw deformities, approved by our Institutional Review Board (#Pro00009723). These CT scans were segmented20 to extract bony masks. We reconstructed bony surface meshes from the segmentation masks using the marching cubes algorithm.21 A total of 51 landmarks (Table 1) were localized on each bony surface by an experienced oral surgeon, these landmarks were utilized to calculate the loss function during the training stage. Following Section 2.3, we selected a bony surface template to establish dense vertex correspondences. Finally, a total of 2867 (47 × 61) random pairs of deformed normal bones were used to train the network.
TABLE 1.
Fifty-one anatomical landmarks with 19 in the midface and 32 in the jaw
No. | Name | Region | No. | Name | Region | No. | Name | Region |
---|---|---|---|---|---|---|---|---|
1 | N | Midface | 18 | Co-R | Midface | 35 | SIG-R | Lower Jaw |
2 | Rh | Midface | 19 | Co-L | Midface | 36 | Cr-L | Lower Jaw |
3 | Fz-R | Midface | 20 | ANS | Upper Jaw | 37 | SIG-L | Lower Jaw |
4 | Fz-L | Midface | 21 | IC | Upper Jaw | 38 | RMA-R | Lower Jaw |
5 | OrM-R | Midface | 22 | GPF-R | Upper Jaw | 39 | Gos-R | Lower Jaw |
6 | OrM-L | Midface | 23 | GPF-L | Upper Jaw | 40 | Go-R | Lower Jaw |
7 | SOF-R | Midface | 24 | U0 | Upper Jaw | 41 | Ag-R | Lower Jaw |
8 | SOF-L | Midface | 25 | U3T-R | Upper Jaw | 42 | RMA-L | Lower Jaw |
9 | Or-R | Midface | 26 | U3T-L | Upper Jaw | 43 | Gos-L | Lower Jaw |
10 | Or-L | Midface | 27 | U5BC-R | Upper Jaw | 44 | Go-L | Lower Jaw |
11 | ION-R | Midface | 28 | U7DBC-R | Upper Jaw | 45 | Ag-L | Lower Jaw |
12 | ION-L | Midface | 29 | U7DBC-L | Upper Jaw | 46 | L0 | Lower Jaw |
13 | Zy-R | Midface | 30 | MF-R | Lower Jaw | 47 | L3T-R | Lower Jaw |
14 | J-R | Midface | 31 | MF-L | Lower Jaw | 48 | L3T-L | Lower Jaw |
15 | Zy-L | Midface | 32 | B | Lower Jaw | 49 | L5BC-L | Lower Jaw |
16 | J-L | Midface | 33 | Pg | Lower Jaw | 50 | L7DBC-R | Lower Jaw |
For testing, we acquired paired pre- and postoperative CT scans from other 24 patients. The postoperative bones were used as ground truth in evaluation. Fifty-one landmarks were manually digitized on the bony surfaces. The postoperative bone was then rigidly registered to its preoperative bone by matching the corresponding surgically unaltered midface landmarks (Figure 6a). The preoperative bone was eventually warped to the postoperative bone with landmark-based TPS interpolation to generate a remeshed postoperative bony surface (Figure 6b).
FIGURE 6.
(a) A pair of preoperative and postoperative bony surfaces with 51 landmarks in green for the midface region and in red for the jaw. (b) The remeshed postoperative bone generated from the preoperative bone overlaps well with the original postoperative bone. The remeshed midface deviates slightly from the postoperative midface due to inevitable errors in bony segmentation and landmark localization
3.2 ∣. Experimental settings
SDNet was implemented with four encoding, four feature-fusion, and four decoding layers. The numbers of points for the encoding and decoding layers were , where N is the number of input points and can differ for training and testing. During training, N = 4724 after mesh simplification. During testing, N = 10 000. The radius r was set to {0.1, 0.2, 0.4, 0.8} and {0.8, 0.4, 0.2, 0.1} for the four encoding and four decoding layers. Convolutions in the four encoding layers output point features with dimensions Cfeature = {64, 128, 256, 512}. The four feature-fusion and decoding layers output features with dimensions Cfeature = {512, 256, 128, 128}.
Empirically, we set α = 0.3 and β = 0.1 for the loss function (1). A total of K = 51 landmarks were used for calculating Ljaw in (5). The network was trained for 200 epochs with an initial learning rate of 0.0001, decayed with a rate of 0.5 at each 100 epochs. For sparse representation, we set λ1 = 0.1 and λ2 = 0.01 in (9).
3.3 ∣. Evaluation metrics
We employed four metrics for quantitative evaluation: Vertex distance (VD), edge-length distance (ED), surface coverage (SC), and landmark distance (LD). VD measures the average vertex distance between the estimated and ground-truth bony surfaces:
(10) |
where Nv is the number of vertices in the estimated surface, and cest (i) and cgt (i) are the coordinate vectors of the i-th pair of corresponding vertices. ED measures how well the ground-truth mesh topology is preserved in the estimated bony surface:
(11) |
where Ne is the total number of edges in the surface mesh; lest (i) and lgt (i) are the lengths of the i-th pair of corresponding edges. SC measures how well two surfaces are overlapped:
(12) |
where Mv ≤ Nv is the number of unique nearest-neighbor vertices identified on the ground-truth surface with respect to vertices on the estimated surface. Larger SC indicates a greater extent of overlap, meaning the higher estimation accuracy. LD measures the average distance between two sets of corresponding landmarks representing clinically important positions:
(13) |
where Nl = 51 is the number of landmarks, cest_landmark (i) and cgt_landmark (i) are the coordinate vectors of the i-th pair of corresponding landmarks.
3.4 ∣. Evaluation on patient data
We tested our framework on the CMF bones of the 24 patients, and compared it with landmark-based sparse representation (LSR).5 Figure 7 shows the estimated bony surfaces and the vertex distance heat maps for five randomly selected patients. Inspection by an experienced oral surgeon indicates that all estimated bones yielded by our method are clinically acceptable, while only 20 estimates are clinically acceptable for LSR. Table 2 indicates that, for the jaw, our method is significantly more accurate (p < 0.05) than LSR on the four metrics. Table 3 shows that the two methods are comparable (p > 0.05) in maintaining the midface.
FIGURE 7.
Reference bony surfaces estimated for five random patients. Heat maps of surface vertex distance are calculated by comparing the estimated bony surfaces with their postoperative bony surfaces (ground truth)
TABLE 2.
Statistics for VD (mm), ED (mm), SC, and LD (mm) for the jaws of 24 patients
Method | Mean | SD | Median | Min | Max |
---|---|---|---|---|---|
VD | |||||
LSR | 5.74 | 2.09 | 5.45 | 3.20 | 10.99 |
SDNet | 3.82 | 0.86 | 3.87 | 2.24 | 5.26 |
ED | |||||
LSR | 0.34 | 0.06 | 0.32 | 0.26 | 0.47 |
SDNet | 0.27 | 0.04 | 0.27 | 0.19 | 0.37 |
SC | |||||
LSR | 0.63 | 0.09 | 0.63 | 0.41 | 0.75 |
SDNet | 0.71 | 0.05 | 0.70 | 0.64 | 0.82 |
LD | |||||
LSR | 5.50 | 1.66 | 5.12 | 3.68 | 9.84 |
SDNet | 3.70 | 0.72 | 3.67 | 2.56 | 5.04 |
TABLE 3.
Statistics for VD (mm), ED (mm), SC, and LD (mm) for the midfaces of 24 patients
Method | Mean | SD | Median | Min | Max |
---|---|---|---|---|---|
VD | |||||
LSR | 1.32 | 0.38 | 1.22 | 0.80 | 2.08 |
SDNet | 1.32 | 0.38 | 1.31 | 0.70 | 1.91 |
ED | |||||
LSR | 0.15 | 0.04 | 0.14 | 0.08 | 0.23 |
SDNet | 0.14 | 0.04 | 0.14 | 0.07 | 0.20 |
SC | |||||
LSR | 0.96 | 0.04 | 0.97 | 0.88 | 1.00 |
SDNet | 0.95 | 0.04 | 0.96 | 0.88 | 1.00 |
LD | |||||
LSR | 1.29 | 0.43 | 1.23 | 0.72 | 2.01 |
SDNet | 1.30 | 0.43 | 1.23 | 0.73 | 2.00 |
3.5 ∣. Comparison with alternative point cloud networks
We compared SDNet with two alternative point cloud networks:
PointNet-Reg constructed by replacing PointConv in SDNet with the standard convolutional operator (i.e., s-MLPs with max-pooling) used in PointNet++.9
CPD-Net22 with loss function replaced with (1) to match our task.
Figure 8 shows example results for two randomly selected patients. Figure 9 shows that SDNet yields significantly better performance than CPD-Net (p < 0.05) for the jaw on the four metrics. Compared with PointNet-Reg, SDNet yields comparable VD, SC, and LD (p > 0.05), but significantly improved ED (p < 0.05). The three networks are comparable (p > 0.05) in maintaining the midface (Figure 10).
FIGURE 8.
Comparison of reference bony surfaces estimated using three competing networks for two patients
FIGURE 9.
Jaw estimation accuracy of three networks for 24 patients
FIGURE 10.
Midface estimation accuracy of three networks for 24 patients
3.6 ∣. Ablation studies
We performed the following ablation studies:
Without Ljaw: Ljaw calculated with the surface centroid instead of the landmarks.
Without Ldf: SDNet without Ldf.
Figure 11 shows example results yielded by the three versions of SDNet for two random patients. Figure 12 shows the effectiveness of Ljaw in the full SDNet in improving jaw estimation in terms of the four metrics (p < 0.05). Figure 12 also shows the effectiveness of Ldf in the full SDNet in improving jaw estimation, producing significantly improved VD, SC, and LD (p < 0.05), and comparable ED (p > 0.05). Figure 13 indicates that the three methods yield comparable performance (p > 0.05) in midface estimation.
FIGURE 11.
Example bony surfaces estimated using three different loss functions in SDNe
FIGURE 12.
Jaw estimation accuracy with three different loss functions for 24 patients
FIGURE 13.
Midface estimation accuracy with three different loss functions for 24 patients
3.7 ∣. Computational cost
The proposed framework is implemented with Tensorflow23 and VTK.24 Using a 12 GB NVIDIA Xp GPU with 64 GB RAM, it takes about 6 min to train for one epoch, and about 1 min to infer a dictionary of patient-specific bony shapes from 47 normal subjects. Sparse representation requires about 2 s per subject.
4 ∣. DISCUSSION AND CONCLUSION
Our framework uses SDNet to construct a dictionary of patient-specific normal bony shapes, which corrects nonlinear shape differences and therefore allows sparse representation to be used effectively for the estimation of reference bones. SDNet uses bony surfaces from unpaired patients and normal individuals for training and therefore allows effective use of unpaired data.
SDNet is designed to predict surface deformation (based on shape information), not vertex correspondences. Vertex correspondences are utilized to train SDNet, but are not necessary during testing. Note that SDNet is derived from PointNet++, which is invariant to the order of input points.8 SDNet learns hierarchical point features and outperforms CPD-Net,22 which only captures global shape features.
Sparse representation assumes linear relationship between the midface and the jaw in predicting the reference bone. In the future, a nonlinear method to fuse the dictionary of bony surfaces for estimating the reference bone can be implemented using a deep learning framework, potentially allowing end-to-end training of a network for deformed-to-normal bone estimation.
To conclude, we have proposed a surface deformation network for estimating reference CMF bony shape models for orthognathic surgical planning. We demonstrated using a clinical dataset that the proposed framework yields significant performance improvements over sparse representation learning. The reference bony shape models can provide objective guidance for personalized surgical planning to improve surgical outcomes.
ACKNOWLEDGMENTS
This work was supported in part by United States National Institutes of Health (NIH)/National Institute of Dental and Craniofacial Research (NIDCR) grants R01 DE022676, R01 DE027251, and R01 DE021863.
Footnotes
CONFLICT OF INTEREST
The authors have no conflicts to disclose.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.
REFERENCES
- 1.Xia JJ, Gateno J, Teichgraeber JF. Three-dimensional computer-aided surgical simulation for maxillofacial surger. Atlas Oral Maxillofac Surg Clin North Am. 2005;13:25–39. [DOI] [PubMed] [Google Scholar]
- 2.Hsu SS, Gateno J, Bell RB, et al. Accuracy of a computer-aided surgical simulation protocol for orthognathic surgery: a prospective multicenter study. J Oral Maxillofac Surg. 2013;71:128–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 1: planning sequence. Int J Oral Maxillofacial Surg. 2015;44:1431–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 2: three-dimensional cephalometry. Int J Oral Maxillofacial Surg. 2015;44:1441–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang L, Ren Y, Gao Y, et al. Estimating patient-specific and anatomically correct reference model for craniomaxillofacial deformity via sparse representation. Med Phys. 2015;42:5809–5816. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Xiao Y, Lai Y, Zhang F, Li C, Gao L. A survey on deep geometry learning: From a representation perspective. Comput Vis Media. 2020;6:113–133. [Google Scholar]
- 7.Ahmed E, Saint A & Shabayek AR et al. A survey on deep learning advances on different 3D data representations. 2018; arXiv preprint arXiv:1808.01462. [Google Scholar]
- 8.Qi CR, Su H, Mo K & Guibas LJ PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2017:652–660. [Google Scholar]
- 9.Qi CR, Yi L, Su H, Guibas LJ. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proc. Adv. Neural Inf. Process. Syst. 2017:5099–5108. [Google Scholar]
- 10.Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M. Deep learning for 3D point clouds: A survey. IEEE Trans Pattern Anal Mach Intell. 2020:1. 10.1109/TPAMI.2020.3005434 [DOI] [PubMed] [Google Scholar]
- 11.Wu W, Qi Z, PointConv FL. Deep convolutional networks on 3D point clouds. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2019:9621–9630. [Google Scholar]
- 12.Turlach BA. Bandwidth selection in kernel density estimation: A review. Inst. Statistique 1993. [Google Scholar]
- 13.Gatzke TD, Grimm CM. Estimating curvature on triangular meshes. Int J Shape Model. 2006;12:1–28. [Google Scholar]
- 14.Bookstein FL. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell. 1989;11:567–585. [Google Scholar]
- 15.Myronenko A, Song X. Point set registration: Coherent point drift. IEEE Trans Pattern Anal Mach Intell. 2010;32:2262–2275. [DOI] [PubMed] [Google Scholar]
- 16.Garland M, Heckbert PS. Surface simplification using quadric error metrics. In: Proc. SIGGRAPH. 1997;97:209–216. [Google Scholar]
- 17.Kingma DP, Adam BJ. A method for stochastic optimization. In: Proc. Int. Conf. Learn. Representations; 2015. [Google Scholar]
- 18.Donoho DL. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun Pure Appl Math. 2006;59:797–829. [Google Scholar]
- 19.Yan J, Shen GF, Fang B, et al. Three-dimensional CT measurement for the craniomaxillofacial structure of normal occlusion adults in Jiangsu, Zhejiang and Shanghai area. China J Oral Maxillofac Surg. 2010;8:2–9. [Google Scholar]
- 20.Wang L, Gao Y, Shi F, et al. Automated segmentation of dental CBCT image with prior-guided sequential random forests. Med Phys. 2016;43:336–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lorensen WE, Cline HE. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput Graph. 1987;21:163–169. [Google Scholar]
- 22.Wang L, Li X, Chen J, Fang Y. Coherent point drift networks: Unsupervised learning of non-rigid point set registration. 2019. arXiv preprint arXiv:1906.03039. [Google Scholar]
- 23.Abadi M, Barham P, Chen J, et al. Tensorflow: A system for large-scale machine learning. In: Proc. USENIX Symp. Oper. Syst. Design Implement. 2016:265–283. [Google Scholar]
- 24.Schroeder W, Martin K, Lorensen B. The Visualization Toolkit, 4th edn. Kitware Inc.; 2006. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.