Unsupervised learning of reference bony shapes for orthognathic surgical planning with a surface deformation network

Deqiang Xiao; Hannah Deng; Chunfeng Lian; Tianshu Kuang; Qin Liu; Lei Ma; Yankun Lang; Xu Chen; Daeseung Kim; Jaime Gateno; Steve Guofang Shen; Dinggang Shen; Pew-Thian Yap; James J Xia

doi:10.1002/mp.15126

. Author manuscript; available in PMC: 2021 Dec 17.

Published in final edited form as: Med Phys. 2021 Aug 11;48(12):7735–7746. doi: 10.1002/mp.15126

Unsupervised learning of reference bony shapes for orthognathic surgical planning with a surface deformation network

Deqiang Xiao ¹, Hannah Deng ², Chunfeng Lian ¹, Tianshu Kuang ², Qin Liu ¹, Lei Ma ¹, Yankun Lang ¹, Xu Chen ¹, Daeseung Kim ², Jaime Gateno ^2,³, Steve Guofang Shen ⁴, Dinggang Shen ¹, Pew-Thian Yap ¹, James J Xia ^2,³

PMCID: PMC8678379 NIHMSID: NIHMS1762335 PMID: 34309844

Abstract

Purpose:

The purpose of this study was to reduce the experience dependence during the orthognathic surgical planning that involves virtually simulating the corrective procedure for jaw deformities.

Methods:

We introduce a geometric deep learning framework for generating reference facial bone shape models for objective guidance in surgical planning. First, we propose a surface deformation network to warp a patient's deformed bone to a set of normal bones for generating a dictionary of patient-specific normal bony shapes. Subsequently, sparse representation learning is employed to estimate a reference shape model based on the dictionary.

Results:

We evaluated our method on a clinical dataset containing 24 patients, and compared it with a state-of-the-art method that relies on landmark-based sparse representation. Our method yields significantly higher accuracy than the competing method for estimating normal jaws and maintains the midfaces of patients’ facial bones as well as the conventional way.

Conclusions:

Experimental results indicate that our method generates accurate shape models that meet clinical standards.

Keywords: orthognathic surgical planning, surface deformation, unsupervised learning, 3D point cloud

1 ∣. INTRODUCTION

Orthognathic surgery is a surgical procedure to correct jaw deformities. During surgical planning, computed tomography (CT) or cone-beam computed tomography (CBCT) scans are acquired to generate a three-dimensional (3D) shape model of craniomaxillofacial (CMF) bones.¹ The deformed upper and lower jaws are virtually osteotomized from the 3D model (Figure 1a) and cut into several small segments. Each bony segment is moved to a desired location to form a new normal-looking bone model, that is, the planned bone (Figure 1b). This planned bone then guides surgeons to perform surgical correction at the time of surgery (Figure 1c).^2,3

(a) The bony surface of a patient with jaw deformity, its normal region is marked as “midface,” and its deformed region (in red) is marked as “jaw.” (b) The jaw is cut into several pieces, which are moved to reconstruct a new normal-looking shape model. (c) The postoperative facial bony shape model

Orthognathic surgical planning is experience dependent. Surgeons move bony segments based on their imagination of what the patient's normal-looking bone should look like. Although some guidance can be obtained by comparing the patient's cephalometric analysis measurements⁴ with the corresponding normative values represented as means and standard deviations, these measurements just provide little guidance to the planning procedure and therefore cannot fully meet clinical requirements. From the clinician's perspective, an objective reference shape model that represents what a patient's normal facial bone should look like is a paradigm change. This reference shape model will deliver a more accurate personalized surgical plan, and thus significantly improve surgical outcomes.

Wang et al.⁵ developed a method to predict patient-specific reference bony shape models using CMF bony landmarks. By dividing a patient's bony landmarks into jaw and midface landmarks, the patient's midface landmarks are represented with a set of sparse coefficients based on a normal midface dictionary. Then, these coefficients are applied to a normal jaw dictionary to estimate the patient's normal jaw landmarks. By combining the estimated jaw landmarks with the patient's midface landmarks, the whole estimated landmarks are obtained and then used to compute a deformation that deforms the patient's bony surface and generates a reference bony shape model. This method is reliant on linear representation and might not work as expected when the bony shape of a patient differs significantly from the ones in the dictionary. Moreover, this method is dependent on landmark digitization, which can be labor sensitive and error prone.

Geometric deep learning⁶ can be applied for shape estimation via point cloud representation.⁷ Qi et al.⁸ proposed PointNet by applying shared multilayer perceptrons (s-MLPs) and max-pooling to learn deep point features over a point cloud, giving good performance in classification and segmentation tasks. Based on PointNet, Qi et al.⁹ further introduced PointNet++ to learn local-global shape features from a point cloud with a hierarchical network. Following PointNet++, a series of more advanced point cloud networks have been proposed for classification, segmentation, object detection, and tracking.¹⁰ However, it is challenging to employ these techniques to our task, since they are supervised and require paired data that are not always available. Data for paired deformed normal bones are almost impossible to acquire in practice.

In this paper, we introduce a framework to estimate reference CMF bony shape models by applying point cloud deep learning without relying on paired training data. Specifically, in the first step, we propose an unsupervised surface deformation network. Subsequently, a dictionary of patient-specific normal bones is constructed by warping a patient's deformed bone to a set of bones from a collection of normal subjects using the proposed network. Finally, sparse representation learning is employed based on the dictionary to generate a patient-specific reference bony shape model. Experimental results show that the accuracy of estimating reference shape models yielded by the proposed framework is superior to that from the sparse representation method.⁵

The rest of this paper is organized as follows. We detail our method in Section 2, present experimental results in Section 3, and discuss and conclude in Section 4.

2 ∣. MATERIALS AND METHODS

We propose a framework to estimate a patient's normal bone from its deformed counterpart (Figure 2). The core of our framework is a surface deformation network (SDNet) that operates on point clouds. SDNet predicts vertex-wise displacements for bone correction based on vertices from a random pair of deformed and normal bony surfaces. When correcting the jaw, the midface is mostly unchanged. Using SDNet, the deformed bone is warped to a set of normal bones to generate a dictionary of patient-specific normal bones. Based on the dictionary, we estimate an accurate reference bony shape model tailored for the patient based on sparse representation learning.

(Left) The surface deformation network (SDNet). (Right) The framework for reference CMF bony shape model estimation. The deformed jaw is marked in red. M is the number of normal subjects

2.1 ∣. SDNet architecture

SDNet (Figure 3) learns and fuses a set of hierarchical point features from any pair of deformed and normal bony surfaces to predict vertex-wise displacements. Two encoding branches are applied to learn features from the input two surfaces separately. Using several encoding layers in each branch, SDNet extracts local-to-global shape information with the coordinate vectors of N vertices on a surface. For each encoding layer, the furthest point sampling⁹ is utilized to select a point subset from the input points; the surface vertices form the input points for the first layer. Around each sampled point, the neighboring points from the input points are gathered in a 3D ball with radius r. Each group of neighboring points is then aggregated to generate a feature vector via PointConv,¹¹ which is essentially s-MLPs applied on point coordinates and point distributions.¹² By propagating point features over a cascade of encoding layers, the number (N_sub) of points is gradually reduced while the dimension (C_feature) of the point feature vector and the receptive field for each sampled point are increased, finally generating a set of local-to-global features. The two encoding branches share their convolutional weights to ensure consistent learning for the input two surfaces.

The architecture of SDNet, including point feature encoding, fusion, and decoding layers. The number (N_sub) of points and the dimension (C_feature) of point feature vectors vary across the layers, that is, N > N₁ > N₂ > N₃ and C_out ≤ C₁ < C₂ < C₃

SDNet fuses the features learned from the two encoding branches and propagates the fused features over a set of decoding layers. In each fusion layer, we concatenate point features from a pair of weight-shared encoding layers and fuse them via s-MLPs. The fused features are then decoded with three operations, that is, upsampling via point interpolation,⁹ grouping via 3D ball neighboring, and PointConv convolution. The decoded point features are concatenated with features of the next fusion layer and fed into the next decoding layer for further processing. The point features are repeatedly fused and decoded until the number of points matches the input. Finally, the final features are mapped to N × 3 output displacement vectors with s-MLPs.

2.2 ∣. Loss function

We design a loss function to encourage SDNet to correct the deformed jaw and fix the normal midface:

L = L_{jaw} + L_{midface} + α L_{df} + β L_{reg},

(1)

where L_jaw captures the shape dissimilarity between the warped deformed jaw bone and the normal jaw bone based on the relative coordinates of the surface vertices. The relative coordinates are computed based on a set of landmarks, which are a small number of clinically relevant surface vertices for each bone (Figure 4). That is, the relative coordinate vector of the i-th vertex with coordinate vector c_{norm_jaw} (i) of the normal bone is computed with respect to the j-th landmark with coordinate vector c_{norm_landmark} (j) as

r_{norm_jaw} (i, j) = c_{norm_jaw} (i) - c_{norm_landmark} (j) .

(2)

(a) Landmarks localized on midface (green) and jaw (red) of the deformed bone, positions of jaw landmarks will be updated during warping. (b) Landmarks on the normal bone

Similarly, for the warped deformed bone, we have

r_{def_jaw}^{'} (i, j) = c_{def_jaw}^{'} (i) - c_{def_landmark}^{'} (j) .

(3)

The loss term L_jaw is defined as

L_{jaw} = \frac{1}{N_{jaw}} \sum_{i = 1}^{N_{jaw}} \sum_{j = 1}^{K} w (i, j) {‖ r_{def_jaw}^{'} (i, j) - r_{norm_jaw} (i, j) ‖}_{2}

(4)

where N_jaw is the number of jaw vertices, K is the number of landmarks, and ∥·∥₂ is the ℓ₂-norm. The weight w (i, j) is negatively correlated with the Euclidean distance between the i-th vertex and the j-th landmark, and calculated as

w (i, j) = \frac{1}{(K - 1)} (1 - \frac{d (i, j)}{\sum_{j = 1}^{K} d (i, j)}),

(5)

where d (i, j) = ∥c_{norm_jaw} (i)–c_{norm_landmark} (j)∥₂.

To fix the midface during warping, we minimize the average magnitude of displacement vectors on the midface, that is,

L_{midface} = \frac{1}{N_{midface}} \sum_{i = 1}^{N_{midface}} ‖ c'_{def_midface} (i) - c_{def_midface} (i) ‖_{2},

(6)

where N_midface is the number of midface vertices, $c_{def_midface}^{'}$ (i) and c_{def_midface} (i) are the coordinate vectors of the i-th pair of corresponding vertices in the warped deformed bone and the original deformed bone, respectively.

We encourage a smooth displacement field to avoid mesh folding, by defining L_df based on the spatial gradients of vertex displacements:

L_{df} = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{‖ N (v_{i}) ‖} \sum_{v_{j} \in N (v_{i})} ‖ u (i) - u (j) ‖_{2},

(7)

where $u (i) = c_{def}^{'} (i) - c_{def} (i)$ and $u (j) = c_{def}^{'} (j) - c_{def} (j)$ are the displacement vectors of vertices V_i and V_j on the deformed bone, respectively; $N (v_{i})$ is the one-ring neighboring set¹³ of V_i, and N is the total number of vertices in the deformed bone.

Finally, ℓ₂-norm regularization of the network parameters, realized using L_reg, is incorporated to avoid overfitting.

2.3 ∣. Network training

Vertex-wise correspondences of all surfaces are established by matching a template surface from the training set nonrigidly with all training bony surfaces (Figure 5).

Vertex-wise correspondence is established by warping a bony surface template to each training bony surface

Specifically, a group of corresponding landmarks localized on the training surfaces are rigidly aligned. The surface with landmarks that give the smallest distance to the average landmarks is selected as the template, and warped to each surface using landmark-based thin plate spline (TPS) interpolation,¹⁴ refined with nonrigid coherent point drift (CPD) matching.¹⁵ The warped versions of the surface template are used for network training. To reduce computation cost during training, we reduce the number of vertices via surface simplification.¹⁶ The vertex coordinates are min–max normalized. The loss function (1) is minimized with Adam¹⁷ optimizer to determine the optimal network parameters.

For testing, we uniformly sample the same number of vertices from a random pair of deformed and normal bony surfaces as input to SDNet. The resulting displacement field, interpolated using TPS, is used to warp the deformed bony surface to generate a normal-looking bony surface.

2.4 ∣. Inferring patient-specific reference bones

A dictionary of patient-specific normal bones is produced by applying SDNet to warp a deformed bone to a series of normal bones. Using the dictionary, we employ sparse representation learning¹⁸ to estimate the patient-specific reference bone. Specifically, we construct two dictionaries, that is, a midface dictionary D_midface and a jaw dictionary D_jaw, using corresponding vertices of all warped deformed bony surfaces. Then, the midface vertices V_{def_midface} of the deformed bone are represented with a set of sparse coefficients C_min according to

C_{min} = \underset{C}{arg min} ‖ D_{midface} C - V_{def_midface} ‖_{2}^{2} + λ_{1} ‖ C ‖_{1} + λ_{2} ‖ C ‖_{2},

(8)

where ∥·∥₁ and ∥·∥₂ denote ℓ₁-norm and ℓ₂-norm, respectively, and λ₁ and λ₂ control the sparsity of representation. With the sparse coefficients C_min, the normal jaw vertices V_{est_jaw} are estimated by calculating

V_{est_jaw} = D_{jaw} C_{min} .

(9)

The original midface vertices V_{def_midface} and the estimated jaw vertices V_{est_jaw} are then combined to estimate a smooth deformation field, which is finally applied to warp the original deformed bone to generate the reference bony shape model.

3 ∣. EXPERIMENTS AND RESULTS

3.1 ∣. Experimental data

For training, we used CT scans of 47 normal subjects from a previous study¹⁹ and CT scans of 61 patients with jaw deformities, approved by our Institutional Review Board (#Pro00009723). These CT scans were segmented²⁰ to extract bony masks. We reconstructed bony surface meshes from the segmentation masks using the marching cubes algorithm.²¹ A total of 51 landmarks (Table 1) were localized on each bony surface by an experienced oral surgeon, these landmarks were utilized to calculate the loss function during the training stage. Following Section 2.3, we selected a bony surface template to establish dense vertex correspondences. Finally, a total of 2867 (47 × 61) random pairs of deformed normal bones were used to train the network.

TABLE 1.

Fifty-one anatomical landmarks with 19 in the midface and 32 in the jaw

No.	Name	Region	No.	Name	Region	No.	Name	Region
1	N	Midface	18	Co-R	Midface	35	SIG-R	Lower Jaw
2	Rh	Midface	19	Co-L	Midface	36	Cr-L	Lower Jaw
3	Fz-R	Midface	20	ANS	Upper Jaw	37	SIG-L	Lower Jaw
4	Fz-L	Midface	21	IC	Upper Jaw	38	RMA-R	Lower Jaw
5	OrM-R	Midface	22	GPF-R	Upper Jaw	39	Gos-R	Lower Jaw
6	OrM-L	Midface	23	GPF-L	Upper Jaw	40	Go-R	Lower Jaw
7	SOF-R	Midface	24	U0	Upper Jaw	41	Ag-R	Lower Jaw
8	SOF-L	Midface	25	U3T-R	Upper Jaw	42	RMA-L	Lower Jaw
9	Or-R	Midface	26	U3T-L	Upper Jaw	43	Gos-L	Lower Jaw
10	Or-L	Midface	27	U5BC-R	Upper Jaw	44	Go-L	Lower Jaw
11	ION-R	Midface	28	U7DBC-R	Upper Jaw	45	Ag-L	Lower Jaw
12	ION-L	Midface	29	U7DBC-L	Upper Jaw	46	L0	Lower Jaw
13	Zy-R	Midface	30	MF-R	Lower Jaw	47	L3T-R	Lower Jaw
14	J-R	Midface	31	MF-L	Lower Jaw	48	L3T-L	Lower Jaw
15	Zy-L	Midface	32	B	Lower Jaw	49	L5BC-L	Lower Jaw
16	J-L	Midface	33	Pg	Lower Jaw	50	L7DBC-R	Lower Jaw

Open in a new tab

For testing, we acquired paired pre- and postoperative CT scans from other 24 patients. The postoperative bones were used as ground truth in evaluation. Fifty-one landmarks were manually digitized on the bony surfaces. The postoperative bone was then rigidly registered to its preoperative bone by matching the corresponding surgically unaltered midface landmarks (Figure 6a). The preoperative bone was eventually warped to the postoperative bone with landmark-based TPS interpolation to generate a remeshed postoperative bony surface (Figure 6b).

(a) A pair of preoperative and postoperative bony surfaces with 51 landmarks in green for the midface region and in red for the jaw. (b) The remeshed postoperative bone generated from the preoperative bone overlaps well with the original postoperative bone. The remeshed midface deviates slightly from the postoperative midface due to inevitable errors in bony segmentation and landmark localization

3.2 ∣. Experimental settings

SDNet was implemented with four encoding, four feature-fusion, and four decoding layers. The numbers of points for the encoding and decoding layers were $N_{sub} = {\frac{N}{4}, \frac{N}{16}, \frac{N}{64}, \frac{N}{128}, \frac{N}{64}, \frac{N}{16}, \frac{N}{4}, N}$ , where N is the number of input points and can differ for training and testing. During training, N = 4724 after mesh simplification. During testing, N = 10 000. The radius r was set to {0.1, 0.2, 0.4, 0.8} and {0.8, 0.4, 0.2, 0.1} for the four encoding and four decoding layers. Convolutions in the four encoding layers output point features with dimensions C_feature = {64, 128, 256, 512}. The four feature-fusion and decoding layers output features with dimensions C_feature = {512, 256, 128, 128}.

Empirically, we set α = 0.3 and β = 0.1 for the loss function (1). A total of K = 51 landmarks were used for calculating L_jaw in (5). The network was trained for 200 epochs with an initial learning rate of 0.0001, decayed with a rate of 0.5 at each 100 epochs. For sparse representation, we set λ₁ = 0.1 and λ₂ = 0.01 in (9).

3.3 ∣. Evaluation metrics

We employed four metrics for quantitative evaluation: Vertex distance (VD), edge-length distance (ED), surface coverage (SC), and landmark distance (LD). VD measures the average vertex distance between the estimated and ground-truth bony surfaces:

VD = \frac{1}{N_{v}} \sum_{i = 1}^{N_{v}} {‖ c_{est} (i) - c_{gt} (i) ‖}_{2},

(10)

where N_v is the number of vertices in the estimated surface, and c_est (i) and c_gt (i) are the coordinate vectors of the i-th pair of corresponding vertices. ED measures how well the ground-truth mesh topology is preserved in the estimated bony surface:

ED = \frac{1}{N_{e}} \sum_{i = 1}^{N_{e}} {‖ l_{est} (i) - l_{gt} (i) ‖}_{1},

(11)

where N_e is the total number of edges in the surface mesh; l_est (i) and l_gt (i) are the lengths of the i-th pair of corresponding edges. SC measures how well two surfaces are overlapped:

SC = \frac{M_{v}}{N_{v}},

(12)

where M_v ≤ N_v is the number of unique nearest-neighbor vertices identified on the ground-truth surface with respect to vertices on the estimated surface. Larger SC indicates a greater extent of overlap, meaning the higher estimation accuracy. LD measures the average distance between two sets of corresponding landmarks representing clinically important positions:

LD = \frac{1}{N_{l}} \sum_{i = 1}^{N_{l}} {‖ c_{est_landmark} (i) - c_{gt_landmark} (i) ‖}_{2},

(13)

where N_l = 51 is the number of landmarks, c_{est_landmark} (i) and c_{gt_landmark} (i) are the coordinate vectors of the i-th pair of corresponding landmarks.

3.4 ∣. Evaluation on patient data

We tested our framework on the CMF bones of the 24 patients, and compared it with landmark-based sparse representation (LSR).⁵ Figure 7 shows the estimated bony surfaces and the vertex distance heat maps for five randomly selected patients. Inspection by an experienced oral surgeon indicates that all estimated bones yielded by our method are clinically acceptable, while only 20 estimates are clinically acceptable for LSR. Table 2 indicates that, for the jaw, our method is significantly more accurate (p < 0.05) than LSR on the four metrics. Table 3 shows that the two methods are comparable (p > 0.05) in maintaining the midface.

Reference bony surfaces estimated for five random patients. Heat maps of surface vertex distance are calculated by comparing the estimated bony surfaces with their postoperative bony surfaces (ground truth)

TABLE 2.

Statistics for VD (mm), ED (mm), SC, and LD (mm) for the jaws of 24 patients

Method	Mean	SD	Median	Min	Max
VD
LSR	5.74	2.09	5.45	3.20	10.99
SDNet	3.82	0.86	3.87	2.24	5.26
ED
LSR	0.34	0.06	0.32	0.26	0.47
SDNet	0.27	0.04	0.27	0.19	0.37
SC
LSR	0.63	0.09	0.63	0.41	0.75
SDNet	0.71	0.05	0.70	0.64	0.82
LD
LSR	5.50	1.66	5.12	3.68	9.84
SDNet	3.70	0.72	3.67	2.56	5.04

Open in a new tab

TABLE 3.

Statistics for VD (mm), ED (mm), SC, and LD (mm) for the midfaces of 24 patients

Method	Mean	SD	Median	Min	Max
VD
LSR	1.32	0.38	1.22	0.80	2.08
SDNet	1.32	0.38	1.31	0.70	1.91
ED
LSR	0.15	0.04	0.14	0.08	0.23
SDNet	0.14	0.04	0.14	0.07	0.20
SC
LSR	0.96	0.04	0.97	0.88	1.00
SDNet	0.95	0.04	0.96	0.88	1.00
LD
LSR	1.29	0.43	1.23	0.72	2.01
SDNet	1.30	0.43	1.23	0.73	2.00

Open in a new tab

3.5 ∣. Comparison with alternative point cloud networks

We compared SDNet with two alternative point cloud networks:

PointNet-Reg constructed by replacing PointConv in SDNet with the standard convolutional operator (i.e., s-MLPs with max-pooling) used in PointNet++.⁹
CPD-Net²² with loss function replaced with (1) to match our task.

Figure 8 shows example results for two randomly selected patients. Figure 9 shows that SDNet yields significantly better performance than CPD-Net (p < 0.05) for the jaw on the four metrics. Compared with PointNet-Reg, SDNet yields comparable VD, SC, and LD (p > 0.05), but significantly improved ED (p < 0.05). The three networks are comparable (p > 0.05) in maintaining the midface (Figure 10).

Comparison of reference bony surfaces estimated using three competing networks for two patients

Jaw estimation accuracy of three networks for 24 patients

Midface estimation accuracy of three networks for 24 patients

3.6 ∣. Ablation studies

We performed the following ablation studies:

Without L_jaw: L_jaw calculated with the surface centroid instead of the landmarks.
Without L_df: SDNet without L_df.

Figure 11 shows example results yielded by the three versions of SDNet for two random patients. Figure 12 shows the effectiveness of L_jaw in the full SDNet in improving jaw estimation in terms of the four metrics (p < 0.05). Figure 12 also shows the effectiveness of L_df in the full SDNet in improving jaw estimation, producing significantly improved VD, SC, and LD (p < 0.05), and comparable ED (p > 0.05). Figure 13 indicates that the three methods yield comparable performance (p > 0.05) in midface estimation.

Example bony surfaces estimated using three different loss functions in SDNe

Jaw estimation accuracy with three different loss functions for 24 patients

Midface estimation accuracy with three different loss functions for 24 patients

3.7 ∣. Computational cost

The proposed framework is implemented with Tensorflow²³ and VTK.²⁴ Using a 12 GB NVIDIA Xp GPU with 64 GB RAM, it takes about 6 min to train for one epoch, and about 1 min to infer a dictionary of patient-specific bony shapes from 47 normal subjects. Sparse representation requires about 2 s per subject.

4 ∣. DISCUSSION AND CONCLUSION

Our framework uses SDNet to construct a dictionary of patient-specific normal bony shapes, which corrects nonlinear shape differences and therefore allows sparse representation to be used effectively for the estimation of reference bones. SDNet uses bony surfaces from unpaired patients and normal individuals for training and therefore allows effective use of unpaired data.

SDNet is designed to predict surface deformation (based on shape information), not vertex correspondences. Vertex correspondences are utilized to train SDNet, but are not necessary during testing. Note that SDNet is derived from PointNet++, which is invariant to the order of input points.⁸ SDNet learns hierarchical point features and outperforms CPD-Net,²² which only captures global shape features.

Sparse representation assumes linear relationship between the midface and the jaw in predicting the reference bone. In the future, a nonlinear method to fuse the dictionary of bony surfaces for estimating the reference bone can be implemented using a deep learning framework, potentially allowing end-to-end training of a network for deformed-to-normal bone estimation.

To conclude, we have proposed a surface deformation network for estimating reference CMF bony shape models for orthognathic surgical planning. We demonstrated using a clinical dataset that the proposed framework yields significant performance improvements over sparse representation learning. The reference bony shape models can provide objective guidance for personalized surgical planning to improve surgical outcomes.

ACKNOWLEDGMENTS

This work was supported in part by United States National Institutes of Health (NIH)/National Institute of Dental and Craniofacial Research (NIDCR) grants R01 DE022676, R01 DE027251, and R01 DE021863.

Footnotes

CONFLICT OF INTEREST

The authors have no conflicts to disclose.

DATA AVAILABILITY STATEMENT

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

1.Xia JJ, Gateno J, Teichgraeber JF. Three-dimensional computer-aided surgical simulation for maxillofacial surger. Atlas Oral Maxillofac Surg Clin North Am. 2005;13:25–39. [DOI] [PubMed] [Google Scholar]
2.Hsu SS, Gateno J, Bell RB, et al. Accuracy of a computer-aided surgical simulation protocol for orthognathic surgery: a prospective multicenter study. J Oral Maxillofac Surg. 2013;71:128–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 1: planning sequence. Int J Oral Maxillofacial Surg. 2015;44:1431–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 2: three-dimensional cephalometry. Int J Oral Maxillofacial Surg. 2015;44:1441–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wang L, Ren Y, Gao Y, et al. Estimating patient-specific and anatomically correct reference model for craniomaxillofacial deformity via sparse representation. Med Phys. 2015;42:5809–5816. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Xiao Y, Lai Y, Zhang F, Li C, Gao L. A survey on deep geometry learning: From a representation perspective. Comput Vis Media. 2020;6:113–133. [Google Scholar]
7.Ahmed E, Saint A & Shabayek AR et al. A survey on deep learning advances on different 3D data representations. 2018; arXiv preprint arXiv:1808.01462. [Google Scholar]
8.Qi CR, Su H, Mo K & Guibas LJ PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2017:652–660. [Google Scholar]
9.Qi CR, Yi L, Su H, Guibas LJ. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proc. Adv. Neural Inf. Process. Syst. 2017:5099–5108. [Google Scholar]
10.Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M. Deep learning for 3D point clouds: A survey. IEEE Trans Pattern Anal Mach Intell. 2020:1. 10.1109/TPAMI.2020.3005434 [DOI] [PubMed] [Google Scholar]
11.Wu W, Qi Z, PointConv FL. Deep convolutional networks on 3D point clouds. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2019:9621–9630. [Google Scholar]
12.Turlach BA. Bandwidth selection in kernel density estimation: A review. Inst. Statistique 1993. [Google Scholar]
13.Gatzke TD, Grimm CM. Estimating curvature on triangular meshes. Int J Shape Model. 2006;12:1–28. [Google Scholar]
14.Bookstein FL. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell. 1989;11:567–585. [Google Scholar]
15.Myronenko A, Song X. Point set registration: Coherent point drift. IEEE Trans Pattern Anal Mach Intell. 2010;32:2262–2275. [DOI] [PubMed] [Google Scholar]
16.Garland M, Heckbert PS. Surface simplification using quadric error metrics. In: Proc. SIGGRAPH. 1997;97:209–216. [Google Scholar]
17.Kingma DP, Adam BJ. A method for stochastic optimization. In: Proc. Int. Conf. Learn. Representations; 2015. [Google Scholar]
18.Donoho DL. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun Pure Appl Math. 2006;59:797–829. [Google Scholar]
19.Yan J, Shen GF, Fang B, et al. Three-dimensional CT measurement for the craniomaxillofacial structure of normal occlusion adults in Jiangsu, Zhejiang and Shanghai area. China J Oral Maxillofac Surg. 2010;8:2–9. [Google Scholar]
20.Wang L, Gao Y, Shi F, et al. Automated segmentation of dental CBCT image with prior-guided sequential random forests. Med Phys. 2016;43:336–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Lorensen WE, Cline HE. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput Graph. 1987;21:163–169. [Google Scholar]
22.Wang L, Li X, Chen J, Fang Y. Coherent point drift networks: Unsupervised learning of non-rigid point set registration. 2019. arXiv preprint arXiv:1906.03039. [Google Scholar]
23.Abadi M, Barham P, Chen J, et al. Tensorflow: A system for large-scale machine learning. In: Proc. USENIX Symp. Oper. Syst. Design Implement. 2016:265–283. [Google Scholar]
24.Schroeder W, Martin K, Lorensen B. The Visualization Toolkit, 4th edn. Kitware Inc.; 2006. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

[R1] 1.Xia JJ, Gateno J, Teichgraeber JF. Three-dimensional computer-aided surgical simulation for maxillofacial surger. Atlas Oral Maxillofac Surg Clin North Am. 2005;13:25–39. [DOI] [PubMed] [Google Scholar]

[R2] 2.Hsu SS, Gateno J, Bell RB, et al. Accuracy of a computer-aided surgical simulation protocol for orthognathic surgery: a prospective multicenter study. J Oral Maxillofac Surg. 2013;71:128–142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R3] 3.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 1: planning sequence. Int J Oral Maxillofacial Surg. 2015;44:1431–1440. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Xia JJ, Gateno J, Teichgraeber JF, et al. Algorithm for planning a double-jaw orthognathic surgery using a computer-aided surgical simulation (CASS) protocol. Part 2: three-dimensional cephalometry. Int J Oral Maxillofacial Surg. 2015;44:1441–1450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Wang L, Ren Y, Gao Y, et al. Estimating patient-specific and anatomically correct reference model for craniomaxillofacial deformity via sparse representation. Med Phys. 2015;42:5809–5816. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Xiao Y, Lai Y, Zhang F, Li C, Gao L. A survey on deep geometry learning: From a representation perspective. Comput Vis Media. 2020;6:113–133. [Google Scholar]

[R7] 7.Ahmed E, Saint A & Shabayek AR et al. A survey on deep learning advances on different 3D data representations. 2018; arXiv preprint arXiv:1808.01462. [Google Scholar]

[R8] 8.Qi CR, Su H, Mo K & Guibas LJ PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2017:652–660. [Google Scholar]

[R9] 9.Qi CR, Yi L, Su H, Guibas LJ. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In: Proc. Adv. Neural Inf. Process. Syst. 2017:5099–5108. [Google Scholar]

[R10] 10.Guo Y, Wang H, Hu Q, Liu H, Liu L, Bennamoun M. Deep learning for 3D point clouds: A survey. IEEE Trans Pattern Anal Mach Intell. 2020:1. 10.1109/TPAMI.2020.3005434 [DOI] [PubMed] [Google Scholar]

[R11] 11.Wu W, Qi Z, PointConv FL. Deep convolutional networks on 3D point clouds. In: Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2019:9621–9630. [Google Scholar]

[R12] 12.Turlach BA. Bandwidth selection in kernel density estimation: A review. Inst. Statistique 1993. [Google Scholar]

[R13] 13.Gatzke TD, Grimm CM. Estimating curvature on triangular meshes. Int J Shape Model. 2006;12:1–28. [Google Scholar]

[R14] 14.Bookstein FL. Principal warps: Thin-plate splines and the decomposition of deformations. IEEE Trans Pattern Anal Mach Intell. 1989;11:567–585. [Google Scholar]

[R15] 15.Myronenko A, Song X. Point set registration: Coherent point drift. IEEE Trans Pattern Anal Mach Intell. 2010;32:2262–2275. [DOI] [PubMed] [Google Scholar]

[R16] 16.Garland M, Heckbert PS. Surface simplification using quadric error metrics. In: Proc. SIGGRAPH. 1997;97:209–216. [Google Scholar]

[R17] 17.Kingma DP, Adam BJ. A method for stochastic optimization. In: Proc. Int. Conf. Learn. Representations; 2015. [Google Scholar]

[R18] 18.Donoho DL. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Commun Pure Appl Math. 2006;59:797–829. [Google Scholar]

[R19] 19.Yan J, Shen GF, Fang B, et al. Three-dimensional CT measurement for the craniomaxillofacial structure of normal occlusion adults in Jiangsu, Zhejiang and Shanghai area. China J Oral Maxillofac Surg. 2010;8:2–9. [Google Scholar]

[R20] 20.Wang L, Gao Y, Shi F, et al. Automated segmentation of dental CBCT image with prior-guided sequential random forests. Med Phys. 2016;43:336–346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Lorensen WE, Cline HE. Marching cubes: A high resolution 3D surface construction algorithm. ACM SIGGRAPH Comput Graph. 1987;21:163–169. [Google Scholar]

[R22] 22.Wang L, Li X, Chen J, Fang Y. Coherent point drift networks: Unsupervised learning of non-rigid point set registration. 2019. arXiv preprint arXiv:1906.03039. [Google Scholar]

[R23] 23.Abadi M, Barham P, Chen J, et al. Tensorflow: A system for large-scale machine learning. In: Proc. USENIX Symp. Oper. Syst. Design Implement. 2016:265–283. [Google Scholar]

[R24] 24.Schroeder W, Martin K, Lorensen B. The Visualization Toolkit, 4th edn. Kitware Inc.; 2006. [Google Scholar]

PERMALINK

Unsupervised learning of reference bony shapes for orthognathic surgical planning with a surface deformation network

Deqiang Xiao

Hannah Deng

Chunfeng Lian

Tianshu Kuang

Qin Liu

Lei Ma

Yankun Lang

Xu Chen

Daeseung Kim

Jaime Gateno

Steve Guofang Shen

Dinggang Shen

Pew-Thian Yap

James J Xia

Abstract

Purpose:

Methods:

Results:

Conclusions:

1 ∣. INTRODUCTION

FIGURE 1.

2 ∣. MATERIALS AND METHODS

FIGURE 2.

2.1 ∣. SDNet architecture

FIGURE 3.

2.2 ∣. Loss function

FIGURE 4.

2.3 ∣. Network training

FIGURE 5.

2.4 ∣. Inferring patient-specific reference bones

3 ∣. EXPERIMENTS AND RESULTS

3.1 ∣. Experimental data

TABLE 1.

FIGURE 6.

3.2 ∣. Experimental settings

3.3 ∣. Evaluation metrics

3.4 ∣. Evaluation on patient data

FIGURE 7.

TABLE 2.

TABLE 3.

3.5 ∣. Comparison with alternative point cloud networks

FIGURE 8.

FIGURE 9.

FIGURE 10.

3.6 ∣. Ablation studies

FIGURE 11.

FIGURE 12.

FIGURE 13.

3.7 ∣. Computational cost

4 ∣. DISCUSSION AND CONCLUSION

ACKNOWLEDGMENTS

Footnotes

DATA AVAILABILITY STATEMENT

REFERENCES

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases