Abstract
Accurate three-dimensional (3D) craniofacial soft tissue analysis is crucial for diagnosing malocclusion and formulating personalized orthodontic treatment plans. However, the automated localization of 3D landmarks is often hindered by complex anatomy and significant biological variability.To address this challenge, we developed an innovative two-stage attention-based deep learning framework for robust landmark detection and diagnostic classification. Our approach leverages PointTransformerV3(PTv3) as its backbone, augmented by two novel modules: a Geodesic Crop module that isolates the facial region via curvature-aware geodesic masking and a Dynamic Landmark Structure Learning module that incorporates anatomical priors to model spatial interdependencies. This integrated architecture significantly enhances localization precision and structural consistency. We evaluated performance of our approach by using three complementary metrics: mean radial error (MRE), successful detection rate (SDR) across clinically relevant thresholds (2–4 mm), and successful classification rate(SCR) for treatment difficulty. Our framework achieved state-of-the-art results, with an MRE of 2.17 ± 1.54 mm (test set) and 2.19 ± 1.60 mm (validation set), alongside high SDR values meeting clinical tolerances. Notably, the model achieved a 91.74% diagnostic accuracy in classifying orthodontic treatment difficulty, underscoring its strong potential for clinical application. Comparative analyses confirmed significant improvements over existing methods in both landmark precision and diagnostic utility. Overall, these results validate the efficacy of our two-stage framework in automating craniofacial morphology assessment. By synergistically integrating geometric cropping, attention mechanisms, and anatomical constraints, the system offers orthodontists a reliable tool to enhance diagnostic precision, optimize treatment planning, and improve outcomes in patients with malocclusion.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-025-30383-w.
Keywords: Orthodontics, Craniofacial analysis, Diagnostic lmaging, Orthodontic diagnosis
Subject terms: Anatomy, Computational biology and bioinformatics, Engineering, Health care, Mathematics and computing, Medical research
Introduction
Facial soft tissue configuration––not merely skeletal/dental anomalies––is a key driver of patient dissatisfaction and dominates contemporary interdisciplinary research1,2. Facial aesthetics are inherently subjective3. In clinical practice, orthodontists often redefine the concept of “aesthetics” through quantifiable objective analysis standards known as “esthetic evaluation”4. While malocclusion exhibits 60%–79.4% global prevalence5,6, Chinese epidemiological data indicate a significant upward trajectory in pediatric/adolescent cases, escalating from 40% (1960s)7 to 83.5% (2020s)8.
In orthodontic practice, facial soft tissues are routinely assessed via visual inspection9, direct anthropometry10, facial photography11, and cephalometric analysis12. Manual diagnosis of the craniofacial soft tissue is influenced by multiple factors, including the orthodontist’s experience and mental condition, time constraints, variability in facial images, and interclinician subjectivity. These complexities contribute to a high likelihood missed and incorrect diagnoses13,14. Given ethical restrictions on frequent radiographic exposure in pediatric patients, photographic documentation serves as a practical alternative for monitoring craniofacial development during treatment due to its non-invasive nature and operational feasibility15. However, manual analysis of facial photographs remains labor-intensive and time-consuming. Recent advances focus on automating three-dimensional (3D) facial landmark identification and improving measurement accuracy16–18,enabling 3D mesh–based and point cloud–based landmark detection methods19–22, which demonstrate significant potential for automated landmark localization. Despite these innovations, the diagnostic validity of automated analyses still depends heavily on clinician expertise. Orthodontists must rapidly diagnose malocclusions, document them, and determine appropriate treatment plans —a challenge particularly for those early in their careers. Consequently, there is a pressing need to develop AI-driven systems capable of automated landmark annotation to support efficient malocclusion diagnosis and streamlined treatment planning23.
With the emergence of advanced deep learning architectures such as PointNet + +24, graph convolutional networks25,26, and attention mechanisms27,28, significant progress has been made in processing 3D meshes and point clouds. These technologies have been successfully applied to landmark detection in 3D models, yielding notable advancements25,27,29–33. However, research specifically targeting 3D facial landmark detection in complex clinical scenarios remains insufficient, and existing methods still exhibit clear limitations. Most current studies overlook the impact of background noise—such as hair—on feature learning, as well as the anatomical spatial relationships among landmarks, leading to poor adaptability of existing methods, particularly for patients with varying postures. Moreover, the diagnostic support provided by current tools is often limited to simple accuracy metrics, lacking comprehensive analysis. Thus, despite the widespread success of deep learning in many fields, its application in automated orthodontic diagnosis remains inadequate, falling short of clinical requirements for high precision, robustness, and holistic assessment.
Therefore, we developed a two-stage landmark detection framework based on PointTransformerV3(PTv3), which generates initial landmark points during the coarse localization stage. Subsequently, we introduced the Geodesic Crop method, which utilizes geodesic distance to extract anatomically relevant regions of interest (ROIs), thereby reducing interference from background noise. Second, in the fine localization stage, the proposed RefineNet incorporates a core Dynamic Landmark Structure Learning (DLSL) module that dynamically models the geometric constraints among landmarks for each sample. By combining locally precise sampling with global structural priors, this framework significantly enhances the robustness and stability against pose variations. Experimental results demonstrate that our method surpasses current state-of-the-art approaches.
Materials and methods
Inclusion and exclusion criteria
Inclusion criteria
Patients aged 6–18 years in mixed or permanent dentition stages who visited the Early Orthodontic Intervention Clinic of the Affiliated Stomatology Hospital, Zhejiang University School of Medicine between November 2024 and June 2025 were enrolled.
Exclusion criteria
Individuals meeting any of the following conditions were excluded:
(1) Significant head tilt or postural deviation (minor physiological asymmetries were not excluded from the study).
(2) Congenital craniofacial anomalies (e.g., cleft lip/palate).
(3) History of craniomaxillofacial trauma.
(4) Obesity (body mass index [BMI] ≥ 28 kg/m²).
Facial imaging
This study enrolled 521 patients (286 males and 235 females). Raw 3D facial scans were acquired using the DS FSCAN + facial scanner (Shining, Hangzhou, China) and processed through its proprietary software. Data were exported in both Wavefront OBJ format (retaining geometric morphology) and JPEG textures (capturing surface details) to enable integrated shape and texture analyses. The images of human faces presented in Figs. 1, 2 and 3, and 4 are of the first author of this study. Written consent for publication of these identifiable images has been obtained from the individual depicted.
Fig. 1.
Sample image of craniofacial 3D mesh.
Fig. 2.
An overview of the proposed coarse-to-fine framework for 3D facial landmark detection.
Fig. 3.
The architecture of RefineNet. The network employs a U-Net structure and PointTransformerV3 Blocks with a core Dynamic Landmark Structure Learning (DLSL) module guided by coarse landmarks.
Fig. 4.
Characteristic facial profiles associated with malocclusion types. (a) Orthognathic profile showing normative total convexity angle. (b) Convex profile demonstrating increased total convexity angle. (c) Concave profile exhibiting decreased total convexity angle. (d) Brachyfacial type with reduced facial height-mandibular width index. (e) Mesofacial type presenting normative facial height-mandibular width index. (f) Dolichofacial type displaying increased facial height-mandibular width index.
Manual annotation of 3D facial landmarks
Two experienced orthodontists independently annotated all landmarks using the open-source software 3D Slicer (version 5.9.0, www.slicer.org). A total of 54 landmarks were categorized into midline landmarks (n = 16) and bilateral paired landmarks (n = 38) based on craniofacial symmetry criteria (see Supplementary Table S1 for detailed definitions). Each operator performed the annotation twice for all 521 3D facial scans, with a minimum interval of 2 weeks between sessions to minimize recall bias. The ground truth position of each landmark was defined as the mean coordinate derived from the four repeated measurements (two operatorsⅹtwo sessions). Before annotation, both operators underwent standardized training to reach consensus on anatomical definitions and digital marking protocols. All coordinate data were exported and stored in Cartesian format (x, y, z) for subsequent analyses.
Deep learning method
Our method involved a two-stage framework (illustrated in Fig. 2) for precise 3D facial landmark localization, featuring two key innovations. First, after coarse initial localization, we developed a Geodesic Crop technique that strategically defined facial ROIs using geodesic distance to exclude nonfacial regions, such as hair and clothing. Second, our RefineNet incorporated a DLSL module, which used hierarchical attention to simultaneously capture local and global spatial–topological relationships among landmarks, thereby enforcing robust anatomical constraints for final localization.
Data preprocessing
To accommodate our point-cloud–based network architecture, we first converted input 3D facial meshes into dense point clouds. Our point cloud representation comprised an 18-dimensional feature vector per point, consisting of the point’s centroid coordinates (three elements), vertex coordinates from three adjacent vertices (nine elements), face normal vector (three elements), and RGB color values (three elements). This process transformed the continuous mesh surface into a discrete set of points while preserving essential geometric and textural information.
Coarse network (CoarseNet)
The first stage, CoarseNet, rapidly estimated approximate facial landmark locations to guide subsequent refinement. Based on PTv334, it avoided costly k-NN operations by serializing 3D points into a 1D sequence via a space-filling curve. PTv3’s serialized attention divided the sequence into nonoverlapping patches, enabling efficient local attention computation35. Furthermore, CoarseNet processed the full head scan point cloud, generating heatmaps for all landmarks to provide robust initial predictions.
Geodesic crop
To precisely segment the facial region and eliminate interference from irrelevant data such as hair, shoulders, and background noise, we employed geodesic distance to effectively crop the point cloud. This step significantly enhanced processing efficiency and accuracy by focusing the attention of the subsequent refinement network onto the relevant facial surface. The algorithm modeled the point cloud as a k-NN graph and approximated the geodesic distance by computing the shortest path between nodes using Dijkstra’s algorithm.
The cropping boundary was dynamically determined based on the coarse landmarks predicted in the first stage. The algorithm calculated the maximum geodesic distance from a stable central point to all other coarse landmarks. This distance was then multiplied by a safety margin to define the final cropping threshold. All points within this threshold were retained, generating a clean and focused facial point cloud for the refinement stage.
Refinement network and dynamic landmark structure learning
RefineNet performed high-precision localization on the geodesically cropped facial point cloud. Its overall architecture adopted the classic U-Net paradigm36, using PTv3 blocks as its core building units to leverage their powerful capabilities in large-scale point cloud feature extraction and modeling. As illustrated in Fig. 3, at the network’s bottleneck, we innovatively integrated the DLSL module. This module was designed to overcome a limitation in the native PTv3 mechanism, where attention was confined to arbitrary patches defined by spatial serialization. In contrast to this generic feature extraction approach, the DLSL module introduced an explicit, structure-aware learning strategy, dynamically guided by the coarse localization results, to inject critical structural prior knowledge into the network.
The workflow of the DLSL module reflected its fundamental difference from the native PTv3 methodology. First, guided by the coarse landmarks, the module partitioned the high-dimensional features from the encoder into local neighborhood groups for each landmark. Subsequently, an independent intra-group attention mechanism was applied within each group. This attention mechanism allowed for comprehensive feature interaction within the neighborhood, enabling the fine-grained extraction of core geometric patterns for each landmark—a process that contradicts the patch-constrained serialized attention of PTv3. After local feature learning, we used a landmark-guided pooling operation to aggregate the features of each group into a compact representative vector. This was a target-oriented feature distillation, distinct from the generic downsampling used in the PTv3 encoder to reduce spatial resolution. Ultimately, these representative vectors were fed into an inter-landmark attention mechanism to model the global topological relationships between different landmarks. This combination of local and global attention mechanisms collectively formed an efficient hierarchical attention mechanism37–39.
Network training and loss function
To address severe data imbalance in heatmap regression, we used a modified Adaptive Wing Loss40. This loss dynamically adjusted to prediction errors, prioritizing those near true landmark locations via a weighting term proportional to the ground-truth heatmap value
, thereby enhancing localization accuracy. The pointwise loss
was defined as:
![]() |
where
are hyperparameters controlling the shape of the loss curve, and C is a constant ensuring continuity at the transition point θ.
Additionally, we trained CoarseNet and RefineNet separately. Each training iteration involved randomly sampling 50,000 points from the input point cloud for efficiency. To improve generalization and robustness, we applied online data augmentation via random point cloud rotations (± 15° per axis) and RGB channel noise (magnitude ≤ 0.1)41, optimized with Adam42 (lr = 0.01). The training spanned 200 epochs, with the learning rate halved every 30 epochs for stable convergence.
Post-processing
After RefineNet predicted the heatmap, we employed a post-processing refinement called mean squared error (MSE)-over-mesh34 to ensure the final localized landmarks lie precisely on the surface of the original high-density mesh, thereby preventing off-manifold predictions. This method reframed the localization problem as an optimization task on the original mesh point set. The process started by calculating an initial center point from the heatmap predictions using a weighted average. Subsequently, a candidate subset of points was selected from the original mesh within a dynamically determined radius around this center. Finally, for each candidate point, the method computed the MSE between a Gaussian distribution and the network’s predicted heatmap. The final landmark was determined as the original mesh point that minimized this error.
Results
Training curves: mean radial error (MRE) and loss over epochs
To evaluate the model’s training performance and convergence behavior, we tracked and visualized the progression of the loss function (Loss) and MRE on both the training and validation sets. As illustrated in Fig. 5, these performance curves provide clear insights into the model’s learning dynamics, enabling an effective assessment of its generalization performance and the identification of potential issues, such as overfitting.
Fig. 5.
Training Curves: MRE and Loss over Epochs. (a) The Mean Radial Error (MRE) for the training and validation sets. (b) The loss curves for the training and validation sets. (c). An idealized MRE convergence curve for theoretical comparison.
Figure 5A and B depict the actual progression of the MRE and loss values over the epochs, respectively. As demonstrated, both the loss and MRE values exhibit a sharp decline during the initial stages of iteration, which reflects the model’s ability to rapidly learn data features in this phase. Subsequently, the rate of decrease decelerates, and the curves eventually stabilize, signifying that the model’s performance has converged. A key observation is that the validation curve consistently tracks the training curve closely, with no significant divergence emerging between them. This observation strongly indicates that the model possesses excellent generalization ability and has effectively avoided the risk of overfitting. For comparison, Fig. 5C presents an idealized MRE convergence curve, which represents the theoretical optimum for model training. In summary, these curves indicate that our model has stable training, good convergence, and excellent generalization performance.
Evaluation metric
For the quantitative analysis of the performance of the proposed models, we used three different metrics across landmark detection and clinical diagnosis. We computed MRE, successful detection rate (SDR), and successful classification rate (SCR) in the classification task to assess performance metrics.
![]() |
The predicted coordinates of the i-th sample are denoted as (xi,yi,zi), while the corresponding ground-truth coordinates are (xi∗,yi∗,zi∗). Here, n represents the total sample size, and the summation operator ∑ indicates the cumulative relative error across all samples. MRE was derived by scaling the summation by n1, thereby computing the average relative error per sample
![]() |
![]() |
.
where
is the count of landmarks for which the localization error is within the predefined threshold.
is the total count of landmarks being evaluated.
![]() |
where TP (True Positive) represents the correctly predicted positive cases. TN (True Negative) represents the correctly predicted negative cases. FP (False Positive) represents the negative cases incorrectly predicted as positive. FN(False Negative) represents the positive cases incorrectly predicted as negative.
Facial analysis metrics, including angular parameters (e.g., nasolabial angle) and proportional indices (e.g., facial width-to-height ratio), were selected as detection indicators (Table S III and Fig. 4). A classification outcome was deemed successful if the automated measurement matched the manual classification results.
Landmark performance analysis
Our proposed two-stage framework showed strong accuracy and robustness in localizing 16 midsagittal landmarks. The detailed quantitative metrics are presented in Table 1, with the MRE with the MRE on the test set closely aligning with the validation set, suggesting effective generalization (Fig. 6). For landmarks within the core facial region, such as A’, UL, Stmi, and LL, the model achieved remarkably low error and variance, attributable to the efficacy of our proposed DLSL module in effectively constraining both fine-grained local features and the global anatomical structure. This high precision was further corroborated via SDR analysis; for instance, over 95% of predictions for the A’ landmark fell within a 2-mm error margin, that is, fully satisfying clinical application requirements.
Table 1.
MRE&SDR of midline landmarks of our model.
| Landmark Name | MRE(mm) | SD(mm) | SDR(%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Val | Test | |||||||||||
| Val | Test | Val | Test | 1 mm | 2 mm | 3 mm | 4 mm | 1 mm | 2 mm | 3 mm | 4 mm | |
| Tr | 6.76 | 5.00 | 9.84 | 4.50 | 5.77 | 17.31 | 36.54 | 46.15 | 4.95 | 24.75 | 39.60 | 52.48 |
| G’ | 2.67 | 2.38 | 1.35 | 1.39 | 5.77 | 38.46 | 67.31 | 76.92 | 16.83 | 47.52 | 72.28 | 84.16 |
| N’ | 1.70 | 1.58 | 1.26 | 1.29 | 30.77 | 71.15 | 86.54 | 92.31 | 35.64 | 78.22 | 90.10 | 96.04 |
| Prn | 1.41 | 1.30 | 0.86 | 0.84 | 32.69 | 78.85 | 94.23 | 98.08 | 43.56 | 86.14 | 95.05 | 99.01 |
| Cm | 1.24 | 1.12 | 0.91 | 0.96 | 46.15 | 84.62 | 94.23 | 98.08 | 53.47 | 92.08 | 98.02 | 98.02 |
| Sn | 1.01 | 0.94 | 0.59 | 0.53 | 53.85 | 96.15 | 98.08 | 100.00 | 61.39 | 95.05 | 100.00 | 100.00 |
| A’ | 0.97 | 0.80 | 0.49 | 0.45 | 53.85 | 96.15 | 100.00 | 100.00 | 74.26 | 98.02 | 100.00 | 100.00 |
| UL | 1.41 | 1.28 | 0.97 | 0.68 | 38.46 | 82.69 | 94.23 | 98.08 | 40.59 | 84.16 | 98.02 | 100.00 |
| Stms | 1.32 | 1.26 | 0.63 | 0.75 | 38.46 | 80.77 | 100.00 | 100.00 | 42.57 | 83.17 | 96.04 | 100.00 |
| Stmi | 1.35 | 1.27 | 0.77 | 0.75 | 38.46 | 84.62 | 94.23 | 100.00 | 43.56 | 81.19 | 97.03 | 100.00 |
| LL | 1.64 | 1.46 | 0.84 | 0.82 | 23.08 | 76.92 | 96.15 | 96.15 | 30.69 | 76.24 | 92.08 | 100.00 |
| B’ | 1.39 | 1.50 | 0.86 | 0.89 | 38.46 | 76.92 | 94.23 | 100.00 | 28.71 | 71.29 | 96.04 | 98.02 |
| Pog’ | 2.31 | 2.31 | 1.96 | 1.66 | 21.15 | 57.69 | 80.77 | 92.31 | 18.81 | 49.51 | 67.33 | 90.10 |
| Gn’ | 2.53 | 2.47 | 1.79 | 1.77 | 9.62 | 51.92 | 67.31 | 80.77 | 15.84 | 46.53 | 68.32 | 84.16 |
| Me’ | 2.59 | 2.85 | 1.48 | 2.06 | 9.62 | 44.23 | 67.31 | 82.69 | 14.85 | 39.60 | 63.37 | 78.22 |
| C | 4.26 | 3.89 | 4.88 | 2.24 | 13.46 | 28.85 | 51.92 | 59.62 | 2.97 | 22.77 | 41.58 | 60.40 |
Fig. 6.
Automated landmark positioning accuracy of midline landmarks across algorithmic models in test set The results show that our method outperforms all other compared methods in terms of both MRE and SDR across various error tolerances. (a) MRE at individual midline landmarks for each model. (b) SDR distribution of landmark localization in our model. (c) SDR distribution of landmark localization in 1stage model. (d) SDR distribution of landmark localization in 1stage + DLSL model. (e) SDR distribution of landmark localization in 2stage model. (f). SDR distribution of landmark localization in Dgcnn model. (g) SDR distribution of landmark localization in Pointnet + + model. (h) SDR distribution of landmark localization in PointTransformerV3 model. (i) SDR distribution of landmark localization in SGCN model.
Despite the outstanding overall performance, challenges remain in the localization of certain boundary landmarks. Specifically, the Trichion (Tr) landmark exhibited the highest error, primarily attributed to its immediate proximity to the hairline, where it remains susceptible to interference and visual ambiguity from hair, even after the application of our geodesic cropping technique. Likewise, the Cervicale (C) point on the neck demonstrated a comparatively higher error, indicating the smooth and less discriminative local geometric features of this region. These findings highlight the inherent difficulty of achieving high-precision localization in feature-sparse or prone-to-occlusion areas, thereby outlining a clear direction for our future optimization efforts.
Our model demonstrated a high accuracy and robust bilateral consistency in localizing symmetrical landmarks. As detailed in Table 2 and illustrated in Fig. 7, corresponding left–right pairs (e.g., ExL/ExR) show nearly identical MRE and SDR distributions. For feature-rich regions such as the canthi (EnL/R), over 80% of predictions fell within a 2-mm error margin, validating our framework’s efficacy. Conversely, performance degraded for certain landmarks, particularly the Zygion (ZyL/R), situated on smoother surfaces with less discriminative geometric cues. The most significant error was observed at the Gonion (GoL/R), whose performance was impacted not only by this lack of local features but also by its anatomical isolation from the main facial cluster. This spatial distance compromised our proposed DLSL module’s ability to effectively model global topological constraints, resulting in reduced localization accuracy.
Table 2.
MRE&SDR of paired landmarks of our model.
| Landmark Name | MRE(mm) | SD(mm) | SDR(%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Val | Test | |||||||||||
| Val | Test | Val | Test | 1 mm | 2 mm | 3 mm | 4 mm | 1 mm | 2 mm | 3 mm | 4 mm | |
| ExL | 1.63 | 1.61 | 0.96 | 0.96 | 32.69 | 67.31 | 90.38 | 98.08 | 31.68 | 70.30 | 91.09 | 98.02 |
| ExR | 1.66 | 1.76 | 0.99 | 1.44 | 32.69 | 71.15 | 86.54 | 98.08 | 34.65 | 65.35 | 86.14 | 92.08 |
| EnL | 0.92 | 0.95 | 0.63 | 0.69 | 61.54 | 96.15 | 98.08 | 100.00 | 62.38 | 89.11 | 98.02 | 100.00 |
| EnR | 1.15 | 1.18 | 0.93 | 1.77 | 55.77 | 86.54 | 96.15 | 98.08 | 63.37 | 91.09 | 94.06 | 97.03 |
| OsL | 2.92 | 3.11 | 2.01 | 2.13 | 11.54 | 38.46 | 63.46 | 75.00 | 12.87 | 36.63 | 56.44 | 74.26 |
| OsR | 3.25 | 3.17 | 1.82 | 1.95 | 11.54 | 30.77 | 50.00 | 65.38 | 9.90 | 30.69 | 53.47 | 73.27 |
| OrL | 2.87 | 2.78 | 2.77 | 1.52 | 15.38 | 40.38 | 65.38 | 84.62 | 10.89 | 38.61 | 59.41 | 80.20 |
| OrR | 2.91 | 2.70 | 2.73 | 1.42 | 19.23 | 38.46 | 67.31 | 75.00 | 10.89 | 33.66 | 64.36 | 81.19 |
| ChkL | 2.97 | 3.23 | 2.07 | 1.88 | 13.46 | 36.54 | 61.54 | 80.77 | 5.94 | 25.74 | 50.50 | 72.28 |
| ChkR | 3.03 | 3.14 | 2.10 | 1.90 | 7.69 | 44.23 | 55.77 | 75.00 | 7.92 | 33.66 | 53.47 | 69.31 |
| ZyL | 3.79 | 4.02 | 2.21 | 1.96 | 5.77 | 23.08 | 34.62 | 55.77 | 2.97 | 14.85 | 35.64 | 57.43 |
| ZyR | 4.19 | 4.09 | 2.76 | 2.51 | 3.85 | 23.08 | 44.23 | 61.54 | 5.94 | 19.80 | 42.57 | 54.46 |
| AlL | 1.58 | 1.40 | 1.32 | 1.01 | 40.38 | 76.92 | 86.54 | 94.23 | 40.59 | 82.18 | 93.07 | 96.04 |
| AlR | 1.93 | 1.47 | 1.22 | 0.98 | 21.15 | 59.62 | 80.77 | 90.38 | 36.63 | 74.26 | 92.08 | 97.03 |
| AcL | 1.09 | 1.10 | 0.68 | 0.63 | 53.85 | 92.31 | 98.08 | 100.00 | 47.52 | 92.08 | 99.01 | 100.00 |
| AcR | 1.21 | 0.97 | 0.74 | 0.55 | 40.38 | 86.54 | 94.23 | 100.00 | 60.40 | 95.05 | 100.00 | 100.00 |
| ItnL | 1.55 | 1.35 | 1.17 | 0.88 | 32.69 | 76.92 | 90.38 | 96.15 | 36.63 | 83.17 | 95.05 | 99.01 |
| ItnR | 1.52 | 1.32 | 1.02 | 1.07 | 38.46 | 80.77 | 88.46 | 94.23 | 47.52 | 83.17 | 96.04 | 97.03 |
| StnL | 1.20 | 1.19 | 1.18 | 0.95 | 57.69 | 84.62 | 92.31 | 98.08 | 55.45 | 87.13 | 97.03 | 99.01 |
| StnR | 1.31 | 1.22 | 0.91 | 0.83 | 44.23 | 84.62 | 94.23 | 98.08 | 42.57 | 89.11 | 97.03 | 99.01 |
| CphL | 1.42 | 1.69 | 0.72 | 1.33 | 34.62 | 80.77 | 98.08 | 100.00 | 29.70 | 73.27 | 94.06 | 97.03 |
| CphR | 1.59 | 1.60 | 0.97 | 1.63 | 26.92 | 76.92 | 94.23 | 96.15 | 33.66 | 79.21 | 95.05 | 98.02 |
| ChL | 1.20 | 1.51 | 0.69 | 2.34 | 40.38 | 86.54 | 98.08 | 100.00 | 44.55 | 77.23 | 96.04 | 98.02 |
| ChR | 1.23 | 1.27 | 0.66 | 0.84 | 46.15 | 86.54 | 98.08 | 100.00 | 44.55 | 87.13 | 96.04 | 98.02 |
| TL | 1.39 | 1.32 | 1.06 | 0.82 | 48.08 | 78.85 | 90.38 | 98.08 | 41.58 | 81.19 | 95.05 | 100.00 |
| TRR | 1.34 | 1.46 | 0.83 | 0.97 | 46.15 | 86.54 | 94.23 | 98.08 | 35.64 | 82.18 | 91.09 | 97.03 |
| SaL | 1.85 | 2.33 | 0.99 | 1.50 | 19.23 | 57.69 | 90.38 | 94.23 | 13.86 | 47.52 | 75.25 | 90.10 |
| SaR | 2.24 | 3.08 | 1.29 | 3.90 | 23.08 | 44.23 | 71.15 | 96.15 | 16.83 | 45.54 | 70.30 | 85.15 |
| PaL | 3.49 | 2.72 | 2.27 | 1.89 | 7.69 | 28.85 | 51.92 | 67.31 | 12.87 | 48.51 | 66.34 | 81.19 |
| PaR | 3.66 | 3.09 | 2.75 | 2.19 | 7.69 | 34.62 | 50.00 | 69.23 | 14.85 | 35.64 | 58.42 | 71.29 |
| PraL | 2.43 | 2.28 | 1.70 | 1.61 | 17.31 | 46.15 | 78.85 | 86.54 | 20.79 | 51.49 | 78.22 | 87.13 |
| PraR | 2.20 | 2.45 | 1.40 | 1.92 | 19.23 | 53.85 | 73.08 | 90.38 | 18.81 | 50.50 | 74.26 | 82.18 |
| SbaL | 1.88 | 2.15 | 1.10 | 1.48 | 26.92 | 63.46 | 76.92 | 98.08 | 19.80 | 50.50 | 79.21 | 93.07 |
| SbaR | 1.77 | 2.11 | 1.39 | 1.28 | 32.69 | 69.23 | 86.54 | 90.38 | 20.79 | 53.47 | 80.20 | 89.11 |
| PL | 1.63 | 1.52 | 1.09 | 0.91 | 30.77 | 75.00 | 92.31 | 96.15 | 31.68 | 77.23 | 94.06 | 97.03 |
| PR | 1.38 | 1.50 | 0.92 | 1.35 | 42.31 | 76.92 | 96.15 | 98.08 | 42.57 | 81.19 | 92.08 | 95.05 |
| GoL | 5.19 | 5.49 | 3.78 | 3.85 | 5.77 | 15.38 | 28.85 | 50.00 | 3.96 | 12.87 | 27.72 | 40.59 |
| GoR | 5.34 | 6.45 | 3.00 | 4.54 | 1.92 | 11.54 | 23.08 | 36.54 | 1.98 | 6.93 | 19.80 | 33.66 |
Fig. 7.
Automated landmark positioning accuracy of pairer landmarks across algorithmic models in test set. The results show that our method outperforms all other compared methods in terms of both MRE and SDR across various error tolerances. (a) MRE at individual midline landmarks for each model. (b) SDR distribution of landmark localization in our model. (c) SDR distribution of landmark localization in 1stage model. (d) SDR distribution of landmark localization in 1stage + DLSL model. (e) SDR distribution of landmark localization in 2stage model. (f) SDR distribution of landmark localization in Dgcnn model. (g) SDR distribution of landmark localization in Pointnet + + model. (h) SDR distribution of landmark localization in PointTransformerV3 model. (i) SDR distribution of landmark localization in SGCN model.
Comparison of the results with State-of-the-Art approaches
This section provides a quantitative comparison of the core innovations in the proposed framework against several current state-of-the-art methods for 3D facial landmark localization. The comparison included the classic models from the general point cloud analysis domain, PointNet + + and DGCNN, the leading performance model PTv3, and the domain-specific method SGCN. To ensure an absolutely fair comparison, all methods were integrated into our proposed two-stage framework, which shares the same Geodesic Crop preprocessing and subsequent post-processing steps. This controlled variable setup ensures that the final performance differences reflect the distinct feature extraction and localization capabilities of the different network core architectures during the precision localization phase (RefineNet).
The results summarized in Table 3 unequivocally demonstrate the superiority of our proposed network architecture (see Supplementary Table S5-S8 for a detailed each landmark comparison). Our method consistently outperformed all competing models across every metric. Notably, our approach achieved a state-of-the-art MRE of 2.1701 and a leading SDR of 87.24% at the < 4 mm threshold. The figures validate the accuracy and reliability of our model’s design, confirming its capabilities in high-precision landmark detection.
Table 3.
MRE&SDR of different model comparsion.
| Group | Datasets | MRE(mm) | SD(mm) | SDR(%) | |||
|---|---|---|---|---|---|---|---|
| 1 mm | 2 mm | 3 mm | 4 mm | ||||
| PointNet++ | Val | 2.48 | 1.64 | 17.80 | 50.43 | 75.04 | 86.54 |
| Test | 2.66 | 1.93 | 16.13 | 47.18 | 70.24 | 83.54 | |
| DGCNN | Val | 2.48 | 1.87 | 24.25 | 57.30 | 74.75 | 84.69 |
| Test | 2.47 | 1.58 | 25.08 | 56.99 | 74.06 | 83.66 | |
| PointTransformerV3 | Val | 2.31 | 1.92 | 25.36 | 60.15 | 77.78 | 87.11 |
| Test | 2.37 | 2.04 | 30.31 | 61.88 | 76.48 | 85.33 | |
| SGCN | Val | 2.40 | 1.83 | 26.89 | 59.94 | 76.46 | 85.76 |
| Test | 2.35 | 1.56 | 28.71 | 60.67 | 75.85 | 84.69 | |
| Ours | Val | 2.19 | 1.60 | 28.45 | 62.57 | 78.77 | 87.50 |
| Test | 2.17 | 1.54 | 29.70 | 62.10 | 78.75 | 87.24 | |
Figure 8 illustrates the SDR performance comparison of our model against various other methods at four different error thresholds. Notably, our method consistently outperformed all other models across all four evaluation thresholds. Furthermore, under the stricter precision requirements of < 1 mm and < 2 mm, the performance advantage was especially significant, highlighting our model’s excellent precision. Our method clearly shows robustness and accuracy in 3D facial landmark detection across all curves.
Fig. 8.
Comparison of the SDR for different models at various precision thresholds. The plot shows the SDR (%) performance of our method against seven other baseline models under error tolerances of 1 mm, 2 mm, 3 mm, and 4 mm.
Confusion matrix analysis
The results of the model’s confusion matrix (Fig. 9) indicated a robust overall accuracy of 91.34%. The detailed confusion matrices for each classification metric are provided in Supplementary Fig. S3 and Supplementary Fig. S4. This revealed that while the model effectively distinguishes between the non-adjacent “small” and “large” classes, the majority of misclassifications occur between the” “normal” class and its adjacent categories. This suggests the model’s primary challenge is in precisely defining the classification boundaries of the “normal” class.illustrates the SDR performance comparison of our model against various other methods at four different error thresholds. Notably, our method consistently outperformed all other models across all four evaluation thresholds. Furthermore, under the stricter precision requirements of < 1 mm and < 2 mm, the performance advantage was especially significant, highlighting our model’s excellent precision. Our method clearly shows robustness and accuracy in 3D facial landmark detection across all curves.
Fig. 9.

The average confusion matrix summarizing the classification performance across multiple diagnostic indicators on the combined validation and test sets.
Effectiveness of geodesic crop
The test results clearly demonstrated the positive impact of geodesic cropping on the accuracy of 3D facial landmark localization (Table 4). This model achieved an MRE of 2.358 mm on the test set, indicating that it effectively reduces interference from irrelevant background elements and allows the model to focus more precisely on key facial features. While the improvement was particularly notable at the 4-mm precision level, where the SDR surged from 78.548% to 86.157%, significant gains were also observed at stricter precision levels, indicating that the method not only increases the overall number of successful localizations but also enhances the model’s high-precision performance. These comprehensive findings validate that geodesic cropping is a highly effective, foundational strategy for improving the accuracy and reliability of landmark localization.
Table 4.
MRE&SDR of different ablation study.
| Group | Geodsic crop | Structure learning | Datasets | MRE(mm) | SD(mm) | SDR(%) | |||
|---|---|---|---|---|---|---|---|---|---|
| 1 mm | 2 mm | 3 mm | 4 mm | ||||||
| ours | √ | √ | Val | 2.19 | 1.60 | 28.45 | 62.57 | 78.77 | 87.50 |
| Test | 2.17 | 1.54 | 29.70 | 62.10 | 78.75 | 87.24 | |||
| 1stage | × | × | Val | 2.94 | 2.57 | 17.41 | 46.62 | 67.38 | 79.52 |
| Test | 2.91 | 2.61 | 19.38 | 47.76 | 67.27 | 78.55 | |||
| 1stage + DLSL | × | √ | Val | 2.71 | 2.21 | 21.01 | 52.49 | 70.37 | 82.16 |
| Test | 2.71 | 2.28 | 22.13 | 51.41 | 69.99 | 81.81 | |||
| 2stage | √ | × | Val | 2.30 | 1.93 | 25.36 | 60.22 | 77.53 | 87.36 |
| Test | 2.36 | 2.24 | 27.36 | 59.86 | 77.48 | 86.16 | |||
Effectiveness of the DLSL module
The introduction of the DLSL module further optimized landmark localization precision (Table 4). First, when applied to the baseline model without geodesic cropping, the DLSL module reduced the test set MRE from 2.936 to 2.712 mm, demonstrating its standalone value in modeling spatial relationships. Second, its role as a powerful refinement tool becomes evident when used in conjunction with geodesic cropping. The test set data revealed that this combined model achieved a final MRE of 2.171 mm, outperforming the 2.358 mm of the model with only geodesic cropping and indicating a clear synergistic effect in which geodesic cropping first provides a clean, focused facial area, after which the DLSL module effectively models the subtle spatial relationships between landmarks to achieve finer adjustments. Moreover, the DLSL module’s contribution reflected in the SDR metric, where the combined model reached 78.749% and 87.238% at 3 and 4 mm precision levels, respectively, compared to 77.484% and 86.157% for the model without DLSL. These findings indicate that the DLSL module is a critical component for refinement, significantly boosting overall performance, as detailed for each landmark in Supplementary Tables S2-S4.
Discussion
Innovations in orthodontic diagnosis have primarily focused on identifying skeletal and soft tissue landmarks43–45. Most research emphasizes processing two-dimensional (2D) data, such as facial photographs46–49 and cephalometric radiographs50–54, while 3D data, especially from CBCT55–58, is targeted but lacks comprehensive studies on 3D facial surface scans. While CBCT offers essential hard-tissue (skeletal) information as our previous study showed59, its use of ionizing radiation remains a concern60. In contrast, 3D facial surface scans provide a non-invasive and radiation-free modality for high-fidelity soft-tissue analysis. This capability is particularly crucial for aesthetic-driven orthodontic diagnosis and for the frequent monitoring of pediatric patients. Consequently, these two modalities are best viewed as complementary rather than competitive. Existing algorithms improve orthodontic diagnosis and reduce workloads, but they do not provide a direct approach for diagnosis, limiting their application. With the shift to 3D examinations, traditional 2D algorithms are inadequate61. To address these challenges, we developed a new algorithm that integrates based on 3D facial landmarks, offering more accurate measurements and preliminary diagnoses.
The two-stage framework proposed here establishes a new state-of-the-art in the task of 3D facial landmark localization. Experimental results demonstrate that our method exhibits comprehensive superiority across all key evaluation metrics. Our model achieves the lowest MRE of 2.1701, a significant reduction of approximately 7.8% compared with that of the next-best method, SGCN (2.3529). Concurrently, the lowest SD of 1.5360 attests to the high stability and consistency of our model’s predictions. Our method also leads comprehensively in SDR, crucial for clinical and practical applications, especially evident in the high-precision < 3 mm range, where our model reaches 78.75%, surpassing the second-place PTv334 by over 2% points and proving its reliability in landmark detection tasks.
RefineNet, particularly the innovative DLSL module, outperforms others, even when all competing methods share the same preprocessing and post-processing pipelines, providing strong evidence for the superiority of its core network architecture. Relative to general-purpose point cloud architectures24,28,34, our method moves beyond their powerful yet “structurally-agnostic” attention mechanisms. Although drawing inspiration from recent advancements in structure learning62–64, our DLSL module explicitly injects prior knowledge of facial anatomical structures into the localization task. Furthermore, our core advantage over other domain-specific methods like SGCN25 lies in its “dynamic” nature. SGCN relies on a static “average face” template to impose global constraints, which limits its adaptability to samples with varying poses and expressions. Conversely, our DLSL module dynamically generates a unique topological structure for each sample based on its coarse localization results, thereby providing far greater individual adaptability and robustness. This task-oriented, dynamic structural modeling ultimately allows our method to excel in both precision and generalization.
The overall localization accuracy of craniofacial landmarks in 3D facial scans was clinically acceptable (mean error < 2.5 mm), with most landmarks exhibiting mean positioning errors within an acceptable range. However, specific landmarks demonstrated significant inaccuracies: Tr was imprecisely identified due to hair occlusion obscuring the frontal hairline boundary, while mental landmarks (C′, Pog, Me) showed reduced accuracy in subjects with skeletal Class II malocclusion and vertical growth patterns, attributable to mandibular retrusion and no prominent bony protrusions complicating soft-tissue identification. Additionally, non-relaxed facial musculature during scanning induced atypical soft-tissue morphology, further increasing localization uncertainty in the mental region. Despite excluding individuals with elevated BMI, variations in soft-tissue thickness, particularly adipose accumulation in the neck and cheeks, adversely affected landmarks such as ZyL/R and GoL/R, confirmed through regression analysis (p < 0.05).
Quantitative validation revealed an MRE of 2.19 ± 1.60 mm for the validation set and 2.17 ± 1.54 mm for the test set. The SDR at incremental tolerance thresholds (1–4 mm) was 28.45%, 62.57%, 78.77%, and 87.50% for the validation set, and 29.70%, 62.10%, 78.75%, and 87.23% for the test set. Landmark-specific errors indicated the lowest MRE for bilateral endocanthion (EnR&L) due to well-defined anatomical boundaries, whereas GoL/R exhibited the highest MRE resulting from soft-tissue coverage obscuring bony landmarks. Among midline landmarks, nasolabial points (e.g., subnasale Sn, labrale superius A′) showed minimal MRE, while C and Tr demonstrated maximal MRE. Comprehensive distributions of MRE and SDR across all landmarks are illustrated in Figs. 6 and 7 for the test set and in Supplementary Fig. Sl and Supplementary Fig. S2 for the validation set. The SCR results for both sets are detailed in Table 5.
Table 5.
SCR(%) of our model in test set and validation set.
| Measurement | Definition | SCR of Test Set | SCR of Validation Set |
|---|---|---|---|
| Mentocervical angle | ∠C-Me’/G’-Pog’ | 89.11 | 88.46 |
| Angle of facial convexity | ∠G’-Sn-Pog’ | 85.15 | 78.85 |
| Angle of total facial convexity | ∠G’-_Prn-Pog’ | 98.02 | 100.00 |
| Angle of the medium facial third | ∠N’-T-Sn | 82.18 | 88.46 |
| Angle of the inferior facial third | ∠Sn-T-Me’ | 89.11 | 84.62 |
| Nasofrontal angle | ∠G’-N’-_Prn | 92.08 | 92.31 |
| Nasal tip angle | ∠N’-_Prn/Cm-Sn | 86.14 | 78.85 |
| Nasolabial angle | ∠Cm-Sn-UL | 86.14 | 80.77 |
| Nasofacial angle | ∠G’-Pog’/N’-_Prn | 85.15 | 76.92 |
| Nasomental angle | ∠N’-_Prn-Pog’ | 90.10 | 82.69 |
| Chin-face height index | (B’-Me’)/(N’-Me’)×100 | 96.04 | 92.31 |
| Chin-mandible height index | (B’-Me’)/(Stmi-Me’)×100 | 92.08 | 86.54 |
| Lower vermilion contour index | (ChL-ChR)/(ChL-LL-ChR)×100 | 90.10 | 86.54 |
| Mouth width contour index | (ChL-ChR)/(ChL-Stmi + ChR-Stmi)×100 | 98.02 | 100.00 |
| Upper vermilion contour index | (ChL-ChR)/(ChL-UL-ChR)×100 | 100.00 | 100.00 |
| Vermilion arch index | (ChL-LL-ChR)/(ChL-UL-ChR)×100 | 94.06 | 94.23 |
| Intercanthal-nasal width index | (EnL-EnR)/(AlL-AlR)×100 | 88.12 | 92.31 |
| Intercanthal-mouth width index | (EnL-EnR)/(ChL-ChR)×100 | 85.15 | 92.31 |
| Intercanthal width-upper face height index | (EnL-EnR)/(N’-Stmi)×100 | 90.10 | 90.38 |
| Biocular-face width index | (ExL-ExR)/(ZyL-ZyR)×100 | 100.00 | 100.00 |
| Mandible-face width index | (GoL-GoR)/(ZyL-ZyR)×100 | 99.01 | 100.00 |
| Facial index | (N’-Me’)/(ZyL-ZyR)×100 | 98.02 | 100.00 |
| Nose-face height index | (N’-Sn)/(N’-Me’)×100 | 85.15 | 82.69 |
| Nose-face height index | (N’-Sn)/(Sn-Me’)×100 | 87.13 | 86.54 |
| Nose height-face width index | (N’-Sn)/(ZyL-ZyR)×100 | 90.10 | 88.46 |
| Upper face height-upper third face depth index | (N’-Stmi)/(T-N’)×100 | 85.15 | 94.23 |
| Upper face index | (N’-Stmi)/(ZyL-ZyR)×100 | 95.05 | 98.08 |
| Upper face-face height index | (N’-Stms)/(N’-Me’)×100 | 92.08 | 86.27 |
| Nostril width-nose height index | (SbaL-Sn)/(N’-Sn)×100 | 100.00 | 100.00 |
| Nostril-nose width index | (SbaL-Sn + SbaR-Sn)/(AlL-AlR)×100 | 100.00 | 100.00 |
| Nasal tip protrusion-nostril floor width index | (Sn-_Prn)/(SbaL-Sn + SbaR-Sn)×100 | 100.00 | 100.00 |
| Lower face-face height index | (Sn-Me’)/(N’-Me’)×100 | 96.04 | 100.00 |
| Upper lip height-mouth width index | (Sn-Stmi)/(ChL-ChR)×100 | 89.11 | 82.69 |
| Upper lip-mandible height index | (Sn-Stms)/(Stmi-Me’)×100 | 85.15 | 80.39 |
| Lower lip-chin height index | (Stmi-B’)/(B’-Me’)×100 | 90.10 | 84.62 |
| Lower lip-mandible height index | (Stmi-B’)/(Stmi-Me’)×100 | 86.14 | 82.69 |
| Mandibulo-face height index | (Stmi-Me’)/(N’-Me’)×100 | 93.07 | 94.23 |
| Mandible height-lower third face depth index | (Stmi-Me’)/(T-Me’)×100 | 98.02 | 100.00 |
| Upper cheek-upper third face depth index | (T-ExR)/(T-N’)×100 | 100.00 | 100.00 |
Despite its contributions, this study has inherent limitations. Firstly, since most of the orthodontic data are captured from the face, we cannot avoid the missing or incomplete data processing caused by occlusion, such as hair. Therefore, integrating depth-sensing modules or generative inpainting models65,66 to solve this problem is necessary, while simultaneously the algorithm is always trained by data from only Asians, causing its biases in processing the facial landmarks of other races. Finally, the training data of the algorithm is also slightly insufficient, and the augmentation with synthetic facial scanning is recommended to mitigate overfitting67,68.
Conclusions
In summary, we developed a real-time 3D facial scanning system using an attention-driven convolutional neural network, detecting 54 craniofacial landmarks in under 10 s. This system effectively balances accuracy and speed for orthodontic and maxillofacial applications. It features a novel cascaded local refinement module that significantly reduces mean localization error by 33.7% compared to one-stage frameworks, emphasizing the effectiveness of the multi-stage approach in enhancing precision while maintaining real-time performance.
Supplementary Information
Below is the link to the electronic supplementary material.
Author contributions
Mouyuan Sun, Xiangtao Liu and Fuli Wu conceptualized the study. Tao Qiu and Chaoran Hu executed the project. Tao Qiu, and Jingyu Zhang built the data sets. Tao Qiu and Mouyuan Sun wrote the final manuscript and created all the images used in the manuscript. Fuli Wu advised on method improvement. Xiangtao iu, Huiming Wang, and Mouyuan Sun supervised the entire project.
Funding
This work was supported by the National Natural Science Foundation of China (82501180), Zhejiang Provincial Natural Science Foundation of China (LQN25H140004), Zhejiang Province Traditional Chinese Medicine Science and Technology Plan Project (2026ZL0078), China Postdoctoral Science Foundation (2024T170782, 2023M743009), China Oral Health Foundation (C2024-002), and Zhejiang Provincial Medical and Health Science and Technology Plan (2025KY942).
Data availability
The datasets generated and analysed during the current study are not publicly available due to patient privacy and ethical restrictions imposed by the Human Research Ethics Committee of the Stomatology Hospital of Zhejiang University School of Medicine. The sequencing data generated in this study are available from the corresponding author upon request.
Code availability
Code for PointNet++, DGCNN, PointTransformerV3, and SGCN can be accessed from the following links: https://github.com/charlesq34/pointnet2, https://github.com/WangYueFt/dgcnn, https://github.com/Pointcept/PointTransformerV3 and https://github.com/gfacchi-dev/CVIU-2S-SGCN.
Declarations
Competing interests
The authors declare no competing interests.
Ethics declarations
The Human Research Ethics Committee of the Stomatology Hospital of Zhejiang University School of Medicine, reviewed and approved the research involving human participants. This study collected samples from patients during 2024–2025. Written informed consent for participation in the study was obtained from all participants and/or their parent(s)/legal guardian(s). All procedures were conducted according to the Helsinki Declaration’s outlined guidelines. The ethics number was No: 2025-066.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally to this work: Tao Qiu and Chaoran Hu.
Contributor Information
Fuli Wu, Email: fuliwu@zjut.edu.cn.
Xiangtao Liu, Email: hzmrlxt@163.com.
Mouyuan Sun, Email: sunmouyuan777@zju.edu.cn.
References
- 1.Liu, C. et al. Impact of orthodontic-induced facial morphology changes on aesthetic evaluation: a retrospective study. BMC Oral Health. 24, 24. 10.1186/s12903-023-03776-4 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kouskoura, T., Ochsner, T., Verna, C., Pandis, N. & Kanavakis, G. The effect of orthodontic treatment on facial attractiveness: a systematic review and meta-analysis. Eur. J. Orthod.44, 636–649. 10.1093/ejo/cjac034 (2022). [DOI] [PubMed] [Google Scholar]
- 3.Zhang, Y., Wang, X., Xu, X., Feng, S. & Xia, L. The use of Eye-Tracking technology in Dento-Maxillofacial esthetics: A systematic review. J. Craniofac. Surg.35, e329–e333. 10.1097/scs.0000000000010008 (2024). [DOI] [PubMed] [Google Scholar]
- 4.Cai, J., Min, Z., Deng, Y., Jing, D. & Zhao, Z. Assessing the impact of occlusal plane rotation on facial aesthetics in orthodontic treatment: a machine learning approach. BMC Oral Health. 24, 30. 10.1186/s12903-023-03817-y (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Chen, H., Lin, L., Chen, J. & Huang, F. Prevalence of malocclusion traits in primary Dentition, 2010–2024: A systematic review. Healthcare12, 1321 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Lombardo, G. et al. Worldwide prevalence of malocclusion in the different stages of dentition: A systematic review and meta-analysis. Eur. J. Paediatr. Dent.21, 115–122. 10.23804/ejpd.2020.21.02.05 (2020). [DOI] [PubMed] [Google Scholar]
- 7.Yu, X. et al. Prevalence of malocclusion and occlusal traits in the early mixed dentition in Shanghai. China PeerJ. 7, e6630. 10.7717/peerj.6630 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yin, J. et al. Prevalence and influencing factors of malocclusion in adolescents in Shanghai, China. BMC Oral Health. 23, 590. 10.1186/s12903-023-03187-5 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ntovas, P. et al. The impact of dental midline angulation towards the facial flow curve on the esthetics of an asymmetric face: perspective of laypeople and orthodontists. J. Esthet Restor. Dent.36, 778–784. 10.1111/jerd.13177 (2024). [DOI] [PubMed] [Google Scholar]
- 10.Kukharev, G. A. & Kaziyeva, N. Digital facial anthropometry: application and implementation. Pattern Recognit. Image Anal.30, 496–511. 10.1134/s1054661820030141 (2020). [Google Scholar]
- 11.Dingemans, A. J. M. et al. PhenoScore quantifies phenotypic variation for rare genetic diseases by combining facial analysis with other clinical features using a machine-learning framework. Nat. Genet.55, 1598–1607. 10.1038/s41588-023-01469-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Jafargholkhanloo, A. F. & Shamsi, M. Cephalometry analysis of facial soft tissue based on two orthogonal views applicable for facial plastic surgeries. Multimedia Tools Appl.82, 30643–30668. 10.1007/s11042-023-14531-w (2023). [Google Scholar]
- 13.Putrino, A., Caputo, M., Galeotti, A., Marinelli, E. & Zaami, S. Type I dentin dysplasia: the literature review and case report of a family affected by misrecognition and late diagnosis. Med. (Kaunas). 59. 10.3390/medicina59081477 (2023). [DOI] [PMC free article] [PubMed]
- 14.Bin Ahmad, M. Z. et al. A case of linear scleroderma in a young child: A diagnosis easily missed in primary care. Am. J. Case Rep.24, e940148. 10.12659/ajcr.940148 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rossi, O. et al. How to Establish the baseline for Non-Invasive technological regenerative esthetic medicine in the face and neck region: A literature review. J. Pers. Med.1510.3390/jpm15070283 (2025). [DOI] [PMC free article] [PubMed]
- 16.Al-baker, B., Alkalaly, A., Ayoub, A., Ju, X. & Mossey, P. Accuracy and reliability of automated three-dimensional facial landmarking in medical and biological studies. A systematic review. Eur. J. Orthod.45, 382–395. 10.1093/ejo/cjac077 (2023). [DOI] [PubMed] [Google Scholar]
- 17.Serafin, M. et al. Accuracy of automated 3D cephalometric landmarks by deep learning algorithms: systematic review and meta-analysis. Radiol. Med.128, 544–555. 10.1007/s11547-023-01629-2 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Navarro-Ballester, A. Artificial intelligence-driven radiological biomarkers: A narrative review of artificial intelligence in meningioma diagnosis. NeuroMarkers2, 100033. 10.1016/j.neumar.2024.100033 (2025). https://doi.org/https://doi.org/ [Google Scholar]
- 19.Dogra, S., Kouznetsova, V. L., Kesari, S. & Tsigelny, I. F. Development of a miRNA-based deep learning model for autism spectrum disorder diagnosis. Adv. Technol. Neurosci.2, 72–76. 10.4103/atn.Atn-d-24-00033 (2025). [Google Scholar]
- 20.Martinez, B. & Peplow, P. V. MicroRNAs as potential biomarkers for diagnosis of major depressive disorder and influence of antidepressant treatment. NeuroMarkers1, 100001. 10.1016/j.neumar.2024.100001 (2024). [Google Scholar]
- 21.Yang, X., Xiong, T. & Li, S. Role of long noncoding RNAs in angiogenesis-related cerebrovascular disorders and regenerative medicine: a narrative review. Regenerative Med. Rep.1, 156–171. 10.4103/regenmed.Regenmed-d-24-00007 (2024). [Google Scholar]
- 22.Gao, Y., Mu, J., Liu, K. & Wang, M. Integrating molecular fingerprints with machine learning for accurate neurotoxicity prediction: an observational study. Adv. Technol. Neurosci.2, 109–115. 10.4103/atn.Atn-d-24-00034 (2025). [Google Scholar]
- 23.Aborode, A. T. et al. The role of machine learning in discovering biomarkers and predicting treatment strategies for neurodegenerative diseases: A narrative review. NeuroMarkers2, 100034. 10.1016/j.neumar.2024.100034 (2025). [Google Scholar]
- 24.Qi, C. R., Yi, L., Su, H. & Guibas, L. J. in Proceedings of the 31st International Conference on Neural Information Processing Systems 5105–5114. (Curran Associates Inc., Long Beach, California, USA, 2017).
- 25.Burger, J., Blandano, G., Facchi, G. M. & Lanzarotti, R. 2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data. Comput. Vis. Image Underst.250, 104227. 10.1016/j.cviu.2024.104227 (2025). [Google Scholar]
- 26.Wang, Y. et al. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 38, Article 146 (10.1145/3326362 (2019).
- 27.Lang, Y. et al. 444–452 (Springer Nature Switzerland).
- 28.Zhao, H., Jiang, L., Jia, J., Torr, P. & Koltun, V. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 16239–16248.
- 29.Berends, B. et al. Fully automated landmarking and facial segmentation on 3D photographs. Sci. Rep.14, 6463. 10.1038/s41598-024-56956-9 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.He, Z. et al. in 2024 IEEE International Symposium on Biomedical Imaging (ISBI). 1–5. [DOI] [PubMed]
- 31.Wang, Y., Cao, M., Fan, Z. & Peng, S. in Proceedings of the AAAI conference on artificial intelligence. 2595–2603.
- 32.Paulsen, R. R. et al. 706–719 (Springer International Publishing).
- 33.Lang, Y. et al. 478–487 (Springer International Publishing).
- 34.Wu, X. et al. in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4840–4851.
- 35.Sun, M. et al. Cuproptosis-related LncRNA JPX regulates malignant cell behavior and epithelial-immune interaction in head and neck squamous cell carcinoma via miR-193b-3p/PLAU axis. Int. J. Oral Sci.16, 63. 10.1038/s41368-024-00314-y (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ronneberger, O., Fischer, P. & Brox, T. 234–241 (Springer International Publishing).
- 37.Wu, L. et al. A hierarchical attention model for social contextual image recommendation. IEEE Trans. Knowl. Data Eng.32, 1854–1867. 10.1109/TKDE.2019.2913394 (2020). [Google Scholar]
- 38.Wang, W., Chen, Z. & Hu, H. in Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence Vol. 33 Article 1099AAAI Press, Honolulu, Hawaii, USA, (2019).
- 39.Liu, Y. et al. Vision Transformers with hierarchical attention. Mach. Intell. Res.21, 670–683. 10.1007/s11633-024-1393-8 (2024). [Google Scholar]
- 40.Wang, X., Bo, L. & Fuxin, L. in 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 6970–6980.
- 41.Shorten, C. & Khoshgoftaar, T. M. A survey on image data augmentation for deep learning. J. Big Data. 6, 60. 10.1186/s40537-019-0197-0 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Kingma, D. P., Ba, J. & Adam A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014).
- 43.Baksi, S., Freezer, S., Matsumoto, T. & Dreyer, C. Accuracy of an automated method of 3D soft tissue landmark detection. Eur. J. Orthod.43, 622–630. 10.1093/ejo/cjaa069 (2020). [DOI] [PubMed] [Google Scholar]
- 44.Jiang, Y. et al. Automatic identification of hard and soft tissue landmarks in cone-beam computed tomography via deep learning with diversity datasets: a methodological study. BMC Oral Health. 25, 505. 10.1186/s12903-025-05831-8 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Lee, Y. et al. Three-dimensional soft tissue landmark detection with marching cube algorithm. Sci. Rep.13, 1544. 10.1038/s41598-023-28792-w (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Zhou, Z. et al. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 15475–15484.
- 47.Xia, J. et al. in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4042–4051.
- 48.Micaelli, P., Vahdat, A., Yin, H., Kautz, J. & Molchanov, P. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22814–22825.
- 49.Huang, Y., Yang, H., Li, C., Kim, J. & Wei, F. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 3060–3070.
- 50.Zhu, H., Quan, Q., Yao, Q., Liu, Z. & Zhou, S. K. 24–34 (Springer Nature Switzerland).
- 51.Wu, H. et al. 155–165 (Springer Nature Switzerland).
- 52.Kim, Y. H., Lee, C., Ha, E. G., Choi, Y. J. & Han, S. S. A fully deep learning model for the automatic identification of cephalometric landmarks. Imaging Sci. Dent.51, 299–306. 10.5624/isd.20210077 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Jiao, Z. et al. Deep learning for automatic detection of cephalometric landmarks on lateral cephalometric radiographs using the mask Region-based convolutional neural network: a pilot study. Oral Surg. Oral Med. Oral Pathol. Oral Radiol.137, 554–562. 10.1016/j.oooo.2024.02.003 (2024). [DOI] [PubMed] [Google Scholar]
- 54.Dai, C. et al. 93–109 (Springer Nature Switzerland).
- 55.Chen, X. et al. Fast and accurate craniomaxillofacial landmark detection via 3D faster R-CNN. IEEE Trans. Med. Imaging. 40, 3867–3878. 10.1109/TMI.2021.3099509 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Jiang, Y. et al. 227–237 (Springer Nature Switzerland).
- 57.Lu, G. et al. CMF-Net: craniomaxillofacial landmark localization on CBCT images using geometric constraint and transformer. Phys. Med. Biol.68, 095020. 10.1088/1361-6560/acb483 (2023). [DOI] [PubMed] [Google Scholar]
- 58.Charles, R. Q., Su, H., Kaichun, M. & Guibas, L. J. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 77–85.
- 59.Sun, M. et al. Development and validation of a clinically applicable nomogram for predicting inferior alveolar nerve injury after mandibular cystectomy. Journal Oral Maxillofacial Surgery10.1016/j.joms.2025.09.019 [DOI] [PubMed]
- 60.Henn, A. D. et al. Smart biomanufacturing for health equity in regenerative medicine therapies. Regenerative Med. Rep.2, 31–35. 10.4103/regenmed.Regenmed-d-24-00021 (2025). [Google Scholar]
- 61.He, D. et al. Multi-omics and machine learning-driven CD8 + T cell heterogeneity score for head and neck squamous cell carcinoma. Mol. Therapy Nucleic Acids. 36, 102413. 10.1016/j.omtn.2024.102413 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Chen, R. et al. Structure-Aware long Short-Term memory network for 3D cephalometric landmark detection. IEEE Trans. Med. Imaging. 41, 1791–1801. 10.1109/TMI.2022.3149281 (2022). [DOI] [PubMed] [Google Scholar]
- 63.Lang, Y. et al. Automatic localization of landmarks in craniomaxillofacial CBCT images using a local Attention-Based graph Convolution network. Med. Image Comput. Comput. Assist. Interv. 12264, 817–826. 10.1007/978-3-030-59719-1_79 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Li, W. et al. 266–283 (Springer International Publishing).
- 65.Yu, X. et al. in 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 12478–12487.
- 66.Zhuang, Z. et al. A survey of point cloud completion. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens.17, 5691–5711. 10.1109/JSTARS.2024.3362476 (2024). [Google Scholar]
- 67.Boutros, F., Huber, M., Siebke, P. & Rieber, T. & Damer, N. in 2022 IEEE International Joint Conference on Biometrics (IJCB). 1–11.
- 68.Boutros, F., Struc, V., Fierrez, J. & Damer, N. Synthetic data for face recognition: current state and future prospects. Image Vis. Comput.135, 104688. 10.1016/j.imavis.2023.104688 (2023). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The datasets generated and analysed during the current study are not publicly available due to patient privacy and ethical restrictions imposed by the Human Research Ethics Committee of the Stomatology Hospital of Zhejiang University School of Medicine. The sequencing data generated in this study are available from the corresponding author upon request.
Code for PointNet++, DGCNN, PointTransformerV3, and SGCN can be accessed from the following links: https://github.com/charlesq34/pointnet2, https://github.com/WangYueFt/dgcnn, https://github.com/Pointcept/PointTransformerV3 and https://github.com/gfacchi-dev/CVIU-2S-SGCN.













