Table 1.
Author (Year) | Data Type | Dataset Size (Training/Test) |
No. of Landmarks/Measurements | Algorithm | Performance |
---|---|---|---|---|---|
Payer et al. (2019) [28] | Lateral cephalograms | 150/250 | 19/0 | CNN | Error radii: 26.67% (2 mm), 21.24% (2.5 mm), 16.76% (3 mm), and 10.25% (4 mm). |
Nishimoto et al. (2019) [29] | Lateral cephalograms | 153/66 | 10/12 | CNN | Average prediction errors: 17.02 pixels. Median prediction errors: 16.22 pixels. |
Zhong et al. (2019) [30] | Lateral cephalograms | 150/100 (additional 150 images than validation set). |
19/0 | U-Net | Test 1: MRE: 1.12 ± 0.88 mm. SDR within 2, 2.5, 3, and 4 mm: 86.91%, 91.82%, 94.88%, and 97.90%, respectively. Test 2: MRE: 1.42 ± 0.84 mm. SDR within 2, 2.5, 3, and 4 mm: 76.00%, 82.90%, 88.74%, and 94.32%, respectively. |
Park et al. (2019) [31] | Lateral cephalograms | 1028/283 | 80/0 | YOLOv3, SSD | YOLOv3 demonstrated overall superiority over SSD in terms of accuracy and computational performance. For YOLOv3, SDR within 2, 2.5, 3, and 4 mm: 80.40%, 87.4%, 92.00%, and 96.2%, respectively. |
Moon et al. (2020) [32] | Lateral cephalograms | Training: 50, 100, 200, 400, 800, 1200, 1600, 2000. Test: 200. |
19, 40, 80 | CNN (YOLOv3) | The accuracy of AI is positively correlated with the number of training datasets and negatively correlated with the number of detection targets. |
Hwang et al. (2020) [33] | Lateral cephalograms | 1028/283 | A total of 80 | CNN (YOLOv3) | Mean detection error: 1.46 ± 2.97 mm. |
Oh et al. (2020) [34] | Lateral cephalograms | 150/100 (additional 150 images than validation set). |
19/8 | CNN (DACFL) | MRE: 14.55 ± 8.22 pixel. SDR within 2, 2.5, 3, and 4 mm: 75.9%, 83.4%, 89.3%, and 94.7%, respectively. Classification accuracy: 83.94%. |
Kim et al. (2020) [35] | Lateral cephalograms | 1675/400 | 23/8 | Stacked hourglass deep learning model. | Point-to-point error: 1.37 ± 1.79 mm. SCR: 88.43%. |
Kunz et al. (2020) [36] | Lateral cephalograms | 1792/50 | 18/12 | CNN | The CNN models showed almost no statistically significant differences with the humans’ gold standard. |
Alqahtani et al. (2020) [37] | Lateral cephalograms | -/30 | 16/16 | Commercially available web-based platform (CephX, https://www.orca-ai.com/, accessed on 23 August 2023) | The results obtained from CephX and manual landmarking did not exhibit clinically significant differences. |
Lee et al. (2020) [38] | Lateral cephalograms | 150/250 | 19/8 | Bayesian CNN | Mean landmark error: 1.53 ± 1.74 mm. SDR within 2, 3, and 4 mm: 82.11%, 92.28%, and 95.95%, respectively. Classification accuracy: 72.69~84.74. |
Yu et al. (2020) [39] | Lateral cephalograms | A total of 5890 | Four skeletal classification indicators. | Multimodal CNN | Sensitivity, specificity, and accuracy for vertical and sagittal skeletal classification: >90%. |
Li et al. (2020) [40] | Lateral cephalograms | 150/100 (additional 150 images than validation set). |
19/0 | GCN | MRE: 1.43 mm. SDR within 2, 2.5, 3, and 4 mm: 76.57%, 83.68%, 88.21%, and 94.31%, respectively. |
Tanikawa et al. (2021) [41] | Lateral cephalograms | 1755/30 for each subgroup | 26/0 | CNN | Mean success rate: 85~91%. Mean identification error: 1.32~1.50 mm. |
Zeng et al. (2021) [42] | Lateral cephalograms | 150/100 (additional 150 images than validation set). |
19/8 | CNN | MRE: 1.64 ± 0.91 mm. SDR within 2, 2.5, 3, and 4 mm: 70.58%, 79.53%, 86.05%, and 93.32%, respectively. SCR: 79.27%. |
Kim et al. (2021) [24] | Lateral cephalograms | 2610/100 (additional 440 images than validation set) |
20/0 | Cascade CNN | Overall detection error: 1.36 ± 0.98 mm. |
Hwang et al. (2021) [43] | Lateral cephalograms | 1983/200 | 19/8 | CNN (YOLOv3) | SDR within 2, 2.5, 3, and 4 mm: 75.45%, 83.66%, 88.92%, and 94.24%, respectively. SCR: 81.53%. |
Bulatova et al. (2021) [44] | Lateral cephalograms | -/110 | 16/0 | CNN (YOLOv3) (Ceppro software) | Total of 12 out of 16 points showed no statistical difference in absolute differences between AI and manual landmarking. |
Jeon et al. (2021) [45] | Lateral cephalograms | -/35 | 16/26 | CNN | None of the measurements showed statistically differences except the saddle angle, linear measurements of maxillary incisor to NA line and mandibular incisor to NB line. |
Hong et al. (2022) [46] | Lateral cephalograms | 3004/184 | 20/ | Cascade CNN | Total mean error was 1.17 mm. Accuracy percentage: 74.2%. |
Le et al. (2022) [47] | Lateral cephalograms | 1193/100 | 41/8 | CNN (DACFL) | MRE of 1.87 ± 2.04 mm. SDR within 2, 2.5, 3, and 4 mm: 73.32%, 80.39%, 85.61%, and 91.68%, respectively. Average SCR: 83.75%. |
Mahto et al. (2022) [48] | Lateral cephalograms | -/30 | 18/12 | Commercially available web-based platform (WebCeph, https://webceph.com, accessed on 23 August 2023) | Intraclass correlation coefficient: 7 parameters >0.9 (excellent agreement), 5 parameters: 0.75~0.9 (good agreement). |
Uğurlu et al. (2022) [49] | Lateral cephalograms | 1360/180 (additional 140 images than validation set) |
21/0 | CNN (FARNet) | MRE: 3.4 ± 1.57 mm. SDR within 2, 2.5, 3, 4 mm: 76.2%, 83.5%, 88.2%, 93.4%, respectively. |
Yao et al. (2022) [50] | Lateral cephalograms | 312/100 (additional 100 images than validation set) | 37/0 | CNN | MRE: 1.038 ± 0.893 mm. SDR within 1, 1.5, 2, 2.5, 3, 3.5, 4 mm: 54.05%, 91.89%, 97.30%, 100%, 100%, 100%, respectively. |
Lu et al. (2022) [51] | Lateral cephalograms | 150/250 | 19/0 | GCN | MRE: 1.19 mm. SDR within 2, 2.5, 3, and 4 mm: 83.20%, 88.93%, 92.88%, and 97.07%, respectively. |
Tsolakis et al. (2022) [52] | Lateral cephalograms | -/100 | 16/18 | CNN (commercially available software: CS imaging V8). | Differences between the AI software (CS imaging V8) and manual landmarking were not clinically significant. |
Duran et al. (2023) [53] | Lateral cephalograms | -/50 | 32/18 | Commercially available web-based platform (OrthoDx, https://ortho dx.phime ntum.com; WebCeph, https://webceph.com, accessed on 23 August 2023) | Consistency between AI software and manual landmarking: A statistically significant good level: angular measurements; a weak level: linear measurement and soft tissue parameters. |
Ye et al. (2023) [54] | Lateral cephalograms | -/43 | 32/0 | Commercially available software (MyOrthoX, Angelalign, and Digident) | MRE: MyOrthoX: 0.97 ± 0.51 mm. Angelalign: 0.80 ± 0.26 mm. Digident: 1.11 ± 0.48 mm. SDR (%) (within 1/1.5/2 mm): MyOrthoX: 67.02 ± 10.23/82.80 ± 7.36/89.99 ± 5.17. Angelalign: 78.08 ± 14.23/89.29 ± 14.02/93.09 ± 13.64. Digident: 59.13 ± 10.36/78.72 ± 5.97/87.53 ± 4.84. |
Ueda et al. (2023) [55] | Lateral cephalometric data | A total of 220 | 0/8 | RF | Overall accuracy: 0.823 ± 0.060. |
Bao et al.(2023) [56] | Reconstructed lateral cephalograms from CBCT | -/85 | 19/23 | Commercially available software (Planmeca Romexis 6.2) | For landmarks: MRE: 2.07 ± 1.35 mm SDR within 1, 2, 2.5, 3, and 4 mm: 18.82%, 58.58%, 71.70%, 82.04%, and 91.39%, respectively. For measurements: The rates of consistency within the 95% limits of agreement: 91.76~98.82%. |
Kim et al. (2021) [57] | Reconstructed Posteroanterior cephalograms from CBCT |
345/85 | 23/0 | Multi-stage CNN | MRE: 2.23 ± 2.02 mm SDR within 2 mm: 60.88%. |
Takeda et al. (2021) [58] | Posteroanterior cephalograms | 320/80 | 4/1 | CNN, RF | The CNN showed higher coefficient of determination than RF and less mean absolute error for the distance from the vertical reference line to menton. CNN with a stochastic gradient descent optimizer had the best performance. |
Lee et al. (2019) [59] | CBCT | 20/7 | 7 | Deep learning | Average point-to-point error: 1.5 mm. |
Torosdagli et al. (2019) [60] | CBCT | A total of 50 | 9/0 | Deep geodesic learning | Errors in the pixel space: <3 pixels for all landmarks. |
Yun et al. (2020) [61] | CBCT | 230/25 | 93/0 | CNN | Average point-to-point error: 3.63 mm. |
Kang et al. (2021) [62] | CT | 20/8 | 16/0 | Multi-stage DRL | Mean detection error: 1.96 ± 0.78. SDR within 2, 2.5, 3, and 4 mm: 58.99%, 75.39%, 86.52%, and 95.70%, respectively. |
Ghowsi et al. (2022) [63] | CBCT | -/100 | 53/0 | Commercially available software (Stratovan Corporation) | Mean absolute error: 1.57 mm. Mean error distance: 3.19 ± 2.6 mm. SDR within 2, 2.5, 3, and 4 mm: 35%, 48%, 59%, and 75%, respectively. |
Dot et al. (2022) [64] | CT | 128/38 (additional 32 images as validation set). |
33/15 | SCN | For landmarks: MRE: 1.0 ± 1.3 mm. SDR within 2, 2.5, and 3 mm: 90.4%, 93.6%, and 95.4%, respectively. For measurements: Mean errors: −0.3 ± 1.3° (angular), −0.1 ± 0.7 mm (linear). |
Blum et al. (2023) [65] | CBCT | 931/114 | 35/0 | CNN | Mean error: 2.73 mm. |
MRE, mean radial error; SDR, success detection rate; YOLOv3, You-Only-Look-Once version 3; SSD, Single-Shot Multibox Detector; SCR, success classification rates; DACFL, deep anatomical context feature learning; CBCT, cone-beam computed tomography; GCN, graph convolutional networks, FARNet, feature aggregation and refinement network; DRL, deep reinforcement learning; CT, computerized tomography; SCN, SpatialConfiguration-Net.