Table 5.
Performance of AI models on internal and external test datasets
Author (Year) | Application | Imaging modality | AI software/ deep learning model | Internal testing | External testing | Main findings | ||
---|---|---|---|---|---|---|---|---|
Test dataset | Performance | Test dataset | Performance | |||||
Orhan et al. (2020) 31 |
Detection and measurement of apical lesions | CBCT | Diagnocat software | N/A | N/A | 109 scans from Eskisehir Osmangazi University, Faculty of Dentistry | SPE = 0.89 PPV = 0.95 F1 = 0.93 |
Diagnocat achieved high performance on external validation with no significant differences in the volumes measured by Diagnocat and by an OMF radiologist |
Krois et al. (2021) 100 |
Detection of apical lesions | Panoramic radiography | CNNs (U‐Net and EfficientNet-B5) |
Dataset A = 150 images from Charité, Berlin, Germany Dataset B = 150 images from King George Medical University, Lucknow, India |
Model trained with images from datasets A and tested on images from dataset A SEN = 0.48 SPE = 1.0 PPV = 0.64 F1 = 0.54 Model trained with images from datasets A&B and tested on images from dataset A SEN = 0.48 SPE = 1.0 PPV = 0.57 F1 = 0.51 Model trained with images from datasets A&B and tested on images from dataset B SEN = 0.40 SPE = 1.0 PPV = 0.54 F1 = 0.46 |
Dataset B = 150 images from King George Medical University, Lucknow, India | Model trained with images from datasets A and tested on images from dataset B SEN = 0.22 SPE = 1.0 PPV = 0.63 F1 = 0.327 |
The model trained with images acquired from one hospital achieved lower performance (especially lower sensitivity) when tested on images acquired from another hospital. |
Zadroz ˙ny et al. (2022) 79 |
Multitasking including identification of missing tooth, caries, filling, prosthetic restoration, endodontically treated tooth, residual root, apical lesion, and periodontal bone loss | Panoramic radiography | Diagnocat software | N/A | N/A | 30 images from the Dental and Maxillofacial Radiology Department, Medical University of Warsaw, Poland |
Missing tooth
SEN = 0.96, SPE = 0.98 Caries SEN = 0.45, SPE = 0.98 Filling SEN = 0.83, SPE = 0.99 Prosthesis SPE = 0.96, SPE = 0.99 Endo-treated tooth SEN = 0.87, SPE = 0.99 Residual root SEN = 0.82, SPE = 1.00 Apical lesion SEN = 0.39, SPE = 0.98 Periodontal bone loss SEN = 0.80, SPE = 0.85 |
Diagnocat achieved high performance on external validation in identifying missing tooth, fillings, prosthesis, endodontically treated tooth, residual root, and periodontal bone loss, but low sensitivities for identifying caries and apical lesions. |
Ezhov et al. (2021) 78 |
Segmentation of teeth and jaws, numbering of teeth, detection of caries, periapical lesions, and periodontitis | CBCT | Diagnocat software | Cropped images from 562 scans taken using 19 scanners | Overall SEN = 0.92 SPE = 0.99 |
30 scans taken using three different scanners from three clinics |
12 dentists with/without the aid of the Diagnocat
Overall SEN = 0.85/0.77 SPE = 0.97/0.96 |
The overall sensitivity of 12 dentists with the aid of Diagnocat on external images was lower than that of Diagnocat on internal images. |
De Angelis et al. (2022) 80 |
Tooth numbering and detection of dental implants, prosthetic crowns, fillings, root remnants, and root canal treatment | Panoramic radiography | Promaton software | N/A | N/A | 120 images from the Department of Oral and Maxillofacial Sciences of Sapienza University of Rome, Italy | Overall AUC = 0.94 SEN = 0.89 SPE = 0.98 PPV = 0.94 NPV = 0.97 |
Promaton achieved high overall performance on external validation |
Nishiyama et al. (2021) 70 |
Diagnosis of mandibular condyle fracture | Panoramic radiography | CNN (AlexNet) |
5-fold CV
Dataset A = 200 images from a university dental hospital Dataset B = 200 images from a general hospital |
Model trained with and tested on images from dataset A/B AUC = 0.85/0.86 ACC = 0.80/0.81 SEN = 0.80/0.80 SPE = 0.79/0.82 Model trained by images from datasets A&B and tested on images from dataset A/B AUC = 0.89/0.91 ACC = 0.82/0.85 SEN = 0.83/0.85 SPE = 0.80/0.84 |
Dataset A = 200 images from a university dental hospital Dataset B = 200 images from a general hospital |
Model trained with images from dataset A and tested on images from dataset B AUC = 0.58 ACC = 0.59 SEN = 0.60 SPE = 0.58 Model trained with images from dataset B and tested on images from dataset A AUC = 0.58 ACC = 0.60 SEN = 0.61 SPE = 0.59 |
The model trained with images acquired from one hospital achieved much lower diagnostic performance when tested on images acquired from another hospital. |
Jung et al. (2021) 57 |
Segmentation of maxillary sinus lesions | CBCT | CNN (3D nnU-Net) | 20 scans from Korea University Anam Hospital | DSC (air) = 0.93 DSC (lesions) = 0.76 |
20 scans from Korea University Ansan Hospital | DSC (air) = 0.97 DSC (lesions) = 0.54 |
The model achieved similar performance on external images in segmenting the air space of the sinus but much lower performance in segmenting the sinus lesions |
3D, three-dimensional; ACC, accuracy; AI, artificial intelligence; AUC, area under the ROC curve; CBCT, cone-beam computed tomography; CNN, convolutional neural network; CV, cross-validation; DSC, Dice similarity coefficient; F1, F1-score; N/A, not available; NPV, negative predictive value; OMF, oral and maxillofacial; PPV, positive predictive value (Precision); SEN, sensitivity (Recall); SPE, specificity.