Skip to main content
. 2022 Dec 12;52(1):20220335. doi: 10.1259/dmfr.20220335

Table 5.

Performance of AI models on internal and external test datasets

Author (Year) Application Imaging modality AI software/ deep learning model Internal testing External testing Main findings
Test dataset Performance Test dataset Performance
Orhan et al.
(2020) 31
Detection and measurement of apical lesions CBCT Diagnocat software N/A N/A 109 scans from Eskisehir Osmangazi University, Faculty of Dentistry SPE = 0.89
PPV = 0.95
F1 = 0.93
Diagnocat achieved high performance on external validation with no significant differences in the volumes measured by Diagnocat and by an OMF radiologist
Krois et al.
(2021) 100
Detection of apical lesions Panoramic radiography CNNs (U‐Net and EfficientNet-B5) Dataset A = 150 images from Charité, Berlin, Germany
Dataset B = 150 images from King George Medical University, Lucknow, India
Model trained with images from datasets A and tested on images from dataset A
SEN = 0.48
SPE = 1.0
PPV = 0.64
F1 = 0.54
Model trained with images from datasets A&B and tested on images from dataset A
SEN = 0.48
SPE = 1.0
PPV = 0.57
F1 = 0.51
Model trained with images from datasets A&B and tested on images from dataset B
SEN = 0.40
SPE = 1.0
PPV = 0.54
F1 = 0.46
Dataset B = 150 images from King George Medical University, Lucknow, India Model trained with images from datasets A and tested on images from dataset B
SEN = 0.22
SPE = 1.0
PPV = 0.63
F1 = 0.327
The model trained with images acquired from one hospital achieved lower performance (especially lower sensitivity) when tested on images acquired from another hospital.
Zadroz ˙ny et al.
(2022) 79
Multitasking including identification of missing tooth, caries, filling, prosthetic restoration, endodontically treated tooth, residual root, apical lesion, and periodontal bone loss Panoramic radiography Diagnocat software N/A N/A 30 images from the Dental and Maxillofacial Radiology Department, Medical University of Warsaw, Poland Missing tooth
SEN = 0.96, SPE = 0.98
Caries
SEN = 0.45, SPE = 0.98
Filling
SEN = 0.83, SPE = 0.99
Prosthesis
SPE = 0.96, SPE = 0.99
Endo-treated tooth
SEN = 0.87, SPE = 0.99
Residual root
SEN = 0.82, SPE = 1.00
Apical lesion
SEN = 0.39, SPE = 0.98
Periodontal bone loss
SEN = 0.80, SPE = 0.85
Diagnocat achieved high performance on external validation in identifying missing tooth, fillings, prosthesis, endodontically treated tooth, residual root, and periodontal bone loss, but low sensitivities for identifying caries and apical lesions.
Ezhov et al.
(2021) 78
Segmentation of teeth and jaws, numbering of teeth, detection of caries, periapical lesions, and periodontitis CBCT Diagnocat software Cropped images from 562 scans taken using 19 scanners Overall
SEN = 0.92
SPE = 0.99
30 scans taken using three different scanners from three clinics 12 dentists with/without the aid of the Diagnocat
Overall
SEN = 0.85/0.77
SPE = 0.97/0.96
The overall sensitivity of 12 dentists with the aid of Diagnocat on external images was lower than that of Diagnocat on internal images.
De Angelis et al.
(2022) 80
Tooth numbering and detection of dental implants, prosthetic crowns, fillings, root remnants, and root canal treatment Panoramic radiography Promaton software N/A N/A 120 images from the Department of Oral and Maxillofacial Sciences of Sapienza University of Rome, Italy Overall
AUC = 0.94
SEN = 0.89
SPE = 0.98
PPV = 0.94
NPV = 0.97
Promaton achieved high overall performance on external validation
Nishiyama et al.
(2021) 70
Diagnosis of mandibular condyle fracture Panoramic radiography CNN (AlexNet) 5-fold CV
Dataset A = 200 images from a university dental hospital
Dataset B = 200 images from a general hospital
Model trained with and tested on images from dataset A/B
AUC = 0.85/0.86
ACC = 0.80/0.81
SEN = 0.80/0.80
SPE = 0.79/0.82
Model trained by images from datasets A&B and tested on images from dataset A/B
AUC = 0.89/0.91
ACC = 0.82/0.85
SEN = 0.83/0.85
SPE = 0.80/0.84
Dataset A = 200 images from a university dental hospital
Dataset B = 200 images from a general hospital
Model trained with images from dataset A and tested on images from dataset B
AUC = 0.58
ACC = 0.59
SEN = 0.60
SPE = 0.58
Model trained with images from dataset B and tested on images from dataset A
AUC = 0.58
ACC = 0.60
SEN = 0.61
SPE = 0.59
The model trained with images acquired from one hospital achieved much lower diagnostic performance when tested on images acquired from another hospital.
Jung et al.
(2021) 57
Segmentation of maxillary sinus lesions CBCT CNN (3D nnU-Net) 20 scans from Korea University Anam Hospital DSC (air) = 0.93
DSC (lesions) = 0.76
20 scans from Korea University Ansan Hospital DSC (air) = 0.97
DSC (lesions) = 0.54
The model achieved similar performance on external images in segmenting the air space of the sinus but much lower performance in segmenting the sinus lesions

3D, three-dimensional; ACC, accuracy; AI, artificial intelligence; AUC, area under the ROC curve; CBCT, cone-beam computed tomography; CNN, convolutional neural network; CV, cross-validation; DSC, Dice similarity coefficient; F1, F1-score; N/A, not available; NPV, negative predictive value; OMF, oral and maxillofacial; PPV, positive predictive value (Precision); SEN, sensitivity (Recall); SPE, specificity.