. 2022 Dec 12;52(1):20220335. doi: 10.1259/dmfr.20220335

Table 5.

Performance of AI models on internal and external test datasets

Author (Year)	Application	Imaging modality	AI software/ deep learning model	Internal testing		External testing		Main findings
Author (Year)	Application	Imaging modality	AI software/ deep learning model	Test dataset	Performance	Test dataset	Performance	Main findings
Orhan et al. (2020)³¹	Detection and measurement of apical lesions	CBCT	Diagnocat software	N/A	N/A	109 scans from Eskisehir Osmangazi University, Faculty of Dentistry	SPE = 0.89 PPV = 0.95 F1 = 0.93	Diagnocat achieved high performance on external validation with no significant differences in the volumes measured by Diagnocat and by an OMF radiologist
Krois et al. (2021)¹⁰⁰	Detection of apical lesions	Panoramic radiography	CNNs (U‐Net and EfficientNet-B5)	Dataset A = 150 images from Charité, Berlin, Germany Dataset B = 150 images from King George Medical University, Lucknow, India	Model trained with images from datasets A and tested on images from dataset A SEN = 0.48 SPE = 1.0 PPV = 0.64 F1 = 0.54 Model trained with images from datasets A&B and tested on images from dataset A SEN = 0.48 SPE = 1.0 PPV = 0.57 F1 = 0.51 Model trained with images from datasets A&B and tested on images from dataset B SEN = 0.40 SPE = 1.0 PPV = 0.54 F1 = 0.46	Dataset B = 150 images from King George Medical University, Lucknow, India	Model trained with images from datasets A and tested on images from dataset B SEN = 0.22 SPE = 1.0 PPV = 0.63 F1 = 0.327	The model trained with images acquired from one hospital achieved lower performance (especially lower sensitivity) when tested on images acquired from another hospital.
Zadroz ˙ny et al. (2022)⁷⁹	Multitasking including identification of missing tooth, caries, filling, prosthetic restoration, endodontically treated tooth, residual root, apical lesion, and periodontal bone loss	Panoramic radiography	Diagnocat software	N/A	N/A	30 images from the Dental and Maxillofacial Radiology Department, Medical University of Warsaw, Poland	Missing tooth SEN = 0.96, SPE = 0.98 Caries SEN = 0.45, SPE = 0.98 Filling SEN = 0.83, SPE = 0.99 Prosthesis SPE = 0.96, SPE = 0.99 Endo-treated tooth SEN = 0.87, SPE = 0.99 Residual root SEN = 0.82, SPE = 1.00 Apical lesion SEN = 0.39, SPE = 0.98 Periodontal bone loss SEN = 0.80, SPE = 0.85	Diagnocat achieved high performance on external validation in identifying missing tooth, fillings, prosthesis, endodontically treated tooth, residual root, and periodontal bone loss, but low sensitivities for identifying caries and apical lesions.
Ezhov et al. (2021)⁷⁸	Segmentation of teeth and jaws, numbering of teeth, detection of caries, periapical lesions, and periodontitis	CBCT	Diagnocat software	Cropped images from 562 scans taken using 19 scanners	Overall SEN = 0.92 SPE = 0.99	30 scans taken using three different scanners from three clinics	12 dentists with/without the aid of the Diagnocat Overall SEN = 0.85/0.77 SPE = 0.97/0.96	The overall sensitivity of 12 dentists with the aid of Diagnocat on external images was lower than that of Diagnocat on internal images.
De Angelis et al. (2022)⁸⁰	Tooth numbering and detection of dental implants, prosthetic crowns, fillings, root remnants, and root canal treatment	Panoramic radiography	Promaton software	N/A	N/A	120 images from the Department of Oral and Maxillofacial Sciences of Sapienza University of Rome, Italy	Overall AUC = 0.94 SEN = 0.89 SPE = 0.98 PPV = 0.94 NPV = 0.97	Promaton achieved high overall performance on external validation
Nishiyama et al. (2021)⁷⁰	Diagnosis of mandibular condyle fracture	Panoramic radiography	CNN (AlexNet)	5-fold CV Dataset A = 200 images from a university dental hospital Dataset B = 200 images from a general hospital	Model trained with and tested on images from dataset A/B AUC = 0.85/0.86 ACC = 0.80/0.81 SEN = 0.80/0.80 SPE = 0.79/0.82 Model trained by images from datasets A&B and tested on images from dataset A/B AUC = 0.89/0.91 ACC = 0.82/0.85 SEN = 0.83/0.85 SPE = 0.80/0.84	Dataset A = 200 images from a university dental hospital Dataset B = 200 images from a general hospital	Model trained with images from dataset A and tested on images from dataset B AUC = 0.58 ACC = 0.59 SEN = 0.60 SPE = 0.58 Model trained with images from dataset B and tested on images from dataset A AUC = 0.58 ACC = 0.60 SEN = 0.61 SPE = 0.59	The model trained with images acquired from one hospital achieved much lower diagnostic performance when tested on images acquired from another hospital.
Jung et al. (2021)⁵⁷	Segmentation of maxillary sinus lesions	CBCT	CNN (3D nnU-Net)	20 scans from Korea University Anam Hospital	DSC (air) = 0.93 DSC (lesions) = 0.76	20 scans from Korea University Ansan Hospital	DSC (air) = 0.97 DSC (lesions) = 0.54	The model achieved similar performance on external images in segmenting the air space of the sinus but much lower performance in segmenting the sinus lesions

3D, three-dimensional; ACC, accuracy; AI, artificial intelligence; AUC, area under the ROC curve; CBCT, cone-beam computed tomography; CNN, convolutional neural network; CV, cross-validation; DSC, Dice similarity coefficient; F1, F1-score; N/A, not available; NPV, negative predictive value; OMF, oral and maxillofacial; PPV, positive predictive value (Precision); SEN, sensitivity (Recall); SPE, specificity.