Skip to main content
. 2023 Mar 21;239(4):499–513. doi: 10.1159/000530225

Table 5.

Summary of quality assessment on AI models reviewed

Domain Checklist item Quality assessment, n (%)
fully addressed partially addressed not addressed
Data 1 Image types 21 (95) 1 (5) 0 (0)
2 Image artifacts 12 (55) 5 (23) 5 (23)
3 Technical acquisition details 22 (100) 0 (0) 0 (0)
4 Pre-processing procedures 20 (91) 0 (0) 2 (9)
5 Synthetic images made public if used 22 (100)a 0 (0) 0 (0)
6 Public images adequately referenced 22 (100) 0 (0) 0 (0)
7 Patient-level metadata 5 (23) 17 (77) 0 (23)
8 Skin tone information and procedure by which skin tone was assessed 3 (14) 16 (73) 3 (14)
9 Potential biases that may arise from use of patient information and metadata 9 (41) 7 (32) 6 (27)
10 Dataset partitions 12 (55) 9 (41) 1 (5)
11 Sample sizes of training, validation, and test sets 7 (32) 14 (64) 1 (5)
12 External test set 3 (14) 2 (9) 17 (77)
13 Multivendor images 20 (91) 2 (9) 0 (0)
14 Class distribution and balance 5 (23) 15 (68) 2 (9)
15 OOD images 2 (9) 7 (32) 13 (59)
Technique 16 Labeling method (ground truth, who did it) 15 (68) 7 (32) 0 (0)
17 References to common/accepted diagnostic labels 22 (100) 0 (0) 0 (0)
18 Histopathologic review for malignancies 16 (73) 2 (9) 4 (18)
19 Detailed description of algorithm development 14 (64) 6 (27) 2 (10)
Technical Assessment 20 How to publicly evaluate algorithm 5 (23) 0 (0) 17 (77)
21 Performance measures 9 (41) 13 (59) 0 (0)
22 Benchmarking, technical comparison, and novelty 15 (68) 0 (0) 7 (32)
23 Bias assessment 10 (45) 6 (27) 6 (27)
Application 24 Use cases and target conditions (inside distribution) 16 (73) 6 (27) 0 (0)
25 Potential impacts on the healthcare team and patients 3 (14) 13 (59) 6 (27)

aNo studies included synthetic images (checklist item 5), therefore marked as “fully addressed” to not negatively impact quality score.

OOD, out of distribution.