Skip to main content
. 2023 Mar 21;239(4):499–513. doi: 10.1159/000530225

Table 4.

Reader studies between AI models and human experts (e.g., dermatologists), and non-experts (e.g., dermatology residents, GPs)

Reference AI performance Expert performance Non-expert performance
Piccolo et al. (2002) [23] Sensitivity: 92%
Specificity: 74%
Sensitivity: 92%
Specificity: 99%
Sensitivity: 69%
Specificity: 94%
Chang et al. (2013) [25] Accuracy Melanoma: 91%
Non-melanoma: 83%
Sensitivity: 86%
Specificity: 88%
Accuracy: 81%
Sensitivity: 83%
Specificity: 86%
Yu et al. (2018) [29] Accuracy: 82%
Sensitivity: 93%
Specificity: 72%
AUC: 0.80
Accuracy: 81%
Sensitivity: 97%
Specificity: 67%
AUC: 0.80
Accuracy: 65%
Sensitivity: 45%
Specificity: 84%
AUC: 0.65
Huang et al. (2020) [37] Sensitivity: 90%
AUC 0.94
Sensitivity: 85%
Specificity: 90%
Sensitivity: 66%
Specificity: 72%
Han et al. (2020) [35] Sensitivity: 89%
Specificity: 78%
AUC: 0.92
Sensitivity: 95%
Specificity: 72%
ROC: 0.91
Accuracy
Dermatology resident: 94%
Non-dermatology clinician: 77%
Sensitivity
Dermatology resident: 69%
Non-dermatology clinician: 65%
AUC
Dermatology resident: 0.88
Non-dermatology clinician: 0.73
Fujisawa et al. (2019) [31] Accuracy
Binary: 92%
Multiclass: 75%
Accuracy
Binary: 85%
Multiclass: 60%
Accuracy
Binary: 74%
Multiclass: 42%
Jinnai et al., (2019) [38] Accuracy: 92%
Sensitivity: 83%
Specificity: 95%
Accuracy: 87%
Sensitivity: 86%
Specificity: 87%
Accuracy: 85%
Sensitivity: 84%
Specificity: 86%
Zhao et al. (2019) [32] Sensitivity
Benign: 90%
Low risk: 90%
High risk: 75%
Sensitivity
Benign: 61%
Low risk: 50%
High risk: 64%
Cho et al. (2020) [33] Sensitivity
Dataset 1: 76%
Dataset 2: 70%
Specificity
Dataset 1: 80%
Dataset 2: 76%
AUC
Dataset 1: 0.83
Dataset 2: 0.77
Sensitivity
-Without algorithm: 90%
-With algorithm: 90%
Specificity
-Without algorithm: 58%
-With algorithm: 61%
Sensitivity
Dermatology resident
-Without algorithm: 80%
-With algorithm: 85%
Non-dermatology clinician
-Without algorithm: 65%
-With algorithm: 74%
Specificity
Dermatology resident
-Without algorithm: 53%
-With algorithm: 71%
Non-dermatology clinician
-Without algorithm: 46%
-With algorithm: 49%
AUC
Dermatology resident
-Without algorithm: 0.33
-With algorithm: 0.42
Non-dermatology clinician
-Without algorithm: 0.11
-With algorithm: 0.23
Han et al. (2020) [36] Multiclass model
Accuracy
Top 1: 45%
Top 3: 69%
Top 5: 78%
Multiclass model
Accuracy (without algorithm)
Top 1: 50%
Top 3: 67% (with algorithm)
Top 1: 53%
Top 3: 74%
Binary model
Accuracy
-Without algorithm: 77%
-With algorithm: 85%
Han et al. (2020) [34] Binary model
Sensitivity: 67%
Specificity: 87%
Multiclass accuracy
Top 1: 50%
Top 3: 70%
Binary model
Sensitivity: 66%
Specificity: 67%
Multiclass accuracy
Top 1: 38%
Top 3: 53%
Li et al. (2020) [44] Accuracy
Binary: 73%
Multiclass: 86%
Accuracy
Binary: 83%
Multiclass: 74%
Liu et al. (2020) [39] Accuracy
Top 1: 66%
Top 3: 90%
Accuracy
Top 1: 63%
Top 3: 75%
Accuracy
Primary care physician
Top 1: 44%
Top 3: 60%
Nurse practitioner
Top 1: 40%
Top 3: 55%
Minagawa et al. (2021) [42] Accuracy: 71% Accuracy: 90%