Skip to main content
. 2023 Sep 27;6:180. doi: 10.1038/s41746-023-00914-8

Table 3.

Outcomes of deep learning algorithms for the diagnosis of the five main categories of skin disease.

Outcome
Accuracy (%) Sensitivity (%) Specificity (%)
All studies Externally validated/tested studies All studies Externally validated/tested studies All studies Externally validated/tested studies
Inflammatory disorders
(psoriasis, eczematous disorders, lichenoid disorders, immunobullous diseases, angioedema, urticaria, discoid lupus erythematous, fixed drug eruption, papulosquamous eruptions)
 Median (IQR) 91.6 (80.0–95.8) 82.7 (52.9–99.9) 77.3 (63.3–92.0) 58.1 (48.4–71.6) 97.8 (94.8–99.3) 99.5 (97.6–99.8)
 Range 35.0–100.0 35.0–100.0 35.3–99.6 35.3–91.7 71.6–100.0 95.7–100.0
 Number of studies 30 6 47 12 40 10
Follicular disorders of skin
(acne, rosacea, hidradenitis suppurativa)
 Median (IQR) 93.0 (86.8–96.7) 84.0 (n/a) 87.4 (67.0–93.9) 86.9 (62.8–91.9) 96.9 (93.0–98.9) 94.4 (n/a)
 Range 49.3–99.7 49.3–99.7 0.0–100.0 41.7–93.9 91.7–100.0 94.1–94.6
 Number of studies 16 3 19 4 15 2
Alopecia
 Median (IQR) 100.0 (n/a) 100.0 (n/a) 84.1 (83.0–94.1) 94.1 (n/a) 99.6 (n/a) 99.3 (n/a)
 Range 100.0 100.0 82.0–100.0 88.1–100.0 99.3–99.8 99.3
 Number of studies 2 2 5 2 2 1
Pigmentary disorders
(vitiligo, melasma)
 Median (IQR) 87.8 (80.4–99.0) 98.0 (n/a) 86.1 (73.7–91.9) 79.4 (73.7–88.4) 97.4 (80.2–98.8) 98.6 (n/a)
 Range 75.0–100.0 75.0–100.0 71.9–97.2 72.4–92.9 79.4–98.9 98.5–98.8
 Number of studies 5 3 8 4 6 2
Skin infections
(viral, bacterial, fungal, parasitic skin infections)
 Median (IQR) 87.5 (60.2–94.9) 59.3 (50.0–73.7) 76.9 (63.1–92.5) 70.2 (55.8–80.3) 98.9 (93.2–99.7) 99.1 (97.4–99.8)
 Range 26.7–100.0 26.7–95.6 26.7–100.0 26.7–96.9 72.7–100.0 72.7–99.9
 Number of studies 17 7 33 17 25 13

The five skin disease categories are inflammatory disorders, follicular disorders of skin, alopecia, pigmentary disorders and skin infections. Studies assessing multiple diseases are reported under each of the relevant disease categories. Where studies report multiple outcomes by using variations of DL algorithms or datasets, the best performing results are presented. Outcomes for ‘externally validated/tested studies’ (i.e. where datasets independent from the training dataset are used for validation and/or testing DL algorithms) are presented separately from ‘all studies’, as these studies are presumed to be at a lower risk of overfitting.

Deep learning (DL), interquartile range (IQR).