. 2023 Sep 27;6:180. doi: 10.1038/s41746-023-00914-8

Table 3.

Outcomes of deep learning algorithms for the diagnosis of the five main categories of skin disease.

Outcome
	Accuracy (%)		Sensitivity (%)		Specificity (%)
	All studies	Externally validated/tested studies	All studies	Externally validated/tested studies	All studies	Externally validated/tested studies
Inflammatory disorders
(psoriasis, eczematous disorders, lichenoid disorders, immunobullous diseases, angioedema, urticaria, discoid lupus erythematous, fixed drug eruption, papulosquamous eruptions)
Median (IQR)	91.6 (80.0–95.8)	82.7 (52.9–99.9)	77.3 (63.3–92.0)	58.1 (48.4–71.6)	97.8 (94.8–99.3)	99.5 (97.6–99.8)
Range	35.0–100.0	35.0–100.0	35.3–99.6	35.3–91.7	71.6–100.0	95.7–100.0
Number of studies	30	6	47	12	40	10
Follicular disorders of skin
(acne, rosacea, hidradenitis suppurativa)
Median (IQR)	93.0 (86.8–96.7)	84.0 (n/a)	87.4 (67.0–93.9)	86.9 (62.8–91.9)	96.9 (93.0–98.9)	94.4 (n/a)
Range	49.3–99.7	49.3–99.7	0.0–100.0	41.7–93.9	91.7–100.0	94.1–94.6
Number of studies	16	3	19	4	15	2
Alopecia
Median (IQR)	100.0 (n/a)	100.0 (n/a)	84.1 (83.0–94.1)	94.1 (n/a)	99.6 (n/a)	99.3 (n/a)
Range	100.0	100.0	82.0–100.0	88.1–100.0	99.3–99.8	99.3
Number of studies	2	2	5	2	2	1
Pigmentary disorders
(vitiligo, melasma)
Median (IQR)	87.8 (80.4–99.0)	98.0 (n/a)	86.1 (73.7–91.9)	79.4 (73.7–88.4)	97.4 (80.2–98.8)	98.6 (n/a)
Range	75.0–100.0	75.0–100.0	71.9–97.2	72.4–92.9	79.4–98.9	98.5–98.8
Number of studies	5	3	8	4	6	2
Skin infections
(viral, bacterial, fungal, parasitic skin infections)
Median (IQR)	87.5 (60.2–94.9)	59.3 (50.0–73.7)	76.9 (63.1–92.5)	70.2 (55.8–80.3)	98.9 (93.2–99.7)	99.1 (97.4–99.8)
Range	26.7–100.0	26.7–95.6	26.7–100.0	26.7–96.9	72.7–100.0	72.7–99.9
Number of studies	17	7	33	17	25	13

The five skin disease categories are inflammatory disorders, follicular disorders of skin, alopecia, pigmentary disorders and skin infections. Studies assessing multiple diseases are reported under each of the relevant disease categories. Where studies report multiple outcomes by using variations of DL algorithms or datasets, the best performing results are presented. Outcomes for ‘externally validated/tested studies’ (i.e. where datasets independent from the training dataset are used for validation and/or testing DL algorithms) are presented separately from ‘all studies’, as these studies are presumed to be at a lower risk of overfitting.

Deep learning (DL), interquartile range (IQR).