Skip to main content
. Author manuscript; available in PMC: 2021 Aug 23.
Published in final edited form as: Nature. 2017 Jan 25;542(7639):115–118. doi: 10.1038/nature21056

Extended Data Table 2 |.

General validation results

a. Classifier Three-way accuracy b. Classifier Nine-way accuracy


Dermatologist 1 65.6% Dermatologist 1 53.3%
Dermatologist 2 66.0% Dermatologist 2 55.0%
CNN 69.4 ± 0.8% CNN 48.9 ± 1.9%
CNN - PA 72.1 ± 0.9% CNN-PA 55.4 ± 1.7%
c. Disease classes: three-way classification d. Disease classes: nine-way classification


0. Benign single lesions 0. Cutaneous lymphoma and lymphoid infiltrates
1. Malignant single lesions 1. Benign dermal tumors, cysts, sinuses
2. Non-neoplastic lesions 2. Malignant dermal tumor
3. Benign epidermal tumors, hamartomas, milia, and growths
4. Malignant and premalignant epidermal tumors
5. Genodermatoses and supernumerary growths
6. Inflammatory conditions
7. Benign melanocytic lesions
8. Malignant Melanoma

Here we show ninefold cross-validation classification accuracy with 127,463 images organized in two different strategies. In each fold, a different ninth of the dataset is used for validation, and the rest is used for training. Reported values are the mean and standard deviation of the validation accuracy across all n = 9 folds. These images are labelled by dermatologists, not necessarily through biopsy; meaning that this metric is not as rigorous as one with biopsy-proven images. Thus we only compare to two dermatologists as a means to validate that the algorithm is learning relevant information. a, Three-way classification accuracy comparison between algorithms and dermatologists. The dermatologists are tested on 180 random images from the validation set—60 per class. The three classes used are first-level nodes of our taxonomy. A CNN trained directly on these three classes also achieves inferior performance to one trained with our partitioning algorithm (PA). b, Nine-way classification accuracy comparison between algorithms and dermatologists. The dermatologists are tested on 180 random images from the validation set—20 per class. The nine classes used are the second-level nodes of our taxonomy. A CNN trained directly on these nine classes achieves inferior performance to one trained with our partitioning algorithm. c, Disease classes used for the three-way classification represent highly general disease classes. d, Disease classes used for nine-way classification represent groups of diseases that have similar aetiologies.