Extended Data Table 2 |.
a. | Classifier | Three-way accuracy | b. | Classifier | Nine-way accuracy |
---|---|---|---|---|---|
|
|
||||
Dermatologist 1 | 65.6% | Dermatologist 1 | 53.3% | ||
Dermatologist 2 | 66.0% | Dermatologist 2 | 55.0% | ||
CNN | 69.4 ± 0.8% | CNN | 48.9 ± 1.9% | ||
CNN - PA | 72.1 ± 0.9% | CNN-PA | 55.4 ± 1.7% | ||
c. | Disease classes: three-way classification | d. | Disease classes: nine-way classification | ||
|
|
||||
0. Benign single lesions | 0. Cutaneous lymphoma and lymphoid infiltrates | ||||
1. Malignant single lesions | 1. Benign dermal tumors, cysts, sinuses | ||||
2. Non-neoplastic lesions | 2. Malignant dermal tumor | ||||
3. Benign epidermal tumors, hamartomas, milia, and growths | |||||
4. Malignant and premalignant epidermal tumors | |||||
5. Genodermatoses and supernumerary growths | |||||
6. Inflammatory conditions | |||||
7. Benign melanocytic lesions | |||||
8. Malignant Melanoma |
Here we show ninefold cross-validation classification accuracy with 127,463 images organized in two different strategies. In each fold, a different ninth of the dataset is used for validation, and the rest is used for training. Reported values are the mean and standard deviation of the validation accuracy across all n = 9 folds. These images are labelled by dermatologists, not necessarily through biopsy; meaning that this metric is not as rigorous as one with biopsy-proven images. Thus we only compare to two dermatologists as a means to validate that the algorithm is learning relevant information. a, Three-way classification accuracy comparison between algorithms and dermatologists. The dermatologists are tested on 180 random images from the validation set—60 per class. The three classes used are first-level nodes of our taxonomy. A CNN trained directly on these three classes also achieves inferior performance to one trained with our partitioning algorithm (PA). b, Nine-way classification accuracy comparison between algorithms and dermatologists. The dermatologists are tested on 180 random images from the validation set—20 per class. The nine classes used are the second-level nodes of our taxonomy. A CNN trained directly on these nine classes achieves inferior performance to one trained with our partitioning algorithm. c, Disease classes used for the three-way classification represent highly general disease classes. d, Disease classes used for nine-way classification represent groups of diseases that have similar aetiologies.