Skip to main content
Journal of Korean Medical Science logoLink to Journal of Korean Medical Science
. 2025 Jul 28;40(35):e220. doi: 10.3346/jkms.2025.40.e220

Large-Scale Dermatopathology Dataset for Lesion Segmentation: Model Development and Analysis

Yosep Chong 1,*, Daseul Park 2,*, Youngbin Ahn 3,*, Yoonjin Kwak 4, Seyeon Park 5, Seung Wan Back 2, Changwoo Lee 5, Gyeongsin Park 1, Mohammad Rizwan Alam 1, Binna Kim 1, Kee-Taek Jang 6, Nayoung Han 7, Chong Woo Yoo 7,8, Jonghyuck Lee 9, Cheol Lee 4,, Young-Gon Kim 2,10,
PMCID: PMC12418205  PMID: 40923506

Abstract

Background

With the increasing incidence of skin cancer, the workload for pathologists has surged. The diagnosis of skin samples, especially for complex lesions such as malignant melanomas and melanocytic lesions, has shown higher diagnostic variability compared to other organ samples. Consequently, artificial intelligence (AI)-based diagnostic assistance programs are increasingly needed to support dermatopathologists in achieving more consistent diagnoses. However, large-scale skin pathology image datasets for AI learning are often insufficient or limited to specific diseases. This study aimed to build and assess a large-scale dermatopathology image dataset for an AI model.

Methods

We trained and evaluated a lesion segmentation model based on this dataset, which consisted of over 34,376 histopathology slide images collected from four institutions, including normal skin and six types of common skin lesion: epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, and malignant melanoma. Each image was accompanied by labeled data consisting of lesion area annotations and clinical information. To ensure the high quality and accuracy of the dataset, we employed data quality management methods, including syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation.

Results

The results of the dataset quality assessment confirmed high quality, with syntactic accuracy and semantic accuracy at 0.99 and 0.95, respectively. Statistical diversity was verified to follow a natural distribution. The validity evaluation verified the strong performance of the segmentation model for each group of data, with a Dice score ranging from 80% to 91%.

Conclusion

The results demonstrated that our constructed dataset provides a well-suited resource for deep learning training, offering a large-scale multi-institutional dermatopathology dataset that can drive advancements in AI-driven dermatopathology diagnosis.

Keywords: Large-Scale Dermatopathology Dataset, Whole Slide Image, Lesion Segmentation, Deep Learning

Graphical Abstract

graphic file with name jkms-40-e220-abf001.jpg

INTRODUCTION

The incidence of skin cancer is increasing globally, particularly in developed countries,1 owing to the aging global population and increased sun exposure.2 These factors have increased the workload for pathologists and made it particularly challenging to accurately diagnose skin samples containing specific melanomas or atypical melanocytic lesions. Consultations for challenging cases can be requested internally or externally; however, the complexity of the process makes this possible only on a limited basis.

With the establishment of a database of whole-slide images (WSIs) in a digital pathology system, the use of deep learning-based artificial intelligence (AI) technology is expected to reduce diagnostic errors caused by excessive workloads and inconsistencies among pathologists.3 There is a growing need for an AI-based pathology diagnosis assistance program for dermatopathological diagnoses. Studies using AI and histopathology image sets have shown remarkable accuracy in several groups, including distinguishing malignant melanoma from benign nevi and basal cell carcinoma from normal tissue.4,5 However, these studies have limitations, such as insufficient datasets from a single institution or targeting only limited specific diseases.

This study aimed to address these previous limitations by constructing a large-scale, multi-institutional dermatopathology dataset, including normal skin images, quality-validated for AI learning, and exploring the potential of AI in segmenting six types of common skin lesions.

METHODS

Demographics

Data were collected from multiple medical institutions, including Seoul National University Hospital, Catholic Medical Center, National Cancer Center, and Samsung Medical Center. A total of 34,376 WSIs were collected and classified into seven groups (six groups for disease and one group for normal conditions) (Table 1). In Group 1, epidermal cysts comprised 5,036 cases (14.65% of the total), with 2,423 (48.11%) collected through biopsy and 2,613 (51.89%) through resection. Group 2 (seborrheic keratosis) comprised 5,133 cases (14.93% of the total), of which 4,726 (92.07%) were collected via biopsy and 407 (7.93%) via resection. Group 3 (Bowen’s disease/squamous cell carcinoma) comprised 5,031 cases (14.64% of the total), of which 2,073 (41.20%) were collected via biopsy and 2,958 (58.79%) via resection. Group 4 (basal cell carcinoma) comprised 5,419 cases (15.76% of the total), of which 1,968 (36.31%) were collected via biopsy and 3,451 (63.69%) via resection. Group 5 (melanocytic nevus) comprised 5,036 cases (14.65% of the total), of which 3,775 (74.96%) were collected via biopsy and 1,261 (25.04%) via resection. Group 6 (malignant melanoma) comprised 3,000 cases (8.73% of the total), of which 2,331 (77.70%) were collected via biopsy and 669 (22.30%) via resection. Group 7 (normal skin tissue - non-neoplastic skin samples with nonspecific histologic findings) comprised 5,721 cases (16.64% of the total), of which 3,949 (68.96%) were collected through biopsy and 1,772 (31.04%) through resection. More details of demographics for each Group were shown in Table 1.

Table 1. Demographics of the study subjects.

Variables Group 1: EC Group 2: SK Group 3: BD/SQCC Group 4: BCC Group 5: MN Group 6: MM Group 7: Normal
Total 5,036 (14.7) 5,133 (14.9) 5,031 (14.6) 5,419 (15.8) 5,036 (14.7) 3,000 (8.7) 5,721 (16.6)
Institution
SNUH 1,621 1,682 1,407 1,659 1,622 1,175 1,828
CMC 1,715 1,750 1,624 1,940 1,711 598 1,825
NCC - - 400 211 - 196 229
SMC 1,700 1,701 1,600 1,609 1,703 1,031 1,839
Age, yr
< 10 67 (1.3) 2 (0.0) 1 (0.0) 2 (0.0) 170 (3.4) 27 (0.9) 86 (1.5)
10–19 184 (3.6) 26 (0.5) 0 (0.0) 47 (0.9) 617 (12.3) 4 (0.1) 77 (1.4)
20–29 536 (10.6) 58 (1.1) 23 (0.5) 106 (2.0) 886 (17.6) 38 (1.3) 222 (3.9)
30–39 713 (14.2) 166 (10.2) 98 (2.0) 172 (3.2) 974 (19.3) 122 (4.1) 335 (5.9)
40–49 912 (18.1) 521 (10.2) 216 (4.3) 343 (6.3) 846 (16.8) 305 (10.2) 529 (9.3)
50–59 993 (19.7) 1,103 (21.5) 633 (12.6) 839 (15.5) 790 (15.7) 533 (17.8) 979 (17.1)
60–69 964 (19.1) 1,549 (30.2) 1,022 (20.3) 1,259 (23.2) 490 (9.7) 962 (32.1) 1,102 (19.3)
70–79 513 (10.2) 1,174 (22.9) 1,443 (28.7) 1,532 (28.3) 206 (4.1) 686 (22.9) 1,181 (20.6)
80–89 147 (2.9) 463 (9.0) 1,281 (25.5) 997 (18.4) 55 (1.1) 297 (9.9) 1,161 (20.3)
≥ 90 7 (0.1) 71 (1.4) 314 (6.2) 123 (2.3) 2 (0.0) 26 (0.9) 49 (0.9)
Gender
Male 2,891 (57.4) 2,751 (53.6) 2,624 (52.2) 2,491 (46.0) 1,893 (37.6) 1,501 (50.0) 2,148 (37.6)
Female 2,145 (42.6) 2,382 (46.4) 2,407 (47.8) 2,928 (54.0) 3,143 (62.4) 1,499 (50.0) 3,573 (62.5)
Lesion location
Head & Neck 2,212 (43.9) 2,987 (56.4) 2,442 (48.5) 4,767 (88.0) 2,506 (49.8) 447 (14.9) 1,902 (33.3)
Trunk 2,323 (46.1) 1,681 (32.8) 1,095 (21.8) 445 (8.2) 1,529 (30.4) 565 (18.8) 2,511 (43.9)
Extremity 467 (9.3) 531 (10.3) 1,418 (28.2) 150 (2.8) 945 (18.8) 1,967 (65.6) 1,258 (22.0)
No specific 34 (0.7) 24 (0.5) 76 (1.5) 57 (1.1) 56 (1.1) 21 (0.7) 50 (0.9)
Procedure
Biopsy 2,423 (48.1) 4,726 (92.1) 2,073 (41.2) 1,973 (36.4) 669 (22.3) 699 (22.3) 1,772 (31.0)
Resection 2,613 (51.9) 407 (7.9) 2,958 (58.8) 3,446 (63.6) 2,331 (77.7) 2,331 (77.7) 3,949 (69.0)

Values are presented as number (%).

EC = epidermal cyst, SK = seborrheic keratosis, BD = Bowen’s disease, SQCC = squamous cell carcinoma, BCC = basal cell carcinoma, MN = melanocytic nevus, MM = malignant melanoma, SNUH = Seoul National University Hospital, CMC = Catholic Medical Center, NCC = National Cancer Center, SMC = Samsung Medical Center.

Overall dataset establishment and assessment flow

Here, we describe the data-collection process for our pathology image dataset. First, we scanned a large number of WSIs at ×20 or ×40 magnification depending on the vendor, with more than 90% of the images being at ×40 magnification, which was then refined by selecting only high-quality data. In addition, an anonymized process for identifying information such as patient identification numbers or barcodes was conducted. The flow of labeling tasks is as follows. The labeling process was carried out by a group of trained annotators who received specialized education from pathologists prior to the task. The labeling team consisted of 21 members, with an average annotation experience of about 1 year. These annotators were primarily responsible for the initial region-of-interest labeling. The verification processes were supervised by pathologists from three institutions, with an overall average clinical experience of about 13 years. At The Catholic University Hospital, five pathologists participated in the review process, with an average clinical experience of about 14 years. At Seoul National University Hospital, two pathologists were involved in the verification, with an average experience of about 7 years. At the National Cancer Center, two pathologists conducted the review, with an average experience of about 15 years. Next, the annotation process was conducted by experts who outlined and validated the regions of interest, generating XML annotation files. After completing the verification, these XML files were integrated with the corresponding WSI files and clinical information to create the final labeled dataset in JSON format. This JSON file served as the primary labeling file and was verified using an in-house tool. By rigorously selecting only high-quality data—excluding significantly poor images that a typical pathologist cannot interpret due to issues such as poor fixation, dehydration, staining, sectioning, or scanning, as well as those with lesions too small to provide meaningful information—we ensured that the WSI data was of high quality and accuracy, making it suitable for use in AI and related applications. Four items were considered to verify data quality: 1) syntactic accuracy, 2) semantic accuracy, 3) statistical diversity, and 4) validity evaluation. Syntactic accuracy verifies the accuracy of the structure and format of the constructed labeled data file. The labeled data were scrutinized to confirm the absence of grammar issues, data structure problems, typos, or missing information. Semantic accuracy evaluates the correctness of the labeling region. This verified the outlines by pathologists who did not participate in this data establishment. Statistical diversity verifies the distribution of demographic data using statistical analysis. We checked the distribution of procedures, sex, age, and lesion location. Finally, for validity evaluation, the deep-learning-based models are checked to determine whether the dataset is suitable.

Labeling tool

Annotation tasks were performed using two open-platform tools: Aperio ImageScope (ver. 12.3.3; Leica Biosystems, Vista, CA, USA) and Automated Slide Analysis Platform (ASAP) (ver. 1.9.0; Computational Pathology Group, Nijmegen, The Netherlands) to accommodate the use of various digital scanners at different institutions, which resulted in differences in WSI formats. WSIs in “.svs” format obtained from Aperio Leica AT2 scanners used at Seoul National University Hospital and the National Cancer Center were annotated using Aperio ImageScope. WSIs in “.mrxs” format obtained from the Pannoramic Flash 250 III scanner (3DHistech, Budapest, Hungary) used at Catholic Medical Center, and “.tiff” format obtained from the Pannoramic 1000 scanner (3DHistech), were annotated using ASAP.

Validity of dataset

The evaluation of validity in our dataset followed a consistent approach of training the deep-learning models for all groups. The data from each group were randomly divided into 8:1:1 at patient level for the training, validation, and test sets. The ratio of data from each institution was also considered, and the data were randomly selected. To train the models, patches were extracted from 512 × 512 pixels from the WSI at a magnification of ×5. The ResNet50 (Microsoft Research, Redmond, WA, USA)6 encoder U-Net,7 which was pretrained on ImageNet, was used as the training model. Model training was conducted using the Dice loss function with the Adam optimizer, and input image data was normalized beforehand. Early stopping was applied to prevent overfitting. Model performance was evaluated using the Dice coefficient score.8

AI-Hub

To archive and share this database, we transferred all the National Information Society Agency (NIA) AI-Hub (https://www.aihub.or.kr), which is a sharing platform for constructing and sharing AI data operated by the NIA in Korea. Its objective is to facilitate the construction and sharing of data for AI research and industrial ecosystems both domestically and internationally, with the goal of advancing and innovating AI technologies across various industrial sectors. To this end, the NIA AI-Hub collects, refines, and processes public and private data from diverse fields that are made available to users for development and research. Moreover, users can use the NIA AI-Hub’s data-sharing platform to access shared data and conduct research activities, facilitating collaboration in AI development.

Ethics statement

This study was approved by the Institutional Review Boards (IRBs) of Seoul National University Hospital (H-2202-060-1299), Catholic University of Korea College of Medicine (KC22SNSI0413 and UC22SNSI0028), National Cancer Center (NCC2022-0089), and Samsung Medical Center (2022-03-111-001). The requirement for informed consent was waived by each IRB.

RESULTS

Dataset quality assessment

All the datasets were evaluated for syntactic accuracy, semantic accuracy, statistical diversity, and validity. The error rate for syntactic accuracy was < 1% for all the data, and no errors were observed in the annotation coordinates. The main cause of errors was capitalization typos in the clinical information, which had minimal impact on model training. Statistical diversity was used solely for the purpose of verifying the natural distribution, considering the medical nature of the data. Regarding semantic accuracy, each disease-specific dataset demonstrated an accuracy of > 95%. Based on the validity evaluation, our model achieved good performance at the patch and WSI levels (87.1% and 85.2%, respectively).

Group evaluation

Table 2 shows the model performance at the patch and slide level for each disease group. In the patch-level Dice score analysis, epidermal cysts showed the highest performance (90.1%), whereas Bowen’s disease/squamous cell carcinoma showed a relatively low score (82%). Other lesions recorded Dice scores close to 90%: seborrheic keratosis, 89.9%; basal cell carcinoma, 88.9%; melanocytic nevus, 84.3%; and malignant melanoma, 87.4%.

Table 2. Patch-level and slide-level model performances.

Groups Dice scores
Patch-level Slide-level
Mean ± SD, % 95% CI Mean ± SD, % 95% CI
Group 1: EC 90.1 ± 8.9 89.1–90.8 81.9 ± 15.4 80.5–83.2
Group 2: SK 89.9 ± 8.4 89.0–90.6 90.2 ± 11.8 89.7–91.7
Group 3: BD/SQCC 82.0 ± 13.1 80.7–83.0 81.3 ± 13.0 80.2–82.5
Group 4: BCC 88.9 ± 9.6 87.9–89.6 85.3 ± 12.2 84.2–86.3
Group 5: MN 84.3 ± 11.8 83.1–85.2 90.8 ± 9.4 90.0–91.6
Group 6: MM 87.4 ± 12.1 85.7–88.6 81.7 ± 19.5 79.5–84.0

SD = standard deviation, CI = confidence interval, EC = epidermal cyst, SK = seborrheic keratosis, BD = Bowen’s disease, SQCC = squamous cell carcinoma, BCC = basal cell carcinoma, MN = melanocytic nevus, MM = malignant melanoma.

In the performance evaluation at the slide level, melanocytic nevus and seborrheic keratosis recorded high average Dice scores (90.8% and 90.2%, respectively), indicating excellent performance. In contrast, Bowen disease/squamous cell carcinoma showed relatively low performance (81.3%). The classification and segmentation of skin disease sites using a deep learning model demonstrated sufficient validity of the dataset.

Fig. 1 shows microscopic images demonstrating the best and worst model performances for each group. For the epidermal cyst group, the best performance was 97.7%, whereas the worst performance dropped to 10.56% (Fig. 1A and B). In the seborrheic keratosis group, the highest and lowest performances were 93.92% and 23.35% (Fig. 1C and D). The Bowen’s disease and squamous cell carcinoma groups showed the best performance at 97.7% and the worst at 27.62% (Fig. 1E and F). In the basal cell carcinoma group, the model achieved a maximum accuracy of 98.33% and a minimum accuracy of 28.41% (Fig. 1G and H). The melanocytic nevus group showed the best performance at 99.09% and the worst at 69.23% (Fig. 1I and J). Finally, the malignant melanoma group yielded the best performance (97.79%) and the worst performance (10.00%) (Fig. 1K and L).

Fig. 1. Comparison of artificial intelligence model performance on various skin lesions. (A) Best performance for epidermal cyst. (B) Worst performance for epidermal cyst. (C) Best performance for seborrheic keratosis. (D) Worst performance for seborrheic keratosis. (E) Best performance for Bowen disease/squamous cell carcinoma. (F) Worst performance for Bowen disease/squamous cell carcinoma. (G) Best performance for basal cell carcinoma. (H) Worst performance for basal cell carcinoma. (I) Best performance for melanocytic nevus. (J) Worst performance for melanocytic nevus. (K) Best performance for malignant melanoma. (L) Worst performance for malignant melanoma.

Fig. 1

DISCUSSION

This study established a large-scale dataset of diverse skin tumors for AI training through dataset quality assessments of syntactic accuracy, semantic accuracy, statistical diversity, and validity evaluation. We have successfully compiled the most comprehensive dermatopathology dataset to date, sourced from four leading Korean medical institutions. We assembled 34,376 WSIs and classified them into seven categories based on the presence of common cutaneous neoplasms. The dataset covers a range of skin conditions, including epidermal cysts, seborrheic keratosis, Bowen disease/squamous cell carcinoma, basal cell carcinoma, melanocytic nevus, malignant melanoma, and normal skin.

The syntactic accuracy was over 99% across our entire dataset, with errors such as capitalization typos having a minimal impact on AI model training. Statistical diversity was appropriately distributed according to the natural distribution. Semantic accuracy was also verified through multiple reviews by experts, ensuring that only lesion areas were labeled. Finally, the validity evaluation achieved a high score through the training of the lesion segmentation AI model, with Dice scores ranging from 82% to 90%. Moreover, users can use the NIA AI-Hub’s data-sharing platform to access shared data and conduct research activities, facilitating collaboration in AI development. The AI algorithms applied in this study showed promising performance in diagnosing common cutaneous neoplasms. The U-Net architecture based on the ResNet50 encoder pretrained in ImageNet achieved impressive average accuracies of 87.1% and 85.2% at the patch and slide levels, respectively. Especially Bowen’s disease/squamous cell carcinoma showed the lowest performance in our model, both at the patch and slide levels (82% and 81.3%, respectively). This outcome aligns with the clinical challenge of visually distinguishing Bowen’s disease and well/very well-differentiated squamous cell carcinoma from normal or reactive squamous epithelium. Compared to other organs of the body, the morphologic atypia of the squamous cell carcinoma in the skin is often limited within the epithelium as very minimal. This inherent complexity may have contributed to the difficulty of the model in distinguishing the two skin lesions from adjacent normal or reactive squamous epithelium. This observation demonstrates a compelling intersection between real-world medical challenges and the limitations of deep learning models.

To the best of our knowledge, this is the most comprehensive dermatopathology image dataset available to date. The initial application of AI technology, specifically machine learning, to dermatopathology diagnosis commenced with the 1987 TEGUMENT system.9 Since then, most deep learning diagnostic applications for early dermatopathology images have been designed for binary classification. These applications leverage machine learning and deep learning algorithms on histopathology images, as shown in Table 3 with a particular focus on groups such as the binary classification of malignant melanoma and melanocytic nevus.10,11

Table 3. Previous studies applying AI models to dermatopathology images.

Target task Year Author Dataset size (train: validation: test) Dataset type Model type Model application Reported model performance
Semantic tree of diagnoses 1987 Potter and Ronan9 147 cases Features (from a series of text-based screens) LM - Acc: 91.8% (same/similar diagnosis as DP)
MM, non-MM classification 2019 Wang et al.12 38: 6: 35 (external validation 76) H&E VGG16, Inception V1, Inception V3 MM, non-MM (62, 93) Acc: 98.2%, Sens: 100.0%, Spec: 96.5%, AUC: 99.8%
MM vs. MN classification 2019 Hekler et al.13 595: - : 100 H&E ResNet50 MM, MN (345, 350) Acc: 68.0%, Sens: 76.0%, Spec: 60.0%
2021 Höhn et al.14 232: 67: 132 H&E ResNeXt50 MM, MN (209, 222) Acc: 83.2%, AUC: 92.3%
2021 Xie et al.4 841 WSIs H&E ResNet50 MM, MN (392, 449) Acc: 93.3%, Sens: 88.7%, Spec: 92.5%, AUC: 96.2%
2021 Ba et al.15 Train/test 781: validation 104 H&E CNN-based detection model, RF MM, MN (86, 695) AUC: 99.0%, Sens: 100.0%, Spec: 94.7%
2021 Li et al.16 701 WSIs H&E ResNet50 MM, MN (IN, CN, JN) AUC: 97.1%
2011 Osborne et al.17 126 WSIs H&E SVM MM, MN Acc: 90.0%, Sens: 100.0%, Spec: 75.0%
2022 Brinker et al.18 100 WSIs H&E CNN MM, MN (50, 50) Acc: 94.0%, Sens: 90.0%,Spec: 92.0%, AUC: 97.0% (annotated)
MM vs. NL classification 2020 De Logu et al.11 Train/test 60: validation 40 H&E Inception-ResNet-v2 MM, NL (50, 50) Acc: 96.5%, Sens: 95.7%, Spec: 97.7%
Spizoid lesion classification identification 2021 Del Amor et al.19 54 WSIs H&E SeaNet SMT Acc: 92.3%, 80.0%
Benign, malignant (30, 21)
Multie classification using machine learning 2018 Xu et al.20 66 WSIs H&E mSVM MM, MN, NL (32, 17, 17) Acc: 95.0%
2012 Lu and Mandal21 66 WSIs H&E mSVM SSM, MN, NL (32, 17, 17) Acc: 90.0%, Sens: 92.0%, Spec: 97.0%
Mitosis detection in melanoma 2022 Sturm et al.22 99 WSIs H&E CNN Mitosis detection (MM 60, MN 35, IML 4) Acc: 75.0%
2017 Andres et al.23 59 WSIs H&E; Ki-67-stained slides RF Mitosis detection (MM 59) Acc: 83.2%
MM segmentation & recognition 2021 Liu et al.24 227 WSIs H&E Mask-R-CNN MM segmentation Dice score: 92.7%
2021 Zhang et al.25 30 WSIs H&E MPMR (Multi-Scale Feature and Probability Map for Melanoma) MM recognition Acc: 95.5%, Precision: 97.4%, Recall: 98.6%
Melanocyte detection 2021 Alheejawi et al.26 4 (train 75: test 25) H&E INS-Net CNN Melanocyte detection Acc: 94.1%, Dice coefficient: 89.2%
2020 Kucharski et al.27 70 (49: 10: 11) H&E Convolutional autoencoder Melanocyte nest detection Sens: 76.0%, Spec: 94.0%, Dice similarity coefficient: 81%
MM, BCC, SQCC, NL classification 2020 Kuiava et al.28 Train: 2,732, test: 284 H&E MobileNet, Inception V3, MM, BCC, SQCC, NL Sens: 98.3%, Spec: 98.8%
Convolutional network models
Ianni et al.29 Train: 5,070, test: 13,537 H&E Tri-CNN Cascade Acc: 98.0%
nBCC, DN, and SK diagnosis 2018 Olsen et al.30 WSIs 1,111 (train: 750: test: 361) WSIs CNN nBCC, DN, SK (424: 339: 348) Acc: 99.5% (nBCC), Acc: 99.4% (DN), Acc: 100% (SK)
SN vs ConvN classification 2019 Hart et al.31 WSIs 200 (train: 140, validation: 60) WSIs Inception V3 SN, ConN (50: 50) Acc: 99.0%, Sens: 85.0%, Spec: 99.0%
BCC classification 2019 Jiang et al.32 MOIC 6610/MOIS 1436, WSIs 128 (train: MOIC & MOIS, test: WSIs) Smartphone photomicrographs, WSI Classification: Inception V3 BCC Sens: 97.0%, Spec: 94.0%, AUC: 98.7%
Segmentation: Deeplab V3
Distant metastatic recurrence rate prediction 2020 Kulkarni et al.33 Train: 108: test: 155 H&E CNN, RNN Melanoma AUC: 90.5% (classifier), AUC: 88.0% (recurrence)
Prediction of sentinel lymph node status 2021 Brinker et al.34 415 H&E ANN Lymph node status AUC: 61.8%
TIL score and disease-specific survival 2021 Moore et al.18 Train: 80, validation: 145 H&E NN TIL scoring -
Predict the mutation status in WSIs of melanoma 2022 Kim et al.19 Train: 256, validation: 21, validation 28 H&E CNN Prediction BRAF genotype AUC: 89.0%
Assessment of melanocytic lesions 2022 Sturm et al.20 102 (not specified for train, validation, and test) H&E CNN-based mitosis algorithm Diagnosing melanocytic skin lesions The concordance rate between pathologists and AI is 90%

MM = malignant melanoma, H&E = hematoxylin and eosin, Sens = sensitivity, Spec = specificity, AUC = area under the curve, ResNet50 = convolutional neural network, MN = melanocytic nevus, WSI = whole-slide image, CNN = convolutional neural network, IN = intradermal nevus, CN = compound nevus, JN = junctional nevus, SVM = support vector machine, NL = no lesion, SSM = superficial spreading melanoma, SMT = Spitzoid melanocytic tumors, IML = intermediate melanocytic lesion, RF = random forest, BCC = basal cell carcinoma, SQCC = squamous cell carcinoma, DN = dermal nevus, SK = seborrheic keratosis, SN = Spitz nevus, nBCC = nodular basal cell carcinoma, ConvN = conventional nevus, MOIC = microscopic ocular image classification, MOIS = microscopic ocular image segmentation, RNN = recurrent neural network, ANN = artificial neural network, NN = neural netw, TIL = tumor-infiltrating lymphocyte.

In several studies, deep learning models have been found to be equivalent or superior to human diagnoses. A remarkable study by Hekler et al.13 in 2019 demonstrated that a convolutional neural network outperformed 11 pathologists in distinguishing between malignant melanoma and melanocytic nevus, thus showing the potential for high accuracy and consistency in deep learning models. This convolutional neural network achieved a specificity of 76%, sensitivity of 60%, and overall accuracy of 68%. In contrast, the pathologists achieved a specificity of 51.8%, sensitivity of 66.5%, and accuracy of 59.2%. However, studies conducted by Wang et al.12 in 2019 and Xie et al.4 in 2021 made significant strides in advancing diagnostic performance beyond Hekler’s comparison with human pathologists, thereby reaffirming the potential of AI in diagnostics.13 Wang et al.’s convolutional neural network model12 outperformed human pathologists in diagnosing malignant melanoma at the WSI level with an impressive accuracy of 98.2%. Furthermore, the model highlighted lesion areas on the WSI using stochastic heat maps, providing useful visual insights. In 2021, Xie et al.4 employed the gradient-weighted class activation mapping (Grad-CAM) method to validate the diagnostic evidence of their proposed model, achieving an accuracy of 93.3% in classifying malignant melanoma and melanocytic nevus. However, it’s crucial to note that these studies were conducted within a limited scope, restricted to one or two hospitals in China.

Our study addresses these limitations and proposes a future research field. This expansive dataset not only enhances the robustness of our model but also improves its generalizability. Moreover, instead of focusing on binary or ternary classification groups, we challenged our model to differentiate between seven skin neoplasms. Despite this increase in complexity, our model achieved commendable Dice scores, with an average of 91.49% at the patch level and 85.23% at the slide level. This indicates the capability of our model to provide accurate and reliable diagnoses across a broader range of skin conditions, surpassing the scope of previous studies. Well-structured dermatopathology datasets are not only essential resources for AI training but also education and further research application. Diagnostic images inferred by an AI model trained on our comprehensive and well-structured dataset can serve as a valuable tool for clinical dermatopathology training. Providing immediate feedback on visual diagnoses and reporting-writing ability significantly enhanced the dermatopathology training process.

Recent dermatopathology studies have demonstrated the potential of AI in predicting disease prognosis and genetic mutations.35 Kulkarni et al.33 utilized deep learning methods to predict disease-specific survival in patients with primary melanoma by extracting high-dimensional features from tissue slides for accurate predictions. Brinker et al.18 developed digital biomarkers for predicting lymph node metastasis. However, Moore et al.36 emphasized the importance of tumor-infiltrating lymphocytes within the tumor, whereas Kim et al.37 proposed a novel approach to predicting BRAF genetic mutations, demonstrating the potential of AI for identifying microscopic genetic alterations. In addition, the ability of AI to detect mitotic figures in skin lesions, as demonstrated by Sturm et al.,22 also shows promise. Our dataset, which encompasses a variety of dermatopathological conditions, could aid in the development of digital biomarkers that allow for more precise prediction of lymph node metastasis or visceral recurrence in the cases like malignant melanoma by additionally incorporating.

In summary, this study overcomes previous limitations and improves performance by utilizing a well-curated, large, multi-institutional dataset. This has significant potential in areas such as training, digital biomarker development, and prognostic prediction. The patch-based AI classification models using this dataset showed overall fair performances. The limitation of our dataset and AI models is that it focused on Asians and lacked racial diversity. To guarantee a broader application range and generalizability of our models, further validation using external public resources, such as The Cancer Genome Atlas data, is necessary. Additional research is required to overcome these limitations, expand racial diversity, and strengthen external validation.

ACKNOWLEDGMENTS

The authors thank Jamshid Abdul-Ghafar for advice on the experimental design and assistance with data collection.

Footnotes

Funding: This study was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT) (No. 2021R1F1A1063684).

Disclosure: Young-Gon Kim is a corporate officer (CAIO) of L’imagin Inc., but he made no influence on this work in relation with the company or its products. Other authors have no potential conflicts of interest to disclose.

Author Contributions:
  • Conceptualization: Jung Y, Lee C, Kim YG.
  • Data curation: Jung Y, Kwak Y, Back SW, Park G, Park G, Kim B, Jang KT, Han N, Yoo CW, Lee C, Lee J.
  • Formal analysis: Jung Y, Park D, Kwak Y, Ahn Y, Back SW, Park G, Park G, Kim B, Jang KT, Han N, Yoo CW, Lee C, Kim YG.
  • Investigation: Jung Y, Kwak Y, Jang KT, Han N, Yoo CW, Lee C.
  • Methodology: Park D, Ahn Y, Park S, Lee C, Park G, Kim B, Lee C, Kim YG.
  • Project administration: Jung Y, Lee C, Kim YG.
  • Resources: Jung Y, Kwak Y, Park G, Jang KT, Han N, Yoo CW, Lee C.
  • Software: Park D, Ahn Y, Kim YG.
  • Supervision: Jung Y, Lee C, Kim YG.
  • Validation: Jung Y, Park D, Kwak Y, Ahn Y, Park G, Han N, Yoo CW, Lee C, Kim YG.
  • Visualization: Park D, Ahn Y.
  • Writing - original draft: Jung Y, Park D, Kim YG.
  • Writing - review & editing: Jung Y, Park D, Kwak Y, Ahn Y, Park G, Park G, Kim B, Jang KT, Han N, Yoo CW, Lee C, Kim YG.

References

  • 1.Urban K, Mehrmal S, Uppal P, Giesey RL, Delost GR. The global burden of skin cancer: a longitudinal analysis from the Global Burden of Disease study, 1990–2017. JAAD Int. 2021;2:98–108. doi: 10.1016/j.jdin.2020.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Perez M, Abisaad JA, Rojas KD, Marchetti MA, Jaimes N. Skin cancer: primary, secondary, and tertiary prevention. Part I. J Am Acad Dermatol. 2022;87(2):255–268. doi: 10.1016/j.jaad.2021.12.066. [DOI] [PubMed] [Google Scholar]
  • 3.Corvò A, Westenberg MA, Wimberger-Friedl R, Fromme S, Peeters MM, van Driel MA, et al. Visual analytics in digital pathology: challenges and opportunities; 9th EG Workshop on Visual Computing for Biology and Medicine; 2019 September 4–6; Brno, Czech Republic. Genova, Italy: The Eurographics Association; 2019. pp. 129–143. [Google Scholar]
  • 4.Xie P, Zuo K, Liu J, Chen M, Zhao S, Kang W, et al. Interpretable diagnosis for whole-slide melanoma histology images using convolutional neural network. J Healthc Eng. 2021;2021:8396438. doi: 10.1155/2021/8396438. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 5.Cruz-Roa AA, Arevalo Ovalle JE, Madabhushi A, González Osorio FA. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection. Med Image Comput Comput Assist Interv. 2013;16(Pt 2):403–410. doi: 10.1007/978-3-642-40763-5_50. [DOI] [PubMed] [Google Scholar]
  • 6.HE K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition; 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016 June 27–30; Las Vegas, NV, USA. Los Alamitos, CA, USA: IEEE Computer Society; 2016. pp. 770–778. [Google Scholar]
  • 7.Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation; 18th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015); 2015 October 5–9; Munich, Germany. Cham, Switzerland: Springer; 2015. pp. 234–241. [Google Scholar]
  • 8.Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302. [Google Scholar]
  • 9.Potter B, Ronan SG. Computerized dermatopathologic diagnosis. J Am Acad Dermatol. 1987;17(1):119–131. doi: 10.1016/s0190-9622(87)70183-2. [DOI] [PubMed] [Google Scholar]
  • 10.Grant SR, Andrew TW, Alvarez EV, Huss WJ, Paragh G. Diagnostic and prognostic deep learning applications for histological assessment of cutaneous melanoma. Cancers (Basel) 2022;14(24):6231. doi: 10.3390/cancers14246231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.De Logu F, Ugolini F, Maio V, Simi S, Cossu A, Massi D, et al. Recognition of cutaneous melanoma on digitized histopathological slides via artificial intelligence algorithm. Front Oncol. 2020;10:1559. doi: 10.3389/fonc.2020.01559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hekler A, Utikal JS, Enk AH, Solass W, Schmitt M, Klode J, et al. Deep learning outperformed 11 pathologists in the classification of histopathological melanoma images. Eur J Cancer. 2019;118:91–96. doi: 10.1016/j.ejca.2019.06.012. [DOI] [PubMed] [Google Scholar]
  • 13.Wang L, Ding L, Liu Z, Sun L, Chen L, Jia R, et al. Automated identification of malignancy in whole-slide pathological images: identification of eyelid malignant melanoma in gigapixel pathological slides using deep learning. Br J Ophthalmol. 2020;104(3):318–323. doi: 10.1136/bjophthalmol-2018-313706. [DOI] [PubMed] [Google Scholar]
  • 14.Höhn J, Krieghoff-Henning E, Jutzi TB, von Kalle C, Utikal JS, Meier F, et al. Combining CNN-based histologic whole slide image analysis and patient data to improve skin cancer classification. Eur J Cancer. 2021;149:94–101. doi: 10.1016/j.ejca.2021.02.032. [DOI] [PubMed] [Google Scholar]
  • 15.Ba W, Wang R, Yin G, Song Z, Zou J, Zhong C, et al. Diagnostic assessment of deep learning for melanocytic lesions using whole-slide pathological images. Transl Oncol. 2021;14(9):101161. doi: 10.1016/j.tranon.2021.101161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li T, Xie P, Liu J, Chen M, Zhao S, Kang W, et al. Automated diagnosis and localization of melanoma from skin histopathology slides using deep learning: a multicenter study. J Healthc Eng. 2021;2021:5972962. doi: 10.1155/2021/5972962. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
  • 17.Osborne JD, Gao S, Chen WB, Andea A, Zhang C. Machine classification of melanoma and nevi from skin lesions; ACM Symposium on Applied Computing (SAC 2011); 2011 March 21–24; TaiChung, Taiwan. New York, NY, USA: ACM; 2011. pp. 100–105. [Google Scholar]
  • 18.Brinker TJ, Schmitt M, Krieghoff-Henning EI, Barnhill R, Beltraminelli H, Braun SA, et al. Diagnostic performance of artificial intelligence for histologic melanoma recognition compared to 18 international expert pathologists. J Am Acad Dermatol. 2022;86(3):640–642. doi: 10.1016/j.jaad.2021.02.009. [DOI] [PubMed] [Google Scholar]
  • 19.Del Amor R, Launet L, Colomer A, Moscardó A, Mosquera-Zamudio A, Monteagudo C, et al. An attention-based weakly supervised framework for spitzoid melanocytic lesion diagnosis in whole slide images. Artif Intell Med. 2021;121:102197. doi: 10.1016/j.artmed.2021.102197. [DOI] [PubMed] [Google Scholar]
  • 20.Xu H, Lu C, Berendt R, Jha N, Mandal M. Automated analysis and classification of melanocytic tumor on skin whole slide images. Comput Med Imaging Graph. 2018;66:124–134. doi: 10.1016/j.compmedimag.2018.01.008. [DOI] [PubMed] [Google Scholar]
  • 21.Lu C, Mandal M. Automated segmentation and analysis of the epidermis area in skin histopathological images; 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2012); 2012 August 28–September 1; San Diego, CA, USA. Piscataway, NJ, USA: IEEE; 2012. pp. 5355–5359. [DOI] [PubMed] [Google Scholar]
  • 22.Sturm B, Creytens D, Smits J, Ooms AHAG, Eijken E, Kurpershoek E, et al. Computer-aided assessment of melanocytic lesions by means of a mitosis algorithm. Diagnostics (Basel) 2022;12(2):436. doi: 10.3390/diagnostics12020436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Andres C, Andres-Belloni B, Hein R, Biedermann T, Schäpe A, Brieu N, et al. iDermatoPath - a novel software tool for mitosis detection in H&E-stained tissue sections of malignant melanoma. J Eur Acad Dermatol Venereol. 2017;31(7):1137–1147. doi: 10.1111/jdv.14126. [DOI] [PubMed] [Google Scholar]
  • 24.Liu K, Mokhtari M, Li B, Nofallah S, May C, Chang O, et al. Learning melanocytic proliferation segmentation in histopathology images from imperfect annotations; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021); 2021 June 19–25; Nashville, TN, USA. Los Alamitos, CA, USA: IEEE Computer Society; 2021. pp. 3761–3770. [Google Scholar]
  • 25.Zhang D, Han H, Du S, Zhu L, Yang J, Wang X, et al. MPMR: multi-scale feature and probability map for melanoma recognition. Front Med (Lausanne) 2022;8:775587. doi: 10.3389/fmed.2021.775587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Alheejawi S, Berendt R, Jha N, Maity SP, Mandal M. An efficient CNN based algorithm for detecting melanoma cancer regions in H&E-stained images; 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); 2021 November 1–5; Mexico. Piscataway, NJ, USA: IEEE; 2021. pp. 3982–3985. [DOI] [PubMed] [Google Scholar]
  • 27.Kucharski D, Kleczek P, Jaworek-Korjakowska J, Dyduch G, Gorgon M. Semi-supervised nests of melanocytes segmentation method using convolutional autoencoders. Sensors (Basel) 2020;20(6):1546. doi: 10.3390/s20061546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Kuiava VA, Kuiava EL, Chielle EO, De Bittencourt FM. Artificial intelligence algorithm for the histopathological diagnosis of skin cancer. Clin Biomed Res. 2020;40(4):218–222. [Google Scholar]
  • 29.Ianni JD, Soans RE, Sankarapandian S, Chamarthi RV, Ayyagari D, Olsen TG, et al. Tailored for real-world: a whole slide image classification system validated on uncurated multi-site data emulating the prospective pathology workload. Sci Rep. 2020;10(1):3217. doi: 10.1038/s41598-020-59985-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Olsen TG, Jackson BH, Feeser TA, Kent MN, Moad JC, Krishnamurthy S, et al. Diagnostic performance of deep learning algorithms applied to three common diagnoses in dermatopathology. J Pathol Inform. 2018;9(1):32. doi: 10.4103/jpi.jpi_31_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Hart SN, Flotte W, Norgan AP, Shah KK, Buchan ZR, Mounajjed T, et al. Classification of melanocytic lesions in selected and whole-slide images via convolutional neural networks. J Pathol Inform. 2019;10(1):5. doi: 10.4103/jpi.jpi_32_18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Jiang YQ, Xiong JH, Li HY, Yang XH, Yu WT, Gao M, et al. Recognizing basal cell carcinoma on smartphone-captured digital histopathology images with a deep neural network. Br J Dermatol. 2020;182(3):754–762. doi: 10.1111/bjd.18026. [DOI] [PubMed] [Google Scholar]
  • 33.Kulkarni PM, Robinson EJ, Sarin Pradhan J, Gartrell-Corrado RD, Rohr BR, Trager MH, et al. Deep learning based on standard H&E images of primary melanoma tumors identifies patients at risk for visceral recurrence and death. Clin Cancer Res. 2020;26(5):1126–1134. doi: 10.1158/1078-0432.CCR-19-1495. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Brinker TJ, Kiehl L, Schmitt M, Jutzi TB, Krieghoff-Henning EI, Krahl D, et al. Deep learning approach to predict sentinel lymph node status directly from routine histology of primary melanoma tumours. Eur J Cancer. 2021;154:227–234. doi: 10.1016/j.ejca.2021.05.026. [DOI] [PubMed] [Google Scholar]
  • 35.Mosquera-Zamudio A, Launet L, Tabatabaei Z, Parra-Medina R, Colomer A, Oliver Moll Deep learning for skin melanocytic tumors in whole-slide images: a systematic review. Cancers (Basel) 2022;15(1):42. doi: 10.3390/cancers15010042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Moore MR, Friesner ID, Rizk EM, Fullerton BT, Mondal M, Trager MH, et al. Automated digital TIL analysis (ADTA) adds prognostic value to standard assessment of depth and ulceration in primary melanoma. Sci Rep. 2021;11(1):2809. doi: 10.1038/s41598-021-82305-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kim RH, Nomikou S, Coudray N, Jour G, Dawood Z, Hong R, et al. Deep learning and pathomics analyses reveal cell nuclei as important features for mutation prediction of BRAF-mutated melanomas. J Invest Dermatol. 2022;142(6):1650–1658.e6. doi: 10.1016/j.jid.2021.09.034. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Korean Medical Science are provided here courtesy of Korean Academy of Medical Sciences

RESOURCES