Table 2.
Author | Degree of Supervision |
Task | Cancer Type | Type of WSI | Dataset | Algorithm/ Model |
Performance | Clinical Application |
---|---|---|---|---|---|---|---|---|
Shen et al. [112] | Fully supervised |
Classification | Gastric cancer | H&E | Training, validation and testing: 432 WSIs (TCGA-STAD cohort) + 460 WSIs (TCGA-COAD) + 171 WSIs (TCGA-READ) + 400 WSIs (Camelyon16) |
DenseNet + Deformable Conditional Random Field model |
Accuracy: 0.9398 (TCGA-STAD), 0.9337 (TCGA-COAD), 0.9294 (TCGA-READ), 0.9468 (Camelyon16) | Identification of suspected cancer area from histological imaging |
Song et al. [30] | Fully supervised |
Classification | Gastric cancer | H&E | Training: 2123 WSIs Validation: 300 WSIs Internal testing: 100 WSIs External validation: 3212 WSIs (daily gastric dataset) + 595 WSIs (PUMCH) + 987 WSIs (CHCAMS and Peking Union Medical College) |
DeepLab v3 | Malignant vs. benign training AUC: 0.923 Internal testing AUC: 0.931 AUC: 0.995 (daily gastric dataset) AUC: 0.990 (PUMCH) AUC: 0.996 (CHCAMS and Peking Union Medical College) |
Diagnosis of gastric cancer |
Su et al. [28] | Fully supervised |
Classification and detection | Gastric cancer | H&E | Training: 348 WSIs Testing: 88 WSIs External Validation: 31 WSIs |
ResNet-18 | Poorly differentiated adenocarcinoma vs. well-differentiated adenocarcinoma and other normal tissue F1 score: 0.8615 Well-differentiated adenocarcinoma vs. poorly differentiated adenocarcinoma and other normal tissue F1 score: 0.8977 Patients with MSI vs. without MSI Accuracy: 0.7727 (95% CI 0.6857–0.8636) |
Differentiation of cancer grade and diagnosis of MSI |
Song et al. [29] | Fully supervised |
Classification | Colorectal cancer | H&E | Training: 177 WSIs Validation: 40 WSIs Internal test: 194 WSIs External validation: 168 WSIs |
Deep Lab v2 with ResNet34 | Adenomatous vs. normal AUC: 0.92 Accuracy: 0.904 |
Diagnosis of colorectal adenomas |
Sirinukunwattana et al. [114] | Fully supervised |
Classification | Colorectal cancer | H&E | Training: 510 WSIs External validation: 431 WSIs (TCGA cohort) + 265 WSIs (GRAMPIAN cohort) |
Inception V3 | Colorectal cancer consensus molecular subtypes 1 vs. 2 vs. 3 vs. 4 Training average accuracy: 70% Training AUC: 0.9 External validation accuracy: 0.64 (TCGA cohort) + 0.72 (GRAMPIAN cohort)External validation AUC:0.84 (TCGA cohort) + 0.85 (GRAMPIAN cohort) |
Prediction of colorectal cancer molecular subtype |
Popovici et al. [115] | Fully supervised |
Classification | Colorectal cancer | H&E | Training: 100 WSIs Test: 200 WSIs |
VGG-F | Molecular subtype A vs. B vs. C vs. D vs. EOverall accuracy: 0.84 (95% CI: 0.79−0.88)Overall recall: 0.85 (95% CI: 0.80−0.89)Overall precision: 0.84 (95% CI: 0.80−0.88) | Prediction of colorectal cancer molecular subtype |
Korbar et al. [116] | Fully supervised |
Classification | Colorectal cancer | H&E | Training: 458 WSIs Testing: 239 WSIs |
ResNet-152 | Hyperplastic polyp vs. sessile serrated polyp vs. traditional serrated adenoma vs. tubular adenoma vs. tubulovillous/villous adenoma vs. normal Accuracy: 0.930 (95% CI: 0.890−0.959) Precision: 0.897 (95% CI: 0.852−0.932) Recall: 0.883 (95% CI: 0.836−0.921) F1 score: 0.888 (95% CI: 0.841−0.925) |
Characterization of colorectal polyps |
Wei et al. [118] | Fully supervised |
Classification | Colorectal cancer | H&E | Training: 326 WSIs Validation: 25 WSIs Internet test: 157 WSIs External validation: 238 WSIs |
Ensemble ResNet×5 | Hyperplastic polyp vs. sessile serrated adenoma vs. tubular adenoma vs. tubulovillous or villous adenoma. Internal test mean accuracy: 0.935 (95% CI: 0.896–0.974) External validation mean accuracy: 0.870 (95% CI: 0.827–0.913) |
Colorectal polyp classification |
Gupta et al. [119] | Fully supervised |
Classification | Colorectal cancer | H&E | Training and testing: 303,012 normal WSI patches and approximately 1,000,000 abnormal WSI patches |
Customized Inception-ResNet-v2 Type 5 (IR-v2 Type 5) model. |
Abnormal region vs. normal region F-score: 0.99 AUC: 0.99 |
Identification of suspected cancer area from histological imaging |
Kather et al. [117] | Fully supervised |
Classification and prognosis | Colorectal cancer | H&E | Training: 86 WSIs Testing: 25 WSIs External validation: 862 WSIs (TCGA cohort) + 409 WSIs (DACHS cohort) |
VGG19 | Adipose tissue vs. background vs. lymphocytes vs. mucus vs. smooth muscle vs. normal colon mucosa vs. cancer-associated stroma vs. colorectal adenocarcinoma epithelium Internal testing Overall Accuracy: 0.99 External testing Overall accuracy: 0.943 High deep stroma score predicts shorter survival Hazard ratio: 1.99 (95% CI: 1.27–3.12) |
Colorectal cancer detection and prediction of patient survival outcome |
Zhu et al. [31] | Fully supervised |
Classification and segmentation | Gastric and colorectal cancer | H&E | Training: 750 WSIs Testing: 250 WSIs |
Adversarial CAC-UNet | Malignant region vs. benign region DSC: 0.8749 Recall: 0.9362 Precision: 0.9027 Accuracy: 0.8935 |
Identification of suspected cancer area from histological imaging |
Xu et al. [33] | Fully supervised |
Segmentation | Colorectal cancer | H&E | Training: 750 WSIs Testing: 250 WSIs |
CoUNet | Malignant region vs. benign region Dice: 0.746 AUC: 0.980 |
Identification of suspected cancer area from histological imaging |
Feng et al. [32] | Fully supervised |
Segmentation | Colorectal cancer | H&E | Training: 750 WSIs Testing: 250 WSIs |
U-Net-16 | Malignant region vs. benign region DSC: 0.7789 AUC:1 |
Identification of suspected cancer area from histological imaging |
Mahendra et al. [120] | Fully supervised |
Segmentation | Colorectal cancer | H&E | Training: 270 WSIs (CAMELYON16) + 500 WSIs (CAMELYON17) + 660 WSIs (DigestPath) + 50 WSIs (PAIP) Testing: 129 WSIs (CAMELYON16) + 500 WSIs (CAMELYON17) + 212 WSIs (DigestPath) + 40 WSIs (PAIP) |
DenseNet-121 + Inception-ResNet-V2 + DeeplabV3Plus |
Malignant region vs. benign region Cohen kappa score: 0.9090 (CAMELYON17) DSC: 0.782 (DigestPath) |
Identification of suspected cancer area from histological imaging |
Gehrung et al. [34] | Fully supervised |
Detection | Oesophageal cancer | H&E and TFF3 pathology slides | Training: 100 + 187 patients Validation: 187 patients External validation: 1519 patients |
VGG-16 | Patients with Barrett’s oesophagus vs. no Barrett’s oesophagus AUC: 0.88 (95% CI: 0.85–0.91) Sensitivity: 0.7262 (95% CI: 0.6742–0.7821) Specificity: 0.9313 (95% CI: 0.9004–0.9613) Simulated realistic cohort workload reduction: 57% External validation cohort reduction: 57.41% |
Detection of Barrett’s oesophagus |
Kather et al. [122] | Fully supervised |
Detection | Gastric and colorectal cancer | H&E | Training: 81 patients (UMM and NCT tissue bank) + 216 patients (TCGA-STAD) + 278 patients (TCGA-CRC-KR) + 260 patients (TCGA-CRC-DX) + 382 patients (UCEC) External validation: 99 patients (TCGA-STAD) + 109 patients (TCGA-CRC-KR) +100 patients (TCGA-CRC-DX) +110 patients (UCEC) + 185 patients (KCCH) |
Resnet18 | Patients with MSI vs. no MSI Training AUC: >0.99 (UMM and NCT tissue bank) AUC: 0.81 (CI: 0.69–0.90) (TCGA-STAD) AUC: 0.84 (CI: 0.73–0.91) (TCGA-CRC-KR) AUC: 0.77 (CI: 0.62–0.87) (TCGA-CRC-DX) AUC: 0.75 (CI: 0.63–0.83) (UCEC) AUC: 0.69 (CI: 0.52–0.82) (KCCH) |
Detection of MSI |
Echle et al. [36] | Fully supervised |
Detection | Colorectal cancer | H&E | Training: 6406 WSIs External validation: 771 WSIs |
Shufflenet | Colorectal tumour sample with dMMR or MSI vs. no dMMR or MSI Mean AUC: 0.92 AUPRC: 0.93 Specificity: 0.67 Sensitivity: 0.95 External validation AUC without colour normalisation: 0.95 External validation AUC with colour normalisation: 0.96 |
Detection of MSI |
Cao et al. [121] | Fully supervised |
Detection | Colorectal cancer | H&E | Training: 429 WSIs External validation: 785 WSIs |
ResNet-18 | Colorectal cancer patients with MSI vs. no MSI AUC: 0.8848 (95% CI: 0.8185–0.9512) External validation AUC: 0.8504 (95% CI: 0.7591–0.9323) |
Detection of MSI |
Meier et al. [124] | Fully supervised |
Prognosis | Gastric cancer | H&E IHC staining, including CD8, CD20, CD68 and Ki67 |
Training and testing: 248 patients | GoogLeNet | Risk of the presence of Ki67&CD20 Hazard ratio = 1.47 (95% CI: 1.15–1.89) Risk of the presence of CD20&CD68 Hazard ratio = 1.33 (95% CI: 1.07–1.67) |
Cancer prognosis based on various IHC markers to predict patient survival outcome |
Bychkov et al. [123] | Fully supervised |
Prognosis | Colorectal cancer | H&E | Training: 220 WSIs Validation: 60 WSIs Testing: 140 WSIs |
VGG-16 | High-risk patients vs. low-risk patients Prediction with small tissue hazard ratio: 2.3 (95% CI: 1.79–3.03) |
Survival analysis of colorectal cancer |
Wang et al. [127] | Weakly supervised | Classification | Gastric cancer | H&E | Training: 408 WSIs Testing: 200 WSIs |
recalibrated multi-instance deep learning |
Cancer vs. dysplasia vs. normal Accuracy: 0.865 |
Diagnosis of gastric cancer |
Xu et al. [37] | Weakly supervised | Classification | Gastric cancer | H&E | Training, validation and testing: 185 WSIs (SRS dataset) + 2032 WSIs (Mars dataset) |
multiple instance classification framework based on graph convolutional networks |
Tumour vs. normal Recall: 0.904 (SRS dataset), 0.9824 (Mars dataset) Precision: 0.9116 (SRS dataset), 0.9826 (Mars dataset) F1-score: 0.9075 (SRS dataset), 0.9824 (Mars dataset) |
Diagnosis of gastric cancer |
Huang et al. [39] | Weakly supervised | Classification | Gastric cancer | H&E | Training and testing: 2333 WSIs External validation: 175 WSIs |
GastroMIL | Gastric cancer vs. normal External validation accuracy: 0.92 GastroMIL risk score associated with patient overall survival Hazard ratio: 2.414 |
Diagnosis of gastric cancer and prediction of patient survival outcome |
Li et al. [131] | Weakly supervised | Classification | Gastric cancer | H&E | Training and testing: 10,894 WSIs | DLA34 + Otsu’s method | Tumour vs. normal Sensitivity: 1.0000 Specificity: 0.8932 AUC: 0.9906 |
Diagnosis of gastric cancer |
Chen et al. [133] | Weakly supervised | Classification | Colorectal cancer | H&E | Training and testing: 400 WSIs | CNN classifier | Normal (including hyperplastic polyp) vs. adenoma vs. adenocarcinoma vs. mucinous adenocarcinoma vs. signet ring cell carcinoma Overall accuracy: 0.76 |
Prediction of colorectal cancer tumour grade |
Ye et al. [38] | Weakly supervised | Classification | Colorectal cancer | H&E | Training and testing: 100 WSIs | Multiple-instance CNN | With epithelial cell nuclei vs. no epithelial cell nuclei Accuracy: 0.936 Precision: 0.922 Recall: 0.960 |
Detection of colon cancer |
Sharma et al. [129] | Weakly supervised | Classification | Gastrointestinal cancer | H&E | Training and testing: 413 WSIs | Cluster-to-Conquer framework |
Celiac cancer vs. normal Accuracy: 0.862 Precision: 0.855 Recall: 0.922 F1-score: 0.887 |
Detection of gastrointestinal cancer |
Klein et al. [130] | Weakly supervised | Detection | Gastric cancer | H&E + Giemsa staining | Training: 191 H&E WSIs and 286 Giemsa-stained WSIs Validation: 71 H&E WSIs and 87 Giemsa-stained WSIs External validation: 364 H&E WSIs and 347 Giemsa-stained WSIs |
VGG+ + active learning |
H. pylori vs. no H. pylori
External validation AUC: 0.81 (H&E) + 0.92 (Giemsa-stained) |
Detection of H. pylori |
WSI = whole-slide imaging; H&E = haematoxylin and eosin; AUC = area under the curve; CI = confidence interval; TCGA = The Cancer Genome Atlas; STAD = stomach adenocarcinoma; DACHS = Darmkrebs: Chancen der Verhütung durch Screening; MSI = microsatellite instability; dMMR = deficient mismatch repair; TFF3 = trefoil factor 3; DSC = Dice similarity coefficient; UMM = University Medical Centre Mannheim, Heidelberg University; NCT = National Centre for Tumour Diseases; CRC = colorectal cancer; PUMCH = Peking Union Medical College Hospital; CHCAMS = Chinese Academy of Medical Sciences; H. pylori = Helicobacter pylori; IHC = immunohistochemistry; CNN = convolutional neural network.