Skip to main content
. 2022 Aug 3;14(15):3780. doi: 10.3390/cancers14153780

Table 2.

Histopathologically related deep learning models used for clinical applications in GI cancers. Deep learning algorithm and models are grouped according to their specific computational task and GI cancer type to compare their performance and clinical applications. The sources of the datasets and sample sizes are also summarized.

Author Degree of
Supervision
Task Cancer Type Type of WSI Dataset Algorithm/
Model
Performance Clinical
Application
Shen et al. [112] Fully
supervised
Classification Gastric cancer H&E Training, validation and testing: 432 WSIs (TCGA-STAD cohort) + 460 WSIs (TCGA-COAD)
+ 171 WSIs (TCGA-READ) + 400 WSIs (Camelyon16)
DenseNet + Deformable Conditional
Random Field model
Accuracy: 0.9398 (TCGA-STAD), 0.9337 (TCGA-COAD), 0.9294 (TCGA-READ), 0.9468 (Camelyon16) Identification of suspected cancer area
from histological imaging
Song et al. [30] Fully
supervised
Classification Gastric cancer H&E Training: 2123 WSIs
Validation: 300 WSIs
Internal testing: 100 WSIs
External validation: 3212 WSIs (daily gastric dataset) + 595 WSIs (PUMCH) + 987 WSIs (CHCAMS and Peking Union Medical College)
DeepLab v3 Malignant vs. benign
training AUC: 0.923
Internal testing AUC: 0.931
AUC: 0.995 (daily gastric dataset)
AUC: 0.990 (PUMCH)
AUC: 0.996 (CHCAMS and Peking Union Medical College)
Diagnosis of gastric cancer
Su et al. [28] Fully
supervised
Classification and detection Gastric cancer H&E Training: 348 WSIs
Testing: 88 WSIs
External Validation: 31 WSIs
ResNet-18 Poorly differentiated adenocarcinoma vs. well-differentiated adenocarcinoma and other normal tissue
F1 score: 0.8615
Well-differentiated adenocarcinoma vs. poorly differentiated adenocarcinoma and other normal tissue
F1 score: 0.8977
Patients with MSI vs. without MSI
Accuracy: 0.7727 (95% CI 0.6857–0.8636)
Differentiation of cancer grade and diagnosis of MSI
Song et al. [29] Fully
supervised
Classification Colorectal cancer H&E Training: 177 WSIs
Validation: 40 WSIs
Internal test: 194 WSIs
External validation: 168 WSIs
Deep Lab v2 with ResNet34 Adenomatous vs. normal
AUC: 0.92
Accuracy: 0.904
Diagnosis of colorectal adenomas
Sirinukunwattana et al. [114] Fully
supervised
Classification Colorectal cancer H&E Training: 510 WSIs
External validation: 431 WSIs (TCGA cohort) + 265 WSIs (GRAMPIAN cohort)
Inception V3 Colorectal cancer consensus molecular subtypes 1 vs. 2 vs. 3 vs. 4
Training average accuracy: 70%
Training AUC: 0.9
External validation accuracy: 0.64 (TCGA cohort) + 0.72 (GRAMPIAN cohort)External validation AUC:0.84 (TCGA cohort) + 0.85 (GRAMPIAN cohort)
Prediction of colorectal cancer
molecular subtype
Popovici et al. [115] Fully
supervised
Classification Colorectal cancer H&E Training: 100 WSIs
Test: 200 WSIs
VGG-F Molecular subtype A vs. B vs. C vs. D vs. EOverall accuracy: 0.84 (95% CI: 0.79−0.88)Overall recall: 0.85 (95% CI: 0.80−0.89)Overall precision: 0.84 (95% CI: 0.80−0.88) Prediction of colorectal cancer
molecular subtype
Korbar et al. [116] Fully
supervised
Classification Colorectal cancer H&E Training: 458 WSIs
Testing: 239 WSIs
ResNet-152 Hyperplastic polyp vs. sessile serrated polyp vs.
traditional serrated adenoma vs. tubular adenoma vs. tubulovillous/villous adenoma vs. normal
Accuracy: 0.930 (95% CI: 0.890−0.959)
Precision: 0.897 (95% CI: 0.852−0.932)
Recall: 0.883 (95% CI: 0.836−0.921)
F1 score: 0.888 (95% CI: 0.841−0.925)
Characterization of colorectal polyps
Wei et al. [118] Fully
supervised
Classification Colorectal cancer H&E Training: 326 WSIs
Validation: 25 WSIs
Internet test: 157 WSIs
External validation: 238 WSIs
Ensemble ResNet×5 Hyperplastic polyp vs. sessile serrated adenoma vs. tubular adenoma vs. tubulovillous or villous adenoma.
Internal test mean accuracy: 0.935 (95% CI: 0.896–0.974)
External validation mean accuracy: 0.870 (95% CI: 0.827–0.913)
Colorectal polyp classification
Gupta et al. [119] Fully
supervised
Classification Colorectal cancer H&E Training and testing: 303,012 normal WSI patches
and approximately 1,000,000 abnormal WSI patches
Customized
Inception-ResNet-v2
Type 5 (IR-v2 Type 5) model.
Abnormal region vs. normal region
F-score: 0.99
AUC: 0.99
Identification of suspected cancer area
from histological imaging
Kather et al. [117] Fully
supervised
Classification and prognosis Colorectal cancer H&E Training: 86 WSIs
Testing: 25 WSIs
External validation: 862 WSIs (TCGA cohort) + 409 WSIs (DACHS cohort)
VGG19 Adipose tissue vs. background vs. lymphocytes vs. mucus vs. smooth muscle vs. normal colon mucosa vs. cancer-associated stroma vs. colorectal adenocarcinoma epithelium
Internal testing Overall Accuracy: 0.99
External testing Overall accuracy: 0.943
High deep stroma score predicts shorter survival
Hazard ratio: 1.99 (95% CI: 1.27–3.12)
Colorectal cancer detection and
prediction of patient survival outcome
Zhu et al. [31] Fully
supervised
Classification and segmentation Gastric and colorectal cancer H&E Training: 750 WSIs
Testing: 250 WSIs
Adversarial CAC-UNet Malignant region vs. benign region
DSC: 0.8749
Recall: 0.9362
Precision: 0.9027
Accuracy: 0.8935
Identification of suspected cancer area
from histological imaging
Xu et al. [33] Fully
supervised
Segmentation Colorectal cancer H&E Training: 750 WSIs
Testing: 250 WSIs
CoUNet Malignant region vs. benign region
Dice: 0.746
AUC: 0.980
Identification of suspected cancer area
from histological imaging
Feng et al. [32] Fully
supervised
Segmentation Colorectal cancer H&E Training: 750 WSIs
Testing: 250 WSIs
U-Net-16 Malignant region vs. benign region
DSC: 0.7789
AUC:1
Identification of suspected cancer area
from histological imaging
Mahendra et al. [120] Fully
supervised
Segmentation Colorectal cancer H&E Training: 270 WSIs (CAMELYON16)
+ 500 WSIs (CAMELYON17)
+ 660 WSIs (DigestPath)
+ 50 WSIs (PAIP)
Testing: 129 WSIs (CAMELYON16)
+ 500 WSIs (CAMELYON17)
+ 212 WSIs (DigestPath)
+ 40 WSIs (PAIP)
DenseNet-121 +
Inception-ResNet-V2 + DeeplabV3Plus
Malignant region vs. benign region
Cohen kappa score: 0.9090 (CAMELYON17)
DSC: 0.782 (DigestPath)
Identification of suspected cancer area
from histological imaging
Gehrung et al. [34] Fully
supervised
Detection Oesophageal cancer H&E and TFF3 pathology slides Training: 100 + 187 patients
Validation: 187 patients
External validation: 1519 patients
VGG-16 Patients with Barrett’s oesophagus vs. no Barrett’s oesophagus
AUC: 0.88 (95% CI: 0.85–0.91)
Sensitivity: 0.7262 (95% CI: 0.6742–0.7821)
Specificity: 0.9313 (95% CI: 0.9004–0.9613)
Simulated realistic cohort workload reduction: 57%
External validation cohort reduction: 57.41%
Detection of
Barrett’s oesophagus
Kather et al. [122] Fully
supervised
Detection Gastric and colorectal cancer H&E Training:
81 patients (UMM and NCT tissue bank) + 216 patients (TCGA-STAD) + 278 patients (TCGA-CRC-KR) + 260 patients (TCGA-CRC-DX) + 382 patients (UCEC)
External validation: 99 patients (TCGA-STAD) + 109 patients (TCGA-CRC-KR) +100 patients (TCGA-CRC-DX) +110 patients (UCEC) + 185 patients (KCCH)
Resnet18 Patients with MSI vs. no MSI
Training AUC: >0.99 (UMM and NCT tissue bank)
AUC: 0.81 (CI: 0.69–0.90) (TCGA-STAD)
AUC: 0.84 (CI: 0.73–0.91) (TCGA-CRC-KR)
AUC: 0.77 (CI: 0.62–0.87) (TCGA-CRC-DX)
AUC: 0.75 (CI: 0.63–0.83) (UCEC)
AUC: 0.69 (CI: 0.52–0.82) (KCCH)
Detection of
MSI
Echle et al. [36] Fully
supervised
Detection Colorectal cancer H&E Training: 6406 WSIs
External validation: 771 WSIs
Shufflenet Colorectal tumour sample with dMMR or MSI vs. no dMMR or MSI
Mean AUC: 0.92
AUPRC: 0.93
Specificity: 0.67
Sensitivity: 0.95
External validation AUC without colour normalisation: 0.95
External validation AUC with colour normalisation: 0.96
Detection of
MSI
Cao et al. [121] Fully
supervised
Detection Colorectal cancer H&E Training: 429 WSIs
External validation: 785 WSIs
ResNet-18 Colorectal cancer patients with MSI vs. no MSI
AUC: 0.8848 (95% CI: 0.8185–0.9512)
External validation AUC: 0.8504 (95% CI: 0.7591–0.9323)
Detection of
MSI
Meier et al. [124] Fully
supervised
Prognosis Gastric cancer H&E
IHC staining, including CD8, CD20, CD68 and Ki67
Training and testing: 248 patients GoogLeNet Risk of the presence of Ki67&CD20
Hazard ratio = 1.47 (95% CI: 1.15–1.89)
Risk of the presence of CD20&CD68
Hazard ratio = 1.33 (95% CI: 1.07–1.67)
Cancer prognosis based on various IHC markers to predict patient survival outcome
Bychkov et al. [123] Fully
supervised
Prognosis Colorectal cancer H&E Training: 220 WSIs
Validation: 60 WSIs
Testing: 140 WSIs
VGG-16 High-risk patients vs. low-risk patients
Prediction with small tissue hazard ratio: 2.3 (95% CI: 1.79–3.03)
Survival analysis
of colorectal cancer
Wang et al. [127] Weakly supervised Classification Gastric cancer H&E Training: 408 WSIs
Testing: 200 WSIs
recalibrated multi-instance
deep learning
Cancer vs. dysplasia vs. normal
Accuracy: 0.865
Diagnosis of gastric cancer
Xu et al. [37] Weakly supervised Classification Gastric cancer H&E Training, validation and testing:
185 WSIs (SRS dataset) + 2032 WSIs (Mars dataset)
multiple instance classification
framework based on
graph convolutional networks
Tumour vs. normal
Recall: 0.904 (SRS dataset), 0.9824 (Mars dataset)
Precision: 0.9116 (SRS dataset), 0.9826 (Mars dataset)
F1-score: 0.9075 (SRS dataset), 0.9824 (Mars dataset)
Diagnosis of gastric cancer
Huang et al. [39] Weakly supervised Classification Gastric cancer H&E Training and testing: 2333 WSIs
External validation: 175 WSIs
GastroMIL Gastric cancer vs. normal
External validation accuracy: 0.92
GastroMIL risk score associated with patient overall survival
Hazard ratio: 2.414
Diagnosis of gastric cancer
and prediction of patient survival outcome
Li et al. [131] Weakly supervised Classification Gastric cancer H&E Training and testing: 10,894 WSIs DLA34 + Otsu’s method Tumour vs. normal
Sensitivity: 1.0000
Specificity: 0.8932
AUC: 0.9906
Diagnosis of gastric cancer
Chen et al. [133] Weakly supervised Classification Colorectal cancer H&E Training and testing: 400 WSIs CNN classifier Normal (including hyperplastic polyp) vs. adenoma vs.
adenocarcinoma vs. mucinous adenocarcinoma vs.
signet ring cell carcinoma
Overall accuracy: 0.76
Prediction of colorectal
cancer tumour grade
Ye et al. [38] Weakly supervised Classification Colorectal cancer H&E Training and testing: 100 WSIs Multiple-instance CNN With epithelial cell nuclei vs. no epithelial cell nuclei
Accuracy: 0.936
Precision: 0.922
Recall: 0.960
Detection of colon cancer
Sharma et al. [129] Weakly supervised Classification Gastrointestinal cancer H&E Training and testing: 413 WSIs Cluster-to-Conquer
framework
Celiac cancer vs. normal
Accuracy: 0.862
Precision: 0.855
Recall: 0.922
F1-score: 0.887
Detection of
gastrointestinal cancer
Klein et al. [130] Weakly supervised Detection Gastric cancer H&E + Giemsa staining Training: 191 H&E WSIs and 286 Giemsa-stained WSIs
Validation: 71 H&E WSIs and 87 Giemsa-stained WSIs
External validation: 364 H&E WSIs and 347 Giemsa-stained WSIs
VGG+ + active learning H. pylori vs. no H. pylori
External validation AUC: 0.81 (H&E) + 0.92 (Giemsa-stained)
Detection of H. pylori

WSI = whole-slide imaging; H&E = haematoxylin and eosin; AUC = area under the curve; CI = confidence interval; TCGA = The Cancer Genome Atlas; STAD = stomach adenocarcinoma; DACHS = Darmkrebs: Chancen der Verhütung durch Screening; MSI = microsatellite instability; dMMR = deficient mismatch repair; TFF3 = trefoil factor 3; DSC = Dice similarity coefficient; UMM = University Medical Centre Mannheim, Heidelberg University; NCT = National Centre for Tumour Diseases; CRC = colorectal cancer; PUMCH = Peking Union Medical College Hospital; CHCAMS = Chinese Academy of Medical Sciences; H. pylori = Helicobacter pylori; IHC = immunohistochemistry; CNN = convolutional neural network.