Schematic workflow of this study
(A) Schematic summary of attMIL and the multi-input DL architecture: a WSI is tessellated into smaller tiles, that are subsequently pre-processed and passed through the encoder to give image feature vectors. In the multi-input case, each image feature vector is concatenated by a vector representing the patient’s clinical data. The set of image feature vectors per WSI is then used as input to the attMIL model. In a first embedding block, the attMIL model reduces the dimension of each tile’s initial feature vector to 256 (from 2,048 [+4 if clinical data are used in the input] when using the Wang encoder). Then, the attention score per tile is calculated. Using the attention score, the attention-weighted sum over all embedded feature vectors can be evaluated to give a 256-dimensional vector representing the entire WSI (green). Finally, this vector is passed through a classification block to obtain a biomarker prediction for the input WSI.
(B) Targets and cohorts used in internal and external validation. For internal validation, we tested for MSI, BRAF, PIK3CA, KRAS, and NRAS status. Externally only for MSI and BRAF status.
(C) List of all six DL approaches that were compared in this study. E, encoder network; P, embedding block that embeds feature vectors into a lower dimensional space; A, attention layers; Π, attention weighting; Σ, sum; C, classification block.