Extended Data Fig. 6: Multi-head self-attention (MHSA) heatmap visualization of UNI across different image resolutions for CRC polyp classification in UniToPatho.
Each colored square represents a 16 × 16 patch token encoded by UNI, with heatmap color corresponding to the attention weight of that patch token to the global [CLS] token of the penultimate layer in UNI. We show MHSA visualizations for resized and center-cropped ROIs at 2242, 4482, 8962, 17922 resolutions for a. normal tissue, b. hyperplastic polyp, c. tubular adenoma with low-grade dysplasia, d. tubular adenoma with high- grade dysplasia, e. tubulo-villous adenoma with high-grade dysplasia, and f. tubulo-villous adenoma with low-grade dysplasia. In each, the left-most image is the original H&E ROI and the right four images are the MHSA visualizations. For comparative purposes, we resize all images within the figure to have the same dimension, but note that at higher resolutions, each colored square has an original image resolution of 16 × 16 pixels at 0.48 mpp. As resolution increases, the heatmaps demonstrate increasing and increasingly fine-grained attention focused on the crypts, in all cases except the hyperplastic polyp in b, focusing on areas a pathologist would use to make the diagnosis.