Skip to main content
. Author manuscript; available in PMC: 2022 Feb 23.
Published in final edited form as: IEEE Access. 2021 Dec 6;9:163526–163541. doi: 10.1109/ACCESS.2021.3132958

FIGURE 1.

FIGURE 1.

Overview of ScATNet for classifying skin biopsy images. To learn representations from these large WSIs at multiple input scales in an end-to-end fashion, ScATNet factorizes the classification pipeline into three steps. The first step involves learning local patch-wise embeddings using an off-the-shelf CNN for each input scale independently. In the second step, ScATNet learns inter-patch representations using transformers and produces contextualized patch embeddings for each input scale. In the last step, ScATNet learns inter-scale representations from concatenated multi-scale contextualized patch embeddings using another transformer network and produces scale-aware embeddings, which are then classified linearly into diagnostic categories.