Skip to main content
. 2026 Apr 16;6(6):101217. doi: 10.1016/j.xgen.2026.101217

Figure 1.

Figure 1

ProtoCloud overview

(A) An illustration of the ProtoCloud model. The model has four major components: a probabilistic encoder, a probabilistic decoder, a prototype matrix, and a linear classifier. The model takes only raw UMI counts (cell-by-gene count matrix) as input during inference. Cell inputs are encoded into a low-dimensional latent space (dz = 20 by default), with different colors indicating cell types. Larger points in the latent space represent prototypes, which are pre-initialized (six per cell type) and share the same latent space as the cell embeddings. Cell type information (brighter color) is encoded in the first half of the latent dimensions. The latent embeddings are used for both cell type prediction, based on similarity to the prototypes, and for reconstructing the gene expression through the decoder.

(B) ProtoCloud provides inherent interpretability through prototypical relevance propagation (PRP), which automatically generates gene-level relevant scores once the model is trained. Each prototype undergoes PRP to produce gene-level relevance scores. Genes with higher relevance scores are the decision-relevant genes of the corresponding cell type.

(C–E) Applications of ProtoCloud. (C) The model enables accurate and robust transfer of cell type annotations across datasets. Shapes denote ground truth cell types, and colors indicate predicted labels. Symbols with black edges represent prototypes, while those without edges are cell embeddings. Crosses mark newly added query cells predicted to belong to the corresponding cell types (indicated by color). (D) ProtoCloud’s similarity-based classification process enables the detection of anomalous annotations, improving the quality of the ground truth annotations. A cell with ground truth type 1 (diamond shape) was initially mislabeled as type 2. ProtoCloud corrects this annotation by assigning the cell to type 1 based on a higher similarity score. (E) The identified cell-type-relevant genes can act as candidate markers and provide molecular insights into cell identities.