Skip to main content
. 2026 Apr 16;6(6):101217. doi: 10.1016/j.xgen.2026.101217

Figure 3.

Figure 3

Latent space organization and HRG verification in ProtoCloud

(A) Evaluation of latent space disentanglement using the PBMC30K63 dataset. Comparison of single-cell integration benchmarking (scIB) metrics for the biological subspace (first-half latent space, z1) and the batch subspace (second-half latent space, z2). Metrics are grouped into biological conservation and batch correction categories. A score closer to 1 indicates better performance.

(B) Visualization of ProtoCloud latent representations of PBMC30K. UMAP projections show cell-type-specific clustering in z1 (left) and batch effects in z2 (right) of the latent space. Prototypes are represented by dots with black outlines.

(C) Visualization of ProtoCloud latent representations of PBMC30K. UMAP projections show the representation using all 3,000 HVGs (left) compared to the representation using the union of the top 30 HRGs of each cell type (right), resulting in 184 unique genes in total.

(D) Gene Ontology (GO) biological process enrichment analysis of HRGs across cell types in PBMC30K. The dot plot displays the top enriched pathways for each cell type. Dot size represents the number of HRGs associated with the pathway, and color indicates statistical significance.

(E) Cell type specificity of HRGs in PBMC30K. Heatmaps comparing row-normalized gene signature scores based on the top 15 HRGs across nine cell types. Diagonal elements (matching cell types) represent on-target specificity, where high values indicate relatively high expression in the respective cell type, while off-diagonal elements represent off-target expression.

(F) Quantitative comparison of gene specificity across three gene sets in PBMC30K. Violin plots display the distributions of Tau specificity scores for HRGs (top 20 per cell type, n = 120), canonical marker genes (n = 69), and randomly selected genes (n = 100). HRGs exhibit cell type specificity comparable to known markers, achieving a median Tau score of 0.828, while randomly selected genes show significantly lower specificity. Statistical significance was assessed using two-sided Mann-Whitney U test (ns, non-significant with p > 0.05 and ∗∗∗∗p ≤ 0.0001).

(G) Expression profiles of the top ten B cell-specific HRGs in PBMC30K. The y axis lists the top ten B cell-specific HRGs in descending rank of relevance. Expression distributions of these top ten B cell-specific HRGs across cell types are shown. High expression in B cells (left column) contrasted with low expression in other cell types demonstrates strong cell type specificity.

(H) Expression profiles of the top differentially ranked HRGs between mature and naive B cells in the TSCA lung64 dataset. Differential expression significance was assessed using the Wilcoxon rank-sum test with Benjamini-Hochberg correction, denoted by asterisks (∗p ≤ 0.05 and ∗∗∗p ≤ 0.001).