SpaCross deciphers spatial structures and corrects batch effects in multi-slice spatially resolved transcriptomics

Donghai Fang; Wenwen Min

doi:10.1038/s42003-025-08810-5

. 2025 Sep 30;8:1393. doi: 10.1038/s42003-025-08810-5

SpaCross deciphers spatial structures and corrects batch effects in multi-slice spatially resolved transcriptomics

Donghai Fang ^1,^#, Wenwen Min ^1,^✉,^#

PMCID: PMC12484869 PMID: 41028333

Abstract

Spatially Resolved Transcriptomics (SRT) has revolutionized tissue architecture analysis by integrating gene expression with spatial coordinates. However, existing spatial domain identification methods struggle with unsupervised learning constraints, lack of implicit supervision in latent space, and challenges in balancing local spatial continuity with global semantic consistency, particularly in multi-slice integration. To address these issues, we propose SpaCross, a comprehensive deep learning framework for SRT that enhances spatial pattern recognition and cross-slice consistency. SpaCross employs a cross-masked graph autoencoder to reconstruct gene expression features while preserving spatial relationships and mitigating identity mapping issues. A cross-masked latent consistency module reinforces implicit constraints on latent representations, improving feature robustness. More importantly, an adaptive spatial-semantic graph structure dynamically integrates local and global contextual information, enabling effective multi-slice integration. Extensive evaluations demonstrate that SpaCross outperforms thirteen state-of-the-art methods on single-slice datasets and achieves robust batch effect correction while preserving biologically meaningful spatial architectures in multi-slice integration. Notably, SpaCross integrates embryonic mouse tissues across developmental stages, identifying conserved regions and uncovering stage-specific structures such as the dorsal root ganglion. In the heart domain, it reconstructs developmental trajectories capturing key transcriptional transitions and gene programs associated with cardiac maturation.

Subject terms: Functional clustering

SpaCross uses a crossmasked graph autoencoder with adaptive spatialsemantic integration to advance multi-slice spatial transcriptomics and reveal conserved and stagespecific tissue structures.

Introduction

The organizational function of multicellular organisms depends on the precise spatial coordination and regulation between cells¹. Traditional single-cell sequencing technologies can analyze cellular expression heterogeneity, but they fail to capture the “spatial code" that governs tissue function. The breakthrough development of Spatially Resolved Transcriptomics (SRT) has coupled gene expression profiles with spatial coordinates, offering a novel paradigm for revealing the molecular blueprint of tissue structure^2–4. The current SRT technological framework is broadly categorized into two types⁵: in situ capture sequencing platforms and in situ hybridization (ISH)-based detection strategies. The former, represented by technologies such as 10x Visium⁶, Slide-Seq⁷, Stereo-Seq⁸, and spatial transcriptomics (ST)², integrates spatial localization information with high-throughput sequencing to enable transcriptomic analysis of thousands of genes within the tissue microenvironment⁹. The latter includes methods such as MERFISH¹⁰ and seqFISH¹¹, which break the single-cell resolution limit and precisely establish the mapping between gene expression profiles and spatial coordinates⁹. These advances enable researchers to gain deeper insights into the spatial organization of biology and the progression of diseases^12,13.

The spatial complexity of biological tissues arises from the differentiation of heterogeneous regions, which form spatial domains with specific biological characteristics. Therefore, interpreting spatial domains is a key challenge in the SRT field for understanding physiology and pathology³. Current spatial transcriptomic clustering methods can be divided into two categories¹⁴: non-spatial clustering methods and spatial clustering methods. Classical non-spatial methods (such as K-means and Louvain¹⁵) partition spatial domains based solely on gene expression matrices, neglecting the biological relationships between spatially adjacent sequencing spots¹⁶. To overcome this limitation, spatial clustering methods explicitly incorporate spatial information to improve domain identification. Among them, some methods do not rely on graph-based learning frameworks. For example, CellCharter¹⁷ employs a latent representation learned from gene expression using variational autoencoders, and integrates spatial adjacency during clustering without the use of graph convolutions. MENDER¹⁸, on the other hand, is a fast and scalable approach that constructs multiscale spatial context representations by computing the distribution of cell states across multiple neighborhood ranges. These non-GNN spatial methods provide interpretable and computationally efficient solutions, especially suitable for large-scale datasets.

Among spatial clustering methods, graph neural network (GNN)-based approaches have emerged as a powerful framework to jointly model spatial topology and transcriptomic features. For example, SpaGCN¹⁹ integrates spatial distance and histological features into an adjacency matrix, combining gene expression data and using Graph Convolutional Networks (GCNs) to learn graph embeddings under unsupervised clustering loss. SEDR²⁰ uses deep autoencoder networks to learn spatial representations and embed spatial information using variational autoencoders simultaneously. STAGATE³ has developed an adaptive graph attention autoencoder to learn low-dimensional latent representations of SRT data and identify spatial domains. Although these methods combine gene expression and spatial information for spatial domain identification, they rely entirely on unsupervised learning, which leads to embedding representations that lack accuracy and are inconsistent with pathological annotations⁹.

In recent years, the development of generative self-supervised learning frameworks based on masking mechanisms has rapidly advanced^21–23. By randomly masking parts of the input features, these frameworks force the model to predict the masked content based on contextual information, thus guiding the model to learn better clustering representations. For example, STMGraph²⁴, based on a dynamic graph attention network, employs a mask-remask mechanism to establish dual-decoding views, allowing embeddings to preserve unmasked features while reconstructing the masked features. STMGAC²⁵ introduces a self-distilled masked graph autoencoder with contrastive triplet learning, which provides latent space supervision and improves spatial clustering accuracy. SpaMask²⁶ constructs a dual-masked graph autoencoder, where node and edge masks ensure that spatially neighboring spots are similar at the feature level. Although these methods effectively avoid the pitfalls of traditional graph autoencoders, which only reconstruct based on adjacency matrices, their supervision signals are limited to the explicit reconstruction task in the raw feature space. This neglects implicit supervision in the latent space, which weakens the robustness of the model.

Graph Contrastive Learning (GCL), as an emerging self-supervised learning framework, constructs positive and negative sample pairs to drive the model to learn discriminative low-dimensional embeddings, thus overcoming the impact of data noise and high dimensionality on clustering^27–29. For instance, GAAEST⁵ constructs a neighborhood graph based on spatial locations, encodes gene expression using a graph attention network, and applies contrastive learning at local, global, and contextual levels to enhance spatial representation learning. GraphST³⁰ introduces self-supervised graph contrastive learning to learn information representations of gene expression maps and their spatial coordinates, enriching the latent representations. ConGI³¹ applies contrastive learning to adapt gene expression to tissue pathology images, thus deciphering spatial domains. stDCL⁹ integrates spatial location and gene expression information, using spatially aware contrastive learning and clustering-level feature contrastive learning mechanisms to identify spatial domains in complex tissue structures. However, these GCL-based spatial domain identification models often capture only local or global information, making it difficult to balance the two, resulting in a disconnection between spatial and expression information and blurry domain boundaries. They heavily rely on the refinement process of clustering results.

Finally, multi-slice integration faces significant methodological challenges^23,32. Inconsistencies in the coordinate systems of consecutive tissue slices create geometric integration obstacles, making it difficult for single-slice clustering methods that rely on spatial coordinates to establish neighborhood relationships (e.g., SpaGCN¹⁹) to integrate multiple slices due to the lack of cross-slice information transfer mechanisms. Although some methods, such as GraphST³⁰ and STitch3D³³, first align spatial coordinates to eliminate physical batch effects before integrating expression data, they struggle to alleviate batch effects through rigid coordinate alignment alone when facing slices with significant physical deformations³⁴. Moreover, methods that do not depend on multi-slice spatial coordinate alignment, such as Splane³⁵ (a discriminative model-based approach) and SEDR²⁰ (which relies on the Harmony process), have difficulty eliminating technical biases from different sequencing platforms due to differences in slice thickness and molecular capture efficiency.

To address these limitations, we propose SpaCross, a framework that integrates a cross-masked graph autoencoder with adaptive spatial-semantic fusion for robust spatial domain identification and multi-slice integration. SpaCross leverages a masking-enhanced graph autoencoder to reconstruct gene expression features through cross-masked inputs that explicitly preserve spatial relationships while mitigating identity mapping issues, and it employs a cross-masked latent consistency (CMLC) module to enforce implicit constraints between latent representations derived from dual-masking perspectives, thereby enhancing feature robustness. Importantly, an adaptive hybrid spatial-semantic graph (AHSG) structure dynamically integrates local spatial continuity and global semantic consistency, which further facilitates effective multi-slice integration.

Our comprehensive evaluations demonstrate the superior performance of SpaCross across diverse spatial transcriptomic scenarios. On single-slice datasets, SpaCross outperforms thirteen state-of-the-art methods in spatial domain identification, showing strong robustness to technical variation. In multi-slice integration tasks-including human dorsolateral prefrontal cortex (DLPFC) and mouse hypothalamic preoptic area datasets-SpaCross effectively balances batch correction with the preservation of biologically coherent architectures. In developing mouse embryos, SpaCross reconstructs spatiotemporal tissue dynamics across stages E9.5 to E11.5, identifying both conserved domains such as the heart and brain, and stage-specific structures like the dorsal root ganglia. Additionally, SpaCross enables cross-platform integration of mouse olfactory bulb data, capturing shared laminar structures and resolving platform-specific variation. These results highlight SpaCross as a generalizable framework for spatial analysis across complex developmental and anatomical contexts.

Results

Overview of SpaCross

SpaCross is a comprehensive analytical framework designed for spatial transcriptomics data, aiming to enhance the accuracy of spatial pattern recognition and cross-slice consistency. The approach comprises key modules including data preprocessing, masked-enhanced self-supervised learning, hybrid graph modeling, and cross-slice integration, utilizing graph neural networks and contrastive learning to jointly model spatial and semantic information (see Fig. 1).

To support the integrative analysis of multi-slice spatial transcriptomics data, SpaCross performs data preprocessing by first integrating gene expression matrices from multiple slices along the spot dimension. The data is then filtered and normalized to remove low-quality genes, retaining only highly variable genes for subsequent modeling. Next, principal component analysis (PCA) is applied for dimensionality reduction to decrease computational complexity. Subsequently, SpaCross incorporates a 3D spatial registration method, utilizing the iterative closest point (ICP)^33,36 algorithm to align spatial coordinates across different slices and dynamically construct a 3D adjacency matrix, ensuring the continuity of spatial relationships. Based on the adjusted 3D spatial coordinates, Euclidean distances are computed to construct a k-nearest neighbor (k-NN) graph, thereby forming the topological structure of the spatial graph. This ensures that the model captures the spatial continuity of neighboring spots while preserving cross-slice semantic propagation (Fig. 1A).

To enhance the model’s robustness against missing data and noise, SpaCross introduces a cross-masked self-supervised learning mechanism (Fig. 1B). Specifically, two complementary masked views are randomly generated on the input features, serving respectively for feature reconstruction and latent space consistency learning. The masked feature matrix, which simulates missing information during training, improves the model’s imputation capability, while the complementary mask provides consistent supervision for the latent space, effectively mitigating overfitting in the autoencoder. The graph encoder, integrating feedforward neural networks and graph convolutional networks, leverages the graph structure to propagate features among neighboring nodes, resulting in more robust latent representations.

In the latent representation learning process, SpaCross further enhances the reliability of these representations through a cross-masked latent consistency (CMLC) mechanism (Fig. 1B). The model aligns latent embeddings generated from complementary views using contrastive learning, thereby reinforcing consistency across different views and ensuring stable feature representation even when handling incomplete data or diverse data augmentation views.

To address the insufficient integration of local and global information in SRT data, SpaCross has developed an adaptive hybrid spatial-semantic graph (AHSG) modeling method (Fig. 1B). Based on the latent embeddings, the model selects local spatial neighbors and global semantic cluster neighbors, fusing them into a unified mixed neighborhood. This strategy not only preserves spatial continuity but also introduces semantic consistency across regions, significantly enhancing the discriminative power in downstream tasks. By aggregating mixed neighborhood features and optimizing node embeddings with contrastive learning, the model effectively delineates the boundaries between different categories.

In downstream tasks (Fig. 1C), SpaCross enables complex spatial domain identification within tissue slices while preserving spatial continuity and clustering accuracy. It supports the integration of consecutive slices, cross-developmental stage comparisons, and cross-platform datasets. These capabilities provide a unified framework for spatial omics studies in development, disease, and cross-species research.

SpaCross enhances clustering and layer-specific identification in DLPFC

We comprehensively assessed SpaCross’s performance in spatial domain identification by benchmarking it against thirteen state-of-the-art methods spanning diverse modeling paradigms, including classical probabilistic models, graph-based frameworks, generative architectures, and contrastive learning strategies. For clarity, we categorized these methods based on their core algorithmic principles, while acknowledging that some span multiple paradigms. Non-GNN-based models included CellCharter¹⁷, which utilizes scVI-based probabilistic embeddings, and MENDER¹⁸, a scalable proximity-based method. Graph-based approaches such as SpaGCN¹⁹, SEDR²⁰, STAGATE³, and DeepST³⁷ explicitly incorporate spatial adjacency via graph neural networks. Generative models like SpaMask²⁶, STMGAC²⁵, and DiffusionST³⁸ enrich spatial representations through masked gene prediction or diffusion modeling. Methods employing contrastive learning, including GraphST³⁰, stDCL⁹, GAAEST⁵, and CCST²⁸, enhance embeddings by contrasting intra- and inter-domain spot pairs.

All methods were evaluated on the human dorsolateral prefrontal cortex (DLPFC) dataset from the 10x Visium platform³⁹, which includes 12 annotated tissue sections categorized into six cortical layers (L1-L6) and white matter (WM) by Maynard et al.³⁹.

To quantify clustering accuracy, we computed the Adjusted Rand Index (ARI) and clustering Accuracy (ACC), where higher values indicate better correspondence with manual annotations (Fig. 2A; see Methods). Each method was run 10 times with different random seeds across all 12 DLPFC slices to ensure robustness and statistical rigor. As all models operated under identical settings-using the same number of spatial domains per slice-their outputs are directly comparable. Importantly, we retained the original post-processing procedures specified by each baseline method when evaluating both ARI and ACC, to ensure a fair and faithful comparison. For each run, we calculated the ARI and ACC per slice, averaged the results, and applied Wilcoxon signed-rank tests to assess statistical significance between methods (Fig. 2A; the experiments of SpaCross under 50 random seeds are shown in Supplementary Fig. 1).

SpaCross significantly outperformed 11 out of 13 baseline methods in terms of ARI, and 10 out of 13 in terms of ACC, based on Wilcoxon signed-rank tests across all slices (Fig. 2A). These results underscore its robust and consistent performance across diverse integration paradigms. Notably, SpaCross achieved the highest mean ARI (0.557) and ACC (0.667), as well as the highest median ACC (0.673) among all evaluated approaches. Compared to non-GNN models such as CellCharter and MENDER, SpaCross demonstrated significantly better alignment with manual annotations (p < 0.05). When evaluated against GNN-based methods like STAGATE and SEDR, SpaCross achieved consistently superior ARI and ACC values, with all improvements reaching strong statistical significance (p < 0.001). While generative models such as SpaMask and STMGAC yielded competitive performance, they still lagged behind SpaCross (p < 0.05). Among contrastive learning methods, GraphST exhibited the closest performance to SpaCross. However, SpaCross maintained higher median scores and reduced inter-slice variability, reflecting its improved stability and generalizability. Although differences with GraphST and stDCL did not reach statistical significance in ARI, SpaCross still outperformed both in mean scores (0.557 vs. 0.543 and 0.517, respectively). These findings confirm SpaCross as a robust and accurate tool for spatial domain identification. Supplementary Fig. 2 presents the visualized spatial domain assignments across all 12 DLPFC slices, while Supplementary Text 1 and Supplementary Fig. 3 further discuss the key architectural differences between SpaCross and related methods such as SpaMask, STMGAC, and GAAEST.

For in-depth evaluation, we analyzed DLPFC slice 151675. As illustrated in Fig. 2B, SpaCross not only achieved the highest clustering scores (ARI = 0.683, ACC = 0.761) but also produced spatial domains with clearly delineated boundaries and minimal spot intermixing. In contrast, DeepST exhibited discontinuous laminar transitions, especially between Domains 2 and 4. SEDR misassigned Domain 1 into Domain 6, while STAGATE, though generally effective, failed to separate Domain 1 from Domain 6. Post-processing-dependent methods DiffusionST, GraphST, GAAEST, and stDCL-improved domain continuity via label refinement¹⁹, but still displayed localized embedding inconsistencies (Fig. 2C). Among them, SpaMask and STMGAC exhibited moderate robustness but still suffered from occasional misclassifications, particularly at cortical layer boundaries. In comparison, GAAEST was particularly sensitive to refinement: its performance deteriorated substantially without this step, with marked drops in ARI and ACC. To further evaluate the robustness and generalizability of SpaCross, we conducted a comparative analysis with GAAEST under varying post-processing refinement radii across all 12 DLPFC slices. Although GAAEST occasionally achieved higher ARI on a subset of slices when the refinement radius exceeded 30, SpaCross consistently outperformed GAAEST on the majority of slices in both ARI and ACC, even without any post-hoc refinement (Supplementary Figs. 4–6). These results underscore SpaCross’s inherent ability to produce accurate and biologically meaningful domains without relying on parameter-sensitive refinement procedures.

To elucidate these discrepancies, we visualized latent embeddings using UMAP (Fig. 2C). SpaCross displayed a clear linear trajectory from Domain 1 to Domain 7, aligning with cortical layer stratification. In comparison, stDCL and GraphST presented overlapping and disordered embeddings, reflecting weaker separation of spatial heterogeneity. Their limitations likely stem from neglecting global cluster semantics during training. To quantify embedding quality, we computed the Discreteness Score (DIS)⁴⁰, where lower values indicate stronger topological coherence. SpaCross achieved the lowest DIS (0.0394), outperforming stDCL (0.1675) and GraphST (0.0831), with consistent trends observed across all slices (Supplementary Fig. 2).

SpaCross’s ability to identify spatially variable genes (SVGs) was also evaluated using a SpaGCN-inspired pipeline¹⁹. In slice 151675, SpaCross detected 30 SVGs with distinct spatial patterns, including 23 genes (e.g., PLP1) in Domain 8, one gene (NEFL) in Domain 5, and six genes (e.g., ENC1) in Domain 3 (Fig. 2D). We compared Moran’s I metrics for these SVGs with five leading methods. SpaCross showed the highest median Moran’s I values, indicating stronger spatial autocorrelation, although the differences did not reach statistical significance (Mann-Whitney U test, p > 0.05).

Further analysis of slice 151675 revealed challenges shared across methods in resolving subtle boundaries, particularly between layers 1–2 and 3–4 (Fig. 2B). For example, stDCL and GraphST did not recover a distinct layer 4; SEDR and DeepST yielded fragmented or overextended boundaries; and STAGATE, though accurate overall, struggled to separate layers 1 and 2. These challenges reflect broader modeling limitations rather than issues unique to SpaCross.

The default configuration of SpaCross assigned seven domains per slice, matching the six annotated layers and white matter. This choice promotes consistency with manual labels but may restrict detection of finer-grained structures. Although SpaCross achieved the highest ARI and ACC under this constraint, spatial maps showed layer 1 split into two domains and layer 4 absent, suggesting over- and under-segmentation (Fig. 2B).

To probe whether these results stemmed from biological similarity or model limitations, we increased the number of domains to eight (Fig. 2B). The revised segmentation clearly separated layers 3 and 4 into Domain_4 and Domain_5, demonstrating SpaCross’s ability to resolve finer distinctions when unconstrained.

To investigate the biological basis of the inferred boundaries, we computed pairwise Pearson correlations of gene expression across domains (Fig. 2E). Domain_4 and Domain_5 (layers 3 and 4) had high similarity (correlation = 0.76), explaining their earlier merging. In contrast, Domain_1 and Domain_2-both from layer 1-had low correlation (0.24), indicating marked molecular heterogeneity.

This heterogeneity was further validated via differential expression analysis between Domain_1 and Domain_2. The volcano plot (Fig. 2F) highlighted domain-specific marker genes: MALAT1, MGP, and MYH11 (vascular/stromal cells) in Domain_1, and glial/neuronal markers CXCL14 and AQP4 in Domain_2. Spatial expression maps (Fig. 2G) confirmed distinct localization, supporting the conclusion that layer 1 subdivision reflects real biological variation rather than model artifacts.

SpaCross robustly delineates tissue structures across diverse experimental platforms

To systematically validate the cross-platform robustness of the SpaCross algorithm, we first conducted performance evaluation using a complex human breast cancer (BRCA) dataset generated by the 10x Visium platform²⁰. This dataset contains 20 histopathological regions meticulously annotated by Xu et al.²⁰ based on H&E-stained images, encompassing ductal carcinoma in situ/lobular carcinoma in situ (DCIS/LCIS), healthy tissue, invasive ductal carcinoma (IDC), and tumor edge regions (Fig. 3A). Experimental results demonstrated SpaCross’s significant performance advantages: it achieved the highest spatial domain clustering accuracy (ARI = 0.65 and ACC = 0.72), outperforming suboptimal methods DiffusionST (ARI = 0.58) and DeepST (ARI = 0.58) by over 0.07 (Fig. 3C). This improvement margin aligns with recent advancements in spatial clustering methodologies, where state-of-the-art approaches typically show high performance gains over predecessors (Supplementary Table 1). For visual comparison across methods, spatial domain identification results of all baseline models are provided in Supplementary Fig. 7.

Fig. 3 — A Histological image (left) and manually annotated spatial domains (right) for the human breast cancer (BRCA) dataset from 10x Visium. B Boxplot comparisons of Moran’s I and Geary’s C metrics for SVGs identified by SpaCross and baseline methods on the BRCA dataset. Mann-Whitney U tests were used to assess statistical significance. C Spatial domain maps on the BRCA dataset generated by SpaCross and representative baseline methods. D Spatial domain identification results on the mouse primary visual cortex (MVC) dataset (STARmap), including manual annotation (top-left) and results from SpaCross and baseline methods. E UMAP visualizations and PAGA trajectories of learned embeddings on the MVC dataset, shown for SpaCross and baseline methods. F Evolution of hybrid spatial-semantic graphs during SpaCross optimization on the MVC dataset. Left: gene expression similarity graph based on PCA-derived correlations. Right: hybrid graphs at epochs 50, 100, and 300, showing progressive refinement of intra-domain structure. G Spatial domain identification results on the mouse somatosensory cortex (MSC) dataset (osmFISH), including manual annotation (leftmost) and spatial domains identified by SpaCross and baseline methods.

At the tissue topology resolution level, SpaCross effectively addressed segmentation biases observed in existing algorithms (including stDCL, DiffusionST, GraphST, and SEDR) towards IDC_2 and IDC_5 regions. As shown in Fig. 3C, while these methods erroneously fragmented contiguous IDC areas into discrete subclusters despite clear block-like pathological features, SpaCross’s identification results exhibited remarkable consistency with manual annotations. Quantitative analysis through inter-cluster similarity matrices (Supplementary Fig. 8) revealed strong spatial co-localization between SpaCross-defined domains 8, 20, 5 and IDC_1, IDC_2, IDC_4, respectively. Notably, the algorithm precisely delineated dynamic tumor edge transition zones, mapping spatial domain 1 to Tumor_edge_3 and domain 3 to Tumor_edge_2 (Fig. 3C), demonstrating exceptional tumor-stroma boundary resolution capability comparable to recent contrastive learning frameworks.

SVG analysis further confirmed SpaCross’s superiority. Since each method employs distinct criteria for identifying spatially variable genes (SVGs), the resulting gene sets differ in both content and size, violating the matched-sample assumption of the Wilcoxon signed-rank test. Therefore, we adopted the Mann-Whitney U test, which is more appropriate for comparing two independent distributions with potentially unequal sample sizes. On the BRCA dataset (Fig. 3B), this test revealed statistically significant differences in Moran’s I values between SpaCross and five representative baselines: p = 0.0031 (stDCL), 7.76 × 10⁻¹⁰ (GraphST), 0.0365 (STAGATE), 8.82 × 10⁻⁷ (SEDR), and 0.0008 (SpaGCN). This provides strong evidence that SpaCross identifies SVGs with significantly higher spatial autocorrelation. Geary’s C values exhibited consistent trends, with SpaCross achieving significantly lower values, indicating better spatial coherence. These findings demonstrate that SpaCross not only aligns well with pathological annotations but also statistically outperforms other methods in capturing spatial expression patterns.

Furthermore, we evaluated the spatial domain identification performance of SpaCross and baseline methods on a more complex mouse brain dataset from the 10x Visium platform, which contains 52 spatial domains. SpaCross achieved the highest accuracy among all methods, with an ARI of 0.52, and exhibited more spatially continuous domain segmentation. See Supplementary Fig. 9 and Supplementary Table 1 for details.

Next, we conducted a systematic evaluation of anatomical structure delineation capabilities on the mouse primary visual cortex (MVC)⁴¹ dataset generated by the STARmap platform, which contains manually annotated anatomical regions including the corpus callosum (CC), hippocampus (HPC), and six neocortical layers (L1–L6). Experimental results demonstrated that SpaCross significantly outperformed all benchmark methods in terms of both accuracy and anatomical plausibility (Fig. 3D). Notably, the non-GNN method MENDER achieved an ARI of 0.65, outperforming several GNN-based methods; however, none of the existing baselines surpassed this score–except SpaCross, which achieved an ARI of 0.70 (Fig. 3D, Supplementary Fig. 10A and Supplementary Table 1). Despite its relatively high ARI, MENDER failed to distinguish the L1 and L2/3 layers, resulting in anatomically inconsistent boundaries.

Consistent with previous findings, stDCL failed to separate the HPC and CC regions. GraphST and SpaGCN exhibited severe cell admixture across cortical layers (Supplementary Fig. 10). STAGATE and SEDR were unable to resolve the boundary between L2/3 and L4. CCST erroneously fragmented the CC region into multiple subdomains and misclassified HPC together with L1 as a single anatomical domain. In contrast, SpaCross achieved near-perfect concordance with gold-standard annotations, producing sharply defined domain boundaries without cellular intermixing and effectively capturing the full laminar and structural architecture of the visual cortex.

UMAP visualization of latent embeddings (Fig. 3E and Supplementary Fig. 10B) revealed that SpaCross captured a linear topological structure aligning with cortical layer development. In contrast, CCST exhibited erroneous overlaps between domain 1 and 5 centroids, while GraphST, STAGATE, and SEDR generated biologically uninterpretable embedding distributions. To elucidate the biological relevance of SpaCross’s representations, we tracked the dynamic evolution of hybrid spatial-semantic proximity graphs during iterative optimization (Fig. 3F). Initial PCA-based gene expression similarity networks (left panel) showed intra-domain correlations only in CC, with anomalous cross-domain similarities prevalent in other regions. Through iterative refinement, SpaCross progressively strengthened intra-domain associations while maintaining biologically meaningful inter-domain relationships, demonstrating effective integration of spatial proximity and global co-expression patterns to establish biologically coherent topological representations.

Finally, we conducted quantitative benchmarking on the mouse somatosensory cortex (MSC)⁴² dataset generated by the osmFISH platform. In terms of anatomical accuracy measured by ARI, most benchmark methods achieved scores below 0.6, while the non-GNN method CellCharter reached an ARI of 0.66. SpaCross demonstrated the best performance with an ARI of 0.67 (Fig. 3G, Supplementary Fig. 11 and Supplementary Table 1). Notably, stDCL exhibited domain misassignment in Layer 5 through improper mixing of multiple anatomical regions, and GraphST erroneously bisected Layer 4 into two discrete partitions. In contrast, SpaCross not only precisely delineated laminar boundaries but also maintained exceptional biological fidelity in resolving spatial cellular distribution patterns (Fig. 3G and Supplementary Fig. 11).

SpaCross corrects batch effects in consecutive tissue slices

To comprehensively evaluate the cross-slice integration efficacy of SpaCross, we implemented it on independent DLPFC donor datasets and conducted comparative benchmarking against state-of-the-art multi-slice integration methods (SPIRAL⁴³, STitch3D³³, Splane³⁵, and STAligner³²). Our multidimensional evaluation framework incorporated: 1) Clustering concordance analysis using manual annotations as the gold standard (quantified by ARI and ACC); 2) Spatial domain discreteness assessment (DIS) with explicit exclusion of label refinement post-processing; 3) A composite metric F1-harmonized Local Inverse Simpson Index (F1LISI)⁴⁴ to quantify the balance between batch effect removal and biological conservation. The F1LISI index (range: 0–1) integrates batch-grouped LISI (LISI_batch) and domain-grouped LISI (LISI_domain) through a tunable weighting coefficient α (as detailed in the Methods section), with higher values indicating superior technical noise elimination while preserving biological variance.

As demonstrated in Donor3 (Fig. 4A), SpaCross exhibits superior performance across all evaluation metrics, achieving median values of ARI = 0.637, ACC = 0.702, DIS = 0.0413, and F1LISI = 0.915. This dominance stems from its innovative hybrid neighbor graph architecture for multi-slice integration, which enhances spatial domain continuity through cross-slice expression similarity modeling, coupled with a latent space consistency module that improves robustness across slices. SPIRAL demonstrates competitive clustering accuracy (ARI = 0.637) and batch mixing (F1LISI = 0.915) via its dual-embedding learning mechanism (independent optimization of biological and batch embeddings). However, its spatial domain results exhibit discrete anomalous spots (Fig. 4B), reflected in the highest DIS value of 0.0908, likely due to insufficient preservation of local spatial information during domain adaptation. STitch3D, employing ICP and PASTE algorithms for spatial coordinate alignment with single-cell reference data integration, achieves suboptimal median metrics (ARI = 0.538, ACC = 0.685). Its rigid transformation assumptions limit adaptability to complex tissue deformations. STAligner shows spatial domain discontinuity in Layer 4 (Fig. 4B), indicating failure of its graph attention mechanism to capture continuous structural features in deep tissue layers. UMAP visualization (Fig. 4C) reveals Splane’s poor multi-slice integration (F1LISI = 0.671), where slice 151673 forms isolated clusters with significant intra-domain mixing, highlighting limitations in spatial constraint modeling. While GraphST and SEDR achieve adequate batch mixing, their lack of clear spatial boundaries suggests a trade-off in local feature preservation within contrastive learning frameworks. Notably, only SpaCross and SPIRAL successfully balance batch correction with preservation of biological tissue topology.

A detailed comparative evaluation of SpaCross against related contrastive learning-based methods (STMGAC, SpaMask, and GAAEST) across multi-slice datasets is provided in Supplementary Text 2 and Supplementary Figs 12, 13. This analysis highlights SpaCross’s superior performance in integrating anatomically diverse, developmentally staged, and cross-platform spatial transcriptomic data.

SpaCross balances developmental consistency and stage-specific variability

To further explore the balance between developmental consistency (shared across stages) and stage-specific variability (regions emerging or regressing at particular stages) in embryonic tissue architecture, we applied the SpaCross algorithm to the mouse embryonic dataset⁸ generated by the Stereo-seq platform, integrating three developmental stages (E9.5, E10.5, and E11.5). Through cross-stage joint clustering, SpaCross identified 20 tissue-region clusters. In addition to recovering known anatomical regions, SpaCross also delineated previously unannotated sub-domains, including a hindbrain sub-region with distinct neuronal signatures. Detailed analysis is provided in Supplementary Text 3 and Supplementary Fig. 14. The clustering results aligned well with ground truth annotations⁴⁵ (Fig. 5A), consistently achieving higher ARI values than baseline methods (E9.5: 0.38, E10.5: 0.35, E11.5: 0.46; Supplementary Fig. 14A). The inter-cluster correlation matrix highlighted three highly coherent spatial domains (Fig. 5B)-the heart (clusters 20/4), dorsal root ganglion (DRG, clusters 12/2), and brain (clusters 10/5)-confirming SpaCross’s robustness in cross-stage spatial alignment. Notably, the DRG region was absent at E9.5 but prominently emerged at E10.5 and E11.5 (Fig. 5C), reflecting known biological events such as neural crest cell migration and ganglion formation. Meanwhile, conserved regions like the liver and dermomyotome persisted across all stages but exhibited marked shifts in spatial extent (Supplementary Fig. 14C).

Fig. 5 — A Spatial domain annotations of E9.5, E10.5, and E11.5 mouse embryos. B Inter-cluster correlation matrix revealing three conserved cross-stage regions: heart (clusters 20/4), dorsal root ganglion (DRG, clusters 12/2), and brain (clusters 10/5). C Spatial domains identified by SpaCross across embryonic stages. D Region-specific marker gene expression patterns: *Nppb* (heart), *Ppp1r17* (DRG), and *Hes5* (brain), confirming biological relevance of identified domains. E Pseudotime trajectory inferred from heart-region spots (clusters 4 and 20). Left: UMAP embedding of clusters 4 and 20, colored by developmental stages (E9.5, E10.5, and E11.5); arrows indicate the progression of stage-specific centroids. Middle: Diffusion pseudotime plotted on the same UMAP, with arrows connecting centroids of bins (pseudotime intervals of 0.1) from 0 to 1, highlighting the inferred developmental trajectory. Right: Expression levels of *Nppb* shown in the same embedding space. Red circles highlight the same spatial region across all panels, corresponding to spots with pseudotime > 0.8. F Heatmap of representative late-upregulated genes in the heart trajectory. G GO enrichment of late-phase genes, highlighting biological processes associated with cardiac structural maturation and functional activation.

To validate the biological relevance of these domains, we examined the spatial expression of region-specific marker genes in representative domains (Fig. 5D), with further examples shown in Supplementary Fig. 14D. In the heart domain, Nppb, a marker for hormone-secreting cardiac cells, maintained high expression from E9.5 to E11.5, indicating sustained cardiac differentiation. In the DRG region, Ppp1r17, a gene associated with neural crest migration, absent at E9.5, was strongly enriched at later stages, supporting a stage-specific emergence of this domain. The brain region showed consistent Hes5 expression, with spatial distribution evolving from a diffuse pattern at E9.5 to layered structures by E11.5, mirroring the gradual organization of neuroepithelial architecture during early corticogenesis.

To further investigate transcriptional dynamics within the SpaCross-identified heart domain, we extracted all spots assigned to clusters 4 and 20-two clusters exhibiting high transcriptomic similarity in the inter-cluster correlation matrix (Fig. 5B) and spatially co-localizing with the annotated heart region-to construct a heart-specific subset for trajectory inference. Using a randomly selected spot from the E9.5 slice as the developmental origin, we applied diffusion pseudotime (DPT) analysis to reconstruct continuous transcriptional progression across the heart domain (Fig. 5E, middle).

To improve biological interpretability and avoid ambiguity in trajectory direction, we discretized pseudotime into bins of width 0.1, computed the centroid of each bin, and connected them to visualize the developmental trajectory in the middle panel of Fig. 5E. The left panel displays arrows connecting stage-specific centroids (E9.5, E10.5, E11.5), serving as a biological reference for the inferred pseudotime direction, while the right panel shows the spatial expression pattern of the late marker gene Nppb without directional overlays. The pseudotime distribution reflected a smooth developmental transition: early pseudotime included a mixture of E9.5, E10.5, and E11.5 spots, suggesting the persistence of early transcriptional features, while later pseudotime was dominated by E10.5 and E11.5 spots, corresponding to more mature cardiac states. Notably, Nppb expression increased along the pseudotime axis, consistent with its role in cardiac structural maturation and functional activation.

To examine transcriptional changes along the trajectory, we stratified the heart-region spots based on pseudotime into an early-to-intermediate group (≤0.8) and a late-phase group (>0.8). Differential expression analysis revealed a cohort of late-upregulated genes, including canonical markers of cardiac maturation and contractility such as Myh7, Myl2, and the natriuretic peptides Nppa and Nppb (Fig. 5F). GO enrichment analysis of late-phase genes highlighted processes related to cardiomyocyte structural maturation (e.g., myofibril assembly and cardiac myofibril organization) and functional activation (e.g., cGMP biosynthesis and systemic arterial blood pressure regulation) (Fig. 5G). These findings demonstrate that SpaCross not only provides spatially coherent annotations but also enables the extraction of temporally resolved, biologically meaningful transcriptional trajectories within anatomically localized domains.

SpaCross enables cross-platform integration of spatially resolved data

We next investigated SpaCross’s cross-platform integration capability by applying it to spatially resolved transcriptomic datasets of mouse olfactory bulb (MOB)⁸ acquired through Stereo-seq and Slide-seqV2⁷ platforms. These platforms differ substantially in capture chemistry and spatial resolution, introducing significant batch effects that pose challenges for integration methods. Unlike the DLPFC dataset, these cross-platform datasets are known to contain pronounced technical variability that cannot be attributed to biological similarity alone. To rigorously evaluate SpaCross’s performance under such conditions, we used DAPI-stained laminar structures²⁰ and Allen Brain Atlas annotations as biological ground truth (Fig. 6A) to assess whether SpaCross can effectively eliminate technical discrepancies while preserving biologically meaningful spatial domains. Both platforms captured MOB’s core laminar architecture, including the rostral migratory stream (RMS), granule cell layer (GCL), glomerular layer (GL), internal plexiform layer (IPL), mitral cell layer (MCL), external plexiform layer (EPL), and olfactory nerve layer (ONL), with Slide-seqV2 additionally covering the accessory olfactory bulb (AOB) and its granular layer (AOBgr), thereby providing finer spatial resolution.

Fig. 6 — A Schematic of MOB laminar structures from DAPI staining and Allen Brain Atlas annotations as biological reference annotations. B Spatial domain identification results from SpaCross, SPIRAL, STAligner, GraphST, SpaMask, and SEDR across Stereo-seq (top panels) and Slide-seqV2 (bottom panels) platforms. C UMAP visualization of batch correction performance (upper) and spatial domain assignments (lower) for different integration methods. D Domain-specific spatial patterns identified by SpaCross with corresponding marker gene expression validations across platforms.

Comparative analysis revealed distinct performance differences among integration methods. While SPIRAL demonstrated effective batch mixing in UMAP visualizations (Fig. 6C), its spatial domain assignments (Domains 1-9) showed chaotic partitioning inconsistent with anatomical structures (Fig. 6B). STAligner exhibited platform-specific annotation errors, misclassifying shared ONL regions as Slide-seqV2-exclusive (Fig. 6B, red circle in Fig. 6C). GraphST completely failed to integrate datasets due to its reliance on slice alignment assumptions (Fig. 6C). SpaMask, benefiting from a dual-masking mechanism, produced spatial domains with clear intra-platform boundaries (Fig. 6B). However, due to substantial differences in spatial resolution and coordinate distributions across platforms, SpaMask was unable to construct a unified 3D neighborhood graph and thus could not achieve effective cross-platform integration. Its UMAP embedding showed unintegrated batch clusters similar to GraphST (Fig. 6C). In contrast, SpaCross achieved superior spatial domain resolution, accurately identifying seven cross-platform conserved structures (RMS, GCL, MCL, EPL, GL, ONL, IPL) and Slide-seqV2-specific AOB/AOBgr regions. Its domain boundaries showed precise alignment with DAPI-revealed laminar organization (Fig. 6B), with Stereo-seq (blue) and Slide-seqV2 (orange) spots overlapping completely in shared regions while remaining distinct in platform-specific areas (Fig. 6C), demonstrating effective decoupling of technical noise from biological signals.

Notably, SpaCross enabled the robust delineation of the AOB and AOBgr regions-structures that were inconsistently resolved using either platform alone due to limitations in spatial resolution or coverage. This stable detection of AOB was further supported by distinct transcriptional signatures revealed in the integrated embedding space. Marker gene expression patterns validated spatial domain fidelity (Fig. 6D). The narrow annular MCL domain showed strong co-localization of mitral cell marker Gabra1 across platforms. GCL displayed continuous gradient expression of granule cell marker Pcp4. Domain-specific patterns were preserved for Cck (EPL), Nrep (RMS), Fabp7 (ONL), Nrsn1 (GL), and Kcnd2 (IPL). Critically, AOB and AOBgr exhibited localized enrichment of Slide-seqV2-specific genes such as Fxyd6 and Tac1, confirming the biological uniqueness of these domains.

To further characterize the AOB region, we performed differential gene expression analysis between AOB and all other spatial domains. The identified AOB-specific genes-including Ptgds, Snca, Uchl1, and Stmn2-suggest a specialized transcriptional program. Gene Ontology enrichment of these markers revealed pathways involved in vesicle trafficking, epithelial morphogenesis, and neurosecretory signaling, underscoring the dual immunomodulatory and neuromodulatory functions attributed to AOB in olfactory-driven behavioral processes. Detailed spatial domain delineation, marker gene expression, differential expression heatmaps, and enrichment results are presented in Supplementary Fig. 15.

Experimental results demonstrate that SpaCross effectively mitigates cross-platform batch effects while preserving biologically coherent spatial domains. It accurately reconstructs the multilayered architecture and enables the stable delineation of platform-specific, histologically validated regions such as the accessory olfactory bulb and its granular layer, which are not reliably identified by either platform alone due to differences in resolution and coverage.

SpaCross generalizes to complex multi-slice and multi-tissue contexts

To further validate the generalization capability of SpaCross in complex multi-slice and multi-tissue scenarios, we applied it to MERFISH-derived mouse hypothalamic preoptic area datasets consisting of five consecutive slices (Bregma coordinates: -0.04 mm to -0.24 mm)¹⁰. This dataset features single-cell resolution, with intercellular spacing (approximately micron-level) significantly smaller than inter-slice intervals (Fig. 7A). Using manually annotated regions (BST, MPA, MPN, PV, PVH, V3, and Fx) as ground truth, we compared SpaCross with baseline methods including SPIRAL, GraphST, STAligner, SEDR, and SpaMask in terms of clustering accuracy and multi-slice integration F1LISI metrics (Fig. 7B). SpaCross achieved the highest mean ARI (0.5854), mean ACC (0.643), and the highest median F1LISI (0.758), significantly outperforming other methods (ARI < 0.5). This performance stems from SpaCross’s multi-slice hybrid graph architecture, which adaptively integrates intra-slice spatial domain constraints and inter-slice similarity, effectively mitigating cross-thick-slice variations.

As shown in Fig. 7C, STAligner failed to distinguish the paraventricular thalamus (PVT) and medial preoptic area (MPA) domains across multiple slices and erroneously partitioned the fornix (Fx) region into two subregions mixed with MPA domains. In contrast, SpaCross-identified spatial domains across five slices exhibited high consistency with manual annotations. Notably, the paraventricular hypothalamic nucleus (PVH) domain thickness gradually decreased from -0.04 mm (left) to -0.24 mm (right), aligning with true anatomical characteristics, while PVT and Fx domain thicknesses showed increasing trends consistent with manual annotations.

We further validated the integration performance of SpaCross on an adult mouse whole-brain (AMB) multi-slice dataset constructed using a ST platform⁴⁶. The experiment selected 35 coronal slice samples consecutively distributed along the anterior-posterior (AP) axis³³ (Fig. 7D), which exhibit progressive morphological transitions while showing significant spatial domain heterogeneity. In terms of clustering accuracy, SpaCross achieved the best performance with a median ARI of 0.44 (Fig. 7E). Spatial domain identification results (Fig. 7F) revealed that SpaCross specifically detected cluster 11, which was precisely registered to the hippocampal region of the Allen Mouse Brain Reference Atlas, while effectively preserving biological heterogeneity in complex tissue slices during AP axis trajectory analysis. Notably, the topological structure of isocortical regions (containing clusters 4 and 8 representing distinct cellular laminar organization) demonstrated excellent spatial continuity across consecutive slices, confirming the model’s reliability in 3D spatial reconstruction. This work demonstrates SpaCross’s superiority in resolving spatially complex and variable multi-slice datasets, highlighting its potential for large-scale spatial omics analysis.

SpaCross ablation experiments validate the efficacy of individual modules

To systematically assess the contributions of SpaCross’s architectural modules and loss function designs, we conducted a series of ablation studies. These experiments evaluated the impact of key components such as the cross-masked latent consistency module (CMLC), the adaptive hybrid spatial-semantic graph (AHSG), and associated training losses. Results confirmed that each component plays a distinct and essential role in maintaining spatial coherence and cross-slice alignment. Detailed ablation analyses are provided in Supplementary Text 4 and Supplementary Figs. 16-17.

Computational performance and scalability

We also evaluated the computational performance and scalability of SpaCross to assess its applicability to large-scale spatial transcriptomics datasets. We conducted benchmarking experiments on the DLPFC dataset by systematically increasing the number of slices from 1 to 12 (covering up to 48,000 spots) and recorded multiple computational metrics including runtime, GPU memory usage, and memory cache. In addition, we compared SpaCross with six representative baseline methods across different algorithmic categories. Detailed benchmarking procedures and comparative results are provided in Supplementary Text 5 as well as Supplementary Fig. 18 and Supplementary Table 2.

Discussion

In this study, we propose SpaCross as a comprehensive deep learning framework that addresses critical limitations in multi-slice spatial transcriptomics integrated analysis. By integrating a masked graph autoencoder for reconstruction learning with a cross-masked latent consistency (CMLC) module, the approach provides dual-space supervision that combines explicit reconstruction loss in the raw feature space with implicit latent consistency constraints. This dual strategy effectively enhances the robustness and accuracy of the learned embeddings, overcoming the shortcomings of traditional unsupervised methods that often yield representations misaligned with pathological annotations.

Conventional graph autoencoders typically rely solely on unsupervised learning, which can result in latent representations that lack biological interpretability. In contrast, SpaCross incorporates complementary masked views that simulate missing data, compelling the model to predict and reconstruct gene expression features while preserving spatial context. The inclusion of the CMLC module further enforces consistency across different masked perspectives, ensuring that the latent space remains stable and biologically meaningful even in the presence of noise. This methodological innovation not only improves the feature robustness but also facilitates the accurate delineation of spatial domains.

A key innovation of SpaCross lies in its adaptive hybrid spatial-semantic graph (AHSG) structure, which harmoniously fuses local spatial continuity with global semantic coherence. By dynamically integrating information from spatial neighborhoods and semantic clusters, the framework adeptly balances fine-grained spatial details with overarching tissue architecture. This capability is particularly important for multi-slice integration, where traditional methods have struggled with batch effects and inter-slice variability. The adaptive graph construction allows SpaCross to effectively transfer information across slices, ensuring that both technical noise is minimized and true biological variations are preserved.

Experimental evaluations reinforce the practical value of these innovations. Across multiple single-slice datasets, SpaCross consistently outperformed thirteen state-of-the-art methods in spatial domain identification, demonstrating superior clustering performance and enhanced robustness. Its application to human dorsolateral prefrontal cortex data and complex MERFISH datasets confirmed that the framework not only corrects batch effects but also maintains the integrity of tissue spatial organization.

In developmental contexts, SpaCross revealed dynamic spatiotemporal patterns in embryonic mouse tissues by integrating data across stages E9.5 to E11.5. In particular, it reconstructed a continuous transcriptional trajectory within the cardiac region, capturing transitions from early to mature cardiomyocyte states. This was supported by stage-aligned pseudotime progression, sustained expression of cardiac markers such as Nppb, and enrichment of genes involved in structural maturation and contractile function. These findings demonstrate SpaCross’s ability to resolve fine-grained developmental dynamics within anatomically defined domains.

Additionally, the cross-platform integration of mouse olfactory bulb data from Stereo-seq and Slide-seqV2 further illustrates the versatility of SpaCross. The framework successfully identified shared laminar structures and resolved platform-specific subdomains, providing biologically meaningful representations across distinct technologies.

While SpaCross demonstrates many strengths, it is worth noting that the graph-based approach might require some further tuning when applied to extremely high-resolution datasets or in cases of uneven spatial sampling. Additionally, although the dual supervision mechanism significantly enhances robustness, exploring complementary strategies for latent space regularization could offer further improvements under particularly challenging data conditions.

In summary, SpaCross represents a significant advancement in the field of spatial transcriptomics. Its innovative integration of cross-masked reconstruction learning, latent consistency enforcement, and adaptive hybrid graph modeling not only improves spatial domain identification and multi-slice integration but also provides valuable insights into tissue architecture across diverse biological contexts. The theoretical and practical implications of this work establish a solid foundation for future research and applications in developmental biology, neuroscience, oncology, and beyond.

Methods

Data preprocessing and spatial graph construction

SpaCross takes gene expression data and spatial coordinates from multiple tissue slices as input (Fig 1). First, we retain the genes shared across all slices. Then, the gene expression matrices from multiple slices are concatenated along the spot dimension to obtain an integrated gene expression matrix. If there is only a single slice, this concatenation step is not necessary. The Scanpy tool⁴⁷ is then used to filter out uninformative genes and perform log normalization on the entire gene expression dataset. Subsequently, the top 2000 highly variable genes are selected, and Principal Component Analysis (PCA) is applied. The first N_pc principal components are chosen as features for the spatial spots, resulting in a feature matrix $X \in R^{N \times N_{p c}}$ , where N is the total number of spots.

To fully incorporate spatial information, we first employ the Iterative Closest Point (ICP)³⁶ algorithm for spatial registration, minimizing the Euclidean distance between feature points in adjacent slices to unify the three-dimensional coordinate system (for algorithm details, refer to Supplementary Text 6). Based on this, a three-dimensional coordinate system is established, where the tissue slice plane is defined as the X-Y plane, and the z-axis represents the distance between adjacent slices. The adjacency relationship is determined using a dynamic threshold principle-if the three-dimensional Euclidean distance between two spots is less than 1.1 times the nearest neighbor distance within a slice, a topological connection is established, forming the three-dimensional adjacency matrix A. If spot j is a neighbor of spot i, then A_ij = A_ji = 1. The constructed adjacency matrix is subsequently used in the graph neural network for each step of the process.

It is important to note that in cross-platform data integration or data from different developmental stages, significant spatial topological variations between samples exist, and direct application of ICP registration may lead to coordinate distortion. To address this, we adopt a hierarchical modeling strategy: first, the adjacency matrices ${A_{t}}_{t = 1}^{T}$ for each slice are independently computed, and then a block diagonal matrix is constructed along the main diagonal. This block diagonal matrix is used as the input adjacency matrix A = diag(A₁, A₂, . . . , A_T). If there is only a single slice, the nearest-neighbor adjacency matrix is computed directly.

Data augmentation with spot masking

Before training SpaCross, we generate a masked feature matrix X_m and a complementary masked feature matrix X_cm. These matrices are used to generate the latent representation and to provide supervisory signals for the latent representation, respectively. Specifically, with a masking rate ρ, we randomly sample a masked subset $V_{m}$ from the set of all spots $V$ . In contrast, the complementary masked subset is denoted as $V_{c m}$ , such that $V_{m} \cup V_{c m} = V$ and $V_{m} \cap V_{c m} = \emptyset$ .

Based on the spot masking mechanism, we construct a masked feature matrix $X_{m} \in R^{N \times N_{p c}}$ to address the “identity mapping" problem. Specifically, for any spot v_i, if $v_{i} \in V_{m}$ , its corresponding feature vector is replaced with a learnable mask token $x_{[M]} \in R^{N_{p c}}$ , i.e., x_m,i = x_[M]; otherwise, x_m,i = x_i.

Similarly, we construct a complementary masked feature matrix $X_{c m} \in R^{N \times N_{p c}}$ to provide persistent supervisory signals in the latent space. It is defined as follows: if $v_{i} \in V_{c m}$ , its corresponding feature vector is replaced with the mask token, i.e., x_cm,i = x_[M]; otherwise, x_cm,i = x_i.

Latent representation learning via masked reconstruction

Graph encoding

The graph encoder $F_{g}$ consists of a feedforward neural network (FNN) and two layers of GCNs. It takes the spatial adjacency matrix A and the masked feature matrix X_m as input, and outputs the latent graph embedding $Z_{g} \in R^{N \times d}$ , where d is the dimensionality of the latent space. That is, $Z_{g} = F_{g} (A, X_{m})$ . Specifically, for the l-th layer of FNN, the input is $H_{f}^{(l - 1)}$ , and the output features $H_{f}^{(l)}$ are given by:

H_{f}^{(l)} = ELU (BN (W_{f}^{(l)} H_{f}^{(l - 1)} + b^{(l)}))

where $H_{f}^{(0)} = X_{m}$ , $H_{f} = H_{f}^{(L)}$ and L = 2. ELU is the Exponential Linear Unit activation function, and BN denotes the Batch Normalization process.

Then, utilizing the information propagation mechanism of GCNs, the masked nodes can learn features from their unmasked neighboring nodes. The mathematical representation is as follows:

Z_{g} = \tilde{A} \cdot ReLU (BN (\tilde{A} H_{f} W_{g}^{(0)})) \cdot W_{g}^{(1)}

where $W_{g}^{(l)}$ is the weight for the l-th layer of the GCN, and $\tilde{A}$ is the symmetrically normalized adjacency matrix, defined as $\tilde{A} = D^{- \frac{1}{2}} A D^{- \frac{1}{2}}$ .

Once the training phase is completed, we utilize raw feature matrix X and A as the input to the graph encoder $F_{g}$ and obtain the latent graph embedding Z. This representation is then used for downstream tasks such as spatial domain identification and visualization.

Representation predicting

To improve self-supervised learning in mask-based graph representations, we propose a graph predictor, $F_{p}$ , for latent space self-supervision. The graph predictor takes the remasked latent representation Z_m and the adjacency matrix A as input and produces the predicted representation, $Z_{p} \in R^{N \times d}$ , such that $Z_{p} = F_{p} (A, Z_{m})$ . The remasked latent representation, Z_m, is obtained by applying the remasking technique to the set of masked nodes, $V_{m}$ , where node representations in the latent space are masked. Specifically, for a node v_i, if $v_{i} \in V_{m}$ , its latent representation is replaced with a learnable mask token, $z_{[R M]} \in R^{d}$ , i.e., z_m,i = z_[RM]; otherwise, z_m,i = z_g,i.

The predicted representation, Z_p, is computed using the weight matrix W_p as follows:

Z_{p} = \tilde{A} \cdot Z_{m} \cdot W_{p}

Z_p will be self-supervised by the complementary representation and used to reconstruct the raw features.

Feature decoding

The feature decoder, $F_{d}$ , maps Z_p to the raw data space to reconstruct the raw features, resulting in $\hat{X} \in R^{N \times N_{p c}}$ . This is computed using the weight matrix W_d as follows:

\hat{X} = \tilde{A} \cdot ReLU (BN (Z_{p})) \cdot W_{d}

Reconstruction loss in the raw space

One of the primary objectives is to reconstruct the masked feature of spots in $V_{m}$ , given a partially observed set of spots and their adjacency relationships. The Scaled Cosine Error (SCE) is employed as the objective function, defined under a predetermined scaling factor, γ, as follows:

L_{SCE} = \frac{1}{∣ V_{m} ∣} \sum_{v_{i} \in V_{m}} {(1 - sim (x_{i}, {\hat{x}}_{i}))}^{γ}, γ \geq 1

Here, γ is set to 2 to diminish the contribution of simple samples during training, $∣ V_{m} ∣$ denotes the number of elements in the masked set, and the cosine similarity, sim( ⋅ , ⋅ ), is computed as follows:

sim (x, y) = \frac{x^{⊤} y}{∥ x ∥ ∥ y ∥}

Latent space guidance via CMLC

Complementary graph encoding

In unlabeled self-supervised learning models, such as those based on autoencoder structures, there is a risk of overfitting to the training data. To address this limitation, we design a Cross-Masked Latent Consistency (CMLC) module that delivers persistent supervisory signals for each masked point in the latent space. The CMLC framework implements a complementary masking strategy through graph encoder $F_{g}$ , which processes the complementary masked feature matrix X_cm and adjacency matrix A to generate the complementary graph embedding $Z_{c g} \in R^{N \times d}$ according to: $Z_{c g} = F_{g} (A, X_{c m})$ . This complementary graph embedding Z_cg provide persistent supervisory signals for guiding the self-supervised matching of predicted representations Z_p in latent space.

Consistency loss in the latent space

To enforce semantic consistency between the predicted representation Z_p (obtained from masked inputs X_m) and the complementary graph embedding Z_cg (derived from X_cm), we employ the InfoNCE (Noise Contrastive Estimation, NCE) loss as the learning objective for the CMLC module. Moreover, this loss function operates on the masked node set $V_{m}$ to align the dual-view latent spaces by contrasting node-specific agreement against perturbed negatives.

Formally, for each masked spot $v_{i} \in V_{m}$ , we treat the representations (z_p,i, z_cg,i) as a positive pair, where z_p,i and z_cg,i are latent vectors of spot i from Z_p and Z_cg, respectively. Negative pairs are constructed by pairing z_p,i with embeddings of unrelated nodes ${z_{c g, j}}_{j \in N_{i}}$ , where $N_{i}$ denotes a set of randomly sampled non-masked nodes. The NCE Loss is defined as:

L_{NCE} = - \frac{1}{∣ V_{m} ∣} \sum_{v_{i} \in V_{m}} \log \frac{\exp (sim (z_{p, i}, z_{c g, i}) / τ)}{\exp (sim (z_{p, i}, z_{c g, i}) / τ) + \sum_{j \in N_{i}} \exp (sim (z_{p, i}, z_{c g, j}) / τ)}

where τ = 0.5 is a temperature hyperparameter that sharpens the similarity distribution.

Discriminative representation learning in AHSG

Adaptive hybrid spatial-semantic graph (AHSG) construction

For any target spot $v_{i} \in V$ , in order to construct a hybrid spatial-semantic neighborhood, we first form a candidate nearest neighbor set $B_{i}$ that consists of the intra-slice candidate set $B_{i}^{i n t r a}$ and the inter-slice candidate set $B_{i}^{i n t e r}$ , i.e.,

B_{i} = B_{i}^{i n t r a} \cup B_{i}^{i n t e r}

Assume that T(i) = t indicates the slice to which spot v_i belongs, and let its latent representation be denoted by z_i. First, we construct the intra-slice candidate set by computing the cosine similarity sim(z_i, z_j) between spot v_i and every other spot $v_{j} \in V \ {v_{i}}$ within the same slice (i.e., satisfying T(j) = t). The latent graph embedding matrix $Z \in R^{N \times d}$ is obtained during the detached inference stage by encoding the features X with the graph encoder $F_{g}$ using the adjacency matrix A, i.e., $Z = F_{g} (A, X)$ . Then, we select the top K_intra spots with the highest similarity to v_i to form the intra-slice candidate set (If it is a single slice, then the candidate set is simply the intra-slice candidate set), formally defined as:

B_{i}^{i n t r a} = \{v_{j} ∣ Rank (sim (z_{i}, z_{j})) \leq K_{i n t r a}, \forall j \in V \ {v_{i}}, T (j) = t\}

For the inter-slice candidate set, we consider all spots from slices different from the slice T(i). That is, for every spots v_j in other slices (i.e., satisfying T(j) ≠ t), we compute the cosine similarity between v_i and these inter-slice spots, and then select the top K_inter spots with the highest similarity. The formal definition is:

B_{i}^{i n t e r} = \{v_{j} ∣ Rank (sim (z_{i}, z_{j})) \leq K_{i n t e r}, \forall j \in V \ {v_{i}}, T (j) \neq t\}

To ensure that the candidate spots are not only similar in the latent semantic space but also exhibit spatial continuity, we introduce a spatial constraint and define the spatially constrained neighborhood set $N_{i}^{S}$ as:

N_{i}^{S} = B_{i} \cap A_{i}

where $A_{i}$ represents the local neighborhood set of spot v_i, and we define $A_{i} = {v_{j} \in V ∣ A_{i j} = 1}$ .

To capture global semantic consistency, we employ the k-means clustering algorithm to the latent representation Z, thereby partitioning all spots in the semantic space into several clusters, and define the semantically similar neighborhood set $N_{i}^{G}$ as:

N_{i}^{G} = B_{i} \cap C_{i}

where $C_{i}$ denotes the set of spots that belong to the same cluster as spot v_i. When spot v_i is assigned to a cluster c, we define $C_{i} = {v_{j} \in V ∣ v_{j} is assigned to cluster c}$ .

Finally, we fuse the spatially constrained neighborhood set $N^{S}$ with the semantically similar neighborhood set $N^{G}$ to form the spatial-semantic hybrid nearest neighbor $N^{F}$ :

N^{F} = N^{S} \cup N^{G}

The hybrid nearest neighbor $N^{F}$ not only preserves local spatial continuity but also emphasizes global semantic consistency, thereby providing richer and more refined neighborhood information for the clustering task.

Hybrid feature aggregation

Leveraging the adaptive hybrid spatial-semantic nearest neighbor, we extract an integrated node embedding matrix $S \in R^{N \times d}$ that captures both the spatial and sematic detailed features. In particular, for each spot v_i, we compute its aggregated summary vector s_i by applying a neighborhood aggregation function over its fused neighborhood:

s_{i} = Sigmoid (\frac{1}{∣N_{i}^{F}∣} \sum_{v_{j} \in N_{i}^{F}} z_{g, j})

where $∣N_{i}^{F}∣$ denotes the number of neighbors in the hybrid set.

Contrastive loss

The summary vector s_i serves as an anchor for aligning the spot embeddings. Specifically, the positive sample pair (z_g,i, s_i) is formed by the spot’s latent embedding z_g,i and its corresponding summary vector s_i. To generate negative samples, we perturb the original embeddings via a corruption function to obtain ${\tilde{Z}}_{g}$ ; thus, the pair $({\tilde{z}}_{g, i}, s_{i})$ serves as a negative example.

By maximizing the mutual information between node embeddings and their corresponding summary vectors, we enhance their alignment in the embedding space while simultaneously mitigating the collapse phenomenon. This alignment is enforced via a contrastive objective formulated with the Binary Cross-Entropy (BCE) loss:

L_{BCE} = - \frac{1}{∣V_{m}∣} \sum_{v_{i} \in V_{m}} [\log D (z_{g, i}, s_{i}) + \log (1 - D ({\tilde{z}}_{g, i}, s_{i}))]

where the discriminator $D (\cdot, \cdot)$ is implemented as a bilinear scoring function:

D (z_{g}, s) = Sigmoid (z_{g}^{⊤} W s)

This formulation not only ensures that each spot embedding z_g,i is highly informative relative to its aggregated summary s_i, but also robustly discriminates against corrupted embeddings, thereby significantly enhancing the model’s clustering performance.

Comprehensive loss function

The comprehensive loss function, regulated by the weight factors λ₁, λ₂, and λ₃, comprises three main components: the reconstruction loss $L_{SCE}$ in the raw space, which measures the reconstruction error of masked features; the matching loss $L_{NCE}$ in the latent space, aimed at ensuring the consistency of latent representations; and the contrastive loss $L_{BCE}$ , used to optimize the similarity and dissimilarity between samples. The total loss is expressed as:

L = λ_{1} L_{SCE} + λ_{2} L_{NCE} + λ_{3} L_{BCE}

Spatial clustering and visualization

After training, we obtain the graph embedding Z. We then apply spatial clustering using the Mclust algorithm, which fits a mixture of Gaussian distributions via Expectation-Maximization to automatically determine optimal clusters. Each cluster represents a distinct spatial domain.

To compute UMAP, we first build a neighbor graph with the sc. pp. neighbors function, capturing local structural relationships among spots. In this step, we set the number of neighbors to 12 and utilize the top 16 principal components of the graph embedding Z. Next, the scanpy. tl. umap function is applied to perform UMAP dimensionality reduction, which facilitates a clear visualization of the similarities and differences among spots. Finally, we conduct scanpy. tl. paga analysis to uncover potential relationships between spatial domains and visualize the results using the scanpy. pl. paga_compare function.

Identification of SVGs and meta-genes

We identify SVGs and meta-genes using the SpaGCN detection framework¹⁹. For SVGs, a Wilcoxon rank-sum test is performed to compare the target domain against its adjacent domains, selecting genes with adjusted P-values below 0.05. Additionally, genes are required to meet three criteria: (1) more than 80% of spots in the target domain express the gene, (2) the in/out score ratio (percentage of expressing spots in the target domain versus each adjacent domain) exceeds 1, and (3) the expression fold change is greater than 1.5. For meta-gene construction, the SVG filtering threshold is relaxed by lowering the minimum fold change to 1.2. Among these modestly enriched genes, one is randomly chosen as a base gene (gene₀). Its mean expression (e₀) in the target domain is calculated, and non-target spots with expression above e₀ are defined as the control group. A subsequent differential test then identifies additional genes with significant expression differences, which are aggregated to form meta-genes.

Experimental details

In the data preprocessing stage, we first extract the top 200 principal components from 2000 highly variable genes using PCA as input features. For datasets with fewer than 2000 but more than 200 genes, PCA is applied directly to the available genes; if the number of genes is below 200, all normalized gene expressions are used without PCA transformation. For spatial neighborhood graph construction, we set K = 12 for datasets generated from 10x Visium, Stereo-seq, and STARmap platforms, and adaptively select K = 6–8 for other platforms based on performance. The encoder $F_{g}$ comprises two FNN layers (dimensions 64 and 32) followed by two GCN layers (dimensions 64 and 16), producing a 16-dimensional latent representation. The graph predictor $F_{p}$ uses a GCN with output dimension 16, while the decoder $F_{d}$ includes one GCN layer outputting 200 dimensions. The discriminator $D$ operates with a latent dimension of 16.

The masking rate is fixed at ρ = 0.5, with scaling factor γ = 2 and temperature τ = 0.5. In hybrid neighbor computation, the K_intra = K_inter = 15 most similar candidates are selected, with updates every 50 steps. Loss function weights are empirically set as λ₁ = 0.6, λ₂ = 0.3, and λ₃ = 0.7, corresponding to the scaled cosine embedding loss (SCE), the noise contrastive estimation loss (NCE), and the binary cross-entropy loss (BCE), respectively. The model is trained for 300 epochs using the Adam optimizer with an initial learning rate of 0.001 and weight decay of 0.0003.

A comprehensive investigation of hyperparameter selection-including neighborhood size, PCA dimensionality, and loss weighting-is provided in Supplementary Text 7, and Supplementary Figs. 19, 20. The algorithm flow can be found in Supplementary Algorithm 1.

Evaluated metrics and criteria

We evaluate the performance of our spatial domain identification model using a combination of metrics that assess both clustering accuracy and spatial continuity. First, to quantify clustering accuracy, we employ the Adjusted Rand Index (ARI)⁴⁸, which measures the similarity between the predicted clusters and the manually annotated labels. The ARI is defined as

ARI = \frac{\sum_{i j} (\begin{matrix} N_{i j} \\ 2 \end{matrix}) - \frac{[\sum_{i} (\begin{matrix} N_{i} \\ 2 \end{matrix}) + \sum_{j} (\begin{matrix} N_{j} \\ 2 \end{matrix})]}{(\begin{matrix} N \\ 2 \end{matrix})}}{\frac{1}{2} [\sum_{i} (\begin{matrix} N_{i} \\ 2 \end{matrix}) + \sum_{j} (\begin{matrix} N_{j} \\ 2 \end{matrix})] - \frac{[\sum_{i} (\begin{matrix} N_{i} \\ 2 \end{matrix}) + \sum_{j} (\begin{matrix} N_{j} \\ 2 \end{matrix})]}{(\begin{matrix} N \\ 2 \end{matrix})}}

where N is the total number of spots, N_ij denotes the number of spots shared between the i-th predicted cluster (C_i ∈ C) and the j-th true cluster (Y_j ∈ Y), and N_i (or N_j) is the number of spots in cluster C_i (or Y_j).

In addition, we utilize the Normalized Mutual Information (NMI) metric⁴⁹ to quantify the shared information between the clustering results and the ground truth. The clustering performance of labeled data is evaluated using not only the NMI but also the Homogeneity (HOM) and Completeness (COM) metrics.⁴⁰. The NMI is computed as

NMI (Y, C) = \frac{2 [H (Y) - H (Y ∣ C)]}{H (Y) + H (C)}

where the entropy H( ⋅ ) is given by

H (X) = - \sum_{i} p (x_{i}) \log p (x_{i})

Similarly, the HOM score measures whether each cluster contains data points from only one class, while the COM score evaluates whether all data points of a given class are assigned to the same cluster. They are defined as:

HOM = 1 - \frac{H (Y ∣ C)}{H (Y)}, COM = 1 - \frac{H (C ∣ Y)}{H (C)}

Both metrics range from 0 to 1, where higher values indicate better clustering quality in terms of purity (homogeneity) and completeness. We then define the overall accuracy score (ACC) as the average of NMI, HOM, and COM:

ACC = \frac{1}{3} (NMI + HOM + COM)

Higher ARI and ACC values (closer to 1) indicate better clustering precision.

To assess spatial continuity, we introduce two metrics: the Spatial Chaos Score (CHAOS) and the Percentage of Anomalous Points (PAS). A lower CHAOS value signifies more coherent spatial domain continuity, while a lower PAS indicates fewer isolated or anomalous points within the spatial domains⁴⁰. To compute CHAOS, we first construct a 1-nearest neighbor (1-NN) graph for each dataset by connecting each spot to its closest neighbor in Euclidean space. Let d_ij denote the Euclidean distance between spot i and spot j; then, we define

w_{k i j} = \{\begin{matrix} d_{i j}, & if spots i and j are connected in the 1-NN graph within cluster k \\ 0, & otherwise \end{matrix})

If n_k is the number of spots in the k-th spatial domain, N is the total number of spots, and K is the total number of unique spatial domains, the CHAOS score is calculated as

CHAOS = \frac{1}{N} \sum_{k = 1}^{K} \sum_{i j}^{n_{k}} w_{k i j}

The PAS score is defined as the percentage of spots whose spatial domain label differs from that of at least six out of their ten nearest neighbors. Lower PAS values correspond to higher spatial homogeneity within domains.

Finally, we define the overall discreteness score (DIS) as the average of the CHAOS and PAS scores:

DIS = \frac{1}{2} (CHAOS + PAS)

In summary, higher ARI and ACC scores indicate better clustering accuracy, while lower DIS scores reflect improved spatial domain continuity. These metrics together provide a comprehensive evaluation of the spatial domain identification performance.

To quantify the balance between batch effect correction and preservation of spatial domain structures in multi-slice spatial data integration, we use the F1LISI metric. This metric leverages the Local Inverse Simpson Index (LISI), which separately measures the mixing of batches within the same spatial domain (LISI_batch) and the separation across distinct spatial domains (1 − LISI_domain). By integrating these two components through a dynamic weighting factor $α = \frac{N_{d o m a i n}}{N_{b a t c h} + N_{d o m a i n}}$ , F1LISI unifies batch mixing and domain separation into a single score using a harmonic mean formulation. The F1LISI is defined as follows⁴⁴:

F1LISI = \frac{(1 + α^{2}) (1 - LISI_{domain}_{n o r m}) (LISI_{batch}_{n o r m})}{α^{2} (1 - LISI_{domain}_{n o r m}) + LISI_{batch}_{n o r m}}

The factor α adaptively adjusts the evaluation focus based on the ratio of spatial domains to batches, prioritizing domain separation when domains are abundant and emphasizing batch mixing when batches dominate. A higher F1LISI score indicates superior performance in simultaneously removing technical batch noise and preserving biologically meaningful spatial patterns.

Comparison with other methods

To benchmark SpaCross, we compared it with a diverse set of methods spanning five categories: non-GNN-based (CellCharter¹⁷, MENDER¹⁸), GNN-based (SpaGCN¹⁹, SEDR²⁰, DeepST³⁷, STAGATE³), generative (STMGAC²⁵, SpaMask²⁶, DiffusionST³⁸), contrastive learning-based (CCST²⁸, GraphST³⁰, GAAEST⁵, stDCL⁹), and multi-slice integration methods (SPIRAL⁴³, STitch3D³³, Splane³⁵, STAligner³²). These methods cover a range of modeling strategies, including spatial proximity, graph convolution, masked prediction, contrastive objectives, and slice alignment via registration or structural priors. The implementation procedures and parameter settings for all comparison methods are described in Supplementary Text 8.

Statistics and reproducibility

All analyses were performed in Python using non-parametric tests (Wilcoxon signed-rank, Mann-Whitney U, Wilcoxon rank-sum), with p < 0.05 considered significant. Here, n denotes biologically independent tissue slices. Reproducibility was assessed by repeating analyses with different random seeds across datasets and averaging results across independent runs. A replicate is defined as an independent run under identical settings, and all baseline models used their default recommended parameters to ensure fairness.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

Supplementary Information^{(26MB, pdf)}

42003_2025_8810_MOESM2_ESM.docx^{(21.2KB, docx)}

Description of Additional Supplementary Files

Supplementary Data 1^{(9MB, xlsx)}

Reporting Summary^{(1.6MB, pdf)}

Transparent Peer Review file^{(39.2MB, pdf)}

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62262069) and the Young Talent Program of Yunnan Province (C619300A067).

Author contributions

W.M. conceived and supervised the project. D.F. and W.M. developed and implemented the SpaCross algorithm. D.F. and W.M. validated the methods and wrote the manuscript. All authors read and approved the final manuscript.

Peer review

Peer review information

Communications Biology thanks Noah Cohen Kalafut and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan, Ophelia Bu. A peer review file is available.

Data availability

All data sets used in this article are publicly available: (1) Human dorsolateral prefrontal cortex data³⁹ captured using 10X Visium technology can be downloaded from http://research.libd.org/spatialLIBD/. (2) Human breast cancer data obtained with 10x Visium technology can be downloaded from https://www.10xgenomics.com/datasets/human-breast-cancer-block-a-section-1-1-standard-1-1-0. (3) Mouse olfactory bulb tissue data generated by the Stereo-seq and Slide-seqV2 platforms can be accessed from https://github.com/JinmiaoChenLab/SEDR_analyses/tree/master/dataand https://singlecell.broadinstitute.org/single_cell/study/SCP815, respectively. (4) The spatial transcriptomic data of the mouse embryo obtained with Stereo-seq technology⁸ can be downloaded from https://db.cngb.org/stomics/mosta/. (5) The mouse primary visual cortex (V1) STARmap dataset⁴¹ is available at https://www.starmapresources.com/data. (6) The mouse brain somatosensory cortex osmFISH dataset⁴² can be downloaded from http://linnarssonlab.org/osmFISH. (7) The mouse hypothalamus dataset from MERFISH⁵⁰ can be downloaded from 10.5061/dryad.8t8s248. (8) The mouse whole brain dataset⁴⁶ profiled by the ST platform can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147747. Detailed descriptions of the various datasets can be found in Supplementary Table 3. The data used in this study have been uploaded to Zenodo and are freely available at: https://zenodo.org/records/15090086.

Code availability

An open-source implementation of the SpaCross algorithm can be downloaded from https://github.com/wenwenmin/SpaCross.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Donghai Fang, Wenwen Min.

Supplementary information

The online version contains supplementary material available at 10.1038/s42003-025-08810-5.

References

1.Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature596, 211–220 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). [DOI] [PubMed] [Google Scholar]
3.Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.13, 1739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Min, W., Shi, Z., Zhang, J., Wan, J. & Wang, C. Multimodal contrastive learning for spatial gene expression prediction using histology images. Brief. Bioinform.25, bbae551 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Wang, T. et al. Graph attention automatic encoder based on contrastive learning for domain recognition of spatial transcriptomics. Commun. Biol.7, 1351 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Rao, N., Clark, S. & Habern, O. Bridging genomics and tissue pathology: 10× genomics explores new frontiers with the Visium spatial gene expression solution. Genet. Eng. Biotechnol. News40, 50–51 (2020). [Google Scholar]
7.Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell185, 1777–1792 (2022). [DOI] [PubMed] [Google Scholar]
9.Yu, Z. et al. Accurate spatial heterogeneity dissection and gene regulation interpretation for spatial transcriptomics using dual graph contrastive learning. Adv. Sci.12, 2410081 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Shah, S. et al. Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell174, 363–376 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Zhang, C., Dong, K., Aihara, K., Chen, L. & Zhang, S. STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning. Nucleic Acids Res.51, e103–e103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Li, X., Zhu, F. & Min, W. SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq. Brief. Bioinform.25, bbae571 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Li, H.-S., Tan, Y.-T. & Zhang, X.-F. Enhancing spatial domain detection in spatial transcriptomics with ensdd. Commun. Biol.7, 1358 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp.2008, P10008 (2008). [Google Scholar]
16.Wang, H., Zhao, J., Nie, Q., Zheng, C. & Sun, X. Dissecting spatiotemporal structures in spatial transcriptomics via diffusion-based adversarial learning. Research7, 0390 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. Cellcharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet.56, 74–84 (2024). [DOI] [PubMed] [Google Scholar]
18.Yuan, Z. Mender: fast and scalable tissue structure identification in spatial omics data. Nat. Commun.15, 207 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods18, 1342–1351 (2021). [DOI] [PubMed] [Google Scholar]
20.Xu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med.16, 12 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Hou, Z. et al. GraphMAE: Self-supervised masked graph autoencoders. in Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), 594–604 (ACM, 2022).
22.Tu, W. et al. RARE: Robust masked graph autoencoder. IEEE Trans. Knowl. Data Eng.36, 5340–5353 (2023). [Google Scholar]
23.Chen, Y., Zhen, C., Mo, Y., Liu, J. & Zhang, L. Multiscale dissection of spatial heterogeneity by integrating multi-slice spatial and single-cell transcriptomics. Adv. Sci.12, 2413124 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Lin, L. et al. STMGraph: spatial-context-aware of transcriptomes via a dual-remasked dynamic graph attention model. Brief. Bioinform.26, bbae685 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Fang, D., Zhu, F., Xie, D. & Min, W. Masked graph autoencoders with contrastive augmentation for spatially resolved transcriptomics data. In Proc. 2024 IEEE Int. Conf. Bioinform. Biomed. (BIBM), 515–520 (IEEE, 2024).
26.Min, W., Fang, D., Chen, J. & Zhang, S. SpaMask: Dual masking graph autoencoder with contrastive learning for spatial transcriptomics. PLoS Comput. Biol.21, e1012881 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lee, N., Lee, J. & Park, C. Augmentation-free self-supervised learning on graphs. in Proc. AAAI Conf. on Artif. Intell., 7372–7380 (AAAI Press, 2022).
28.Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci.2, 399–408 (2022). [DOI] [PubMed] [Google Scholar]
29.Nie, W., Yu, Y., Wang, X., Wang, R. & Li, S. C. Spatially informed graph structure learning extracts insights from spatial transcriptomics. Adv. Sci.11, 2403572 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun.14, 1155 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Zeng, Y. et al. Identifying spatial domain by adapting transcriptomics with histology through contrastive learning. Brief. Bioinform.24, bbad048 (2023). [DOI] [PubMed] [Google Scholar]
32.Zhou, X., Dong, K. & Zhang, S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput. Sci.3, 894–906 (2023). [DOI] [PubMed] [Google Scholar]
33.Wang, G. et al. Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks. Nat. Mach. Intell.5, 1200–1213 (2023). [Google Scholar]
34.Zeira, R., Land, M., Strzalkowski, A. & Raphael, B. J. Alignment and integration of spatial transcriptomics data. Nat. Methods19, 567–575 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Xu, H. et al. SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat. Commun.14, 7603 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Arun, K. S., Huang, T. S. & Blostein, S. D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell.1, 698–700 (1987). [DOI] [PubMed] [Google Scholar]
37.Xu, C. et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res.50, e131–e131 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Cui, Y. et al. DiffusionST: a deep generative diffusion model-based framework for enhancing spatial transcriptomics data quality and identifying spatial domains. Brief. Bioinform.26, bbaf390 (2025). [DOI] [PubMed] [Google Scholar]
39.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods21, 712–722 (2024). [DOI] [PubMed] [Google Scholar]
41.Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361, eaat5691 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmfish. Nat. Methods15, 932–935 (2018). [DOI] [PubMed] [Google Scholar]
43.Guo, T. et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol.24, 241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol.21, 12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Richardson, L. et al. EMAGE mouse embryo spatial gene expression database: 2014 update. Nucleic Acids Res.42, D835–D844 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Ortiz, C. et al. Molecular atlas of the adult mouse brain. Sci. Adv.6, eabb3446 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 1–5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc.66, 846–850 (1971). [Google Scholar]
49.Amelio, A. & Pizzuti, C. Correction for closeness: adjusting normalized mutual information measure for clustering comparison. Comput. Intell.33, 579–601 (2017). [Google Scholar]
50.Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362, eaau5324 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(26MB, pdf)}

42003_2025_8810_MOESM2_ESM.docx^{(21.2KB, docx)}

Description of Additional Supplementary Files

Supplementary Data 1^{(9MB, xlsx)}

Reporting Summary^{(1.6MB, pdf)}

Transparent Peer Review file^{(39.2MB, pdf)}

Data Availability Statement

An open-source implementation of the SpaCross algorithm can be downloaded from https://github.com/wenwenmin/SpaCross.

[CR1] 1.Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature596, 211–220 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). [DOI] [PubMed] [Google Scholar]

[CR3] 3.Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.13, 1739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Min, W., Shi, Z., Zhang, J., Wan, J. & Wang, C. Multimodal contrastive learning for spatial gene expression prediction using histology images. Brief. Bioinform.25, bbae551 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Wang, T. et al. Graph attention automatic encoder based on contrastive learning for domain recognition of spatial transcriptomics. Commun. Biol.7, 1351 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Rao, N., Clark, S. & Habern, O. Bridging genomics and tissue pathology: 10× genomics explores new frontiers with the Visium spatial gene expression solution. Genet. Eng. Biotechnol. News40, 50–51 (2020). [Google Scholar]

[CR7] 7.Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell185, 1777–1792 (2022). [DOI] [PubMed] [Google Scholar]

[CR9] 9.Yu, Z. et al. Accurate spatial heterogeneity dissection and gene regulation interpretation for spatial transcriptomics using dual graph contrastive learning. Adv. Sci.12, 2410081 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Shah, S. et al. Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell174, 363–376 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Zhang, C., Dong, K., Aihara, K., Chen, L. & Zhang, S. STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning. Nucleic Acids Res.51, e103–e103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Li, X., Zhu, F. & Min, W. SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq. Brief. Bioinform.25, bbae571 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Li, H.-S., Tan, Y.-T. & Zhang, X.-F. Enhancing spatial domain detection in spatial transcriptomics with ensdd. Commun. Biol.7, 1358 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp.2008, P10008 (2008). [Google Scholar]

[CR16] 16.Wang, H., Zhao, J., Nie, Q., Zheng, C. & Sun, X. Dissecting spatiotemporal structures in spatial transcriptomics via diffusion-based adversarial learning. Research7, 0390 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. Cellcharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet.56, 74–84 (2024). [DOI] [PubMed] [Google Scholar]

[CR18] 18.Yuan, Z. Mender: fast and scalable tissue structure identification in spatial omics data. Nat. Commun.15, 207 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods18, 1342–1351 (2021). [DOI] [PubMed] [Google Scholar]

[CR20] 20.Xu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med.16, 12 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Hou, Z. et al. GraphMAE: Self-supervised masked graph autoencoders. in Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), 594–604 (ACM, 2022).

[CR22] 22.Tu, W. et al. RARE: Robust masked graph autoencoder. IEEE Trans. Knowl. Data Eng.36, 5340–5353 (2023). [Google Scholar]

[CR23] 23.Chen, Y., Zhen, C., Mo, Y., Liu, J. & Zhang, L. Multiscale dissection of spatial heterogeneity by integrating multi-slice spatial and single-cell transcriptomics. Adv. Sci.12, 2413124 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Lin, L. et al. STMGraph: spatial-context-aware of transcriptomes via a dual-remasked dynamic graph attention model. Brief. Bioinform.26, bbae685 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Fang, D., Zhu, F., Xie, D. & Min, W. Masked graph autoencoders with contrastive augmentation for spatially resolved transcriptomics data. In Proc. 2024 IEEE Int. Conf. Bioinform. Biomed. (BIBM), 515–520 (IEEE, 2024).

[CR26] 26.Min, W., Fang, D., Chen, J. & Zhang, S. SpaMask: Dual masking graph autoencoder with contrastive learning for spatial transcriptomics. PLoS Comput. Biol.21, e1012881 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Lee, N., Lee, J. & Park, C. Augmentation-free self-supervised learning on graphs. in Proc. AAAI Conf. on Artif. Intell., 7372–7380 (AAAI Press, 2022).

[CR28] 28.Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci.2, 399–408 (2022). [DOI] [PubMed] [Google Scholar]

[CR29] 29.Nie, W., Yu, Y., Wang, X., Wang, R. & Li, S. C. Spatially informed graph structure learning extracts insights from spatial transcriptomics. Adv. Sci.11, 2403572 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun.14, 1155 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Zeng, Y. et al. Identifying spatial domain by adapting transcriptomics with histology through contrastive learning. Brief. Bioinform.24, bbad048 (2023). [DOI] [PubMed] [Google Scholar]

[CR32] 32.Zhou, X., Dong, K. & Zhang, S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput. Sci.3, 894–906 (2023). [DOI] [PubMed] [Google Scholar]

[CR33] 33.Wang, G. et al. Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks. Nat. Mach. Intell.5, 1200–1213 (2023). [Google Scholar]

[CR34] 34.Zeira, R., Land, M., Strzalkowski, A. & Raphael, B. J. Alignment and integration of spatial transcriptomics data. Nat. Methods19, 567–575 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Xu, H. et al. SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat. Commun.14, 7603 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Arun, K. S., Huang, T. S. & Blostein, S. D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell.1, 698–700 (1987). [DOI] [PubMed] [Google Scholar]

[CR37] 37.Xu, C. et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res.50, e131–e131 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Cui, Y. et al. DiffusionST: a deep generative diffusion model-based framework for enhancing spatial transcriptomics data quality and identifying spatial domains. Brief. Bioinform.26, bbaf390 (2025). [DOI] [PubMed] [Google Scholar]

[CR39] 39.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods21, 712–722 (2024). [DOI] [PubMed] [Google Scholar]

[CR41] 41.Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361, eaat5691 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmfish. Nat. Methods15, 932–935 (2018). [DOI] [PubMed] [Google Scholar]

[CR43] 43.Guo, T. et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol.24, 241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol.21, 12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Richardson, L. et al. EMAGE mouse embryo spatial gene expression database: 2014 update. Nucleic Acids Res.42, D835–D844 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Ortiz, C. et al. Molecular atlas of the adult mouse brain. Sci. Adv.6, eabb3446 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 1–5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc.66, 846–850 (1971). [Google Scholar]

[CR49] 49.Amelio, A. & Pizzuti, C. Correction for closeness: adjusting normalized mutual information measure for clustering comparison. Comput. Intell.33, 579–601 (2017). [Google Scholar]

[CR50] 50.Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362, eaau5324 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

SpaCross deciphers spatial structures and corrects batch effects in multi-slice spatially resolved transcriptomics

Donghai Fang

Wenwen Min

Abstract

Introduction

Results

Overview of SpaCross

Fig. 1. Overview of the SpaCross framework for spatial transcriptomics analysis.

SpaCross enhances clustering and layer-specific identification in DLPFC

Fig. 2. SpaCross enables accurate and biologically consistent spatial domain identification in the human DLPFC.

SpaCross robustly delineates tissue structures across diverse experimental platforms

Fig. 3. SpaCross robustly delineates tissue structures across diverse experimental platforms.

SpaCross corrects batch effects in consecutive tissue slices

Fig. 4. Comparative evaluation of multi-slice integration methods on consecutive DLPFC tissue sections.

SpaCross balances developmental consistency and stage-specific variability

Fig. 5. SpaCross enables spatiotemporal trajectory reconstruction of mouse embryonic heart development.

SpaCross enables cross-platform integration of spatially resolved data

Fig. 6. Cross-platform integration of mouse olfactory bulb (MOB) spatial transcriptomics datasets using SpaCross.

SpaCross generalizes to complex multi-slice and multi-tissue contexts

Fig. 7. Generalization of SpaCross to multi-slice and multi-tissue spatial transcriptomics datasets.

SpaCross ablation experiments validate the efficacy of individual modules

Computational performance and scalability

Discussion

Methods

Data preprocessing and spatial graph construction

Data augmentation with spot masking

Latent representation learning via masked reconstruction

Graph encoding

Representation predicting

Feature decoding

Reconstruction loss in the raw space

Latent space guidance via CMLC

Complementary graph encoding

Consistency loss in the latent space

Discriminative representation learning in AHSG

Adaptive hybrid spatial-semantic graph (AHSG) construction

Hybrid feature aggregation

Contrastive loss

Comprehensive loss function

Spatial clustering and visualization

Identification of SVGs and meta-genes

Experimental details

Evaluated metrics and criteria

Comparison with other methods

Statistics and reproducibility

Reporting summary

Supplementary information

Acknowledgements

Author contributions

Peer review

Peer review information

Data availability

Code availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases