Skip to main content
Communications Biology logoLink to Communications Biology
. 2025 Sep 30;8:1393. doi: 10.1038/s42003-025-08810-5

SpaCross deciphers spatial structures and corrects batch effects in multi-slice spatially resolved transcriptomics

Donghai Fang 1,#, Wenwen Min 1,✉,#
PMCID: PMC12484869  PMID: 41028333

Abstract

Spatially Resolved Transcriptomics (SRT) has revolutionized tissue architecture analysis by integrating gene expression with spatial coordinates. However, existing spatial domain identification methods struggle with unsupervised learning constraints, lack of implicit supervision in latent space, and challenges in balancing local spatial continuity with global semantic consistency, particularly in multi-slice integration. To address these issues, we propose SpaCross, a comprehensive deep learning framework for SRT that enhances spatial pattern recognition and cross-slice consistency. SpaCross employs a cross-masked graph autoencoder to reconstruct gene expression features while preserving spatial relationships and mitigating identity mapping issues. A cross-masked latent consistency module reinforces implicit constraints on latent representations, improving feature robustness. More importantly, an adaptive spatial-semantic graph structure dynamically integrates local and global contextual information, enabling effective multi-slice integration. Extensive evaluations demonstrate that SpaCross outperforms thirteen state-of-the-art methods on single-slice datasets and achieves robust batch effect correction while preserving biologically meaningful spatial architectures in multi-slice integration. Notably, SpaCross integrates embryonic mouse tissues across developmental stages, identifying conserved regions and uncovering stage-specific structures such as the dorsal root ganglion. In the heart domain, it reconstructs developmental trajectories capturing key transcriptional transitions and gene programs associated with cardiac maturation.

Subject terms: Functional clustering


SpaCross uses a crossmasked graph autoencoder with adaptive spatialsemantic integration to advance multi-slice spatial transcriptomics and reveal conserved and stagespecific tissue structures.

Introduction

The organizational function of multicellular organisms depends on the precise spatial coordination and regulation between cells1. Traditional single-cell sequencing technologies can analyze cellular expression heterogeneity, but they fail to capture the “spatial code" that governs tissue function. The breakthrough development of Spatially Resolved Transcriptomics (SRT) has coupled gene expression profiles with spatial coordinates, offering a novel paradigm for revealing the molecular blueprint of tissue structure24. The current SRT technological framework is broadly categorized into two types5: in situ capture sequencing platforms and in situ hybridization (ISH)-based detection strategies. The former, represented by technologies such as 10x Visium6, Slide-Seq7, Stereo-Seq8, and spatial transcriptomics (ST)2, integrates spatial localization information with high-throughput sequencing to enable transcriptomic analysis of thousands of genes within the tissue microenvironment9. The latter includes methods such as MERFISH10 and seqFISH11, which break the single-cell resolution limit and precisely establish the mapping between gene expression profiles and spatial coordinates9. These advances enable researchers to gain deeper insights into the spatial organization of biology and the progression of diseases12,13.

The spatial complexity of biological tissues arises from the differentiation of heterogeneous regions, which form spatial domains with specific biological characteristics. Therefore, interpreting spatial domains is a key challenge in the SRT field for understanding physiology and pathology3. Current spatial transcriptomic clustering methods can be divided into two categories14: non-spatial clustering methods and spatial clustering methods. Classical non-spatial methods (such as K-means and Louvain15) partition spatial domains based solely on gene expression matrices, neglecting the biological relationships between spatially adjacent sequencing spots16. To overcome this limitation, spatial clustering methods explicitly incorporate spatial information to improve domain identification. Among them, some methods do not rely on graph-based learning frameworks. For example, CellCharter17 employs a latent representation learned from gene expression using variational autoencoders, and integrates spatial adjacency during clustering without the use of graph convolutions. MENDER18, on the other hand, is a fast and scalable approach that constructs multiscale spatial context representations by computing the distribution of cell states across multiple neighborhood ranges. These non-GNN spatial methods provide interpretable and computationally efficient solutions, especially suitable for large-scale datasets.

Among spatial clustering methods, graph neural network (GNN)-based approaches have emerged as a powerful framework to jointly model spatial topology and transcriptomic features. For example, SpaGCN19 integrates spatial distance and histological features into an adjacency matrix, combining gene expression data and using Graph Convolutional Networks (GCNs) to learn graph embeddings under unsupervised clustering loss. SEDR20 uses deep autoencoder networks to learn spatial representations and embed spatial information using variational autoencoders simultaneously. STAGATE3 has developed an adaptive graph attention autoencoder to learn low-dimensional latent representations of SRT data and identify spatial domains. Although these methods combine gene expression and spatial information for spatial domain identification, they rely entirely on unsupervised learning, which leads to embedding representations that lack accuracy and are inconsistent with pathological annotations9.

In recent years, the development of generative self-supervised learning frameworks based on masking mechanisms has rapidly advanced2123. By randomly masking parts of the input features, these frameworks force the model to predict the masked content based on contextual information, thus guiding the model to learn better clustering representations. For example, STMGraph24, based on a dynamic graph attention network, employs a mask-remask mechanism to establish dual-decoding views, allowing embeddings to preserve unmasked features while reconstructing the masked features. STMGAC25 introduces a self-distilled masked graph autoencoder with contrastive triplet learning, which provides latent space supervision and improves spatial clustering accuracy. SpaMask26 constructs a dual-masked graph autoencoder, where node and edge masks ensure that spatially neighboring spots are similar at the feature level. Although these methods effectively avoid the pitfalls of traditional graph autoencoders, which only reconstruct based on adjacency matrices, their supervision signals are limited to the explicit reconstruction task in the raw feature space. This neglects implicit supervision in the latent space, which weakens the robustness of the model.

Graph Contrastive Learning (GCL), as an emerging self-supervised learning framework, constructs positive and negative sample pairs to drive the model to learn discriminative low-dimensional embeddings, thus overcoming the impact of data noise and high dimensionality on clustering2729. For instance, GAAEST5 constructs a neighborhood graph based on spatial locations, encodes gene expression using a graph attention network, and applies contrastive learning at local, global, and contextual levels to enhance spatial representation learning. GraphST30 introduces self-supervised graph contrastive learning to learn information representations of gene expression maps and their spatial coordinates, enriching the latent representations. ConGI31 applies contrastive learning to adapt gene expression to tissue pathology images, thus deciphering spatial domains. stDCL9 integrates spatial location and gene expression information, using spatially aware contrastive learning and clustering-level feature contrastive learning mechanisms to identify spatial domains in complex tissue structures. However, these GCL-based spatial domain identification models often capture only local or global information, making it difficult to balance the two, resulting in a disconnection between spatial and expression information and blurry domain boundaries. They heavily rely on the refinement process of clustering results.

Finally, multi-slice integration faces significant methodological challenges23,32. Inconsistencies in the coordinate systems of consecutive tissue slices create geometric integration obstacles, making it difficult for single-slice clustering methods that rely on spatial coordinates to establish neighborhood relationships (e.g., SpaGCN19) to integrate multiple slices due to the lack of cross-slice information transfer mechanisms. Although some methods, such as GraphST30 and STitch3D33, first align spatial coordinates to eliminate physical batch effects before integrating expression data, they struggle to alleviate batch effects through rigid coordinate alignment alone when facing slices with significant physical deformations34. Moreover, methods that do not depend on multi-slice spatial coordinate alignment, such as Splane35 (a discriminative model-based approach) and SEDR20 (which relies on the Harmony process), have difficulty eliminating technical biases from different sequencing platforms due to differences in slice thickness and molecular capture efficiency.

To address these limitations, we propose SpaCross, a framework that integrates a cross-masked graph autoencoder with adaptive spatial-semantic fusion for robust spatial domain identification and multi-slice integration. SpaCross leverages a masking-enhanced graph autoencoder to reconstruct gene expression features through cross-masked inputs that explicitly preserve spatial relationships while mitigating identity mapping issues, and it employs a cross-masked latent consistency (CMLC) module to enforce implicit constraints between latent representations derived from dual-masking perspectives, thereby enhancing feature robustness. Importantly, an adaptive hybrid spatial-semantic graph (AHSG) structure dynamically integrates local spatial continuity and global semantic consistency, which further facilitates effective multi-slice integration.

Our comprehensive evaluations demonstrate the superior performance of SpaCross across diverse spatial transcriptomic scenarios. On single-slice datasets, SpaCross outperforms thirteen state-of-the-art methods in spatial domain identification, showing strong robustness to technical variation. In multi-slice integration tasks-including human dorsolateral prefrontal cortex (DLPFC) and mouse hypothalamic preoptic area datasets-SpaCross effectively balances batch correction with the preservation of biologically coherent architectures. In developing mouse embryos, SpaCross reconstructs spatiotemporal tissue dynamics across stages E9.5 to E11.5, identifying both conserved domains such as the heart and brain, and stage-specific structures like the dorsal root ganglia. Additionally, SpaCross enables cross-platform integration of mouse olfactory bulb data, capturing shared laminar structures and resolving platform-specific variation. These results highlight SpaCross as a generalizable framework for spatial analysis across complex developmental and anatomical contexts.

Results

Overview of SpaCross

SpaCross is a comprehensive analytical framework designed for spatial transcriptomics data, aiming to enhance the accuracy of spatial pattern recognition and cross-slice consistency. The approach comprises key modules including data preprocessing, masked-enhanced self-supervised learning, hybrid graph modeling, and cross-slice integration, utilizing graph neural networks and contrastive learning to jointly model spatial and semantic information (see Fig. 1).

Fig. 1. Overview of the SpaCross framework for spatial transcriptomics analysis.

Fig. 1

A Workflow of multi-slices integration by SpaCross. Integration of gene expression matrices, filtering, PCA for dimensionality reduction, 3D spatial alignment, and construction of a k-NN graph to preserve spatial continuity and cross-slice semantic propagation. B Masked-enhanced learning and hybrid graph modeling. The top and bottom modules form the cross-masked self-supervised learning framework, which generates complementary masked views to enable feature reconstruction and enforce latent consistency (CMLC). The middle module constructs an adaptive hybrid spatial-semantic graph (AHSG) that integrates local spatial neighbors with global semantic clusters, further optimized via contrastive learning. C Downstream applications. SpaCross enables complex spatial domain identification in tissue slices, supporting integration of consecutive slices, cross-developmental stages, and cross-platform data.

To support the integrative analysis of multi-slice spatial transcriptomics data, SpaCross performs data preprocessing by first integrating gene expression matrices from multiple slices along the spot dimension. The data is then filtered and normalized to remove low-quality genes, retaining only highly variable genes for subsequent modeling. Next, principal component analysis (PCA) is applied for dimensionality reduction to decrease computational complexity. Subsequently, SpaCross incorporates a 3D spatial registration method, utilizing the iterative closest point (ICP)33,36 algorithm to align spatial coordinates across different slices and dynamically construct a 3D adjacency matrix, ensuring the continuity of spatial relationships. Based on the adjusted 3D spatial coordinates, Euclidean distances are computed to construct a k-nearest neighbor (k-NN) graph, thereby forming the topological structure of the spatial graph. This ensures that the model captures the spatial continuity of neighboring spots while preserving cross-slice semantic propagation (Fig. 1A).

To enhance the model’s robustness against missing data and noise, SpaCross introduces a cross-masked self-supervised learning mechanism (Fig. 1B). Specifically, two complementary masked views are randomly generated on the input features, serving respectively for feature reconstruction and latent space consistency learning. The masked feature matrix, which simulates missing information during training, improves the model’s imputation capability, while the complementary mask provides consistent supervision for the latent space, effectively mitigating overfitting in the autoencoder. The graph encoder, integrating feedforward neural networks and graph convolutional networks, leverages the graph structure to propagate features among neighboring nodes, resulting in more robust latent representations.

In the latent representation learning process, SpaCross further enhances the reliability of these representations through a cross-masked latent consistency (CMLC) mechanism (Fig. 1B). The model aligns latent embeddings generated from complementary views using contrastive learning, thereby reinforcing consistency across different views and ensuring stable feature representation even when handling incomplete data or diverse data augmentation views.

To address the insufficient integration of local and global information in SRT data, SpaCross has developed an adaptive hybrid spatial-semantic graph (AHSG) modeling method (Fig. 1B). Based on the latent embeddings, the model selects local spatial neighbors and global semantic cluster neighbors, fusing them into a unified mixed neighborhood. This strategy not only preserves spatial continuity but also introduces semantic consistency across regions, significantly enhancing the discriminative power in downstream tasks. By aggregating mixed neighborhood features and optimizing node embeddings with contrastive learning, the model effectively delineates the boundaries between different categories.

In downstream tasks (Fig. 1C), SpaCross enables complex spatial domain identification within tissue slices while preserving spatial continuity and clustering accuracy. It supports the integration of consecutive slices, cross-developmental stage comparisons, and cross-platform datasets. These capabilities provide a unified framework for spatial omics studies in development, disease, and cross-species research.

SpaCross enhances clustering and layer-specific identification in DLPFC

We comprehensively assessed SpaCross’s performance in spatial domain identification by benchmarking it against thirteen state-of-the-art methods spanning diverse modeling paradigms, including classical probabilistic models, graph-based frameworks, generative architectures, and contrastive learning strategies. For clarity, we categorized these methods based on their core algorithmic principles, while acknowledging that some span multiple paradigms. Non-GNN-based models included CellCharter17, which utilizes scVI-based probabilistic embeddings, and MENDER18, a scalable proximity-based method. Graph-based approaches such as SpaGCN19, SEDR20, STAGATE3, and DeepST37 explicitly incorporate spatial adjacency via graph neural networks. Generative models like SpaMask26, STMGAC25, and DiffusionST38 enrich spatial representations through masked gene prediction or diffusion modeling. Methods employing contrastive learning, including GraphST30, stDCL9, GAAEST5, and CCST28, enhance embeddings by contrasting intra- and inter-domain spot pairs.

All methods were evaluated on the human dorsolateral prefrontal cortex (DLPFC) dataset from the 10x Visium platform39, which includes 12 annotated tissue sections categorized into six cortical layers (L1-L6) and white matter (WM) by Maynard et al.39.

To quantify clustering accuracy, we computed the Adjusted Rand Index (ARI) and clustering Accuracy (ACC), where higher values indicate better correspondence with manual annotations (Fig. 2A; see Methods). Each method was run 10 times with different random seeds across all 12 DLPFC slices to ensure robustness and statistical rigor. As all models operated under identical settings-using the same number of spatial domains per slice-their outputs are directly comparable. Importantly, we retained the original post-processing procedures specified by each baseline method when evaluating both ARI and ACC, to ensure a fair and faithful comparison. For each run, we calculated the ARI and ACC per slice, averaged the results, and applied Wilcoxon signed-rank tests to assess statistical significance between methods (Fig. 2A; the experiments of SpaCross under 50 random seeds are shown in Supplementary Fig. 1).

Fig. 2. SpaCross enables accurate and biologically consistent spatial domain identification in the human DLPFC.

Fig. 2

All (BG) are based on slice 151675. A Boxplots of ARI (left) and ACC (right) across 12 annotated DLPFC slices, comparing SpaCross with 13 state-of-the-art methods under identical settings. Each method was repeated with 10 random seeds; significance was assessed using the Wilcoxon signed-rank test (*p < 0.05, **p < 0.01, ***p < 0.001). B Spatial domain maps including manual annotations (top-left), SpaCross results under clustering numbers n = 7 and n = 8, and results from representative baseline methods. C UMAP visualization of learned embeddings and PAGA trajectories, shown for SpaCross and representative baseline methods. D Spatial expression patterns of selected SVGs identified by SpaCross, including PLP1 (Domain 8), NEFL (Domain 5), and ENC1 (Domain 3). (Bottom right) Boxplot comparison of Moran’s I for SVGs across different methods (higher values indicate stronger spatial autocorrelation). E Pearson correlation matrix of gene expression across spatial domains based on SpaCross results under the 8-domain setting. F Volcano plot of differentially expressed genes between Domain_1 and Domain_2, highlighting selected marker genes. G Spatial expression maps of representative marker genes.

SpaCross significantly outperformed 11 out of 13 baseline methods in terms of ARI, and 10 out of 13 in terms of ACC, based on Wilcoxon signed-rank tests across all slices (Fig. 2A). These results underscore its robust and consistent performance across diverse integration paradigms. Notably, SpaCross achieved the highest mean ARI (0.557) and ACC (0.667), as well as the highest median ACC (0.673) among all evaluated approaches. Compared to non-GNN models such as CellCharter and MENDER, SpaCross demonstrated significantly better alignment with manual annotations (p < 0.05). When evaluated against GNN-based methods like STAGATE and SEDR, SpaCross achieved consistently superior ARI and ACC values, with all improvements reaching strong statistical significance (p < 0.001). While generative models such as SpaMask and STMGAC yielded competitive performance, they still lagged behind SpaCross (p < 0.05). Among contrastive learning methods, GraphST exhibited the closest performance to SpaCross. However, SpaCross maintained higher median scores and reduced inter-slice variability, reflecting its improved stability and generalizability. Although differences with GraphST and stDCL did not reach statistical significance in ARI, SpaCross still outperformed both in mean scores (0.557 vs. 0.543 and 0.517, respectively). These findings confirm SpaCross as a robust and accurate tool for spatial domain identification. Supplementary Fig. 2 presents the visualized spatial domain assignments across all 12 DLPFC slices, while Supplementary Text 1 and Supplementary Fig. 3 further discuss the key architectural differences between SpaCross and related methods such as SpaMask, STMGAC, and GAAEST.

For in-depth evaluation, we analyzed DLPFC slice 151675. As illustrated in Fig. 2B, SpaCross not only achieved the highest clustering scores (ARI = 0.683, ACC = 0.761) but also produced spatial domains with clearly delineated boundaries and minimal spot intermixing. In contrast, DeepST exhibited discontinuous laminar transitions, especially between Domains 2 and 4. SEDR misassigned Domain 1 into Domain 6, while STAGATE, though generally effective, failed to separate Domain 1 from Domain 6. Post-processing-dependent methods DiffusionST, GraphST, GAAEST, and stDCL-improved domain continuity via label refinement19, but still displayed localized embedding inconsistencies (Fig. 2C). Among them, SpaMask and STMGAC exhibited moderate robustness but still suffered from occasional misclassifications, particularly at cortical layer boundaries. In comparison, GAAEST was particularly sensitive to refinement: its performance deteriorated substantially without this step, with marked drops in ARI and ACC. To further evaluate the robustness and generalizability of SpaCross, we conducted a comparative analysis with GAAEST under varying post-processing refinement radii across all 12 DLPFC slices. Although GAAEST occasionally achieved higher ARI on a subset of slices when the refinement radius exceeded 30, SpaCross consistently outperformed GAAEST on the majority of slices in both ARI and ACC, even without any post-hoc refinement (Supplementary Figs. 46). These results underscore SpaCross’s inherent ability to produce accurate and biologically meaningful domains without relying on parameter-sensitive refinement procedures.

To elucidate these discrepancies, we visualized latent embeddings using UMAP (Fig. 2C). SpaCross displayed a clear linear trajectory from Domain 1 to Domain 7, aligning with cortical layer stratification. In comparison, stDCL and GraphST presented overlapping and disordered embeddings, reflecting weaker separation of spatial heterogeneity. Their limitations likely stem from neglecting global cluster semantics during training. To quantify embedding quality, we computed the Discreteness Score (DIS)40, where lower values indicate stronger topological coherence. SpaCross achieved the lowest DIS (0.0394), outperforming stDCL (0.1675) and GraphST (0.0831), with consistent trends observed across all slices (Supplementary Fig. 2).

SpaCross’s ability to identify spatially variable genes (SVGs) was also evaluated using a SpaGCN-inspired pipeline19. In slice 151675, SpaCross detected 30 SVGs with distinct spatial patterns, including 23 genes (e.g., PLP1) in Domain 8, one gene (NEFL) in Domain 5, and six genes (e.g., ENC1) in Domain 3 (Fig. 2D). We compared Moran’s I metrics for these SVGs with five leading methods. SpaCross showed the highest median Moran’s I values, indicating stronger spatial autocorrelation, although the differences did not reach statistical significance (Mann-Whitney U test, p > 0.05).

Further analysis of slice 151675 revealed challenges shared across methods in resolving subtle boundaries, particularly between layers 1–2 and 3–4 (Fig. 2B). For example, stDCL and GraphST did not recover a distinct layer 4; SEDR and DeepST yielded fragmented or overextended boundaries; and STAGATE, though accurate overall, struggled to separate layers 1 and 2. These challenges reflect broader modeling limitations rather than issues unique to SpaCross.

The default configuration of SpaCross assigned seven domains per slice, matching the six annotated layers and white matter. This choice promotes consistency with manual labels but may restrict detection of finer-grained structures. Although SpaCross achieved the highest ARI and ACC under this constraint, spatial maps showed layer 1 split into two domains and layer 4 absent, suggesting over- and under-segmentation (Fig. 2B).

To probe whether these results stemmed from biological similarity or model limitations, we increased the number of domains to eight (Fig. 2B). The revised segmentation clearly separated layers 3 and 4 into Domain_4 and Domain_5, demonstrating SpaCross’s ability to resolve finer distinctions when unconstrained.

To investigate the biological basis of the inferred boundaries, we computed pairwise Pearson correlations of gene expression across domains (Fig. 2E). Domain_4 and Domain_5 (layers 3 and 4) had high similarity (correlation = 0.76), explaining their earlier merging. In contrast, Domain_1 and Domain_2-both from layer 1-had low correlation (0.24), indicating marked molecular heterogeneity.

This heterogeneity was further validated via differential expression analysis between Domain_1 and Domain_2. The volcano plot (Fig. 2F) highlighted domain-specific marker genes: MALAT1, MGP, and MYH11 (vascular/stromal cells) in Domain_1, and glial/neuronal markers CXCL14 and AQP4 in Domain_2. Spatial expression maps (Fig. 2G) confirmed distinct localization, supporting the conclusion that layer 1 subdivision reflects real biological variation rather than model artifacts.

SpaCross robustly delineates tissue structures across diverse experimental platforms

To systematically validate the cross-platform robustness of the SpaCross algorithm, we first conducted performance evaluation using a complex human breast cancer (BRCA) dataset generated by the 10x Visium platform20. This dataset contains 20 histopathological regions meticulously annotated by Xu et al.20 based on H&E-stained images, encompassing ductal carcinoma in situ/lobular carcinoma in situ (DCIS/LCIS), healthy tissue, invasive ductal carcinoma (IDC), and tumor edge regions (Fig. 3A). Experimental results demonstrated SpaCross’s significant performance advantages: it achieved the highest spatial domain clustering accuracy (ARI = 0.65 and ACC = 0.72), outperforming suboptimal methods DiffusionST (ARI = 0.58) and DeepST (ARI = 0.58) by over 0.07 (Fig. 3C). This improvement margin aligns with recent advancements in spatial clustering methodologies, where state-of-the-art approaches typically show high performance gains over predecessors (Supplementary Table 1). For visual comparison across methods, spatial domain identification results of all baseline models are provided in Supplementary Fig. 7.

Fig. 3. SpaCross robustly delineates tissue structures across diverse experimental platforms.

Fig. 3

A Histological image (left) and manually annotated spatial domains (right) for the human breast cancer (BRCA) dataset from 10x Visium. B Boxplot comparisons of Moran’s I and Geary’s C metrics for SVGs identified by SpaCross and baseline methods on the BRCA dataset. Mann-Whitney U tests were used to assess statistical significance. C Spatial domain maps on the BRCA dataset generated by SpaCross and representative baseline methods. D Spatial domain identification results on the mouse primary visual cortex (MVC) dataset (STARmap), including manual annotation (top-left) and results from SpaCross and baseline methods. E UMAP visualizations and PAGA trajectories of learned embeddings on the MVC dataset, shown for SpaCross and baseline methods. F Evolution of hybrid spatial-semantic graphs during SpaCross optimization on the MVC dataset. Left: gene expression similarity graph based on PCA-derived correlations. Right: hybrid graphs at epochs 50, 100, and 300, showing progressive refinement of intra-domain structure. G Spatial domain identification results on the mouse somatosensory cortex (MSC) dataset (osmFISH), including manual annotation (leftmost) and spatial domains identified by SpaCross and baseline methods.

At the tissue topology resolution level, SpaCross effectively addressed segmentation biases observed in existing algorithms (including stDCL, DiffusionST, GraphST, and SEDR) towards IDC_2 and IDC_5 regions. As shown in Fig. 3C, while these methods erroneously fragmented contiguous IDC areas into discrete subclusters despite clear block-like pathological features, SpaCross’s identification results exhibited remarkable consistency with manual annotations. Quantitative analysis through inter-cluster similarity matrices (Supplementary Fig. 8) revealed strong spatial co-localization between SpaCross-defined domains 8, 20, 5 and IDC_1, IDC_2, IDC_4, respectively. Notably, the algorithm precisely delineated dynamic tumor edge transition zones, mapping spatial domain 1 to Tumor_edge_3 and domain 3 to Tumor_edge_2 (Fig. 3C), demonstrating exceptional tumor-stroma boundary resolution capability comparable to recent contrastive learning frameworks.

SVG analysis further confirmed SpaCross’s superiority. Since each method employs distinct criteria for identifying spatially variable genes (SVGs), the resulting gene sets differ in both content and size, violating the matched-sample assumption of the Wilcoxon signed-rank test. Therefore, we adopted the Mann-Whitney U test, which is more appropriate for comparing two independent distributions with potentially unequal sample sizes. On the BRCA dataset (Fig. 3B), this test revealed statistically significant differences in Moran’s I values between SpaCross and five representative baselines: p = 0.0031 (stDCL), 7.76 × 10−10 (GraphST), 0.0365 (STAGATE), 8.82 × 10−7 (SEDR), and 0.0008 (SpaGCN). This provides strong evidence that SpaCross identifies SVGs with significantly higher spatial autocorrelation. Geary’s C values exhibited consistent trends, with SpaCross achieving significantly lower values, indicating better spatial coherence. These findings demonstrate that SpaCross not only aligns well with pathological annotations but also statistically outperforms other methods in capturing spatial expression patterns.

Furthermore, we evaluated the spatial domain identification performance of SpaCross and baseline methods on a more complex mouse brain dataset from the 10x Visium platform, which contains 52 spatial domains. SpaCross achieved the highest accuracy among all methods, with an ARI of 0.52, and exhibited more spatially continuous domain segmentation. See Supplementary Fig. 9 and Supplementary Table 1 for details.

Next, we conducted a systematic evaluation of anatomical structure delineation capabilities on the mouse primary visual cortex (MVC)41 dataset generated by the STARmap platform, which contains manually annotated anatomical regions including the corpus callosum (CC), hippocampus (HPC), and six neocortical layers (L1–L6). Experimental results demonstrated that SpaCross significantly outperformed all benchmark methods in terms of both accuracy and anatomical plausibility (Fig. 3D). Notably, the non-GNN method MENDER achieved an ARI of 0.65, outperforming several GNN-based methods; however, none of the existing baselines surpassed this score–except SpaCross, which achieved an ARI of 0.70 (Fig. 3D, Supplementary Fig. 10A and Supplementary Table 1). Despite its relatively high ARI, MENDER failed to distinguish the L1 and L2/3 layers, resulting in anatomically inconsistent boundaries.

Consistent with previous findings, stDCL failed to separate the HPC and CC regions. GraphST and SpaGCN exhibited severe cell admixture across cortical layers (Supplementary Fig. 10). STAGATE and SEDR were unable to resolve the boundary between L2/3 and L4. CCST erroneously fragmented the CC region into multiple subdomains and misclassified HPC together with L1 as a single anatomical domain. In contrast, SpaCross achieved near-perfect concordance with gold-standard annotations, producing sharply defined domain boundaries without cellular intermixing and effectively capturing the full laminar and structural architecture of the visual cortex.

UMAP visualization of latent embeddings (Fig. 3E and Supplementary Fig. 10B) revealed that SpaCross captured a linear topological structure aligning with cortical layer development. In contrast, CCST exhibited erroneous overlaps between domain 1 and 5 centroids, while GraphST, STAGATE, and SEDR generated biologically uninterpretable embedding distributions. To elucidate the biological relevance of SpaCross’s representations, we tracked the dynamic evolution of hybrid spatial-semantic proximity graphs during iterative optimization (Fig. 3F). Initial PCA-based gene expression similarity networks (left panel) showed intra-domain correlations only in CC, with anomalous cross-domain similarities prevalent in other regions. Through iterative refinement, SpaCross progressively strengthened intra-domain associations while maintaining biologically meaningful inter-domain relationships, demonstrating effective integration of spatial proximity and global co-expression patterns to establish biologically coherent topological representations.

Finally, we conducted quantitative benchmarking on the mouse somatosensory cortex (MSC)42 dataset generated by the osmFISH platform. In terms of anatomical accuracy measured by ARI, most benchmark methods achieved scores below 0.6, while the non-GNN method CellCharter reached an ARI of 0.66. SpaCross demonstrated the best performance with an ARI of 0.67 (Fig. 3G, Supplementary Fig. 11 and Supplementary Table 1). Notably, stDCL exhibited domain misassignment in Layer 5 through improper mixing of multiple anatomical regions, and GraphST erroneously bisected Layer 4 into two discrete partitions. In contrast, SpaCross not only precisely delineated laminar boundaries but also maintained exceptional biological fidelity in resolving spatial cellular distribution patterns (Fig. 3G and Supplementary Fig. 11).

SpaCross corrects batch effects in consecutive tissue slices

To comprehensively evaluate the cross-slice integration efficacy of SpaCross, we implemented it on independent DLPFC donor datasets and conducted comparative benchmarking against state-of-the-art multi-slice integration methods (SPIRAL43, STitch3D33, Splane35, and STAligner32). Our multidimensional evaluation framework incorporated: 1) Clustering concordance analysis using manual annotations as the gold standard (quantified by ARI and ACC); 2) Spatial domain discreteness assessment (DIS) with explicit exclusion of label refinement post-processing; 3) A composite metric F1-harmonized Local Inverse Simpson Index (F1LISI)44 to quantify the balance between batch effect removal and biological conservation. The F1LISI index (range: 0–1) integrates batch-grouped LISI (LISIbatch) and domain-grouped LISI (LISIdomain) through a tunable weighting coefficient α (as detailed in the Methods section), with higher values indicating superior technical noise elimination while preserving biological variance.

As demonstrated in Donor3 (Fig. 4A), SpaCross exhibits superior performance across all evaluation metrics, achieving median values of ARI = 0.637, ACC = 0.702, DIS = 0.0413, and F1LISI = 0.915. This dominance stems from its innovative hybrid neighbor graph architecture for multi-slice integration, which enhances spatial domain continuity through cross-slice expression similarity modeling, coupled with a latent space consistency module that improves robustness across slices. SPIRAL demonstrates competitive clustering accuracy (ARI = 0.637) and batch mixing (F1LISI = 0.915) via its dual-embedding learning mechanism (independent optimization of biological and batch embeddings). However, its spatial domain results exhibit discrete anomalous spots (Fig. 4B), reflected in the highest DIS value of 0.0908, likely due to insufficient preservation of local spatial information during domain adaptation. STitch3D, employing ICP and PASTE algorithms for spatial coordinate alignment with single-cell reference data integration, achieves suboptimal median metrics (ARI = 0.538, ACC = 0.685). Its rigid transformation assumptions limit adaptability to complex tissue deformations. STAligner shows spatial domain discontinuity in Layer 4 (Fig. 4B), indicating failure of its graph attention mechanism to capture continuous structural features in deep tissue layers. UMAP visualization (Fig. 4C) reveals Splane’s poor multi-slice integration (F1LISI = 0.671), where slice 151673 forms isolated clusters with significant intra-domain mixing, highlighting limitations in spatial constraint modeling. While GraphST and SEDR achieve adequate batch mixing, their lack of clear spatial boundaries suggests a trade-off in local feature preservation within contrastive learning frameworks. Notably, only SpaCross and SPIRAL successfully balance batch correction with preservation of biological tissue topology.

Fig. 4. Comparative evaluation of multi-slice integration methods on consecutive DLPFC tissue sections.

Fig. 4

A Quantitative benchmarking results on Donor3 slices across four evaluation metrics: clustering accuracy (ARI, ACC, Wilcoxon signed-rank test), spatial coherence (DIS, Wilcoxon signed-rank test), and batch correction performance (F1LISI). F1LISI jointly quantifies batch effect removal and biological conservation, with higher values indicating better integration quality. B Spatial domain maps identified by SpaCross, SPIRAL, STitch3D, and STAligner across integrated consecutive slices. C UMAP visualizations of integrated slices, showing batch identities (top), manual annotations (middle), and spatial domain assignments by each method (bottom).

A detailed comparative evaluation of SpaCross against related contrastive learning-based methods (STMGAC, SpaMask, and GAAEST) across multi-slice datasets is provided in Supplementary Text 2 and Supplementary Figs 12, 13. This analysis highlights SpaCross’s superior performance in integrating anatomically diverse, developmentally staged, and cross-platform spatial transcriptomic data.

SpaCross balances developmental consistency and stage-specific variability

To further explore the balance between developmental consistency (shared across stages) and stage-specific variability (regions emerging or regressing at particular stages) in embryonic tissue architecture, we applied the SpaCross algorithm to the mouse embryonic dataset8 generated by the Stereo-seq platform, integrating three developmental stages (E9.5, E10.5, and E11.5). Through cross-stage joint clustering, SpaCross identified 20 tissue-region clusters. In addition to recovering known anatomical regions, SpaCross also delineated previously unannotated sub-domains, including a hindbrain sub-region with distinct neuronal signatures. Detailed analysis is provided in Supplementary Text 3 and Supplementary Fig. 14. The clustering results aligned well with ground truth annotations45 (Fig. 5A), consistently achieving higher ARI values than baseline methods (E9.5: 0.38, E10.5: 0.35, E11.5: 0.46; Supplementary Fig. 14A). The inter-cluster correlation matrix highlighted three highly coherent spatial domains (Fig. 5B)-the heart (clusters 20/4), dorsal root ganglion (DRG, clusters 12/2), and brain (clusters 10/5)-confirming SpaCross’s robustness in cross-stage spatial alignment. Notably, the DRG region was absent at E9.5 but prominently emerged at E10.5 and E11.5 (Fig. 5C), reflecting known biological events such as neural crest cell migration and ganglion formation. Meanwhile, conserved regions like the liver and dermomyotome persisted across all stages but exhibited marked shifts in spatial extent (Supplementary Fig. 14C).

Fig. 5. SpaCross enables spatiotemporal trajectory reconstruction of mouse embryonic heart development.

Fig. 5

A Spatial domain annotations of E9.5, E10.5, and E11.5 mouse embryos. B Inter-cluster correlation matrix revealing three conserved cross-stage regions: heart (clusters 20/4), dorsal root ganglion (DRG, clusters 12/2), and brain (clusters 10/5). C Spatial domains identified by SpaCross across embryonic stages. D Region-specific marker gene expression patterns: Nppb (heart), Ppp1r17 (DRG), and Hes5 (brain), confirming biological relevance of identified domains. E Pseudotime trajectory inferred from heart-region spots (clusters 4 and 20). Left: UMAP embedding of clusters 4 and 20, colored by developmental stages (E9.5, E10.5, and E11.5); arrows indicate the progression of stage-specific centroids. Middle: Diffusion pseudotime plotted on the same UMAP, with arrows connecting centroids of bins (pseudotime intervals of 0.1) from 0 to 1, highlighting the inferred developmental trajectory. Right: Expression levels of Nppb shown in the same embedding space. Red circles highlight the same spatial region across all panels, corresponding to spots with pseudotime > 0.8. F Heatmap of representative late-upregulated genes in the heart trajectory. G GO enrichment of late-phase genes, highlighting biological processes associated with cardiac structural maturation and functional activation.

To validate the biological relevance of these domains, we examined the spatial expression of region-specific marker genes in representative domains (Fig. 5D), with further examples shown in Supplementary Fig. 14D. In the heart domain, Nppb, a marker for hormone-secreting cardiac cells, maintained high expression from E9.5 to E11.5, indicating sustained cardiac differentiation. In the DRG region, Ppp1r17, a gene associated with neural crest migration, absent at E9.5, was strongly enriched at later stages, supporting a stage-specific emergence of this domain. The brain region showed consistent Hes5 expression, with spatial distribution evolving from a diffuse pattern at E9.5 to layered structures by E11.5, mirroring the gradual organization of neuroepithelial architecture during early corticogenesis.

To further investigate transcriptional dynamics within the SpaCross-identified heart domain, we extracted all spots assigned to clusters 4 and 20-two clusters exhibiting high transcriptomic similarity in the inter-cluster correlation matrix (Fig. 5B) and spatially co-localizing with the annotated heart region-to construct a heart-specific subset for trajectory inference. Using a randomly selected spot from the E9.5 slice as the developmental origin, we applied diffusion pseudotime (DPT) analysis to reconstruct continuous transcriptional progression across the heart domain (Fig. 5E, middle).

To improve biological interpretability and avoid ambiguity in trajectory direction, we discretized pseudotime into bins of width 0.1, computed the centroid of each bin, and connected them to visualize the developmental trajectory in the middle panel of Fig. 5E. The left panel displays arrows connecting stage-specific centroids (E9.5, E10.5, E11.5), serving as a biological reference for the inferred pseudotime direction, while the right panel shows the spatial expression pattern of the late marker gene Nppb without directional overlays. The pseudotime distribution reflected a smooth developmental transition: early pseudotime included a mixture of E9.5, E10.5, and E11.5 spots, suggesting the persistence of early transcriptional features, while later pseudotime was dominated by E10.5 and E11.5 spots, corresponding to more mature cardiac states. Notably, Nppb expression increased along the pseudotime axis, consistent with its role in cardiac structural maturation and functional activation.

To examine transcriptional changes along the trajectory, we stratified the heart-region spots based on pseudotime into an early-to-intermediate group (≤0.8) and a late-phase group (>0.8). Differential expression analysis revealed a cohort of late-upregulated genes, including canonical markers of cardiac maturation and contractility such as Myh7, Myl2, and the natriuretic peptides Nppa and Nppb (Fig. 5F). GO enrichment analysis of late-phase genes highlighted processes related to cardiomyocyte structural maturation (e.g., myofibril assembly and cardiac myofibril organization) and functional activation (e.g., cGMP biosynthesis and systemic arterial blood pressure regulation) (Fig. 5G). These findings demonstrate that SpaCross not only provides spatially coherent annotations but also enables the extraction of temporally resolved, biologically meaningful transcriptional trajectories within anatomically localized domains.

SpaCross enables cross-platform integration of spatially resolved data

We next investigated SpaCross’s cross-platform integration capability by applying it to spatially resolved transcriptomic datasets of mouse olfactory bulb (MOB)8 acquired through Stereo-seq and Slide-seqV27 platforms. These platforms differ substantially in capture chemistry and spatial resolution, introducing significant batch effects that pose challenges for integration methods. Unlike the DLPFC dataset, these cross-platform datasets are known to contain pronounced technical variability that cannot be attributed to biological similarity alone. To rigorously evaluate SpaCross’s performance under such conditions, we used DAPI-stained laminar structures20 and Allen Brain Atlas annotations as biological ground truth (Fig. 6A) to assess whether SpaCross can effectively eliminate technical discrepancies while preserving biologically meaningful spatial domains. Both platforms captured MOB’s core laminar architecture, including the rostral migratory stream (RMS), granule cell layer (GCL), glomerular layer (GL), internal plexiform layer (IPL), mitral cell layer (MCL), external plexiform layer (EPL), and olfactory nerve layer (ONL), with Slide-seqV2 additionally covering the accessory olfactory bulb (AOB) and its granular layer (AOBgr), thereby providing finer spatial resolution.

Fig. 6. Cross-platform integration of mouse olfactory bulb (MOB) spatial transcriptomics datasets using SpaCross.

Fig. 6

A Schematic of MOB laminar structures from DAPI staining and Allen Brain Atlas annotations as biological reference annotations. B Spatial domain identification results from SpaCross, SPIRAL, STAligner, GraphST, SpaMask, and SEDR across Stereo-seq (top panels) and Slide-seqV2 (bottom panels) platforms. C UMAP visualization of batch correction performance (upper) and spatial domain assignments (lower) for different integration methods. D Domain-specific spatial patterns identified by SpaCross with corresponding marker gene expression validations across platforms.

Comparative analysis revealed distinct performance differences among integration methods. While SPIRAL demonstrated effective batch mixing in UMAP visualizations (Fig. 6C), its spatial domain assignments (Domains 1-9) showed chaotic partitioning inconsistent with anatomical structures (Fig. 6B). STAligner exhibited platform-specific annotation errors, misclassifying shared ONL regions as Slide-seqV2-exclusive (Fig. 6B, red circle in Fig. 6C). GraphST completely failed to integrate datasets due to its reliance on slice alignment assumptions (Fig. 6C). SpaMask, benefiting from a dual-masking mechanism, produced spatial domains with clear intra-platform boundaries (Fig. 6B). However, due to substantial differences in spatial resolution and coordinate distributions across platforms, SpaMask was unable to construct a unified 3D neighborhood graph and thus could not achieve effective cross-platform integration. Its UMAP embedding showed unintegrated batch clusters similar to GraphST (Fig. 6C). In contrast, SpaCross achieved superior spatial domain resolution, accurately identifying seven cross-platform conserved structures (RMS, GCL, MCL, EPL, GL, ONL, IPL) and Slide-seqV2-specific AOB/AOBgr regions. Its domain boundaries showed precise alignment with DAPI-revealed laminar organization (Fig. 6B), with Stereo-seq (blue) and Slide-seqV2 (orange) spots overlapping completely in shared regions while remaining distinct in platform-specific areas (Fig. 6C), demonstrating effective decoupling of technical noise from biological signals.

Notably, SpaCross enabled the robust delineation of the AOB and AOBgr regions-structures that were inconsistently resolved using either platform alone due to limitations in spatial resolution or coverage. This stable detection of AOB was further supported by distinct transcriptional signatures revealed in the integrated embedding space. Marker gene expression patterns validated spatial domain fidelity (Fig. 6D). The narrow annular MCL domain showed strong co-localization of mitral cell marker Gabra1 across platforms. GCL displayed continuous gradient expression of granule cell marker Pcp4. Domain-specific patterns were preserved for Cck (EPL), Nrep (RMS), Fabp7 (ONL), Nrsn1 (GL), and Kcnd2 (IPL). Critically, AOB and AOBgr exhibited localized enrichment of Slide-seqV2-specific genes such as Fxyd6 and Tac1, confirming the biological uniqueness of these domains.

To further characterize the AOB region, we performed differential gene expression analysis between AOB and all other spatial domains. The identified AOB-specific genes-including Ptgds, Snca, Uchl1, and Stmn2-suggest a specialized transcriptional program. Gene Ontology enrichment of these markers revealed pathways involved in vesicle trafficking, epithelial morphogenesis, and neurosecretory signaling, underscoring the dual immunomodulatory and neuromodulatory functions attributed to AOB in olfactory-driven behavioral processes. Detailed spatial domain delineation, marker gene expression, differential expression heatmaps, and enrichment results are presented in Supplementary Fig. 15.

Experimental results demonstrate that SpaCross effectively mitigates cross-platform batch effects while preserving biologically coherent spatial domains. It accurately reconstructs the multilayered architecture and enables the stable delineation of platform-specific, histologically validated regions such as the accessory olfactory bulb and its granular layer, which are not reliably identified by either platform alone due to differences in resolution and coverage.

SpaCross generalizes to complex multi-slice and multi-tissue contexts

To further validate the generalization capability of SpaCross in complex multi-slice and multi-tissue scenarios, we applied it to MERFISH-derived mouse hypothalamic preoptic area datasets consisting of five consecutive slices (Bregma coordinates: -0.04 mm to -0.24 mm)10. This dataset features single-cell resolution, with intercellular spacing (approximately micron-level) significantly smaller than inter-slice intervals (Fig. 7A). Using manually annotated regions (BST, MPA, MPN, PV, PVH, V3, and Fx) as ground truth, we compared SpaCross with baseline methods including SPIRAL, GraphST, STAligner, SEDR, and SpaMask in terms of clustering accuracy and multi-slice integration F1LISI metrics (Fig. 7B). SpaCross achieved the highest mean ARI (0.5854), mean ACC (0.643), and the highest median F1LISI (0.758), significantly outperforming other methods (ARI  < 0.5). This performance stems from SpaCross’s multi-slice hybrid graph architecture, which adaptively integrates intra-slice spatial domain constraints and inter-slice similarity, effectively mitigating cross-thick-slice variations.

Fig. 7. Generalization of SpaCross to multi-slice and multi-tissue spatial transcriptomics datasets.

Fig. 7

A Spatial distribution of five MERFISH slices (Bregma coordinates: −0.04 mm to −0.24 mm) visualized in 3D coordinate space. B Boxplots comparing SpaCross, SPIRAL, GraphST, STAligner, SEDR, and SpaMask on clustering accuracy (ARI, Wilcoxon signed-rank test) and multi-slice integration performance (F1LISI). C Manual annotations (top), SpaCross-derived spatial domains (middle), and STAligner-derived spatial domains (bottom) across five consecutive hypothalamic preoptic area slices. D Coronal anatomical reference atlas of the adult mouse brain with annotated regions. E ARI performance comparison across methods on the adult mouse whole-brain multi-slice dataset. F Spatial patterns of SpaCross-identified clusters: Cluster 11 localized to hippocampal regions (top) and Clusters 2/4 in isocortical laminar structures (bottom).

As shown in Fig. 7C, STAligner failed to distinguish the paraventricular thalamus (PVT) and medial preoptic area (MPA) domains across multiple slices and erroneously partitioned the fornix (Fx) region into two subregions mixed with MPA domains. In contrast, SpaCross-identified spatial domains across five slices exhibited high consistency with manual annotations. Notably, the paraventricular hypothalamic nucleus (PVH) domain thickness gradually decreased from -0.04 mm (left) to -0.24 mm (right), aligning with true anatomical characteristics, while PVT and Fx domain thicknesses showed increasing trends consistent with manual annotations.

We further validated the integration performance of SpaCross on an adult mouse whole-brain (AMB) multi-slice dataset constructed using a ST platform46. The experiment selected 35 coronal slice samples consecutively distributed along the anterior-posterior (AP) axis33 (Fig. 7D), which exhibit progressive morphological transitions while showing significant spatial domain heterogeneity. In terms of clustering accuracy, SpaCross achieved the best performance with a median ARI of 0.44 (Fig. 7E). Spatial domain identification results (Fig. 7F) revealed that SpaCross specifically detected cluster 11, which was precisely registered to the hippocampal region of the Allen Mouse Brain Reference Atlas, while effectively preserving biological heterogeneity in complex tissue slices during AP axis trajectory analysis. Notably, the topological structure of isocortical regions (containing clusters 4 and 8 representing distinct cellular laminar organization) demonstrated excellent spatial continuity across consecutive slices, confirming the model’s reliability in 3D spatial reconstruction. This work demonstrates SpaCross’s superiority in resolving spatially complex and variable multi-slice datasets, highlighting its potential for large-scale spatial omics analysis.

SpaCross ablation experiments validate the efficacy of individual modules

To systematically assess the contributions of SpaCross’s architectural modules and loss function designs, we conducted a series of ablation studies. These experiments evaluated the impact of key components such as the cross-masked latent consistency module (CMLC), the adaptive hybrid spatial-semantic graph (AHSG), and associated training losses. Results confirmed that each component plays a distinct and essential role in maintaining spatial coherence and cross-slice alignment. Detailed ablation analyses are provided in Supplementary Text 4 and Supplementary Figs. 16-17.

Computational performance and scalability

We also evaluated the computational performance and scalability of SpaCross to assess its applicability to large-scale spatial transcriptomics datasets. We conducted benchmarking experiments on the DLPFC dataset by systematically increasing the number of slices from 1 to 12 (covering up to 48,000 spots) and recorded multiple computational metrics including runtime, GPU memory usage, and memory cache. In addition, we compared SpaCross with six representative baseline methods across different algorithmic categories. Detailed benchmarking procedures and comparative results are provided in Supplementary Text 5 as well as Supplementary Fig. 18 and Supplementary Table 2.

Discussion

In this study, we propose SpaCross as a comprehensive deep learning framework that addresses critical limitations in multi-slice spatial transcriptomics integrated analysis. By integrating a masked graph autoencoder for reconstruction learning with a cross-masked latent consistency (CMLC) module, the approach provides dual-space supervision that combines explicit reconstruction loss in the raw feature space with implicit latent consistency constraints. This dual strategy effectively enhances the robustness and accuracy of the learned embeddings, overcoming the shortcomings of traditional unsupervised methods that often yield representations misaligned with pathological annotations.

Conventional graph autoencoders typically rely solely on unsupervised learning, which can result in latent representations that lack biological interpretability. In contrast, SpaCross incorporates complementary masked views that simulate missing data, compelling the model to predict and reconstruct gene expression features while preserving spatial context. The inclusion of the CMLC module further enforces consistency across different masked perspectives, ensuring that the latent space remains stable and biologically meaningful even in the presence of noise. This methodological innovation not only improves the feature robustness but also facilitates the accurate delineation of spatial domains.

A key innovation of SpaCross lies in its adaptive hybrid spatial-semantic graph (AHSG) structure, which harmoniously fuses local spatial continuity with global semantic coherence. By dynamically integrating information from spatial neighborhoods and semantic clusters, the framework adeptly balances fine-grained spatial details with overarching tissue architecture. This capability is particularly important for multi-slice integration, where traditional methods have struggled with batch effects and inter-slice variability. The adaptive graph construction allows SpaCross to effectively transfer information across slices, ensuring that both technical noise is minimized and true biological variations are preserved.

Experimental evaluations reinforce the practical value of these innovations. Across multiple single-slice datasets, SpaCross consistently outperformed thirteen state-of-the-art methods in spatial domain identification, demonstrating superior clustering performance and enhanced robustness. Its application to human dorsolateral prefrontal cortex data and complex MERFISH datasets confirmed that the framework not only corrects batch effects but also maintains the integrity of tissue spatial organization.

In developmental contexts, SpaCross revealed dynamic spatiotemporal patterns in embryonic mouse tissues by integrating data across stages E9.5 to E11.5. In particular, it reconstructed a continuous transcriptional trajectory within the cardiac region, capturing transitions from early to mature cardiomyocyte states. This was supported by stage-aligned pseudotime progression, sustained expression of cardiac markers such as Nppb, and enrichment of genes involved in structural maturation and contractile function. These findings demonstrate SpaCross’s ability to resolve fine-grained developmental dynamics within anatomically defined domains.

Additionally, the cross-platform integration of mouse olfactory bulb data from Stereo-seq and Slide-seqV2 further illustrates the versatility of SpaCross. The framework successfully identified shared laminar structures and resolved platform-specific subdomains, providing biologically meaningful representations across distinct technologies.

While SpaCross demonstrates many strengths, it is worth noting that the graph-based approach might require some further tuning when applied to extremely high-resolution datasets or in cases of uneven spatial sampling. Additionally, although the dual supervision mechanism significantly enhances robustness, exploring complementary strategies for latent space regularization could offer further improvements under particularly challenging data conditions.

In summary, SpaCross represents a significant advancement in the field of spatial transcriptomics. Its innovative integration of cross-masked reconstruction learning, latent consistency enforcement, and adaptive hybrid graph modeling not only improves spatial domain identification and multi-slice integration but also provides valuable insights into tissue architecture across diverse biological contexts. The theoretical and practical implications of this work establish a solid foundation for future research and applications in developmental biology, neuroscience, oncology, and beyond.

Methods

Data preprocessing and spatial graph construction

SpaCross takes gene expression data and spatial coordinates from multiple tissue slices as input (Fig 1). First, we retain the genes shared across all slices. Then, the gene expression matrices from multiple slices are concatenated along the spot dimension to obtain an integrated gene expression matrix. If there is only a single slice, this concatenation step is not necessary. The Scanpy tool47 is then used to filter out uninformative genes and perform log normalization on the entire gene expression dataset. Subsequently, the top 2000 highly variable genes are selected, and Principal Component Analysis (PCA) is applied. The first Npc principal components are chosen as features for the spatial spots, resulting in a feature matrix XRN×Npc, where N is the total number of spots.

To fully incorporate spatial information, we first employ the Iterative Closest Point (ICP)36 algorithm for spatial registration, minimizing the Euclidean distance between feature points in adjacent slices to unify the three-dimensional coordinate system (for algorithm details, refer to Supplementary Text 6). Based on this, a three-dimensional coordinate system is established, where the tissue slice plane is defined as the X-Y plane, and the z-axis represents the distance between adjacent slices. The adjacency relationship is determined using a dynamic threshold principle-if the three-dimensional Euclidean distance between two spots is less than 1.1 times the nearest neighbor distance within a slice, a topological connection is established, forming the three-dimensional adjacency matrix A. If spot j is a neighbor of spot i, then Aij = Aji = 1. The constructed adjacency matrix is subsequently used in the graph neural network for each step of the process.

It is important to note that in cross-platform data integration or data from different developmental stages, significant spatial topological variations between samples exist, and direct application of ICP registration may lead to coordinate distortion. To address this, we adopt a hierarchical modeling strategy: first, the adjacency matrices {At}t=1T for each slice are independently computed, and then a block diagonal matrix is constructed along the main diagonal. This block diagonal matrix is used as the input adjacency matrix A = diag(A1A2, . . . , AT). If there is only a single slice, the nearest-neighbor adjacency matrix is computed directly.

Data augmentation with spot masking

Before training SpaCross, we generate a masked feature matrix Xm and a complementary masked feature matrix Xcm. These matrices are used to generate the latent representation and to provide supervisory signals for the latent representation, respectively. Specifically, with a masking rate ρ, we randomly sample a masked subset Vm from the set of all spots V. In contrast, the complementary masked subset is denoted as Vcm, such that VmVcm=V and VmVcm=.

Based on the spot masking mechanism, we construct a masked feature matrix XmRN×Npc to address the “identity mapping" problem. Specifically, for any spot vi, if viVm, its corresponding feature vector is replaced with a learnable mask token x[M]RNpc, i.e., xm,i = x[M]; otherwise, xm,i = xi.

Similarly, we construct a complementary masked feature matrix XcmRN×Npc to provide persistent supervisory signals in the latent space. It is defined as follows: if viVcm, its corresponding feature vector is replaced with the mask token, i.e., xcm,i = x[M]; otherwise, xcm,i = xi.

Latent representation learning via masked reconstruction

Graph encoding

The graph encoder Fg consists of a feedforward neural network (FNN) and two layers of GCNs. It takes the spatial adjacency matrix A and the masked feature matrix Xm as input, and outputs the latent graph embedding ZgRN×d, where d is the dimensionality of the latent space. That is, Zg=Fg(A,Xm). Specifically, for the l-th layer of FNN, the input is Hf(l1), and the output features Hf(l) are given by:

Hf(l)=ELU(BN(Wf(l)Hf(l1)+b(l))) 1

where Hf(0)=Xm, Hf=Hf(L) and L = 2. ELU is the Exponential Linear Unit activation function, and BN denotes the Batch Normalization process.

Then, utilizing the information propagation mechanism of GCNs, the masked nodes can learn features from their unmasked neighboring nodes. The mathematical representation is as follows:

Zg=A~ReLU(BN(A~HfWg(0)))Wg(1) 2

where Wg(l) is the weight for the l-th layer of the GCN, and A~ is the symmetrically normalized adjacency matrix, defined as A~=D12AD12.

Once the training phase is completed, we utilize raw feature matrix X and A as the input to the graph encoder Fg and obtain the latent graph embedding Z. This representation is then used for downstream tasks such as spatial domain identification and visualization.

Representation predicting

To improve self-supervised learning in mask-based graph representations, we propose a graph predictor, Fp, for latent space self-supervision. The graph predictor takes the remasked latent representation Zm and the adjacency matrix A as input and produces the predicted representation, ZpRN×d, such that Zp=Fp(A,Zm). The remasked latent representation, Zm, is obtained by applying the remasking technique to the set of masked nodes, Vm, where node representations in the latent space are masked. Specifically, for a node vi, if viVm, its latent representation is replaced with a learnable mask token, z[RM]Rd, i.e., zm,i = z[RM]; otherwise, zm,i = zg,i.

The predicted representation, Zp, is computed using the weight matrix Wp as follows:

Zp=A~ZmWp 3

Zp will be self-supervised by the complementary representation and used to reconstruct the raw features.

Feature decoding

The feature decoder, Fd, maps Zp to the raw data space to reconstruct the raw features, resulting in X^RN×Npc. This is computed using the weight matrix Wd as follows:

X^=A~ReLU(BN(Zp))Wd 4

Reconstruction loss in the raw space

One of the primary objectives is to reconstruct the masked feature of spots in Vm, given a partially observed set of spots and their adjacency relationships. The Scaled Cosine Error (SCE) is employed as the objective function, defined under a predetermined scaling factor, γ, as follows:

LSCE=1VmviVm1simxi,x^iγ,γ1 5

Here, γ is set to 2 to diminish the contribution of simple samples during training, Vm denotes the number of elements in the masked set, and the cosine similarity, sim( ⋅ , ⋅ ), is computed as follows:

sim(x,y)=xyxy 6

Latent space guidance via CMLC

Complementary graph encoding

In unlabeled self-supervised learning models, such as those based on autoencoder structures, there is a risk of overfitting to the training data. To address this limitation, we design a Cross-Masked Latent Consistency (CMLC) module that delivers persistent supervisory signals for each masked point in the latent space. The CMLC framework implements a complementary masking strategy through graph encoder Fg, which processes the complementary masked feature matrix Xcm and adjacency matrix A to generate the complementary graph embedding ZcgRN×d according to: Zcg=Fg(A,Xcm). This complementary graph embedding Zcg provide persistent supervisory signals for guiding the self-supervised matching of predicted representations Zp in latent space.

Consistency loss in the latent space

To enforce semantic consistency between the predicted representation Zp (obtained from masked inputs Xm) and the complementary graph embedding Zcg (derived from Xcm), we employ the InfoNCE (Noise Contrastive Estimation, NCE) loss as the learning objective for the CMLC module. Moreover, this loss function operates on the masked node set Vm to align the dual-view latent spaces by contrasting node-specific agreement against perturbed negatives.

Formally, for each masked spot viVm, we treat the representations (zp,izcg,i) as a positive pair, where zp,i and zcg,i are latent vectors of spot i from Zp and Zcg, respectively. Negative pairs are constructed by pairing zp,i with embeddings of unrelated nodes {zcg,j}jNi, where Ni denotes a set of randomly sampled non-masked nodes. The NCE Loss is defined as:

LNCE=1VmviVmlogexpsim(zp,i,zcg,i)/τexpsim(zp,i,zcg,i)/τ+jNiexpsim(zp,i,zcg,j)/τ 7

where τ = 0.5 is a temperature hyperparameter that sharpens the similarity distribution.

Discriminative representation learning in AHSG

Adaptive hybrid spatial-semantic graph (AHSG) construction

For any target spot viV, in order to construct a hybrid spatial-semantic neighborhood, we first form a candidate nearest neighbor set Bi that consists of the intra-slice candidate set Biintra and the inter-slice candidate set Biinter, i.e.,

Bi=BiintraBiinter 8

Assume that T(i) = t indicates the slice to which spot vi belongs, and let its latent representation be denoted by zi. First, we construct the intra-slice candidate set by computing the cosine similarity sim(zizj) between spot vi and every other spot vjV\{vi} within the same slice (i.e., satisfying T(j) = t). The latent graph embedding matrix ZRN×d is obtained during the detached inference stage by encoding the features X with the graph encoder Fg using the adjacency matrix A, i.e., Z=Fg(A,X). Then, we select the top Kintra spots with the highest similarity to vi to form the intra-slice candidate set (If it is a single slice, then the candidate set is simply the intra-slice candidate set), formally defined as:

Biintra=vjRank(sim(zi,zj))Kintra,jV\{vi},T(j)=t 9

For the inter-slice candidate set, we consider all spots from slices different from the slice T(i). That is, for every spots vj in other slices (i.e., satisfying T(j) ≠ t), we compute the cosine similarity between vi and these inter-slice spots, and then select the top Kinter spots with the highest similarity. The formal definition is:

Biinter=vjRank(sim(zi,zj))Kinter,jV\{vi},T(j)t 10

To ensure that the candidate spots are not only similar in the latent semantic space but also exhibit spatial continuity, we introduce a spatial constraint and define the spatially constrained neighborhood set NiS as:

NiS=BiAi 11

where Ai represents the local neighborhood set of spot vi, and we define Ai={vjVAij=1}.

To capture global semantic consistency, we employ the k-means clustering algorithm to the latent representation Z, thereby partitioning all spots in the semantic space into several clusters, and define the semantically similar neighborhood set NiG as:

NiG=BiCi 12

where Ci denotes the set of spots that belong to the same cluster as spot vi. When spot vi is assigned to a cluster c, we define Ci={vjVvjisassignedtoclusterc}.

Finally, we fuse the spatially constrained neighborhood set NS with the semantically similar neighborhood set NG to form the spatial-semantic hybrid nearest neighbor NF:

NF=NSNG 13

The hybrid nearest neighbor NF not only preserves local spatial continuity but also emphasizes global semantic consistency, thereby providing richer and more refined neighborhood information for the clustering task.

Hybrid feature aggregation

Leveraging the adaptive hybrid spatial-semantic nearest neighbor, we extract an integrated node embedding matrix SRN×d that captures both the spatial and sematic detailed features. In particular, for each spot vi, we compute its aggregated summary vector si by applying a neighborhood aggregation function over its fused neighborhood:

si=Sigmoid1NiFvjNiFzg,j 14

where NiF denotes the number of neighbors in the hybrid set.

Contrastive loss

The summary vector si serves as an anchor for aligning the spot embeddings. Specifically, the positive sample pair (zg,isi) is formed by the spot’s latent embedding zg,i and its corresponding summary vector si. To generate negative samples, we perturb the original embeddings via a corruption function to obtain Z~g; thus, the pair (z~g,i,si) serves as a negative example.

By maximizing the mutual information between node embeddings and their corresponding summary vectors, we enhance their alignment in the embedding space while simultaneously mitigating the collapse phenomenon. This alignment is enforced via a contrastive objective formulated with the Binary Cross-Entropy (BCE) loss:

LBCE=1VmviVmlogDzg,i,si+log1Dz~g,i,si 15

where the discriminator D(,) is implemented as a bilinear scoring function:

D(zg,s)=SigmoidzgWs 16

This formulation not only ensures that each spot embedding zg,i is highly informative relative to its aggregated summary si, but also robustly discriminates against corrupted embeddings, thereby significantly enhancing the model’s clustering performance.

Comprehensive loss function

The comprehensive loss function, regulated by the weight factors λ1, λ2, and λ3, comprises three main components: the reconstruction loss LSCE in the raw space, which measures the reconstruction error of masked features; the matching loss LNCE in the latent space, aimed at ensuring the consistency of latent representations; and the contrastive loss LBCE, used to optimize the similarity and dissimilarity between samples. The total loss is expressed as:

L=λ1LSCE+λ2LNCE+λ3LBCE 17

Spatial clustering and visualization

After training, we obtain the graph embedding Z. We then apply spatial clustering using the Mclust algorithm, which fits a mixture of Gaussian distributions via Expectation-Maximization to automatically determine optimal clusters. Each cluster represents a distinct spatial domain.

To compute UMAP, we first build a neighbor graph with the scppneighbors function, capturing local structural relationships among spots. In this step, we set the number of neighbors to 12 and utilize the top 16 principal components of the graph embedding Z. Next, the scanpytlumap function is applied to perform UMAP dimensionality reduction, which facilitates a clear visualization of the similarities and differences among spots. Finally, we conduct scanpytlpaga analysis to uncover potential relationships between spatial domains and visualize the results using the scanpyplpaga_compare function.

Identification of SVGs and meta-genes

We identify SVGs and meta-genes using the SpaGCN detection framework19. For SVGs, a Wilcoxon rank-sum test is performed to compare the target domain against its adjacent domains, selecting genes with adjusted P-values below 0.05. Additionally, genes are required to meet three criteria: (1) more than 80% of spots in the target domain express the gene, (2) the in/out score ratio (percentage of expressing spots in the target domain versus each adjacent domain) exceeds 1, and (3) the expression fold change is greater than 1.5. For meta-gene construction, the SVG filtering threshold is relaxed by lowering the minimum fold change to 1.2. Among these modestly enriched genes, one is randomly chosen as a base gene (gene0). Its mean expression (e0) in the target domain is calculated, and non-target spots with expression above e0 are defined as the control group. A subsequent differential test then identifies additional genes with significant expression differences, which are aggregated to form meta-genes.

Experimental details

In the data preprocessing stage, we first extract the top 200 principal components from 2000 highly variable genes using PCA as input features. For datasets with fewer than 2000 but more than 200 genes, PCA is applied directly to the available genes; if the number of genes is below 200, all normalized gene expressions are used without PCA transformation. For spatial neighborhood graph construction, we set K = 12 for datasets generated from 10x Visium, Stereo-seq, and STARmap platforms, and adaptively select K = 6–8 for other platforms based on performance. The encoder Fg comprises two FNN layers (dimensions 64 and 32) followed by two GCN layers (dimensions 64 and 16), producing a 16-dimensional latent representation. The graph predictor Fp uses a GCN with output dimension 16, while the decoder Fd includes one GCN layer outputting 200 dimensions. The discriminator D operates with a latent dimension of 16.

The masking rate is fixed at ρ = 0.5, with scaling factor γ = 2 and temperature τ = 0.5. In hybrid neighbor computation, the Kintra = Kinter = 15 most similar candidates are selected, with updates every 50 steps. Loss function weights are empirically set as λ1 = 0.6, λ2 = 0.3, and λ3 = 0.7, corresponding to the scaled cosine embedding loss (SCE), the noise contrastive estimation loss (NCE), and the binary cross-entropy loss (BCE), respectively. The model is trained for 300 epochs using the Adam optimizer with an initial learning rate of 0.001 and weight decay of 0.0003.

A comprehensive investigation of hyperparameter selection-including neighborhood size, PCA dimensionality, and loss weighting-is provided in Supplementary Text 7, and Supplementary Figs. 19, 20. The algorithm flow can be found in Supplementary Algorithm 1.

Evaluated metrics and criteria

We evaluate the performance of our spatial domain identification model using a combination of metrics that assess both clustering accuracy and spatial continuity. First, to quantify clustering accuracy, we employ the Adjusted Rand Index (ARI)48, which measures the similarity between the predicted clusters and the manually annotated labels. The ARI is defined as

ARI=ijNij2iNi2+jNj2N212iNi2+jNj2iNi2+jNj2N2 18

where N is the total number of spots, Nij denotes the number of spots shared between the i-th predicted cluster (Ci ∈ C) and the j-th true cluster (Yj ∈ Y), and Ni (or Nj) is the number of spots in cluster Ci (or Yj).

In addition, we utilize the Normalized Mutual Information (NMI) metric49 to quantify the shared information between the clustering results and the ground truth. The clustering performance of labeled data is evaluated using not only the NMI but also the Homogeneity (HOM) and Completeness (COM) metrics.40. The NMI is computed as

NMI(Y,C)=2[H(Y)H(YC)]H(Y)+H(C) 19

where the entropy H( ⋅ ) is given by

H(X)=ip(xi)logp(xi) 20

Similarly, the HOM score measures whether each cluster contains data points from only one class, while the COM score evaluates whether all data points of a given class are assigned to the same cluster. They are defined as:

HOM=1H(YC)H(Y),COM=1H(CY)H(C) 21

Both metrics range from 0 to 1, where higher values indicate better clustering quality in terms of purity (homogeneity) and completeness. We then define the overall accuracy score (ACC) as the average of NMI, HOM, and COM:

ACC=13NMI+HOM+COM 22

Higher ARI and ACC values (closer to 1) indicate better clustering precision.

To assess spatial continuity, we introduce two metrics: the Spatial Chaos Score (CHAOS) and the Percentage of Anomalous Points (PAS). A lower CHAOS value signifies more coherent spatial domain continuity, while a lower PAS indicates fewer isolated or anomalous points within the spatial domains40. To compute CHAOS, we first construct a 1-nearest neighbor (1-NN) graph for each dataset by connecting each spot to its closest neighbor in Euclidean space. Let dij denote the Euclidean distance between spot i and spot j; then, we define

wkij=dij,ifspotsiandjareconnectedinthe1-NNgraphwithinclusterk0,otherwise 23

If nk is the number of spots in the k-th spatial domain, N is the total number of spots, and K is the total number of unique spatial domains, the CHAOS score is calculated as

CHAOS=1Nk=1Kijnkwkij 24

The PAS score is defined as the percentage of spots whose spatial domain label differs from that of at least six out of their ten nearest neighbors. Lower PAS values correspond to higher spatial homogeneity within domains.

Finally, we define the overall discreteness score (DIS) as the average of the CHAOS and PAS scores:

DIS=12CHAOS+PAS 25

In summary, higher ARI and ACC scores indicate better clustering accuracy, while lower DIS scores reflect improved spatial domain continuity. These metrics together provide a comprehensive evaluation of the spatial domain identification performance.

To quantify the balance between batch effect correction and preservation of spatial domain structures in multi-slice spatial data integration, we use the F1LISI metric. This metric leverages the Local Inverse Simpson Index (LISI), which separately measures the mixing of batches within the same spatial domain (LISIbatch) and the separation across distinct spatial domains (1 − LISIdomain). By integrating these two components through a dynamic weighting factor α=NdomainNbatch+Ndomain, F1LISI unifies batch mixing and domain separation into a single score using a harmonic mean formulation. The F1LISI is defined as follows44:

F1LISI=(1+α2)(1LISI_domainnorm)(LISI_batchnorm)α2(1LISI_domainnorm)+LISI_batchnorm 26

The factor α adaptively adjusts the evaluation focus based on the ratio of spatial domains to batches, prioritizing domain separation when domains are abundant and emphasizing batch mixing when batches dominate. A higher F1LISI score indicates superior performance in simultaneously removing technical batch noise and preserving biologically meaningful spatial patterns.

Comparison with other methods

To benchmark SpaCross, we compared it with a diverse set of methods spanning five categories: non-GNN-based (CellCharter17, MENDER18), GNN-based (SpaGCN19, SEDR20, DeepST37, STAGATE3), generative (STMGAC25, SpaMask26, DiffusionST38), contrastive learning-based (CCST28, GraphST30, GAAEST5, stDCL9), and multi-slice integration methods (SPIRAL43, STitch3D33, Splane35, STAligner32). These methods cover a range of modeling strategies, including spatial proximity, graph convolution, masked prediction, contrastive objectives, and slice alignment via registration or structural priors. The implementation procedures and parameter settings for all comparison methods are described in Supplementary Text 8.

Statistics and reproducibility

All analyses were performed in Python using non-parametric tests (Wilcoxon signed-rank, Mann-Whitney U, Wilcoxon rank-sum), with p < 0.05 considered significant. Here, n denotes biologically independent tissue slices. Reproducibility was assessed by repeating analyses with different random seeds across datasets and averaging results across independent runs. A replicate is defined as an independent run under identical settings, and all baseline models used their default recommended parameters to ensure fairness.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Supplementary information

42003_2025_8810_MOESM2_ESM.docx (21.2KB, docx)

Description of Additional Supplementary Files

Supplementary Data 1 (9MB, xlsx)
Reporting Summary (1.6MB, pdf)

Acknowledgements

This work was supported by the National Natural Science Foundation of China (62262069) and the Young Talent Program of Yunnan Province (C619300A067).

Author contributions

W.M. conceived and supervised the project. D.F. and W.M. developed and implemented the SpaCross algorithm. D.F. and W.M. validated the methods and wrote the manuscript. All authors read and approved the final manuscript.

Peer review

Peer review information

Communications Biology thanks Noah Cohen Kalafut and the other, anonymous, reviewers for their contribution to the peer review of this work. Primary Handling Editors: Aylin Bircan, Ophelia Bu. A peer review file is available.

Data availability

All data sets used in this article are publicly available: (1) Human dorsolateral prefrontal cortex data39 captured using 10X Visium technology can be downloaded from http://research.libd.org/spatialLIBD/. (2) Human breast cancer data obtained with 10x Visium technology can be downloaded from https://www.10xgenomics.com/datasets/human-breast-cancer-block-a-section-1-1-standard-1-1-0. (3) Mouse olfactory bulb tissue data generated by the Stereo-seq and Slide-seqV2 platforms can be accessed from https://github.com/JinmiaoChenLab/SEDR_analyses/tree/master/dataand https://singlecell.broadinstitute.org/single_cell/study/SCP815, respectively. (4) The spatial transcriptomic data of the mouse embryo obtained with Stereo-seq technology8 can be downloaded from https://db.cngb.org/stomics/mosta/. (5) The mouse primary visual cortex (V1) STARmap dataset41 is available at https://www.starmapresources.com/data. (6) The mouse brain somatosensory cortex osmFISH dataset42 can be downloaded from http://linnarssonlab.org/osmFISH. (7) The mouse hypothalamus dataset from MERFISH50 can be downloaded from 10.5061/dryad.8t8s248. (8) The mouse whole brain dataset46 profiled by the ST platform can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147747. Detailed descriptions of the various datasets can be found in Supplementary Table 3. The data used in this study have been uploaded to Zenodo and are freely available at: https://zenodo.org/records/15090086.

Code availability

An open-source implementation of the SpaCross algorithm can be downloaded from https://github.com/wenwenmin/SpaCross.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Donghai Fang, Wenwen Min.

Supplementary information

The online version contains supplementary material available at 10.1038/s42003-025-08810-5.

References

  • 1.Rao, A., Barkley, D., França, G. S. & Yanai, I. Exploring tissue architecture using spatial transcriptomics. Nature596, 211–220 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ståhl, P. L. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353, 78–82 (2016). [DOI] [PubMed] [Google Scholar]
  • 3.Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.13, 1739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Min, W., Shi, Z., Zhang, J., Wan, J. & Wang, C. Multimodal contrastive learning for spatial gene expression prediction using histology images. Brief. Bioinform.25, bbae551 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Wang, T. et al. Graph attention automatic encoder based on contrastive learning for domain recognition of spatial transcriptomics. Commun. Biol.7, 1351 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Rao, N., Clark, S. & Habern, O. Bridging genomics and tissue pathology: 10× genomics explores new frontiers with the Visium spatial gene expression solution. Genet. Eng. Biotechnol. News40, 50–51 (2020). [Google Scholar]
  • 7.Rodriques, S. G. et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science363, 1463–1467 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Chen, A. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell185, 1777–1792 (2022). [DOI] [PubMed] [Google Scholar]
  • 9.Yu, Z. et al. Accurate spatial heterogeneity dissection and gene regulation interpretation for spatial transcriptomics using dual graph contrastive learning. Adv. Sci.12, 2410081 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Shah, S. et al. Dynamics and spatial genomics of the nascent transcriptome by intron seqFISH. Cell174, 363–376 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Zhang, C., Dong, K., Aihara, K., Chen, L. & Zhang, S. STAMarker: determining spatial domain-specific variable genes with saliency maps in deep learning. Nucleic Acids Res.51, e103–e103 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Li, X., Zhu, F. & Min, W. SpaDiT: diffusion transformer for spatial gene expression prediction using scRNA-seq. Brief. Bioinform.25, bbae571 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Li, H.-S., Tan, Y.-T. & Zhang, X.-F. Enhancing spatial domain detection in spatial transcriptomics with ensdd. Commun. Biol.7, 1358 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp.2008, P10008 (2008). [Google Scholar]
  • 16.Wang, H., Zhao, J., Nie, Q., Zheng, C. & Sun, X. Dissecting spatiotemporal structures in spatial transcriptomics via diffusion-based adversarial learning. Research7, 0390 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Varrone, M., Tavernari, D., Santamaria-Martínez, A., Walsh, L. A. & Ciriello, G. Cellcharter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat. Genet.56, 74–84 (2024). [DOI] [PubMed] [Google Scholar]
  • 18.Yuan, Z. Mender: fast and scalable tissue structure identification in spatial omics data. Nat. Commun.15, 207 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Hu, J. et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat. Methods18, 1342–1351 (2021). [DOI] [PubMed] [Google Scholar]
  • 20.Xu, H. et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med.16, 12 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hou, Z. et al. GraphMAE: Self-supervised masked graph autoencoders. in Proc. 28th ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), 594–604 (ACM, 2022).
  • 22.Tu, W. et al. RARE: Robust masked graph autoencoder. IEEE Trans. Knowl. Data Eng.36, 5340–5353 (2023). [Google Scholar]
  • 23.Chen, Y., Zhen, C., Mo, Y., Liu, J. & Zhang, L. Multiscale dissection of spatial heterogeneity by integrating multi-slice spatial and single-cell transcriptomics. Adv. Sci.12, 2413124 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Lin, L. et al. STMGraph: spatial-context-aware of transcriptomes via a dual-remasked dynamic graph attention model. Brief. Bioinform.26, bbae685 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Fang, D., Zhu, F., Xie, D. & Min, W. Masked graph autoencoders with contrastive augmentation for spatially resolved transcriptomics data. In Proc. 2024 IEEE Int. Conf. Bioinform. Biomed. (BIBM), 515–520 (IEEE, 2024).
  • 26.Min, W., Fang, D., Chen, J. & Zhang, S. SpaMask: Dual masking graph autoencoder with contrastive learning for spatial transcriptomics. PLoS Comput. Biol.21, e1012881 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lee, N., Lee, J. & Park, C. Augmentation-free self-supervised learning on graphs. in Proc. AAAI Conf. on Artif. Intell., 7372–7380 (AAAI Press, 2022).
  • 28.Li, J., Chen, S., Pan, X., Yuan, Y. & Shen, H.-B. Cell clustering for spatial transcriptomics data with graph neural networks. Nat. Comput. Sci.2, 399–408 (2022). [DOI] [PubMed] [Google Scholar]
  • 29.Nie, W., Yu, Y., Wang, X., Wang, R. & Li, S. C. Spatially informed graph structure learning extracts insights from spatial transcriptomics. Adv. Sci.11, 2403572 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Long, Y. et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun.14, 1155 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Zeng, Y. et al. Identifying spatial domain by adapting transcriptomics with histology through contrastive learning. Brief. Bioinform.24, bbad048 (2023). [DOI] [PubMed] [Google Scholar]
  • 32.Zhou, X., Dong, K. & Zhang, S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput. Sci.3, 894–906 (2023). [DOI] [PubMed] [Google Scholar]
  • 33.Wang, G. et al. Construction of a 3D whole organism spatial atlas by joint modelling of multiple slices with deep neural networks. Nat. Mach. Intell.5, 1200–1213 (2023). [Google Scholar]
  • 34.Zeira, R., Land, M., Strzalkowski, A. & Raphael, B. J. Alignment and integration of spatial transcriptomics data. Nat. Methods19, 567–575 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Xu, H. et al. SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat. Commun.14, 7603 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Arun, K. S., Huang, T. S. & Blostein, S. D. Least-squares fitting of two 3-D point sets. IEEE Trans. Pattern Anal. Mach. Intell.1, 698–700 (1987). [DOI] [PubMed] [Google Scholar]
  • 37.Xu, C. et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res.50, e131–e131 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Cui, Y. et al. DiffusionST: a deep generative diffusion model-based framework for enhancing spatial transcriptomics data quality and identifying spatial domains. Brief. Bioinform.26, bbaf390 (2025). [DOI] [PubMed] [Google Scholar]
  • 39.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yuan, Z. et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat. Methods21, 712–722 (2024). [DOI] [PubMed] [Google Scholar]
  • 41.Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361, eaat5691 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Codeluppi, S. et al. Spatial organization of the somatosensory cortex revealed by osmfish. Nat. Methods15, 932–935 (2018). [DOI] [PubMed] [Google Scholar]
  • 43.Guo, T. et al. SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol.24, 241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Tran, H. T. N. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol.21, 12 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Richardson, L. et al. EMAGE mouse embryo spatial gene expression database: 2014 update. Nucleic Acids Res.42, D835–D844 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ortiz, C. et al. Molecular atlas of the adult mouse brain. Sci. Adv.6, eabb3446 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 1–5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc.66, 846–850 (1971). [Google Scholar]
  • 49.Amelio, A. & Pizzuti, C. Correction for closeness: adjusting normalized mutual information measure for clustering comparison. Comput. Intell.33, 579–601 (2017). [Google Scholar]
  • 50.Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362, eaau5324 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

42003_2025_8810_MOESM2_ESM.docx (21.2KB, docx)

Description of Additional Supplementary Files

Supplementary Data 1 (9MB, xlsx)
Reporting Summary (1.6MB, pdf)

Data Availability Statement

All data sets used in this article are publicly available: (1) Human dorsolateral prefrontal cortex data39 captured using 10X Visium technology can be downloaded from http://research.libd.org/spatialLIBD/. (2) Human breast cancer data obtained with 10x Visium technology can be downloaded from https://www.10xgenomics.com/datasets/human-breast-cancer-block-a-section-1-1-standard-1-1-0. (3) Mouse olfactory bulb tissue data generated by the Stereo-seq and Slide-seqV2 platforms can be accessed from https://github.com/JinmiaoChenLab/SEDR_analyses/tree/master/dataand https://singlecell.broadinstitute.org/single_cell/study/SCP815, respectively. (4) The spatial transcriptomic data of the mouse embryo obtained with Stereo-seq technology8 can be downloaded from https://db.cngb.org/stomics/mosta/. (5) The mouse primary visual cortex (V1) STARmap dataset41 is available at https://www.starmapresources.com/data. (6) The mouse brain somatosensory cortex osmFISH dataset42 can be downloaded from http://linnarssonlab.org/osmFISH. (7) The mouse hypothalamus dataset from MERFISH50 can be downloaded from 10.5061/dryad.8t8s248. (8) The mouse whole brain dataset46 profiled by the ST platform can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE147747. Detailed descriptions of the various datasets can be found in Supplementary Table 3. The data used in this study have been uploaded to Zenodo and are freely available at: https://zenodo.org/records/15090086.

An open-source implementation of the SpaCross algorithm can be downloaded from https://github.com/wenwenmin/SpaCross.


Articles from Communications Biology are provided here courtesy of Nature Publishing Group

RESOURCES