Abstract
Spatial domain identification is an essential task for revealing spatial heterogeneity within tissues, providing insights into disease mechanisms, tissue development, and the cellular microenvironment. In recent years, spatial multi-omics has emerged as the new frontier in spatial domain identification that offers deeper insights into the complex interplay and functional dynamics of heterogeneous cell communities within their native tissue context. Most existing methods rely on static graph structures that treat all neighboring cells uniformly, failing to capture the nuanced cellular interactions within the microenvironment and thus blurring functional boundaries. Furthermore, cross-modal reconstruction performance is often degraded by overfitting to modality-specific noise, which may impair the precise delineation of spatial domains. Therefore, we present GATCL, a novel deep learning framework that integrates a graph attention network with contrastive learning (CL) for robust spatial domain identification. First, GATCL leverages the graph attention mechanism to dynamically assign weights to neighboring spots, adaptively modeling the complex cellular architecture. Second, it implements a cross-modal CL strategy that forces representations from the same spatial location to be similar while pushing those from different locations apart, thereby achieving robust alignment between modalities. Comprehensive experiments across six distinct datasets (spanning transcriptome, proteome, and chromatin) reveal that GATCL is superior to seven representative methods across six key evaluation metrics.
Keywords: spatial domain identification, graph attention network, contrastive learning, spatial multi-omics
Introduction
The emergence of single-cell technologies has revolutionized the understanding of cellular heterogeneity and dynamic changes within complex biological systems [1–3]. Following this, spatially resolved omics technologies have emerged as the next major frontier for preserving the native spatial context of cells within tissues [4, 5]. By integrating sequencing data with spatial coordinates, these technologies provide crucial insights into molecular interactions within the tissue’s native microenvironment [6–8]. Spatial transcriptomics technologies are mainly categorized into two types [9–11]: imaging-based technologies (e.g. MERFISH [12], seqFISH+ [13], and osmFISH [14]) and sequencing-based technologies (e.g. 10
Visium [15], Slide-seq [16], and Stereo-seq [17]). However, due to the inherent limitations of available information in complex tissues, spatial transcriptomics technologies restrict comprehensive analysis. To address this, the field has transitioned to spatial multi-omics, like Stereo-cite-seq [18], SPOTS [19], spatial ATAC–RNA-seq and CUT&Tag-RNA-seq [20].
Spatial domain identification is facilitated by these technologies that aim to identify functional regions within the tissue [21–23]. In recent years, numerous computational methods have been developed for the task. BayesSpace [24] leverages spatial neighborhood information to improve the resolution of spatial transcriptomic data with statistical approach. But it faces restrictions when processing large-scale datasets. Consequently, as a powerful computational tool, deep learning is well-suited for modeling complex biological data [25, 26]. Particularly, graph-based models convert spatial data into a node-edge topology to directly explore spot correlations for domain identification [27]. SpaGCN [28] combines neighboring gene expression to identify spatial domains characterized by consistent expression patterns and histological features. Then DeepST [29] integrates the multi-source data to address the limitation of SpaGCN including failing to model nonlinear interactions and poor handling of multi-source data with dual encoders. However, both SpaGCN and DeepST lack the capability to integrate serial section data. STAGATE [30] builds a cross-section spatial network via an adaptive graph attention autoencoder to enable joint analysis of multiple sections. Considering that STAGATE lacks the capability to remove batch effects, GraphST [5] implicitly corrects batch effects relying on graph-based self-supervised contrastive learning (CL). While most aforementioned methods construct adjacency matrices via predefined similarity metrics or simply fuse multiple data sources additively, STMGCN [31] adopts multiple neighborhood graphs with independent encoders for view-specific representations and an attention mechanism for adaptive fusion. Besides, SpaNCMG [11] constructs a complementary neighborhood graph by fusing local information and global structure information to enhance the spatial transcriptome data.
While spatial transcriptomics has been pivotal in spatial domain identification, the information from multi-omics modalities can help to further enrich the functional characterization. However, the inherent heterogeneity across different omics poses significant computational challenges for integration. Recently, SpatialGlue [32] proposes a graph neural network architecture equipped with a dual-attention mechanism to synergistically integrate spatial multi-omics data. Meanwhile, it leverages cross-modal decoding as an auxiliary task to align features from different modalities. However, conventional GNNs assume uniform neighbor influence, overlooking microenvironmental heterogeneity, which might blur functional boundaries. Furthermore, cross-modal reconstruction may inadvertently focus the model on modality-specific details. Extensive research has shown that augmenting GNNs with attention mechanisms allows capturing more complex relationships by assigning distinct importance weights to neighbors [33]. In addition, CL extracts discriminative embeddings by aligning positive pairs and segregating negatives [34, 35]. Leveraging these established advantages, we propose GATCL: first, its graph attention network preserves clear domain boundaries by assigning higher weights to functionally similar neighbors; second, it implements a CL strategy that maximizes concordance for co-located spots, yielding more discriminative representations for precise spatial domain delineation. Extensive experiments confirm that GATCL consistently surpasses seven representative methods across several datasets and platforms.
Materials and methods
Overview of the method
The architecture of GATCL is visualized in Fig. 1. As shown in Fig. 1a, GATCL constructs two complementary graphs: a spatial graph based on spatial coordinates and a feature similarity graph derived from molecular profiles. Subsequently, these graphs are independently processed by multi-layer graph attention networks to extract latent embeddings. As depicted in Fig. 1b, an intra-modality and a cross-modal attention mechanism fusion module inspired by SpatialGlue [32] adaptively enables joint modeling of spatial multi-omics data. Following that, corresponding to Fig. 1c, GATCL leverages CL to reinforce cross-modal feature consistency by aligning features at spatially corresponding locations and separating those at non-corresponding ones to further enhance consistency. Finally, the model is trained with a joint objective that combines modality-specific reconstruction losses (Fig. 1d) and CL loss (Fig. 1c).
Figure 1.
The overall architecture of GATCL.
Graph construction
Expression-based graph construction
Given the expression matrix
, where
is the number of spots and
is the feature dimension, we construct a
-nearest neighbor (KNN) graph based on feature similarity. The adjacency matrix
is defined as:
![]() |
(1) |
where
denotes the
most correlated spots to spot
. This process is applied independently to the different modalities, resulting in two distinct graphs:
and
.
Spatial-based graph construction
To capture the spatial relationships between different spots, we construct a spatial proximity graph. Based on Euclidean distances between the physical coordinates of the spots, the adjacency matrix
is defined as:
![]() |
(2) |
Here,
denotes the KNN of node
based on the Euclidean distance:
![]() |
(3) |
where
and
are the spatial coordinates of spot
and spot
, respectively. And the process efficiently identifies nearest neighbors in physical space, also resulting in two graphs:
and
.
Graph attention network
To encode the graph-structured data, we employ an
-layer GAT encoder. By dynamically weighting neighbors, the GAT focuses on the more important nodes for effective representation learning. Besides, this design is also applied to the decoder, differing in its inputs and outputs. In short, the process can be described as follows:
![]() |
(4) |
![]() |
(5) |
![]() |
(6) |
![]() |
(7) |
where
and
are the expression matrices of the transcriptome and proteome, respectively. Here, GATConv represents the multi-layer graph attention network encoder which is formally defined as follows. let
denote a graph with
nodes and edge set
. The initial node features are represented as
, where
is the feature dimension. The
th layer (
) computes the hidden representation via:
![]() |
(8) |
where
is the number of attention heads and where ∥ indicates concatenation across
attention heads. The attention coefficients are computed as:
![]() |
(9) |
The output of each layer is then updated as:
![]() |
(10) |
Here,
denotes an activation function. The residual projection
is defined as:
![]() |
After
such layers, the encoder outputs
, including transcriptome feature embedding
, transcriptome spatial embedding
, proteome feature embedding
, proteome spatial embedding
.
Modality-aware attention fusion
To adaptively fuse representations from different modalities, we employ a self-learned attention mechanism from SpatialGlue that computes weighted combinations of input embeddings based on their importance:
![]() |
(11) |
![]() |
(12) |
![]() |
(13) |
In detail, given
modality-specific embeddings
,
...
, we first concatenate them into a unified tensor:
![]() |
(14) |
Then apply a two-layer feed-forward attention network to compute attention weights
over the modality inputs:
![]() |
(15) |
![]() |
(16) |
The attention scores
reflect the relative importance of each modality. The final fused embedding is computed as the weighted sum:
![]() |
(17) |
Cross-modality contrastive learning
To align the representations of spatial transcriptomics and proteomics at each spatial location (spot), we adopt a CL framework. We define the positive sample pair as transcriptomic and proteomic data originating from the same location. In contrast, the negative sample pair is constituted by any two omics datasets from different locations. Given
,
, where
represents the number of spots, the goal is to bring the matched pairs from the same location closer while pushing unmatched pairs apart in the latent space.
Firstly, the representations are L2-normalized:
![]() |
(18) |
The similarity between positive pairs (same spot) is computed via dot product:
![]() |
(19) |
Then the sum of similarities between negative sample pairs
can be represented as:
![]() |
(20) |
The contrastive loss is defined as:
![]() |
(21) |
where
is a learnable temperature parameter that is dynamically annealed during training:
![]() |
(22) |
where annealing factor
. This encourages the encoder to produce modality-invariant representations for each spatial spot while maintaining inter-spot discriminability.
Training objective
To jointly model spatial transcriptome and proteome, we design a multi-task loss that combines both modality-specific reconstruction and cross-modality alignment via CL. The overall objective encourages the model to preserve the original omics information while enforcing consistency between modalities in a shared embedding space. The decoder adopts a symmetric structure similar to the encoder, decoding
separately back into the original transcriptomic and proteomic data to obtain the reconstructed representations
and
. The reconstruction loss for transcriptomics and proteomics is computed as
![]() |
(23) |
![]() |
(24) |
The total loss can be defined as
![]() |
(25) |
where
,
and
are hyperparameters.
Result
Application to human lymph node A1 dataset
We first apply GATCL to the human lymph node A1 dataset [32] downloaded from https://zenodo.org/records/10362607, and compare it with several mainstream methods, including Seurat [36], totalVI [37], MultiVI [38], MOFA+ [39], MEFISTO [40], scMM [41], and SpatialGlue [32]. The ground truth is shown in Fig. 2a that is annotated by experts from SpatialGlue [32]. Referring to the ground truth, GATCL exhibits stronger structural consistency and clearer spatial boundaries compared with other methods as presented in Fig. 2b. In detail, GATCL demonstrates more accurate boundaries over others in identifying the pericapsular adipose tissue and the cortex.
Figure 2.
GATCL identifies spatial domains in human lymph node A1.
In addition, to more intuitively compare the performance of different methods, we select six supervised metrics including homogeneity, mutual information, V-measure, AMI, NMI, and ARI to quantify the results, where higher values indicate better performance. Figure 2c presents the performance comparison when the number of clusters is set to 10, matching the number of ground truth categories referring to Fig. 2a. The results indicate that GATCL either outperforms or performs comparably to the competing methods. For instance, GATCL’s mutual_info score is 0.490 points higher than that of MultiVI, and its homogeneity score is 0.290 points higher than MultiVI’s. Besides, for mutual_info, GATCL achieves a score 0.090 higher than the strongest competitor SpatialGlue. In order to rigorously assess the clustering performance, we conduct a comprehensive comparison, varying the number of clusters from 4 to 11. From the results summarized in Fig. 2d, it is evident that GATCL demonstrates superior performance. Specifically, on all six metrics, GATCL consistently achieves the highest median scores, significantly outperforming all other baseline models. These findings collectively validate the effectiveness and robustness of the GATCL framework for identifying accurate spatial domains.
Application to human lymph node D1 dataset
We replicate the analysis on another human lymph node dataset that could be downloaded from https://zenodo.org/records/10362607. The ground truth labels shown in Fig. 3a are annotated by experts from SpatialGlue [32]. As shown in Fig. 3b, GATCL and SpatialGlue show high concordance with key anatomical regions. In contrast, many mainstream methods (such as Seurat and totalVI) can only resolve indistinct macroscopic structures while ScMM and MultiVI appear severely fragmented.
Figure 3.
GATCL identifies spatial domains in human lymph node D1.
Then, to ensure the results are not tissue-specific, we compare the supervised metrics (homogeneity, mutual_info, V_measure, AMI, NMI, and ARI) when the number of clusters is 10, identical to the analysis of A1. As is evident from Fig. 3, GATCL continues to show excellent performance. Compared with SpatialGlue, GATCL increases 0.0597 in mutual_info and 0.0356 in homogeneity. Furthermore, GATCL substantially outperforms the Seurat baseline, achieving an ARI score 0.0436 points higher and a mutual_info score 0.0724 points higher. Although its ARI value is slightly lower than that of the MEFISTO method, all other metrics are superior to those of the other methods, including SpatialGlue which has the best performance among all compared methods in spatial domain identification.
Next, we compare GATCL against seven other methods with the number of cluster ranging from 4 to 11 to quantitatively evaluate the robustness and stability of GATCL. The results shown in Fig. 3d demonstrate the superiority of GATCL as it consistently achieves the highest median values across all metrics. These quantitative results provide robust support for the qualitative observations, confirming that GATCL can more accurately and reliably identify biological domains.
To further investigate the modality-specific contributions within GATCL, we visualize the learned modality weights. As shown in Fig. 3e–f, the higher importance weighting of spatial information over feature information for both RNA and protein almost across clusters suggests that spatial proximity offers valuable complementary insights for representation learning. Besides, a significantly higher weight is assigned to RNA than protein as illustrated in Fig. 3g. It indicates that the model primarily relies on information from RNA for its analysis, treating protein information as secondary and supplementary.
Application to mouse spleen dataset
We next extend the testing to a mouse spleen dataset [19, 32] that is downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198353. Firstly, We visualize the results of each method in Fig. 4a. From Fig. 4a, GATCL and SpatialGlue demonstrate superior performance than others, yielding relatively clear clustering results with distinct boundaries. In stark contrast, most competing methods (such as ScMM, totalVI, and MultiVI) yield results characterized by noise and blurred boundaries. For a more targeted quantitative assessment, we benchmark GATCL against the well-performing SpatialGlue with evaluation metrics (homogeneity, mutual_info, v_measure, AMI, NMI, and ARI). As illustrated in Fig. 4b, GATCL’s performance is consistently superior to SpatialGlue across all metrics, underscoring the higher alignment between its predicted clusters and the ground-truth biological domains. Together, these quantitative findings and the spatial visualizations conclude that GATCL enables more accurate and interpretable spatial domain identification than the compared methods.
Figure 4.
GATCL identifies spatial domains in mouse spleen.
Secondly, an ablation experiment confirms the necessity of both the GAT and CL modules (Fig. 4c). Indeed, the full GATCL model consistently outperforms variants lacking either component. Removing the CL module (w/o CL), for instance, significantly degraded performance (e.g. lower mutual-info score), demonstrating its importance for feature representation. Likewise, removing the GAT (w/o GAT) yields the lowest ARI score, highlighting its critical role in capturing spatial proximity and cell-to-cell relationships. These findings confirm that both components are essential for superior spatial domain identification.
Thirdly, we perform differential gene expression analysis across the five predicted domains to evaluate the functional relevance of spatial clusters identified by GATCL. The resulting heatmap shown in Fig. 4d reveals distinct transcriptional profiles that correlate well with the spatial context and known immune cell distributions in the spleen. In Cluster 0, the high expression of Hbb-bs, Hba-a1/a2, Slc4a1, and Gm42418 are markers for erythrocytes to the red pulp macrophages region of the mouse spleen. Cluster 1 is characterized by upregulation of mitochondrial-related genes (e.g. mt-Co1, mt-Co2, and mt-Nd4) and ribosomal protein genes (e.g. Rpl13 and Rpl19), which is a hallmark of highly activated immune cells. Cluster 2 highly expresses H2-DMb2, B2m, etc., all of which are genes related to major histocompatibility complex (MHC) molecules and are rich in antigen-presenting cell regions, representing marginal regions.
Finally, we analyze the modality weights in Fig. 4e–g. From Fig. 4e and f, the model consistently assigns greater importance to cellular spatial relationships for both RNA and protein, indicating a high dependency on the local microenvironment for the predictions. Furthermore, as shown in Fig. 4g, the fused modality weights reveal a shift from RNA-dominant to protein-dominant contributions in certain regions, underscoring the spatially heterogeneous nature of multimodal information.
Application to mouse thymus dataset
We conduct experiments on the mouse thymus dataset [32] downloaded from https://zenodo.org/records/10362607 that contains highly structured and functionally diverse immune microenvironments. Owing to the lack of established annotations, a quantitative evaluation is infeasible. So, the performance evaluation is instead based on the biological relevance of the identified clustered regions. To assess GATCL’s performance in spatial domain identification, firstly, we benchmark it against seven mainstream methods in Fig. 5a. The results indicate that GATCL, similar to top-performing methods like Seurat and SpatialGlue, successfully identifies spatially coherent domains. These domains clearly mirror the thymus’s core anatomy: an outer cortex (Cluster 0) enveloping a central medulla comprised of various colored clusters (Cluster 5, Cluster 1 and Cluster 7). Conversely, totalVI and MEFISTO yield more fragmented results, and MultiVI fails to resolve any meaningful spatial organization. Overall, GATCL shows better performance stability when partitioning biologically complex tissues.
Figure 5.
GATCL identifies spatial domains in mouse thymus.
Secondly, GO analysis further validates the identified spatial domains (Fig. 5), revealing a strong correspondence between their spatial positioning and functional enrichments, which accurately recapitulates the known thymic microarchitecture. Specifically, peripheral and capsular regions (Clusters 0, 1) were enriched for metabolic pathways. The cluster at the corticomedullary junction (Cluster 2) was enriched for cell cycle progression, indicative of a proliferative zone. Deeper cortical zones (Clusters 3, 4) showed enrichment for lymphocyte differentiation and T cell maturation. Finally, medullary clusters (Clusters 6, 7) were highly enriched for GO terms related to antigen presentation and MHC protein complex assembly, consistent with the function of APCs(Antigen-Presenting Cells) in negative selection.
Thirdly, to evaluate the biological validity of GATCL-derived spatial representations, we perform PAGA trajectory inference. As depicted in Fig. 5c, the result reveals a clear developmental pathway from the outer (cortical) to the inner (medullary) regions [42]. This is in strong agreement with the established biological process of directed cell migration and differentiation within the mouse thymus. These findings provide compelling evidence that GATCL operating without supervision, effectively clusters cells, preserves their spatial coherence, and successfully identifies spatial domains of high biological significance.
Finally, we leverage the learned attention weights across clusters to investigate GATCL’s internal preferences regarding modality and graph structure. For the RNA modality, shown in Fig. 5, spatial graphs consistently receive higher weights compared with feature graphs across most clusters, particularly in Clusters 0, 3, 4, and 5. In contrast, from Fig. 5e, protein modality demonstrates more balanced reliance on spatial and feature graphs. In addition, according to Fig. 5f, when examining the fusion between RNA and protein modalities, GATCL exhibits strong adaptivity: RNA dominates in Clusters 2–4, whereas protein contributes more in Clusters 0, 1, and 5.
Application to spatial RNA-ATAC datasets
We extend GATCL to the joint analysis of the spatial transcriptome and spatial epigenome, which is a more challenging field of spatial multi-omics. Specifically, we conduct experiments on a mouse brain dataset downloaded from https://zenodo.org/records/7480069 and a human placental dataset downloaded from https://singlecell.broadinstitute.org/single_cell/study/SCP2601. We follow [43] to preprocess the mouse brain dataset. As for the human placental dataset, apply Latent Semantic Indexing to the original peak count data to reduce its dimensionality to 200. Subsequently, genes expressed in <10 spots are filtered, log-normalized with SCANPY, and the top 3000 HVGs are selected.
With reference to the ground-truth regional annotation shown in Fig. 6a, by comparison, the spatial domain identification result generated by GATCL (Fig. 6c) exhibits more reasonable regional partitioning than that generated by SpatialGlue (Fig. 6b). GATCL can better capture the overall spatial structure and boundary characteristics of mouse brain regions, while SpatialGlue shows more fragmented and less coherent domain partitioning. For quantitative performance evaluation, we employ six supervised metrics including homogeneity, mutualinfo, Vmeasure, AMI, NMI, and ARI. As shown in Fig. 6d, GATCL consistently achieves higher scores than SpatialGlue across all these metrics. The most significant gains are observed in mutual_info, where GATCL outperforms SpatialGlue by 0.242. Similarly, GATCL achieves 0.144 points higher than SpatialGlue’s about ARI. Substantial advantages are also recorded in homogeneity (+0.143), AMI (+0.123), and v_measure (+0.122), confirming the superior accuracy and robustness of our method. Besides, Fig. 7a shows the ground-truth annotations of human placental, while Fig. 7b and c displays the spatial domain identification results of SpatialGlue and GATCL, respectively. Quantitatively (Fig. 7d), GATCL consistently outperforms SpatialGlue across all six supervised metrics. It is most significant in mutual_info, where GATCL surpasses SpatialGlue by 0.3380 points. And the robust lead is consistent across other key metrics, including homogeneity (+0.1797), AMI (+0.1702), and ARI (+0.0923), highlighting the enhanced robustness of our method. In summary, both qualitative visualization and quantitative evaluation results confirm that GATCL is more effective in identifying spatial domains.
Figure 6.
GATCL identifies spatial domains in mouse brain.
Figure 7.
GATCL identifies spatial domains in human placental.
Conclusion
In this study, we propose GATCL to identify spatial domains more accurately with spatial multi-omics data. GATCL adopts attention-weighted aggregation to selectively prioritize functionally relevant neighbors, thus preserving the precise domain boundaries while CL achieves a robust alignment by bypassing the noise-prone cross-modal reconstruction inherent in existing methods. GATCL has been fully validated on a wide range of datasets, encompassing diverse species, tissue types, and multiple spatial omics such as transcriptomics, proteomics, and chromatin. Complementing the direct performance comparison, the ablation analysis confirms both the graph attention mechanism and cross-modal CL make substantial contributions to precise spatial domain depiction. Furthermore, the parameter sensitivity analysis is also conducted (Supplementary Materials) to assess the impact of hyperparameter choices on model performance. Finally, both training and inference processes are efficient, and can be completed in a few tens of seconds on an Intel(R) Xeon(R) Silver 4316 CPU and NVIDIA A40 GPU.
By reliably identifying biologically meaningful spatial domains, GATCL might reveal tumor microenvironment interactions, providing critical insights for targeted therapeutic strategies. Looking ahead, we aim to extend the GATCL framework to further incorporate Hematoxylin and Eosin (H&E)-stained tissue images, thereby unlocking deeper insights into the interplay between spatial architecture and molecular expression.
Key Points
We propose GATCL which is based on graph attention network and contrastive learning (CL) for spatial domain identification.
GATCL leverages a graph attention network to overcome the limitations of static graph models, dynamically weighing cellular neighbors to achieve precise delineation of functional domains.
A CL framework is employed to align multi-modal representations, bypassing noise-prone cross-modal reconstruction and learning a robust latent space for accurate domain identification.
Experimental results across six datasets, including spatial transcriptomics, proteomics, and chromatin from different platforms, demonstrate that GATCL outperforms seven representative methods across six evaluation metrics.
Supplementary Material
Contributor Information
Jichong Mu, School of Computer Science and Technology, Harbin Institute of Technology, Xidazhi St 90, 150000, Harbin, Heilongjiang, China; Zhengzhou Research Institute, Harbin Institute of Technology, Longyuan East Seventh Street, 450000, Zhengzhou, Henan, China.
Yachen Yao, School of Computer Science and Technology, Harbin Institute of Technology, Xidazhi St 90, 150000, Harbin, Heilongjiang, China.
Qiuhao Chen, School of Computer Science and Technology, Harbin Institute of Technology, Xidazhi St 90, 150000, Harbin, Heilongjiang, China; Zhengzhou Research Institute, Harbin Institute of Technology, Longyuan East Seventh Street, 450000, Zhengzhou, Henan, China.
Jiqiu Sun, Harbin Institute of Technology Hospital, Xiaowai St, 150000, Harbin, Heilongjiang, China.
Tianyi Zhao, School of Medicine and Health, Harbin Institute of Technology, Xidazhi St 90, 150000, Harbin, Heilongjiang, China.
Conflicts of interest
None declared.
Funding
This work is supported by the National Natural Science Foundation of China (grant no. 62572148).
Data availability
The source code is available at https://github.com/1999mjc/GATCL.
References
- 1. Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet 2019;20:257–72. 10.1038/s41576-019-0093-7 [DOI] [PubMed] [Google Scholar]
- 2. Xiangyu W, Yang X, Dai Y et al. Single-cell sequencing to multi-omics: technologies and applications. Biomarker Res 2024;12:100. 10.1186/s40364-024-00643-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Rebuffet L, Melsen JE, Escaliere B et al. High-dimensional single-cell analysis of human natural killer cell heterogeneity. Nat Immunol 2024;25:1212–3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Hang X, Huazhu F, Long Y et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med 2024;16:12. 10.1186/s13073-024-01283-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Long Y, Ang KS, Li M et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun 2023;14:1155. 10.1038/s41467-023-36796-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Crosetto N, Bienko M, van Oudenaarden A. Spatially resolved transcriptomics and beyond. Nat Rev Genet 2015;16:57–66. 10.1038/nrg3832 [DOI] [PubMed] [Google Scholar]
- 7. Baysoy A, Bai Z, Satija R et al. The technological landscape and applications of single-cell multi-omics. Nat Rev Mol Cell Biol 2023;24:695–713. 10.1038/s41580-023-00615-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Chen Y, Qian W, Lin L et al. Mapping gene expression in the spatial dimension. Small Methods 2021;5:2100722. 10.1002/smtd.202100722 [DOI] [PubMed] [Google Scholar]
- 9. Zhuang X. Spatially resolved single-cell genomics and transcriptomics by imaging. Nat Methods 2021;18:18–22. 10.1038/s41592-020-01037-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Larsson L, Frisen J, Lundeberg J. Spatially resolved transcriptomics adds a new dimension to genomics. Nat Methods 2021;18:15–8. 10.1038/s41592-020-01038-7 [DOI] [PubMed] [Google Scholar]
- 11. Si Z, Li H, Shang W et al. SpaNCMG: improving spatial domains identification of spatial transcriptomics using neighborhood-complementary mixed-view graph convolutional network. Brief Bioinform 2024;25:bbae259. 10.1093/bib/bbae259 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Chen KH, Boettiger AN, Moffitt JR et al. Spatially resolved, highly multiplexed RNA profiling in single cells. Science 2015;348:aaa6090. 10.1126/science.aaa6090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Eng C-HL, Lawson M, Zhu Q et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqfish. Nature 2019;568:235–9. 10.1038/s41586-019-1049-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Codeluppi S, Borm LE, Zeisel A et al. Spatial organization of the somatosensory cortex revealed by osmfish. Nat Methods 2018;15:932–5. 10.1038/s41592-018-0175-z [DOI] [PubMed] [Google Scholar]
- 15. Stahl PL, Salmen F, Vickovic S et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 2016;353:78–82. 10.1126/science.aaf2403 [DOI] [PubMed] [Google Scholar]
- 16. Rodriques SG, Stickels RR, Goeva A et al. Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science 2019;363:1463–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Chen A, Liao S, Cheng M et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 2022;185:1777–1792.e21. [DOI] [PubMed] [Google Scholar]
- 18. Yang H, Liao S, Liu W et al. Integrated spatial transcriptomic and proteomic analysis of fresh frozen tissue with stereo-cite-seq. Eur J Hum Genet 2024;32:1205. [Google Scholar]
- 19. Ben-Chetrit N, Niu X, Swett ADD et al. Integration of whole transcriptome spatial profiling with protein markers. Nat Biotechnol 2023;41:788–93. 10.1038/s41587-022-01536-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Zhang D, Deng Y, Kukanja P et al. Spatial epigenome-transcriptome co-profiling of mammalian tissues. Nature 2023;616:113–22. 10.1038/s41586-023-05795-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Sun S, Liu J, Li G et al. DeepGFT: identifying spatial domains in spatial transcriptomics of complex and 3D tissue using deep learning and graph fourier transform. Genome Biol 2025;26:153. 10.1186/s13059-025-03631-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Jiang R, Li Z, Jia Y et al. SINFONIA: scalable identification of spatially variable genes for deciphering spatial domains. Cells 2023;12:604. 10.3390/cells12040604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wang Z, Geng A, Duan H et al. A comprehensive review of approaches for spatial domain recognition of spatial transcriptomes. Brief Funct Genomics 2024;23:702–12. 10.1093/bfgp/elae040 [DOI] [PubMed] [Google Scholar]
- 24. Zhao E, Stone MR, Ren X et al. Spatial transcriptomics at subspot resolution with bayesspace. Nat Biotechnol 2021;39:1375–84. 10.1038/s41587-021-00935-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Abbasi K, Razzaghi P. Incorporating part-whole hierarchies into fully convolutional network for scene parsing. Expert Syst Appl 2020;160:113662. 10.1016/j.eswa.2020.113662 [DOI] [Google Scholar]
- 26. Norouzi R, Norouzi R, Abbasi K et al. DFT_ANPD: a dual-feature two-sided attention network for anticancer natural products detection. Comput Biol Med 2025;194:110442–2. [DOI] [PubMed] [Google Scholar]
- 27. Shiwen W, Sun F, Wentao Zhang X et al. Graph neural networks in recommender systems: a survey. ACM Comput Surv 2023;55:1–37. 10.1145/3535101 [DOI] [Google Scholar]
- 28. Jian H, Li X, Coleman K et al. SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods 2021;18:1342–51. [DOI] [PubMed] [Google Scholar]
- 29. Chang X, Jin X, Wei S et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res 2022;50:e131. 10.1093/nar/gkac901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun 2022;13:1739. 10.1038/s41467-022-29439-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Shi X, Zhu J, Long Y et al. Identifying spatial domains of spatially resolved transcriptomics via multi-view graph convolutional networks. Brief Bioinform 2023;24:1–14. 10.1093/bib/bbad278 [DOI] [PubMed] [Google Scholar]
- 32. Long Y, Ang KS, Sethi R et al. Deciphering spatial domains from spatial multi-omics with spatialglue. Nat Methods 2024;21:1658–67. 10.1038/s41592-024-02316-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Abir AR, Nayeem MA, Sohel Rahman M et al. GTG-ACO: graph transformer guided ant colony optimization for learning heuristics and pheromone dynamics for combinatorial optimization. Swarm Evol Comput 2025;99:102147. 10.1016/j.swevo.2025.102147 [DOI] [Google Scholar]
- 34. Abir AR, Tahmid MT, Saifur Rahman M. LOCAS: multilabel mRNA localization with supervised contrastive learning. Brief Bioinform 2025;26:bbaf441. 10.1093/bib/bbaf441 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Abir AR, Dip SA, Zhang L. UnCOT-AD: unpaired cross-omics translation enables multi-omics integration for alzheimer’s disease prediction. Brief Bioinform 2025;26:bbaf438. 10.1093/bib/bbaf438 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Hao Y, Hao S, Andersen-Nissen E et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573–3587.e29. 10.1016/j.cell.2021.04.048 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Ghazanfar S, Guibentif C, Marioni JCC. Stabilized mosaic single-cell data integration using unshared features. Nat Biotechnol 2024;42:284–92. 10.1038/s41587-023-01766-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Ashuach T, Gabitto MI, Koodli RV et al. MultiVI: deep generative model for the integration of multimodal data. Nat Methods 2023;20:1222–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Argelaguet R, Arnol D, Bredikhin D et al. MOFA plus: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol 2020;21:111. 10.1186/s13059-020-02015-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Velten B, Braunger JM, Argelaguet R et al. Identifying temporal and spatial patterns of variation from multimodal data using mefisto. Nat Methods 2022;19:179–86. 10.1038/s41592-021-01343-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Minoura K, Abe K, Nam H et al. A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell Rep Methods 2021;1:100071. 10.1016/j.crmeth.2021.100071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Stankiewicz LN, Salim K, Flaschner EA et al. Sex-biased human thymic architecture guides T cell development through spatially defined niches. Dev Cell 2025;60:152–169.e8. 10.1016/j.devcel.2024.09.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Tian T, Zhang J, Lin X et al. Dependency-aware deep generative models for multitasking analysis of spatial omics data. Nat Methods 2024;21:1501–13. 10.1038/s41592-024-02257-y [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The source code is available at https://github.com/1999mjc/GATCL.

































