Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2025 Jun 26;53(12):gkaf536. doi: 10.1093/nar/gkaf536

A comprehensive review of spatial transcriptomics data alignment and integration

Muiz Khan 1, Suzan Arslanturk 2, Sorin Draghici 3,4,
PMCID: PMC12199153  PMID: 40568931

Abstract

Spatial data acquisition technologies enable high-throughput quantification of molecular expression in tissue sections maintaining spatial context information. However, performing downstream analysis on a whole tissue section requires the alignment and integration of multiple tissue slices. This is a nontrivial task due to tissue heterogeneity and plasticity. Although manual solutions exist, they are time-consuming and require technical expertise. Hence, automated and robust alignment and integration of multiple slices within and across datasets, individuals, and experiments becomes essential. This study aims to (i) present a comprehensive review of methodologies for spatial transcriptomics (ST) data alignment and integration, (ii) explain the problem, its scope and challenges, and (iii) propose a general pipeline. We review 24 tools addressing multi-slice ST alignment and integration, and tackling key challenges through downstream validation. Tools focusing solely on single-slice ST analyses or multi-omics integration are excluded. We categorize these approaches by methodology (statistical mapping, image processing and registration, and graph-based) in accordance with the generalized pipeline. We evaluate their strengths, limitations, and real-world applications based on task scope and their potential to advance biological insights. Despite improved spatial resolution and 3D tissue reconstruction, significant challenges persist in achieving robust alignment and integration across heterogeneous tissue slices.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

High-throughput data acquisition techniques in biological experiments have revolutionized the analysis, interpretation, and visualization of biological data at various levels. However, traditional methods such as bulk “omics” approaches offer only a general perspective of cell populations within tissues, primarily emphasizing the dominant population. As a result, rare sub-populations can be masked by the more abundant populations. Furthermore, spatial information is completely disregarded. In contrast, single-cell “omics” methods provide higher resolution and capture rare cell areas, but they require cell isolation, which also disrupts spatial context information. Spatial transcriptomics (ST) technologies have significantly advanced our capacity to quantify gene expression within tissue sections while preserving crucial spatial context information. This is achieved by slicing each tissue section into multiple thin slices. These thin slices are spatially represented in a two-dimensional (2D) coordinate space, with each data point representing a spot consisting of one to 100 cells and their corresponding messenger ribonucleic acid (mRNA) expression values. However, it is important to note that a single 2D coordinate space only represents a single slice of the tissue section, limiting the comprehensive analysis of the entire tissue context. Several studies have demonstrated better biological insights derived from downstream analyses of single ST tissue slices compared to single-cell-RNA (scRNA) and bulk-RNA analyses, such as in cell-type identification and spatial clustering analysis [1–6]. Still, there is a theoretical concern regarding the potential reduction in statistical power due to low gene expression coverage and the neglect of spatial relationships in the 3D tissue context. To ensure robust statistical power and maximize the utility of ST data, downstream analyses should be performed on multiple tissue slices, thus capturing the complete tissue context and enhancing the understanding of underlying biological mechanisms.

Before conducting downstream analyses on multiple tissue slices, it is important to address two key aspects. First, aligning and integrating tissue slices across the same or different datasets, experiments, or individuals can provide higher resolution and, consequently, improved expression coverage. Second, multiple consecutive tissue slices from the same or different dataset or experiment can be aligned and integrated to achieve a comprehensive 3D holistic view of the entire tissue section. This 3D reconstruction preserves the spatial relationships between tissue structures across slices, offering insights into cellular organization, interactions, and spatial gradients of gene expression that cannot be captured in isolated 2D slices. Such a holistic perspective is critical for studying complex tissue architectures and biological processes in both healthy and diseased states.

Advanced ST technologies such as Visium from 10× Genomics allows expression measurement of up to 5000 spots per slice, with each spot in the 2D space capturing between 1 and 30 cells [7]. Although this is a significant improvement over bulk RNA expressions while capturing the spatial information, the gene expression resolution and scale are lower than in scRNA expression. To achieve better resolution in the ST data, we can leverage expression coverage of multiple datasets, experiments or individuals. By consolidating gene expression data from across various spots, it can provide a richer and more comprehensive understanding of cellular interactions and functions within the tissue. This task is nontrivial as it requires the consideration of spatial relationships and entails inherent challenges such as biological and technical variability, spatial warping, and differences in experimental protocols across ST platforms. On the other hand, the 2D nature of the sliced tissue sections leads to the loss of z-axis information, making it challenging to accurately reconstruct the 3D structure of the tissue. Addressing these challenges and solving this problem manually would require technical expertise and significant amount of time. It is crucial to solve this computationally in order to fully utilize the potential of ST technologies and advance our understanding of complex biological processes. Therefore, the development of a robust automated methodology for ST data alignment and integration is paramount.

The ST alignment and integration task has emerged as a prominent area of study within the field of ST, gaining significant traction lately. Currently, there exist at least 24 different methodologies (as shown in Table 2) proposed to address the specific challenge of aligning and integrating multiple tissue slices in ST (a few of them are still in preprint). These methods aim to tackle different versions of the problem, such as aligning and integrating full or partial consecutive or nonconsecutive tissue slices and different tissue regions from different datasets, experiments, or individuals and so on. To demonstrate their effectiveness in better understanding biological phenomena through downstream analyses, these methods provided results and comparisons on real-life and simulated datasets.

Table 2.

List of approaches in different categories (total 24 tools) along with datasets, evaluation measures, and alignment scope

Category Model Tools Data references Downstream analysis Evaluation measures Scope
Statistical mapping: 10 tools Bayesian inference (BI) Splotch [28] Mouse Lumbar Spinal Cord [29], Olfactory Bulb [30] Spatial Profiling Gene Expression Coverage Homogeneous
    GPSA [16] Breast Cancer [30], Mouse Brain Serial Section [31], Mouse Hippocampus [32] Spatial Differential Expression Analysis, 3D Mapping Alignment Error & Accuracy Homogeneous
    Eggplant [33] Human Developmental Heart, Breast Cancer & Mouse Olfactory Bulb [30], Mouse Hippocampus [32] Spatial Differential Expression Analysis Alignment Accuracy, Precision-Recall Homogeneous
  Cluster-aware PRECAST [20] Human Brain [30], Mouse Olfactory Bulb [32] Spatial Clustering, Spatial Profiling, Spatial Trajectory Clustering Accuracy Homogeneous, Heterogeneous
  Optimal transport PASTE [7] Human Brain [30], squamous cell carcinoma [34] Spatial Clustering, Spatial Differential Expression Analysis, 3D Mapping Alignment Accuracy, Spatial Coherence Score Homogeneous
    PASTE2 [14] Human Brain [30, 35] Identification of Spatial Domains, 3D Mapping Alignment Accuracy Homogeneous, Heterogeneous
    OTVI [27] Human Brain [36] Spatial Clustering Alignment Accuracy Homogeneous, Heterogeneous
    DeST-OT [18] Axolotl Brain Sections [37] Cell Growth Analysis Growth Distortion Homogeneous, Heterogeneous
    ST-GEARS [38] Human Brain [30, 35], Mouse Hippocampus [32] Spatial Domain Identification, 3D Mapping Alignment Accurace Homogeneous, Heterogeneous
    GraphST [39] Mouse Breast Cancer & Brain Tissue [30] Spatial Domain Identification Cluster accuracy Homogeneous
Image processing & registration: 4 tools Landmark-free STIM [22] Mouse Brain [40] Spatial Profiling, Cell-type Identification Gene Expression Coverage, Clustering Accuracy Homogeneous
    STaCker [23] Mouse Brain & Olfactory Bulb [30] Spatial Clustering Alignment Error, Label Mapping Dice Score Homogeneous, Heterogeneous
    STalign [12] Human [30] & Mouse [41] Brain, Breast Cancer [42] Cell-type Identification, 3D Mapping Gene Enrichment Scores, Clustering Accuracy Homogeneous, Heterogeneous
  Landmark-based STUtility [21] Breast Cancer, Mouse Brain [30, 35] Spatial Clustering, Spatial Profiling, 3D Mapping Clustering Accuracy, Gene Expression Coverage Homogeneous
Graph-based: 10 tools Contrastive learning SpatiAlign [17] Human Brain [30], Mouse Hippocampus [32], Olfactory Bulb [30] Spatial Profiling, Cell-type Identification, Cellular Trajectory Gene Expression Coverage, Clustering Accuracy Homogeneous, Heterogeneous
    STAligner [10] Human Brain [30], Mouse Olfactory Bulb [32, 43], Embroyo [43] Spatial Profiling, Cell-type Identification, Cellular Trajectory Gene Expression Coverage, Clustering Accuracy 3D Mapping Homogeneous, Heterogeneous
    Graspot [19] Human Brain & Human Developmental Heart [30] Spatial Clustering, Cell Growth Alignment & Clustering Accuracy Homogeneous, Heterogeneous
    ATAT [13] Human Brain [30], Colon & Stomach Samples [13] Spatial Domain Identification, Gene Expression Trajectories Gene Expression Coverage Homogeneous, Heterogeneous
    MaskGraphene [44] Human Brain [30], Mouse Hypothalamus [45] Spatial Domain Identification Alignment & Clustering Accuracy Homogeneous, Heterogeneous
    STAIR [46] Human Brain [30], Mouse Olfactory Bulb [32, 43] Spatial Clustering, 3D Mapping Gene Expression Coverage, Alignment Accuracy Homogeneous, Heterogeneous
  Graph matching SLAT [15] Human Brain [30, 32, 43] Spatial Clustering, Cellular Growth Analysis Clustering Accuracy, Gene Expression Coverage Homogeneous, Heterogeneous
    SPIRAL [24] Human Brain [30, 32, 43, 47] Spatial Clustering, Spatial Autocorrelation Clustering Accuracy, Gene Expression Coverage Homogeneous, Heterogeneous
    BiGATAE [48] Human Brain[30], squamous cell carcinoma[34] Spatial Clustering, Spatial Domain Identification Clustering Accuracy Homogeneous
  Adversarial learning SPACEL[25] Human Brain[30, 32, 43] Cell-type Identification, 3D Mapping Clustering Accuracy Homogeneous, Heterogeneous

Due to the increasing importance of ST alignment and integration, there is a need for comprehensive reviews that address the problem itself, along with its scope, significance and challenges. To date, only a handful of review papers exist; notably, one by Liu et al. [8], which compares a mere five alignment and integration tools, and another by Hu et al. [9], which benchmarks five alignment tools and five integration tools separately. There is a need for a formal description of this problem, its significance, and the challenges it presents. Additionally, there is a crucial demand for outlining the scope of the task and exploring potential solutions that could enhance integrative downstream analyses in spatial transcriptomics (ST).

In this study, we conduct a comprehensive review to explore the current landscape of methodologies for ST data alignment and integration. Our primary objective of this review is to explain the problem, its scope and underlying challenges, then propose a generalized pipeline to address these challenges. For this purpose, we critically evaluate various tools and their methodologies to ascertain their respective strengths, limitations, and applicability. Through this review, we aim to provide researchers with a comprehensive overview of the available methodologies, enabling them to tackle the challenges of spatial data analysis more effectively. Ultimately, this review seeks to facilitate the construction of comprehensive ST atlases and promote a deeper understanding of the molecular mechanisms underlying development, physiology, and disease.

Review selection criteria

We searched for articles describing ST alignment and integration in PubMed and Scopus. Our focus was on studies that include the following: (i) identification of the problem and its significance, (ii) proposals for solutions addressing multi-slice ST alignment and integration, (iii) demonstration of downstream analyses to validate results on integrated slices, and (iv) explanations of how integrated slices generated through ST alignment and integration provide enhanced biological insights and information about the spatial context. Tools focusing solely on single-slice ST analysis or multi-omics data integration tools, such as those integrating ST data with scRNA-seq data, were excluded from this review.

Spatial data alignment and the integration problem

In this section, a detailed discussion on the problem itself is presented. First, we describe and formulate the problem using a high-level definition of the key components of the alignment and integration process. Then, we discuss in depth the inherent challenges, scope and significance of this particular problem in order to elucidate the necessity for a robust solution. Subsequently, we explore a potential general approach to solve the problem. This generalized pipeline serves as a framework for categorizing different methodologies based on their implementation of each component within the pipeline. In the latter sections, we introduce several tools that offer solution to the problem by implementing these components with different approaches. We then discuss their advantages and disadvantages in terms of their capabilities to address the existing challenges.

Problem description

The input data consists of spatially-resolved transcriptomics data obtained from several tissue slices. Each slice typically contains gene expression measurements at distinct spatial locations. This results in a high-dimensional representation of an individual slice, where each spatial location (a data point in 2D space) is defined by x and y coordinates and associated with a vector of gene expression values, and additional features or metadata often provided in the dataset, including cell-type annotations or morphological markers. In addition to gene expression data, histology images are often used as input, which are microscopic images of tissue sections stained to reveal cellular and structural features, e.g. hematoxylin and eosin (H&E)-stained slides. These images are often used to derive structural attributes, including cell density, tissue boundaries, and histological patterns, contributing to the multi-modal nature of ST data.

The alignment and integration problem can be stated as the task of identifying between the high-dimensional ST data matrices obtained from the multiple tissue slices and integrating them into a common coordinate system (CCS). This integration process must maintain the spatial relationships and gene expression patterns across all tissue slices.

More precisely (as shown in the Fig. 1), the input consists of high-dimensional ST data matrices, denoted as X = {X1, X2, ..., Xn}, where each Xi represents the data from the i-th tissue slice, and n is the number of slices. Each Xi is a matrix of size mi × (ki + 2), where mi is the number of spatial locations and ki is the number of gene expression features for slice i. Two extra columns are for storing the x and y coordinates for the spatial location. The desired output is denoted as Y, which encapsulates the gene expressions from each input tissue slice while accurately representing spatial locations by embedding gene expression values. Y is essentially another matrix with all the common gene expression and common spots after aligning and integrating the input slices. The size of Y is s × (l + 2), where s is the number of common spots and l denotes the consolidated gene expression features. These common spots can be thought of as “pixels” in the CCS. At each spatial location (pixel) (x, y) in Y, the gene expression values from all slices Xi that map to this location are integrated using a weighted averaging approach:

Figure 1.

Figure 1.

Problem description: (A) Raw data for a single input slice Xi, featuring a histology image alongside its spatially resolved data that displays the gene expression measured at each spot in this slice (spatial location). (B) The problem is to take the high-dimensional ST data matrices provided as input and represented as X = {X1, X2, …, Xn} (where each Xi corresponds to the data from the i-th tissue slice and n is the total number of slices), and produce the output Y which uses a CCS to integrate gene expression data by using a weighted average (Equation 1) from the input tissue slices, and preserves the spatial locations. (C) The matrix representation of the raw data: the inputs are systematically transformed into a structured matrix to facilitate the analysis process. Each matrix Xi includes mi spatial locations and ki gene expression features for slice i, plus two additional columns for the x and y coordinates. (D) The matrix representation of the output. This uses the CCS to represents all spots mapped on the same spatial location in the CCS (s) and consolidated expression values from the all genes (l) from all the input slices.

graphic file with name TM0001.gif (1)

where wi represents a weight based on the quality or relevance of slice Xi. This ensures that contributions from different slices are proportionally represented while maintaining consistency in spatial and expression information.

The actual mapping between these inputs and output is modeled as a transformation function, fi: XiY. To implement this transformation effectively across each sample slice, it is imperative to address the computational cost associated with mapping each data point to the CCS, while preserving the spatial relationships and gene expression patterns. A loss function quantifies and penalizes any discrepancies in the transformation process, with the ultimate aim to minimize the total transformation cost, thereby achieving an efficient and accurate embedding of all tissue slices into the CCS. In other words, the goal of the optimization is to compute the function fθ that maps each input Xi to the output Y. The optimization problem can be formulated as:

graphic file with name TM0001a.gif (2)

where Inline graphic is the loss function, θ represents the shared parameters of the function fθ, Xi is the data corresponding to one slice, and Y is the mapped data in the CCS. This formulation ensures a consistent transformation for all slices Xi into the CCS Y. In the context of ST data analysis, the loss function is designed to capture spatial relationships between samples. For example, it penalizes large differences in gene expression between spatially close samples more heavily than differences between spatially distant samples. This way, the model is encouraged to learn spatially smooth representations that respect the underlying spatial structure of the tissue.

Various methods outlined in subsequent sections utilize distinctive approaches to tackle the stated problem, each employing strategies specifically designed to address different aspects of it. The details of these strategies, crucial to resolving the problem, are not included in the problem description section but are thoroughly explored in later sections. For example, these methodologies differ in their techniques for merging gene expression values across multiple slices, consolidating multiple spots into a single coordinate within the CCS, and in the precise formulation of the loss function.

Problem scope, significance, and inherent challenges

In this subsection, we outline the scope, discuss the significance of solving the problem, and address a few of the inherent challenges associated with the problem.

Scope

To outline the scope of tasks involved in aligning and integrating ST data, it is essential to consider two distinct types: aligning and integrating homogeneous tissue slices, and aligning and integrating heterogeneous tissue slices. Table 1 summarizes the foundational aspects of these two types.

Table 1.

Spatial alignment & integration of multiple tissue slices task scope

Aspect Homogeneous alignment Heterogeneous alignment
Within dataset Across dataset Within dataset Across dataset
Input source Same tissue section, spatial region(s), consecutive slices Same or Similar regions, different samples, same tissue section, may not be consecutive slices Same tissue section, similar or different region(s), nonconsecutive tissue slices Different or similar sections, spatial regions, or time points. Nonconsecutive tissue slices
Overlap between slices Full overlap Maximum overlap Full or partial overlap Partial or no overlap
Protocols and configurations used for data collection Same Different Same Different
Mapping Pairwise spot-to-spot mapping usually is sufficient Pairwise mapping is sufficient, may need batch effect removal Requires identifying spatial relationships, domains, or clusters Requires spatial domain recovery, just pairwise mapping is not sufficient
Additional adjustments before alignment Rarely needs explicit recovery of spatial regions or batch effect removal Minor adjustments, batch effect removal needed to address small variability Often requires explicit recovery of spatial regions to find overlaps Explicit recovery of spatial regions and structural differences are needed to find overlap

Homogeneous alignment within the same dataset or experiment involves aligning and integrating tissue slices that share similar characteristics, such as consecutive slices from the same spatial region, collected using the same protocol, and with full spatial overlap. When aligning and integrating data across different datasets, tissue samples may differ because they originate from various experiments, or individuals. Despite differences, these samples often exhibit maximum overlap between similar spatial regions. Maximum overlap refers to regions with high structural or morphological similarity, including cellular organization, tissue boundaries, or gene expression patterns, which serve as a foundation for effective homogeneous alignment across diverse datasets. For example, slices from different datasets representing adjacent sections of the same organ, such as the liver or brain, may share similar structural features, allowing them to be aligned and integrated into a CCS for downstream analysis. Pairwise mapping between spots from multiple slices is typically sufficient, eliminating the need for explicit recovery of spatial regions in homogeneous alignment since the regions are identical and nearly fully overlap.

Heterogeneous alignment, in contrast, deals with aligning and integrating tissue slices that most likely differ in spatial regions, protocols or technical configurations, and samples. This includes aligning nonconsecutive slices from different locations or time-points, where spatial regions may partially or fully overlap. For example, aligning heart tissue slices taken at different stages of development, such as from early embryonic stages to mature adult tissue, requires handling variations in structure, cell composition, and spatial organization as the heart grows and changes over time, but these slices still exhibit overlap or share similar spatial regions [10]. When samples are collected from across datasets or experiments, they often share similar spatial regions with partial overlap. So, explicit recovery of spatial regions is necessary. This can involve identifying domains, clusters, or cell types where there are similarities between regions or domains. Only performing pairwise mapping between spots is usually insufficient, and this type of task requires prior identification of similarities between regions or domains. Additionally, batch effect removal is needed to ensure the accuracy of the alignment and integration process [10, 11].

Significance

Both types of tasks outlined in Table 1 aim to improve resolution and enhance gene expression coverage in ST data. This process helps eliminate redundancies and noise in the raw data, making it more suitable for downstream analysis. Additionally, aligning and integrating the data preserves the spatial context, which is crucial for gaining deeper insights into biological processes. The alignment and integration of ST data can utilize the full potential of the dataset, improving spatial domain identification and enabling a comprehensive 3D view of the tissue section, which can potentially lead to the visualization of a complete tissue atlas. This process greatly benefits downstream analyses such as tissue profiling [7, 12], cellular trajectory analysis [13], and cell-type identification [7, 14, 15]. Furthermore, as ST data can be collected at different time-points or stages, it allows for the observation and analysis of tissue development or growth, providing insights into cell progression, disease dynamics, and the impact of marker genes [10, 13, 16–19].

Inherent challenges

ST data has some inherent characteristics due to the nature of data acquisition techniques which increase computational complexities. One key characteristic is the high-dimensionality and multi-modality of ST data, which reflects the diverse gene expression profiles and spatial information captured within tissues [7]. ST datasets also exhibit variability, noise, and uncertainty, which can arise from various factors such as experimental conditions, tissue preparation, and sequencing artifacts [16]. They may be collected over different time-points and represent varying tissue architectures or conditions [10]. Spatial warping also refers to distortions or deformations in the spatial coordinates of the data points, often caused by tissue processing or imaging techniques [16]. Moreover, ST data have spatial dependency and spatial heterogeneity [2, 20], meaning that neighboring regions in the tissue may have similar gene expression profiles, and spatially distinct regions may exhibit different expression patterns. That means they have inherent correlation among different modalities within a tissue context. They may also contain canonical structures, such as tumors or specific tissue architectures [13]. All of these factors should be considered to deal with the alignment and integration task. Figure 2 summarizes all the inherent challenges of ST data.

Figure 2.

Figure 2.

An overview of inherent challenges in ST data, including data complexities (high dimensionality and multi-modality), structural complexities (variability in spatial patterns due to biological and environmental factors), spatial factors (warping, heterogeneity, and dependencies), and technical variabilities (configuration issues, imaging artifacts, noise, and resolution).

A general pipeline

To formulate the computational solution for aligning and integrating multiple tissue slices, we define a series of steps to find a transformation or mapping that aligns spatial locations across multiple slices. Figure 3 demonstrates basic components of a general pipeline to solve the problem. The input data consist of a 2D spatial grid or coordinate space representing each tissue slice, gene expression level for each of the coordinate spot. The input may contain several tissue slices. It is also important to note that various methods deal with histology images [14, 21–23] as an additional input data. Histology images provide complementary spatial information that can enhance the alignment and integration. Dealing with a 2D spatial grid, gene expression, and histology images as input data increase the computational complexity and challenges in the alignment and integration task.

Figure 3.

Figure 3.

A general pipeline to solve the multi-slice alignment and integration problem with sequential steps involved in processing and analyzing ST data from raw inputs to downstream applications. The workflow highlights the required and optional components at each step, along with the commonly used methods for each component. It begins with (A) the input phase, where gene expressions, spatial coordinates, and histology images are gathered. (B) In the pre-processing stage, these inputs undergo normalization and data representation to standardize the data and make it amenable for further analysis. (C) The preparation stage focuses on reducing dimensionality and extracting relevant features, while identifying spatial dependencies critical for subsequent alignment. (D) During the alignment phase, low-dimensional embeddings are mapped to each other, optimized via a cost function to ensure accurate spatial registration. (E) Integration leads to the creation of a CCS and shared embeddings, which are crucial for accurate multi-slice analysis. (F) The output then undergoes visualization, including 3D reconstruction to contextualize the data spatially and visually. Finally, (G) the downstream phase leverages the processed data for various applications such as analyzing developmental stages, profiling tissues, exploring disease dynamics, and identifying spatial domains.

The pipeline starts with representing the raw input data. In recent studies, data matrices were used to represent the spatial grid or 2D coordinate system [7]. Another approach is to use a graph-based representation [10, 13, 15, 17, 19, 24, 25], where spots are represented as nodes in a graph and edges represent spatial relationships between spots. This is also called the neighborhood graph. For the image-based methods, the extracted features from the histology images are often represented as binary masks or feature vectors, which are then used as input to the alignment and integration task.

These high-dimensional data matrices can be challenging to analyze and visualize. The following step in the pipeline is to reduce the complexity of the high-dimensional input data. In order to extract only the most informative features and make best use of the input data, some pre-processing should be taken place. Normalization and standardization can ensure that gene expression levels are comparable across different spots. Common normalization methods include total count normalization, where the total number of counts in each spot is equalized, and scaling normalization, where the counts are scaled to a common total count or to the total count of a reference spot. Standardization methods, such as z-score normalization, are used to scale the expression values to have a mean of zero and a standard deviation of one. Another important step is dimensionality reduction. Dimensionality reduction techniques, such as principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), or Uniform Manifold Approximation and Projection (UMAP) are used to reduce the data to a lower-dimensional space while retaining key features. PCA is widely used for dimensionality reduction for its ability to capture global variance with relatively little distortion [26]. t-SNE and UMAP, although primarily designed for visualization, can still serve as valuable tools for uncovering complex, nonlinear patterns within the data. These methods prioritize local neighborhood structures, making them highly effective at revealing latent clusters, manifold structures, and spatial relationships that may not be immediately apparent in a higher-dimensional space. This ability to highlight nonlinear structures makes t-SNE and UMAP particularly useful for understanding underlying biological variations. However, it is important to note that these methods are more sensitive to hyperparameters and may introduce greater distortions than PCA [26]. Graph representation based methods use encoders to reduce the dimensionality of the input data by learning a compressed representation that captures the essential features of the data in a lower-dimensional latent space [15, 24]. Low-dimensional data can be useful for improved computational efficiency in downstream analyses.

Once the data has been reduced to a lower-dimensional space, embeddings can be generated for each tissue slice. These embeddings capture both the spatial relationships between spots and the gene expression patterns within each spot. By aligning the embeddings of different tissue slices, an optimal transformation can be found that preserves spatial relationships [7, 14, 27]. Finally, the aligned embeddings can be integrated into a CCS while maintaining spatial relationships and gene expression patterns. The CCS represents shared low-dimensional embeddings between the aligned tissue slices. Based on the data representations, this can be data matrices or neighborhood graphs. CCS can be leveraged for further downstream analyses.

Tools

In this section, we present several computational tools developed to tackle the alignment and integration of multiple tissue slices. These tools have addressed various aspects of the defined problem scope, as outlined in Table 1. We categorize them into three main groups based on their underlying methodologies (shown in Table 2): Statistical mapping (SM)-based methods, image processing and registration (IPR)-based methods, and graph-based (GB) methods. Figure 4 represents the schematic overview of the problem and demonstrates how each of these groups tackles it. The first group, SM-based methods, solely rely on statistical models to capture low-dimensional embeddings from the input data and align and integrate them into a CCS. IPR-based methods consider images either as an input or a supporting aid to find the proper location for alignment. The last group, GB methods, represents the input data as mathematical graphs and try to find the optimal mapping between the nodes based on their spatial relationship. Although a few models overlap within the groups defined here, we categorize them based on the dominant computational methods. We evaluate each method’s strengths and weaknesses, considering their effectiveness in handling the complexities of ST data.

Figure 4.

Figure 4.

A comprehensive overview of the alignment and integration problem using various methodological approaches, which we have categorized into three distinct groups: SM-based methods, IPR-based methods, and GB methods. (A) A visual representation of tissue sections segmented into multiple slices. (B) Diverse input data representations employed by SM, IPR, and GB groups, highlighting their unique approaches (C) Visualization of the mapping techniques applied to the input representations. (D) Various forms of the outputs in the CCS, showcasing how each methodology consolidates and visualizes the integrated spatial and gene expression data.

Statistical mapping

The first group, SM-based methods, rely on statistical models in each step of the alignment and integration process, including, capture low-dimensional embeddings from input data, establish relationships among these embeddings, and finally integrate them into a unified space. These methods typically consider input data (tissue slices) as samples from a distribution and utilize statistical techniques such as dimensionality reduction, clustering, and probabilistic modeling to extract meaningful features. Moreover, they leverage spatial regression (SR) methods to find the relationship between gene expressions and spatial locations. Using these relationships, they optimize a cost function to align one distribution to the other, or apply statistical inference to project them onto a shared space. It is crucial for these methods to ensure that the aligned and integrated output preserves both the spatial relationships and the gene expression patterns as the input.

The high-dimensional and multi-modal nature of ST data increases computational complexities. Additionally, the hierarchical spatial structure introduces heterogeneity and variations in the distributions. The pre-processing step of the ST data alignment & integration task requires proper data preparation and representation that can also preserve the spatial relationships and expression counts. There may also be need to perform normalization tasks such as log transformation and normalization with the total number of expression counts as done by [10, 16]. This process helps to stabilize the variance across genes and tissue regions. There can be other variations or distortions that occur in tissue slices during the data collection and imaging process. These slice-specific deformations can be caused by factors such as tissue processing, slicing thickness, and imaging conditions [16]. Adjustments made during the pre-processing stage to account for these slice-specific deformations can be helpful for nonlinear and partial alignment between slices.

SM-based methods are particularly useful for dealing with these complex data representations or high-dimensional gene expression profiles as they can effectively reduce the dimensionality of the data while preserving its spatial information. Various methods use techniques like PCA [20], t-SNE, or UMAP. These methods can efficiently reduce the dimensionality of the data and extract key features that capture the variability in gene expression across the spatial locations.

To extract the spatial heterogeneity and spatial dependence between the neighboring locations, spatial autocorrelation analysis can be performed. Conditional autoregressive (CAR) model was applied by Splotch [28] in their hierarchical model. PRECAST [20] also used an intrinsic CAR model to extract the dependence of each spot on its immediate neighbor. Spatial clustering can help to identify spatially coherent regions in the data. Based on the spatial dependencies, identifying spatial clusters or groups of spatial locations that exhibit similar gene expression patterns can be helpful in the alignment and integration task. This can be also helpful to gain insights about different microenvironments, cell types, or regions. PRECAST [20] uses Potts Model in spatial clustering step to promote spatial smoothness in the cluster label space. This model encourages neighboring spots to share the same cluster label, which is biologically intuitive as adjacent tissue spots are more likely to belong to the same cellular or tissue types. It uses Gaussian Mixture Model for clustering. SR can be used to model how gene expression levels vary across different locations (2D spots) within the same tissue slice. Understanding spatial variation within a tissue slice can reveal patterns and relationships in the gene expression data from each spot, which can then be leveraged to find similarities between other slices using this knowledge.

Various methods employ a Gaussian process (GP) regression. One advantage of using GP regression models is that they can incorporate prior knowledge or assumptions about the underlying relationship through their mean and covariance (kernel) function [49]. The mean function captures expected trends, such as linear growth, while the covariance function defines relationships between data points, enabling smoothness, periodicity, or discontinuities. The covariance matrix, derived from the kernel function, encodes pairwise similarities between data points and forms the core structure of the GP. By defining prior assumptions through the kernel, the covariance matrix incorporates prior assumptions about the data’s behavior, such as smoothness, periodicity, or spatial continuity, before any observations are made. When combined with observed data, the prior distribution is updated to form the posterior distribution, enabling the model to refine predictions while maintaining biologically plausible structures and providing uncertainty estimates. GPSA [16] incorporates a mean function that accounts for spatial distortions, such as spatial warping and misalignments between tissue slices. This ensures that GP not only models the underlying gene expression trends but also corrects for distortions within slices, improving the accuracy of alignment and integration while effectively capturing complex spatial dependencies in ST data. Using landmarks as prior information, Eggplant [33] learns spatial relationship between feature values and landmarks. In this case, landmarks are basically manually selected data points that have spatial coherence. GPs are also useful for complex, nonlinear, noisy, and sparse data because they can provide uncertainty measures. As mentioned earlier, ST data is inherently nonlinear and may exhibit deformities due to its high-dimensional nature and collection from various tissue positions. GP regression can capture variations and irregularities in gene expression across different tissue regions. The uncertainty measures offered by GPs highlight areas with high variability or sparse data. For example, GP regression facilitates a more accurate and nuanced understanding of the spatial relationships and potential deformities as demonstrated by GPSA [16] which learns the spatial pattern and estimate slice-specific deformation using warping functions.

The mapping estimation of two different low-dimensional embeddings can be achieved by several probabilistic models. Bayesian inference models are useful for handling the uncertainty in the mapping process [16, 28, 33]. These models rely on a prior distribution, often defined by GPs, to capture assumptions about spatial patterns, such as smoothness or continuity. By incorporating this prior knowledge, Bayesian models can align multiple tissue slices while accounting for variability and distortions that may arise between slices. For instance, GPSA [16] uses GP-based warping functions to infer transformations that preserve spatial relationships across slices. Bayesian inference further refines these mappings by updating the prior with observed data to form a posterior distribution, which provides estimates of the most likely alignments along with measures of uncertainty. Methods often employ sampling techniques to explore the posterior distribution and identify the optimal mapping parameters by generating a range of plausible solutions based on the observed data and prior assumptions. This iterative sampling approach is particularly useful when the posterior distribution is complex or high-dimensional, ensuring that the model can capture variability and provide robust predictions. For example, Splotch [28] performs Bayesian inference using the adaptive Hamiltonian Monte Carlo sampler, which efficiently generates samples from the posterior to improve alignment accuracy.

Another way to find mappings between two tissue slices (distributions) can be optimal transport (OT), also known as the Wasserstein distance [50]. OT can compare two distributions and measure the minimum cost needed to transform one distribution into the other. Gradient-based optimizer is often used to iteratively improve this mapping and converge to the optimal solution. During this optimization process, a regularization step adjusts the parameters involved in OT so that it adapts and smooths the OT plan. This makes it more stable and better able to generalize to unseen data. The PASTE [7] algorithm uses fused Gromov–Wasserstein OT with a hyperparameter to balance transcriptional and spatial information in order to find the optimal mapping between spots in different slices that minimizes both transcriptional dissimilarity and spatial distance in the pairwise slice alignment technique. In center slice integration, PASTE [7] combines multiple slices into a single center slice with a low-rank transcript count matrix and high similarity to individual slices by useing fused Gromov–Wasserstein barycenter with non-negative matrix factorization (NMF) to find a consensus gene expression matrix. PASTE2 [14], an improvement over PASTE [7], allows for partial alignment by introducing a parameter in OT that controls the fraction of spots to align, providing more flexibility in aligning slices. PASTE [7] plays an important role addressing the problem of alignment and integration of multiple slices as well as inspiring other researchers to investigate this problem. Many other tools borrow their ideas and methodologies to implement solutions to their own version of the problem; such as, GraphST [39] uses PASTE’s [7] center alignment algorithm to align and integrate multiple tissue slices into a single consensus slice. DeST-OT [18] also uses OT to align multiple slices from different time points. OTVI [27] uses OT by minimizing a cost function that balances the dissimilarity between feature vectors in the CCS and each omics slice with the Euclidean distance between spot coordinates. This is subject to constraints that each mapping is a probabilistic coupling between spots. The optimization is performed using a Block Coordinate Descent algorithm that alternates between optimizing the representation matrix and optimizing the mappings. ST-GEARS [38] also utilizes the OT framework with a self-adaptive regularization strategy and the Conditional Gradient optimizer for optimization.

This mapping specifies which spots in one slice correspond to which spots in another slice, based on their transcriptional profiles and spatial locations. After finding the mapping, models use different mechanism to integrate or register spatially resolved gene expression measurements into a CCS. The integration step ensures that the gene expression data from different slices are aligned properly, allowing for accurate comparison and analysis of gene expression patterns across the tissue section. Splotch [28] uses specific anatomical features as landmarks for registration instead of directly registering histological images, which can introduce more variability. The second layer of GPSA [16] utilizes GP functions to model phenotypic readouts at each location in the CCS. GPSA’s approach allows it to capture complex relationships between spatial coordinates and phenotypic readouts, enabling accurate alignment and integration of spatial data. It models readouts as a weighted linear combination of GPs through a linear model of coregionalization, allowing joint modeling of multiple types of readout modalities across multiple slices. PRECAST [20] projects batch effects and biological effects onto the space of biological effects between cell/domain types to align cell/domain clusters across multiple tissue slides. Eggplant [33] focuses on transferring spatial features to a defined reference system by using GP to learn the relationships between feature values and distances to shared spatial landmarks, which are features consistently located across individuals. ST-GEARS [38] uses procrustes analysis to align two sets of points in Euclidean space which involves scaling, rotating, and translating one set of points to minimize the distance between corresponding points in the two sets. This method also introduced an elastic field to represent the deformation of the image or dataset, allowing for flexible alignment that can account for local variations in shape or intensity.

The methods discussed in this context primarily rely on statistical modeling for the alignment and integration task. While some methods, such as Eggplant [33], ST-GEARS [38], and PASTE2 [14], utilize histology images as supplementary information, the core task of alignment and integration is predominantly driven by statistical models rather than relying heavily on histology images as the primary input.

Image processing and registration

Methods in the second group utilize image-based inputs as well as employ fundamental IPR techniques for feature extraction in alignment and integration tasks. These methodologies undergo a sequence of pre-processing and registration procedures. The general approach involves extracting and processing spatial information from histological images to establish correspondences between different slices, facilitating the alignment of spatial features and structures across multiple slices. STIM [22] adapted alignment strategy originally designed for registering large electron and light microscopy datasets to achieve 3D alignment [51–53]. It was built upon the ImgLib2 [54] framework to provide comprehensive support for various ST data processing tasks. ImgLib2 [54] defines images as functions mapping coordinates in n-dimensional space to values, accommodating irregularly spaced datasets without imposing constraints on size, dimensionality, or data type, making it apt for large biological image datasets. STIM [22] extends ImgLib2 [54] functionalities to enable efficient loading, processing, and visualization of ST data. STUtility [21] processes H&E images by downsizing, thresholding, blurring, and clustering them into superpixels. For normalization, it initially filters out genes with low abundance and spots lacking sufficient information. Subsequently, it applies normalization techniques and decomposes the data into factor activities using NMF. STUtility [21] identifies gene drivers based on the top-ranked features and utilizes the resulting low-dimensional representation for clustering and spatial autocorrelation analysis.

Image registration is a commonly used technique in biomedical imaging, extensively used to align two or more images of the same subject captured at different times, from varying viewpoints, or by distinct sensors [55]. This technique plays a pivotal role in aligning histology images with ST data, facilitating the interpretation and analysis of the spatial organization of gene expression within tissues. Landmark-based (LB) image registration techniques require pre-identified landmarks or markers in both images or datasets being aligned [56]. These landmarks serve as reference points to guide the alignment process, ensuring that corresponding spatial features across datasets are matched correctly. While LB alignment can enhance alignment precision, it requires additional pre-processing to identify landmarks, which can be time-consuming and may introduce bias if landmarks are chosen incorrectly. STUtility [21] uses pre-defined landmarks for image registration. It aligns tissue sections using the Iterative Closest Point (ICP) algorithm and maps pixel coordinates between samples and a reference image. The aligned images are transformed for smooth alignment. This comprehensive approach allows for a holistic analysis of gene expression patterns within tissue. However, the availability of landmarks for ST datasets is not guaranteed.

Landmark-free (LF) image registration techniques does not rely on predefined landmarks [55]. Instead, it uses algorithmic methods to automatically identify correspondences between datasets based on features. LF methods are more flexible and less labor-intensive but may face challenges in accurately aligning datasets with subtle or complex spatial variations or when integrating gene expression data. For example, CODA [57] employs nonlinear image registration to align slices and reconstruct tissue micro-environments at subcellular resolution, but does not incorporate gene expression data, which excludes it as an ST tool. Nonetheless, it lays the foundation for 3D reconstruction of large tissues. STIM’s [22] spatial alignment tasks rely on linear transformations and RANSAC for robust model estimation. It employs GP regression to transfer observed spatial features to a defined reference system without relying on predefined landmarks. STaCker (Spatial Transcriptomics Common coordinate builder) [23] is another landmark-free method which employs a deep learning architectrue. It uses a U-Net backbone with skip connections trained in a tissue-agnostic manner using synthetic data. The loss function, based on the Dice score between reference and moved label maps, prioritizes aligning cell regions over achieving precise cell-level alignment. It harnesses the spatial arrangement of tissue spots or cells derived from transcriptome data, represented as a contour map, to guide the alignment process. STalign [12] utilizes diffeomorphic metric mapping, a landmark-free image registration technique which finds a smooth mapping between all voxels in the images. The process begins with rasterization, converting vector graphics into a raster image, followed by representing the spatial locations of individual cells using varifold measures. Gaussian kernels are then applied to smooth out the spatial distribution of cells, reducing noise and refining spatial relationships. It solves the mapping that minimizes dissimilarity between images, guided by an objective function comprising regularization and matching terms. This objective function incorporates a Gaussian mixture model that groups similar data points based on their inherent structure to decompose the spatial data into matching components, background noise, and artifacts of the image. Structural dissimilarities are further accounted for through an image contrast function, which considers cell density variations and imaging modality differences while reinforcing the matching terms. The parameters of the objective function are optimized using steepest gradient descent over multiple epochs. Finally, STalign [12] applies the computed transformation to the source’s original cell positions to generate aligned coordinates.

Graph-based

Various methods define the input ST data as a mathematical graph, where each spot in a tissue slice corresponds to a node, and the spatial relationships between these spots are represented as edges. These methods first need to construct the neighborhood graphs. For example, GraphST [39] constructs an undirected neighborhood graph G = (V, E), where V represents the set of spots and E is the set of edges connecting these spots. The neighbors of a spot are defined based on its proximity to other spots computed from the spatial location information using Euclidean distance.

GB algorithms offer several advantages for handling complex ST data. First, it is useful for high-dimensional and multi-modal ST data. GB encoders can effectively capture and preserve the inherent features of ST data while transforming it into a latent representation. Autoencoders, which consist of an encoder and a decoder, can be employed for generating low-dimensional gene representations, as demonstrated by SpatiAlign [17]. Also, attention mechanism can enhance latent representation learning by focusing on relevant information so that the feature extraction process can disregard irrelevant or noisy data and emphasize on the spatial relationships. STAligner [10] and BiGATAE [48] utilize attention autoencoder to extract spatially aware embeddings, capturing the spatial relationships between spots. Similarly, Graspot [19] employs a graph attention network (GAT) to learn embeddings of spots from the spatial neighbor network using a self-attention mechanism.

Second, from these autoencoder-based architectures, it is possible to reconstruct the original input data from the latent space using a decoder. The primary purpose of this reconstruction is to ensure that the autoencoder is able to learn a meaningful representation of the input data in the latent space. By comparing the reconstructed output with the original input, the autoencoder learns to minimize the reconstruction error, which forces it to capture the most important features of the data in the latent space. SpatiAlign [17] and Graspot [19] use the decoder network for gene expression reconstruction. SpatiAlign’s [17] decoder reverses aligned representations into raw gene expression space, using gene expression profiles and spatial neighboring graphs. Graspot [19] employs a GAT module to reconstruct original gene expression profiles, minimizing reconstruction loss. Autoencoders have also proven to be effective in single-slice ST data analysis. For instance, STAGATE [2] can extract features from low-dimensional embeddings generated by the encoder, reconstruct the original input, and optimize performance through comparison. These low-dimensional embeddings were later utilized to identify spatial domains within a single slice. Building on this approach, tools like STAligner [10] and Graspot [19] adopted the same autoencoder-based architecture to align and integrate multiple input slices. Figure 5 illustrates a general workflow for applying autoencoders in ST alignment and integration tasks.

Figure 5.

Figure 5.

A general workflow for using autoencoders in ST alignment and integration with multiple slices. The process begins by modeling each input slice as a spatial network based on an adjacency matrix derived from spatial coordinates and Euclidean distances within a pre-defined radius. The encoder extracts features by iteratively refining edge weights and consolidating spatial and gene expression information, while optionally applying attention mechanisms to prioritize important spots. This produces low-dimensional embeddings, which are passed to two components: (i) a decoder that reconstructs input slices and calculates the reconstruction cost, and (ii) a transport cost module that evaluates the cost of mapping spots across embeddings. Both costs are minimized together to optimize the embeddings and mapping plan. Finally, aligned spots are integrated into a CCS, preserving spatial relationships and gene expression patterns for downstream analyses.

Third, this latent space can be used for feature learning using contrastive learning with spatially aware loss functions. For instance, SpatiAlign [17] incorporates augmentation-based contrastive learning to exploit potential information in ST datasets, using a memory bank to store final latent representations for each dataset and measuring similarity between spots/cells for self-batch/across-batch contrastive learning. STAligner [10] and ATAT [13] employ triplet loss functions. STAligner [10] constructs spot triplets based on these embeddings, consisting of anchor-positive and anchor-negative spot pairs. Anchor-positive pairs are mutual nearest neighbors with similar gene expressions from different slices, while anchor-negative pairs are from the same slice with different spatial positions and dissimilar expressions. The triplet loss encourages the reduction of the distance between anchor and positive spots while increasing the distance between anchor and negative spots. By iteratively optimizing the graph attention autoencoder training and triplet construction, STAligner [10] generates batch-corrected embeddings that align and integrate the ST datasets to identify spatial domains across diverse tissue slices. ATAT [13] learns a spatially aware representation of each image tile centered on the ST spots using a convolutional neural network (CNN) with triplet loss. This representation enables the algorithm to identify the shortest path between user-specified starting and endpoint on the slide based on similarity scores for adjacent tiles.

Fourth, GB algorithms are particularly suitable for finding OT cost minimization. The objective is to identify an alignment that minimizes a defined cost function, which often represents the spatial distance between corresponding nodes (spots) in different slices. This approach leverages the inherent spatial relationships within the data to establish meaningful correspondences between spatial locations, facilitating the integration and analysis of gene expression patterns across multiple tissue slices. Graph matching algorithms can serve better observation on the correspondence as it preserve the spatial relationship between each spots. SLAT [15] and BiGATAE [48] focus on aligning datasets by minimizing the cost of a bipartite matching problem of their spatial graphs. SLAT [15] utilizes a lightweight graph-convolutional network to integrate molecular and spatial information, and adversarial alignment to learn cell embeddings that minimize the Wasserstein distance for graph matching. It also provides coordinate matching options for aligning spatial coordinates, with quality assessment and probabilistic matching to evaluate cell matches. SPIRAL [24] uses cluster-aware Gromov–Wasserstein OT for batch effect removal and coordinate alignment. It consists of two modules, SPIRAL-integration, and SPIRAL-alignment, which work together to correct batch effects and align spatial relationships using GraphSAGE network and domain adaptation. SPIRAL-alignment constructs Common Coordinate Matrices (CCMs) using cluster-aware Gromov–Wasserstein distance, assigning spots or cells to positions in the reference sample’s coordinate system based on shared cluster spots. The UOT module of Graspot [19] method computes the unbalanced OT cost between spots in different slices based on their latent vectors, aiming to align the slices in a common low-dimensional space. The total loss minimized by Graspot [19] combines the reconstruction loss and the UOT loss, with a trade-off parameter controlling their relative importance.

Another approach to find the correspondence between each nodes in different graphs (i.e. different tissue slices) is to use graph adversarial learning. In graph adversarial learning, a discriminator is trained to distinguish between embeddings from the original graph and embeddings from a modified graph, which encourages the model to learn embeddings that preserve the graph structure. This approach can be applied to find correspondences between nodes in different graphs by using the discriminator to compare the embeddings of nodes in different graphs and identify nodes that are likely to correspond to each other. SPACEL [25] employs an adversarial learning algorithm to learn latent features shared across multiple ST slices. Splane module of SPACEL [25] employs a graph convolutional network (GCN) and an adversarial learning algorithm to identify spatial domains by jointly analyzing multiple ST slices.

Experiments

To summarize the experimental designs demonstrated by various tools, we first highlight a selection of datasets that are frequently utilized in ST. Table 3 provides an overview of these datasets that use different technologies, spatial resolution, input materials, and available species. The chemistries used in ST data acquisition methods also differ based on input material compatibility. For example, fresh frozen tissues commonly utilize oligo-dT primed libraries for capturing polyadenylated transcripts, enabling whole-transcriptome analysis, whereas formalin-fixed paraffin-embedded (FFPE) samples typically rely on probe-based chemistries that are designed to detect targeted gene sets even in the presence of RNA degradation which happens in FFPE samples [58]. Fixed cells are often compatible with hybridization-based methods, which enable high-resolution spatial mapping of transcripts. It is also important to emphasize the critical role of pathology labs or individuals with expertise in tissue orientation and sectioning, as the success of these technologies depends heavily on proper sample preparation. For instance, some technologies require 8–10 μm tissue thickness, which may lead to challenges such as cells overlapping within sections. This overlap can obscure spatial relationships, necessitating the use of bioinformatics tools designed to deconvolute mixed signals and infer underlying cellular compositions. To the best of our knowledge, no computational tool currently exists to specifically address this challenge. This presents a potential opportunity for further research and development in this area.

Table 3.

Overview of spatial transcriptomics technologies

Technology Resolution (μm) Frequently used dataset(s) Input material Available species
10x Genomics Visium [30] 55 Human Brain [47], Breast Cancer & Development Heart, Mouse Olfactory Bulb [30] & Brain Serial Section [31] Fresh Frozen Human, Mouse
Slide-seq [40] 10 Mouse Hippocampus [40] Fresh Frozen Mouse
Slide-seqV2 [32] 10 Mouse Hippocampus [32] Fresh Frozen Mouse
MERFISH [41] 0.1 Mouse Brain [41] Fixed Cells Human, Mouse
SeqFISH+ [59] 0.6 Mouse Embryonic Development & Brain [59] Fixed Cells Mouse
Stereo-seq [43] 0.5 Mouse Embryo, Human Brain [43] Fresh Frozen Mouse
GeoMX DSP [58] Varies Human Breast Cancer & Lymph Node, Mouse Brain [58] FFPE or Fresh Frozen Human, Mouse
Xenium [42] 0.2 Humain Breast Cancer [42] FFPE or Fresh Frozen Human, Mouse
Visium HD [60] 2 Human Breast Cancer [60] FFPE or Fresh Frozen Human, Mouse
HDST [61] 2 Mouse Brain and Breast Cancer [61] Fresh Frozen Human, Mouse
STARmap [62] 2 Mouse Brain [62, 63] Fresh Frozen Human, Mouse

In this section, we revisit the previously defined problem, its scope and inherent challenges to explore various experiments reported in recently proposed tools. We analyze how the implementation of different components within the general pipeline enhances the overall alignment and integration process by identifying the problem definition, scope and challenges. Subsequently, in the next section, we explore how authors have leveraged the outcomes of these alignment and integration tasks to enhance real-world applications and downstream analyses, thereby advancing biological insights.

Within dataset and technologies

Aligning and integrating multiple tissue slices within the same dataset, or technology can significantly enhance the spatial resolution and gene expression coverage. Such alignment and integration strategies can be utilized to stack tissue slices on top of each other, thereby creating a 3D holistic view of gene expression patterns within the tissue [7]. Homogeneous alignment within the dataset or technology involves the inclusion of consecutive slices originating from the same tissue section, ensuring a consistent spatial context for analysis. These slices are expected to correspond to identical spatial regions within the tissue, facilitating easy alignment and integration and enabling the formation of a coherent 3D view. Since the slices are from the same dataset or technology, they are likely to have been processed using the same experimental protocols and technical configurations, minimizing methodological discrepancies and reducing the inherent variability of the ST data. The possibility of a full overlap between slices further enhances the alignment process. One commonly used dataset named the human DorsoLateral PreFrontal Cortex (DLPFC) dataset [47], utilized the 10× Genomics Visium platform to produce ST data in the six-layered dorsolateral prefrontal cortex of the adult human brain. This dataset has been used by various alignment and integration tools [7, 10, 17] to demonstrate their effectiveness of homogeneous alignment within dataset. For example, PASTE [7] considered all 12 tissue slices from three adult samples, with four adjacent slices for each sample to compute pairwise slice alignments for each pair of consecutive slices. It annotates each spot as white matter or one of six neocortical layers and matched with the ground truth annotation provided by DLPFC [47]. PASTE [7] also stacked them on top of each other to reconstruct the 3D holistic view of the adjacent slices. SpatiAlign [17] and STAligner [10] also considered similar configuration of DLPFC [47] dataset. But they clustered similar spots as different regions of cortex layer, then measure the similarities between the annotated clusters and the ground truth clusters with adjusted rand index (ARI). SpatiAlign [17] also considered mean weighted F1 score of the local inverse Simpson’s index (LISI). Having better ARI and mean weighted F1 scores suggest that the clustering method is more effective in producing clusters that align well with the ground truth clusters and preserve the spatial relationships between data points.

Heterogeneous alignment within a dataset, or technology allows for the inclusion of tissue slices from the same dataset but differing in spatial regions within the tissue, which may partially or fully overlap. To find the mappings between slices from different spatial regions, often some pre-processing steps are performed such as translation, rotation of the original slices which can be manually or automatically done to initially match the source and target slice. Such alignment enables the examination of gene expression across diverse anatomical structures or micro-environments. Unlike homogeneous alignment, heterogeneous alignment samples might be collected using different experimental protocols and technical configurations. This flexibility accommodates the inherent heterogeneity of the dataset and requires the alignment and integration process to be more robust. Moreover, while tissue slices in heterogeneous alignment may be sourced from the same tissue section, they are not limited to consecutive slices. This means that nonconsecutive tissue slices, obtained from different locations or time-points, can be included in the analysis. STalign [12] aligned nine coronal slices representing three biological replicates and spanning three locations with respect to bregma, using MERFISH [41] technology. They effectively reduced spatial dissimilarities between slices caused by biological and technical variations. The alignment accuracy was evaluated by comparing the positions of manually identified structural landmarks before and after alignment. STalign [12] consistently reduced the root-mean-square error (RMSE) between landmarks compared to a supervised affine transformation, indicating higher alignment accuracy.

Stereo-seq [43] dataset sampled mouse embryo slices at different time stages. This enabled the possibility to align and integrate these slices in order to investigate spatiotemporal heterogeneity during mouse development. STAligner [10] and SpatiAlign [17] performed alignment and integration over Stereo-seq [43] dataset by considering four mouse embryo slices sampled at different time stages to examine development during mouse organogenesis. This is challenging because the alignment and integration has to offer a common embedding space, even the slice size are different and there is presence of other batch effects, as well as it has to be consistent alignment of spatial domains throughout the different time-point or stages. STAligner [10] and SpatiAlign [17] effectively aligned and integrated the slices from different time-points which opened the door for further downstream analysis by using the developmental trajectory. Slide-seq [40] dataset provides mouse brain tissue slices with heterogeneous characteristics. They are collected from different regions of the mouse hippocampus [40], and exhibit partial overlap between slices and low transcript expression [10], which makes the alignment and integration task more challenging. STAligner [10] also demonstrates the automatic registration of seven consecutive mouse hippocampus slices using Slide-seq [40], where they were able to align adjacent slices sequentially with translation and rotation along the z-axis and reconstructed the 3D shape of mouse hippocampus tissue section. SpatiAlign [17] also utilized three mouse hippocampal slices from the same dataset to evaluate the performance of integration using LISI scores and clustering structural heterogeneity. Hierarchical clustering validated spatiAlign’s [17] effectiveness in identifying brain regions, showing strong spatial aggregation with clear boundaries and consistency with anatomical structures. SpatiAlign [17] was used to identify substructures of the hippocampus, including CA1, CA2, and dentate gyrus, across all three slices.

Across dataset & technologies

When aligning tissue slices from different datasets or technologies, it is important to consider that they can be from the same tissue section but may exhibit slight differences in spatial regions and technical configurations due to variations in protocols or other factors, even though it is homogeneous alignment. So, the goal is to achieve maximum overlap between the datasets, ensuring that corresponding features in the tissue are aligned accurately. SLAT [15] reported benchmarking results against PASTE [7] and STAGATE [2] to show their performance on homogeneous alignment across datasets. They have used consecutive slices from the same tissue generated by three representative technologies: 10× Visium [30], MERFISH [41], and Stereo-seq [43]. In their result, SLAT [15] exhibits higher alignment accuracy in recovering correct cell matching in expert-curated cell types and spatial regions. STAligner [10] and SpatiAlign [17] considered mouse olfactory bulb slices from a different dataset. STAligner [10] was applied to integrate two mouse olfactory bulb slices produced by Slide-seqV2 [40] and Stereo-seq [43] platforms, revealing clear tissue structures but substantial batch effects. Results showed that STAligner [10] successfully deciphered the known tissue structures according to annotated laminar structures and the Allen Mouse Brain Atlas. SpatiAlign [17] aligned three mouse olfactory bulb datasets from 10× Genomics Visium [30] and Stereo-seq [43]. They reported spatial domains by identifying clusters and reporting F1-score. They outperformed PRECAST [20] in their study.

Heterogeneous alignment across datasets which include tissue slices originated from the same or different tissue sections, usually exhibit variations in spatial regions. Such datasets are typically derived using different protocols or technical configurations, leading to partial overlap rather than complete overlap. This challenge is particularly evident when working with nonconsecutive tissue slices, which could be from different locations, tissues, or even different time-points. For example, 10× Visium [30] technology achieves a spatial resolution of 55 μm, with each capture site (spot) containing ∼1–10 cells. Slide-seq [40] enhances the spatial resolution to nearly cellular level (10 μm), while Stereo-seq [43] achieves sub-cellular resolution (0.22 μm) [10]. Various tools demonstrated their alignment efficiency with different scale or resolution. SLAT [15] such experiment using seqFISH [59] and Stereo-seq [43] datasets that have differences in scale, detectability, and annotation resolutions. SLAT [15] also reported cross-scale alignment on 10× Visium [30] and Xenium [42] slices where they identified rare cell groups. STalign [12] was also applied to a single-cell resolution ST dataset assayed by MERFISH [41] and a multi-cellular pixel resolution ST dataset assayed by Visium [30]. These datasets represented partially matched tissue sections. Incorporating manually placed landmarks to initialize the alignment, STalign [12] aligned the datasets, resulting in high spatial gene expression correspondence, despite differences in resolution and detection efficiency between the technologies. The study also assessed cell-type spatial correspondence, identifying putative cell types and matching them based on transcriptional similarity.

Applications

ST data alignment and integration allows precise tissue profiling with extensive gene expression coverage, which plays crucial role in identifying distinct cell types and their interactions within tissues. Such detailed profiling enhances the understanding tissue dynamics in developmental biology. These insights enable researchers to track developmental changes accurately and prepare drug development processes by targeting specific molecular pathways within exact tissue locations. Moreover, the identification of spatial biomarkers through this detailed profiling supports early diagnosis and effective disease monitoring. This section elaborates on the applications as discussed in the reviewed literature.

Tissue profiling

Integrative analysis provided significant improvement in the downstream analysis. Tissue profiling could benefit from the large gene expression coverage of lowly expressed regions. 3D holistic view could identify the spatial relationships well to understand the biological systems. On the DLPFC [47] dataset, PASTE’s [7] integrated slice demonstrates its ability to recover known marker genes in an unsupervised manner. This highlights PASTE’s [7] capability to identify biologically relevant gene expression patterns without the need for prior knowledge, further emphasizing its utility in ST data analysis. STalign [12] exploits variations in cell densities that form visible structures, enabling alignment across samples and animals, particularly for tissues with highly prototypic structures like the brain. STalign’s [12] applicability extends to diverse ST technologies, as demonstrated by achieving structural correspondence for partially matched slices of the adult mouse brain assayed by different single-cell resolution ST technologies (such as Xenium [42]). For tissues with more inter-sample and inter-animal variation, alignment across serial sections is still achievable. For instance, STalign [12] can align serial sections of the developing human heart assayed at single-cell resolution with in situ sequencing (ISS). ATAT [13] leveraged ST data from 12 colon and 4 stomach samples, and aligned tissue tiles based on visual similarities to study gene expression in relation to tissue landmarks. This method effectively distinguished healthy areas from those affected by disease, enhancing understanding of tissue architecture. The results highlighted distinct gene expression patterns aligned with specific tissue layers and conditions, affirming its capability to capture critical spatial gene expression dynamics and providing insights into the molecular mechanisms governing tissue health and disease.

Cell-type clustering

PASTE [7], PASTE2 [14], and SLAT [15] showed improved identification of cell types or regions from the integrated tissue section. PASTE [7] shows promising results in clustering analysis. It produces more spatially coherent gene expression clusters on the SCC dataset, indicating its ability to group genes based on their spatial relationships. Similarly, on the DLPFC [47] dataset, PASTE [7] generates more accurate clustering results compared to scRNA-seq integration methods that do not utilize spatial information. This suggests that PASTE [7] is effective in leveraging spatial information to improve clustering accuracy. BiGATAE [48] outperformed various single-slice clustering methods on the DLPFC dataset [47]. Splotch [28] uses the posterior estimates of gene expression to infer expression levels of genes obtained from Bayesian modeling approaches. In the analysis of the spinal cord dataset, it is noted that each co-expression module is composed of multiple cell types. To further investigate the cell-type components in the modules, published cell-type level expression data are used. By detecting distinct expression patterns of genes in each co-expression module based on cell-type level data, submodules are identified. It is then determined which submodules show cell-type-specific expression, indicating genes that are specifically expressed in certain cell types, such as astrocytes.

Developmental stages

Annotated tissues by STAligner [10] from their experiment with mouse embryo slices from Slide-seq [40] were well characterized with known marker genes, confirming the accurate identification of major tissues and organs. STAligner [10] also detected changes in tissue proportions over the developmental period, such as the increasing size of the liver and proportions of hindbrain and muscle. SpatiAlign [17] also effectively aligned three mouse hippocampal slices from Slide-seq [40] by removing batch effects and enabling clustering and annotation of cell types. It identified marker genes for specific cell types. Differential gene expression analysis revealed stage-specific genes involved in neuronal differentiation and maturation. GO enrichment analysis highlighted developmental events at different stages, such as neurogenesis and synaptic plasticity. Trajectory analysis showed a linear developmental trajectory and clear transition paths across cell types, respectively. DeST-OT [18] demonstrated its application in studying telencephalon development in axolotl (Ambystoma mexicanum). They measured accuracy of alignment of ST data across multiple developmental stages and compared performance with PASTE [7] to capture transitions from progenitor to mature cell types with high fidelity. Graspot [19] aligned slices from human embryonic hearts at 4.5 to 9 post-conception weeks and highlighted transformative changes in spot distribution and heart structure.

Disease progression

Identifying spatial variability and patterns within tissue profiles, both in healthy and diseased conditions, through enhanced integrative tissue analysis can provide critical insights into the progression of diseases across biological systems. ATAT [13] analyzed ST data from patients with ulcerative colitis (UC) and Crohn’s disease (CD), aligning tiles using visual cues from H&E-stained images to map gene expression patterns. This alignment facilitated a precise molecular understanding of disease progression. Their findings showed crucial insights into the differential inflammatory processes and cellular dynamics characterizing each condition. GPSA [16] was applied to integrate four slices of breast cancer tumor from a dataset provided by 10× Visium [30]. The analysis revealed substantial variability in gene expression for genes like PRSS23 and CST4, which are associated with tumor progression.

Drug development

ATAT [13] effectively mapped the progression of gastrointestinal diseases which also facilitates drug discovery by accurately aligning tissue tiles from healthy and disease conditions and uncovers key molecular mechanisms and inflammatory processes. The analysis of variance demonstrated by GPSA [16] identified genes that are involved with key oncogenic pathways, notably MYC targets and KRAS signaling. This enables the identification of specific targets for developing more effective.

Biomarker discovery

The analysis conducted by ATAT [13] demonstrated the potential to facilitate the identification of distinctive biomarkers that distinguish UC from DC. UC primarily affects the superficial layers of the colon and DC involves all tissue layers. Also, identifying marker genes as demonstrated by PASTE [7], PASTE2 [14], and GPSA [16] can help identifying different biomarkers.

Discussion

Alignment and integration task play an important role in improving the quality of ST data for downstream analysis, hence improving the capabilities to get a better understanding and biological insights. The alignment of multiple tissue slices within and across datasets can enable accurate comparison of gene expression patterns across spatial locations. With the aligned tissue slices, we can get a 3D view of the gene expression patterns which also enables possibilities to explore different mechanisms. Downstream analysis such as cell-type clustering, cellular trajectory analysis and developmental growth analysis can largely benefit from this holistic view of the whole tissue section. This can also be used to build a tissue atlas for enabling analysis on the whole organ. The possibilities are endless because we are aware of the interconnectedness of spatial organization and biological systems, which work together to maintain the function of the entire ecosystem within an organism’s body.

But the ST alignment and integration task is not as simple as stacking one slice over another. Rather, it requires significant computational effort and optimization to find the correct correspondence between each spot from one slice to another. This requires high-dimensional data handling and pre-processing, which is a very important step in the general pipeline to solving this problem. We have seen that SM-based tools are using fundamental statistical methodologies such as PCA and UMAP to get low-dimensional embeddings from the data. Performing PCA [20] can be challenging for complex multi-modal data such as ST and may struggle with nonlinear relationships. On the other hand, IPR-based tools are using image processing techniques (such as masking, super-pixels [21]) to reduce the dimensionality and extract features from the ST data. While pre-processing the image for feature extraction, important information can sometimes be lost. For example, Super-Pixel technique groups similar pixels that might not be as effective for highly heterogeneous regions if it combines similar, but distinct cell types or gene expression patterns. GB approaches provide encoders to learn the latent features from raw ST data [10, 17]. They can preserve spatial relationships and deal with nonlinear relationships, yet require extensive computation and training data. They also rely on fine-tuned hyper-parameters.

ST data is inherently heterogeneous as it may contain spots from different spatial regions in each slice as well as these slices have some sort of correspondence between them. So, it is an important step to understand the relationships between spots. Autocorrelation in SM-based tools plays an important part in learning this correspondence between spots. They have utilized a CAR model to extract the spatial dependencies and cluster the related regions based on that. We can call this step as SR. Some SM-based methods have dealt with SR by incorporating landmarks, specifying the important features and then finding the relationship between them. GP regression was used by some SM-based methods to incorporate the landmark information as prior knowledge of the distribution. LB IPR-based tools have also used similar kind of LB SR tasks with image processing techniques such as SIFT, Gaussian smoothing etc. But identifying landmarks requires manual input. GB approaches have their own advantages to learn the spatial relationships and heterogeneity as they can effectively preserve the actual relationship in the graph representations with nodes and edges. So they do not require any explicit step for SR. Autoencoder-based GB approaches [10, 15, 17, 19] can also utilize the decoder to reconstruct the original input data from latent space and minimize the cost of finding the correspondence between spots from different slices. Learning about spatial heterogeneity and dependencies plays an important role, especially in heterogeneous alignment within or across datasets as it reveals inherent batch effects. Batch effects can occur due to technical and biological variations. First, the tissue slices can represent different regions of the tissue section. So, we need to identify those similar regions by analysis of variance within the tissue slice. Second, the input tissue slices may have partial overlap based on those regions identified which makes it more challenging to align and integrate them. For instance, adjacent tissue slices often exhibit better overlap than tissue slices that are farther apart [7]. Third, across dataset alignment is likely to involve data collected at different time points, using different protocols and technical configurations. There are essential factors when dealing with batch effects.

We have observed that finding mappings between low-dimensional representations of different tissue slices can be addressed using only statistical methods [7], image processing techniques [12], or graph algorithms [15], in some cases with mixture of these [12]. SM-based tools generally estimates the correspondence between distributions (tissue slices) or infer one distribution from another using prior knowledge. This approach is particularly useful for removing outliers and managing the inherent noise and sparsity of ST data. It also helps preserve spatial relationships and addresses heterogeneity within the data. In addition, this approach can be applied for homogeneous or heterogeneous alignment within or across datasets. However, the distribution-based approaches require prior assumptions to ensure correctness and incorrect assumptions can lead to inaccurate mappings. This is also related to hyperparameter selection, as the results can be sensitive to changes in these hyperparameters. This also may lead to over/under-smoothing of the data and may result in incorrect distribution of cell-types or regions. Although, tools like GPSA [16] deals with nonlinear mappings by using GP, SM-based tools can be sensitive with deformation and nonlinearity which may make it difficult to be more robust in heterogenous alignment across the datasets where batch effects are crucial. IPR-based methods need to adjust the target image (tissue slice) according to the reference image. This involves additional pre-processing steps such as rotations, warping and other adjustments, and may also require manual adjustments. Examples in STalign demonstrate rotating images [12] prior to starting the alignment process. IPR-based tools use image processing techniques to align the slices; however, they still need to employ statistical methods such as Gaussian Mixture Modeling and other tools to accommodate coherent and smooth alignment and integration that preserves spatial relationships and gene expression. GB tools [10, 15, 17, 19] use graph matching and adversarial algorithms, which are more robust as they can effectively update the cost functions and optimize iteratively by constructing and reconstructing the shared latent space. However, these methods are computationally expensive due to the large number of parameters within their architectures, which require significant resources for processing and optimization. They also require hyper-parameter fine-tuning to ensure efficiency. GB well preserves the spatial context and it can deal with the nonlinear partial alignment problem. Training can sometimes demand substantial data and computational power. In addition, GB algorithms can reserve prior knowledge which can be highly beneficial for preserving the spatial context. Attention-based algorithms [19] are highly efficient to learn latent features from the spatial context and this can be leveraged to robustly align and integrate slices within or across datasets, individuals, or modalities.

The papers reviewed included robust applications of ST data alignment and integration such as improved tissue profiling including gene expression analysis, differential gene analysis, identifying cellular trajectory, identifying gene expression patterns, and biological processes in different time points. Integrated ST data have also been used to identify regions and cell types. These types of data can be useful to identify growth of different tissue sections or organs in different developmental stages. Furthermore, by leveraging enhanced tissue profiling, cell-type clustering and developmental biology insights, integrative analysis of multiple tissue slices can enhance our understanding of disease progression and biomarker discovery. This comprehensive approach not only informs the development of more effective drugs but also supports the advancement of precision medicine, tailoring treatments to individual patient needs based on specific molecular profiles.

Despite their robustness and promising applications, ST data alignment and integration have yet to reach their full potential. While progress has been made in aligning and integrating heterogeneous tissue slices across datasets, challenges remain in achieving automatic alignment that is independent of dataset origins, experimental conditions, or technical configurations, relying solely on tissue context. Such advancements could enable the construction of spatial atlases for entire tissues, organs, or organisms. However, heterogeneity, technical variability, and biological differences arising from developmental stages or disease progression often lead to misalignment in soft tissues. Additionally, scalability poses a major hurdle, as existing methods struggle with large datasets and lack efficient strategies for accurate volumetric reconstruction.

Furthermore, while histology images are often included as auxiliary input in ST alignment and integration, very few existing tools effectively leverage their full potential. Histological images provide valuable spatial context, capturing fine-grained tissue morphology, cellular architecture, and structural landmarks that can aid in multi-slice alignment. Despite their widespread use in computational biomedical imaging [64, 65], such as tomography, microscopy, and MRI, their application in ST remains underutilized. Advanced 3D image reconstruction techniques [55, 66–70], including tomographic reconstruction, morphological interpolation, and deep learning-based generative models, have the potential to enhance spatial coherence across tissue slices by inferring missing spatial relationships and correcting distortions caused during tissue sectioning. Integrating histology-based 3D reconstruction into ST pipelines could enable more precise spatial transcriptomic mapping, facilitate cross-slice alignment, and improve biological interpretability by correlating morphological features with gene expression patterns. However, this remains an open challenge, as current ST tools primarily focus on spot-based gene expression alignment, without fully capitalizing on the rich spatial and structural information embedded in histology images. Addressing this gap could significantly advance ST integration by creating more accurate, high-resolution spatial atlases of tissues, organs, and even whole organisms.

Based on our extensive evaluation of architecture, robustness and scope, we recommend the class of autoencoder-based frameworks such as STAligner [10], SLAT [15], SpatiAlign [17], Graspot [19], and BiGATAE [48] for ST alignment and integration task because of their ability to handle heterogeneity, technical variability, and biological differences. This architecture is well-suited for modeling nonlinear relationships and handling high-dimensional data. Autoencoders effectively learn low-dimensional embeddings that preserve both spatial organization and gene expression patterns, enabling seamless alignment across slices while accounting for variability in protocols, resolutions, and tissue morphology. For instance, heterogeneous alignment and integration across multiple slices from different time points using the same mouse embryo dataset [43] were successfully demonstrated by STAligner [10] and SpatiAlign [17], showcasing their ability to extract biologically meaningful features while managing batch effects and other sources of variability. SLAT [15] further demonstrated robustness and flexibility by aligning and integrating multiple slices from two different datasets (seqFISH [59] and Stereo-seq [43]) collected at different time points. These tools also demonstrated their effectiveness through downstream analyses; for instance, SLAT [15] performed cross-scale alignment on 10× Visium [30] and Xenium [42] slices, successfully identifying rare cell groups. Additionally, their scalable design and capacity for processing large datasets and 3D tissue reconstructions position them as robust and versatile solutions for advancing ST alignment and integration, ultimately contributing to the construction of spatial atlases of tissues, organs, and organisms.

Conclusion

In conclusion, ST data acquisition techniques have unlocked vast possibilities in biological research, yet their full potential remains untapped. ST alignment and integration plays a crucial role towards this goal. In this review, we have emphasized on the importance of this task by discussing the problem at its core and significance in real-life applications. We have reviewed recently proposed tools to solve this problem and presented a comprehensive overview of the approaches that outlines the most essential steps within the general pipeline. We discussed their general methodologies and outlined challenges, most importantly their limitations. While this review serves as a detailed guide and highlights the potential for robust ST alignment and integration solutions, it remains theoretical in nature and does not provide a proof of concept. Instead, it lays the groundwork for future methodological advancements and practical implementations. Moving forward, an ideal solution should be a versatile tool capable of aligning and integrating tissue slices with shared gene expression patterns, regardless of data collection techniques, datasets, or tissue regions. Integrating multi-omics spatial data, including spatial proteomics and metabolomics, alongside ST could provide more insightful biological context for understanding complex molecular interactions. Another key direction is improving cross-species ST alignment and integration techniques, allowing the comparison of gene expression patterns across model organisms and human tissues for better translational research. Additionally, developing methods that incorporate single-cell resolution data while preserving spatial context will further improve alignment precision. Moreover, ensuring standardization and reproducibility in ST data alignment through benchmark datasets and evaluation frameworks is crucial for advancing the field.

Acknowledgements

Author contributions: M.K.: Conceptualization, Investigation, Writing – Original Draft, Writing – Review & Editing, Visualization. S.A. and S.D.: Supervision, Writing – Review & Editing, Project Administration, Conceptualization.

Contributor Information

Muiz Khan, Department of Computer Science, Wayne State University, Detroit, 48202 Michigan, United States.

Suzan Arslanturk, Department of Computer Science, Wayne State University, Detroit, 48202 Michigan, United States.

Sorin Draghici, Department of Computer Science, Wayne State University, Detroit, 48202 Michigan, United States; Advaita Bioinformatics, Ann Arbor, 48105 Michigan, United States.

Conflict of interest

None declared.

Funding

No funding was received for this study. Funding to pay the Open Access publication charges for this article was provided by the National Science Foundation (NSF).

Data availability

No new data were generated or analyzed in support of this research.

References

  • 1. Elosua-Bayes  M, Nieto  P, Mereu  E  et al.  SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 2021; 49:e50. 10.1093/nar/gkab043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Dong  K, Zhang  S  Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022; 13:1739. 10.1038/s41467-022-29439-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Dries  R, Zhu  Q, Dong  R  et al.  Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 2021; 22:1–31. 10.1186/s13059-021-02286-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Pham  D, Tan  X, Xu  J  et al.  Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat Commun.  2023; 14:7739. 10.1038/s41467-023-43120-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Hu  J, Li  X, Coleman  K  et al.  SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021; 18:1342–51. 10.1038/s41592-021-01255-8. [DOI] [PubMed] [Google Scholar]
  • 6. Zhao  E, Stone  MR, Ren  X  et al.  Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021; 39:1375–84. 10.1038/s41587-021-00935-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zeira  R, Land  M, Strzalkowski  A  et al.  Alignment and integration of spatial transcriptomics data. Nat Methods. 2022; 19:567–75. 10.1038/s41592-022-01459-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Liu  Y, Yang  C  Computational methods for alignment and integration of spatially resolved transcriptomics data. Comput Struct Biotechnol J. 2024; 1094–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hu  Y, Xie  M, Li  Y  et al.  Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. Genome Biol. 2024; 25:212. 10.1186/s13059-024-03361-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Zhou  X, Dong  K, Zhang  S  Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat Comput Sci. 2023; 3:894–906. 10.1038/s43588-023-00528-w. [DOI] [PubMed] [Google Scholar]
  • 11. Xu  H, Fu  H, Long  Y  et al.  Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024; 16:12. 10.1186/s13073-024-01283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Clifton  K, Anant  M, Aihara  G  et al.  STalign: alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat Commun. 2023; 14:8123. 10.1038/s41467-023-43915-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Song  S, Mohsin  E, Zhang  R  et al.  ATAT: automated tissue alignment and traversal in spatial transcriptomics with self-supervised learning. bioRxiv10 December 2023, preprint: not peer reviewed 10.1101/2023.12.08.570839. [DOI] [Google Scholar]
  • 14. Liu  X, Zeira  R, Raphael  BJ  Partial alignment of multislice spatially resolved transcriptomics data. Genome Res. 2023; 33:1124–32. 10.1101/gr.277670.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Xia  CR, Cao  ZJ, Tu  XM  et al.  Spatial-linked alignment tool (SLAT) for aligning heterogenous slices. Nat Commun. 2023; 14:7236. 10.1038/s41467-023-43105-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Jones  A, Townes  FW, Li  D  et al.  Alignment of spatial genomics data using deep Gaussian processes. Nat Methods. 2023; 20:1379–87. 10.1038/s41592-023-01972-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Zhang  C, Liu  L, Zhang  Y  et al.  spatiAlign: an unsupervised contrastive learning model for data integration of spatially resolved transcriptomics. GigaScience. 2024; 13:42. 10.1093/gigascience/giae042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Halmos  P, Liu  X, Gold  J  et al.  DeST-OT: alignment of spatiotemporal transcriptomics data. International Conference on Research in Computational Molecular Biology. 2024; Cambridge, MA, USA: Springer; 434–7. [Google Scholar]
  • 19. Gao  Z, Cao  K, Wan  L  Graspot: a graph attention network for spatial transcriptomics data integration with optimal transport. Bioinformatics. 2024; 40:137–45. 10.1093/bioinformatics/btae394. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Liu  W, Liao  X, Luo  Z  et al.  Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat Commun. 2023; 14:296. 10.1038/s41467-023-35947-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Bergenstråhle  J, Larsson  L, Lundeberg  J  Seamless integration of image and molecular analysis for spatial transcriptomics workflows. BMC Genom. 2020; 21:1–7. 10.1186/s12864-020-06832-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Preibisch  S, Karaiskos  N, Rajewsky  N  Image-based representation of massive spatial transcriptomics datasets. Cell Syst. 2025; 16:101264. [DOI] [PubMed] [Google Scholar]
  • 23. Lais  P, Mishra  S, Xiong  K  et al.  Image guided construction of a common coordinate framework for spatial transcriptome data. Sci Rep. 2025; 15:18074. 10.1038/s41598-025-01862-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Guo  T, Yuan  Z, Pan  Y  et al.  SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 2023; 24:241. 10.1186/s13059-023-03078-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Xu  H, Wang  S, Fang  M  et al.  SPACEL: deep learning-based characterization of spatial transcriptome architectures. Nat Commun. 2023; 14:7603. 10.1038/s41467-023-43220-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Chari  T, Pachter  L  The specious art of single-cell genomics. PLoS Comput Biol. 2023; 19:e1011288. 10.1371/journal.pcbi.1011288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Liu  X, Raphael  B  Representation learning for spatial multimodal data integration with optimal transport. NeurIPS 2023 AI for Science Workshop. 2023; New Orleans, LA, USA: OpenReview. [Google Scholar]
  • 28. Äijö  T, Maniatis  S, Vickovic  S  et al.  Splotch: robust estimation of aligned spatial temporal gene expression data. 05 September 2019, preprint: not peer reviewed 10.1101/757096. [DOI]
  • 29. Maniatis  S, Äijö  T, Vickovic  S  et al.  Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science. 2019; 364:89–93. 10.1126/science.aav9776. [DOI] [PubMed] [Google Scholar]
  • 30. Stahl  PL, Salmén  F, Vickovic  S  et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016; 353:78–82. 10.1126/science.aaf2403. [DOI] [PubMed] [Google Scholar]
  • 31. 10x Genomics  Mouse Brain Serial Section 2 Sagittal Posterior Dataset. https://www.10xgenomics.com/datasets/mouse-brain-serial-section-2-sagittal-posterior-1-standardAccessed: December 21, 2024.
  • 32. Stickels  RR, Murray  E, Kumar  P  et al.  Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol. 2021; 39:313–9. 10.1038/s41587-020-0739-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Andersson  A, Andrusivová  Ž, Czarnewski  P  et al.  A landmark-based common coordinate framework for spatial transcriptomics data. 2021; 2021–11.bioRxiv13 November 2021, preprint: not peer reviewed 10.1101/2021.11.11.468178. [DOI]
  • 34. Ji  AL, Rubin  AJ, Thrane  K  et al.  Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell. 2020; 182:497–514. 10.1016/j.cell.2020.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Wang  M, Hu  Q, Lv  T  et al.  High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae. Dev Cell. 2022; 57:1271–83. 10.1016/j.devcel.2022.04.006. [DOI] [PubMed] [Google Scholar]
  • 36. Zhang  D, Deng  Y, Kukanja  P  et al.  Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature. 2023; 616:113–22. 10.1038/s41586-023-05795-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Wei  X, Fu  S, Li  H  et al.  Single-cell Stereo-seq reveals induced progenitor cells involved in axolotl brain regeneration. Science. 2022; 377:eabp9444. 10.1126/science.abp9444. [DOI] [PubMed] [Google Scholar]
  • 38. Xia  T, Hu  L, Zuo  L  et al.  ST-GEARS: advancing 3D downstream research through accurate spatial information recovery. Nat Commun. 2024; 15:7806. 10.1038/s41467-024-51935-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Long  Y, Ang  KS, Li  M  et al.  Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023; 14:1155. 10.1038/s41467-023-36796-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Rodriques  SG, Stickels  RR, Goeva  A  et al.  Slide-seq: a scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019; 363:1463–7. 10.1126/science.aaw1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Chen  KH, Boettiger  AN, Moffitt  JR  et al.  Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015; 348:aaa6090. 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Janesick  A, Shelansky  R, Gottscho  A  et al.  High resolution mapping of the breast cancer tumor microenvironment using integrated single cell, spatial and in situ analysis of FFPE tissue. Nat Commun.  2023; 14:7374. 10.1038/s41467-023-43458-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Chen  A, Liao  S, Cheng  M  et al.  Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022; 185:1777–92. 10.1016/j.cell.2022.04.003. [DOI] [PubMed] [Google Scholar]
  • 44. Hu  Y, Li  Y, Xie  M  et al.  MaskGraphene: Advancing joint embedding, clustering, and batch correction for spatial transcriptomics using graph-based self-supervised learning. 25 February 2024, preprint: not peer reviewed 10.1101/2024.02.21.581387. [DOI]
  • 45. Li  Z, Zhou  X  BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022; 23:168. 10.1186/s13059-022-02734-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Yu  Y, Xie  Z  Spatial transcriptomic alignment, integration, and de novo 3D reconstruction by STAIR. 9February 2024,preprint: not peer reviewed. 10.21203/rs.3.rs-3939678/v1. [DOI]
  • 47. Maynard  KR, Collado-Torres  L, Weber  LM  et al.  Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021; 24:425–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Tao  Y, Sun  X, Wang  F  BiGATAE: a bipartite graph attention auto-encoder enhancing spatial domain identification from single-slice to multi-slices. Brief Bioinform. 2024; 25:bbae045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Williams  C, Rasmussen  C  Gaussian processes for machine learning. adaptive computation and machine learning series. 2006; Cambridge, MA, USA: MIT Press. [Google Scholar]
  • 50. Titouan  V, Courty  N, Tavenard  R  et al.  Optimal transport for structured data with application on graphs. International Conference on Machine Learning. 2019; Long beach, CA, USA: PMLR; 6275–84. [Google Scholar]
  • 51. Preibisch  S, Saalfeld  S, Tomancak  P  Globally optimal stitching of tiled 3D microscopic image acquisitions. Bioinformatics. 2009; 25:1463–5. 10.1093/bioinformatics/btp184. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Saalfeld  S, Fetter  R, Cardona  A  et al.  Elastic volume reconstruction from series of ultra-thin microscopy sections. Nat Methods. 2012; 9:717–20. 10.1038/nmeth.2072. [DOI] [PubMed] [Google Scholar]
  • 53. Hörl  D, Rojas  Rusak F, Preusser  F  et al.  BigStitcher: reconstructing high-resolution image datasets of cleared and expanded samples. Nat Methods. 2019; 16:870–4. 10.1038/s41592-019-0501-0. [DOI] [PubMed] [Google Scholar]
  • 54. Pietzsch  T, Preibisch  S, Tomančák  P  et al.  ImgLib2—generic image processing in Java. Bioinformatics. 2012; 28:3009–11. 10.1093/bioinformatics/bts543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Zheng  G, Li  S, Szekely  G  Statistical shape and deformation analysis: methods, implementation and applications, 1st ed. 2017; Cambridge, MA, USA: Academic Press. [Google Scholar]
  • 56. Voulodimos  A, Doulamis  A  Recent advances in 3D imaging, modeling, and reconstruction. 2020; Hershey, PA: IGI Global; 10.4018/978-1-5225-5294-9. [DOI] [Google Scholar]
  • 57. Kiemen  AL, Braxton  AM, Grahn  MP  et al.  CODA: quantitative 3D reconstruction of large tissues at cellular resolution. Nat Methods. 2022; 19:1490–9. 10.1038/s41592-022-01650-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Merritt  CR, Ong  GT, Church  SE  et al.  Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nat Biotechnol. 2020; 38:586–99. 10.1038/s41587-020-0472-9. [DOI] [PubMed] [Google Scholar]
  • 59. Eng  CHL, Lawson  M, Zhu  Q  et al.  Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature. 2019; 568:235–9. 10.1038/s41586-019-1049-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Nagendran  M, Sapida  J, Arthur  J  et al.  1457 Visium HD enables spatially resolved, single-cell scale resolution mapping of FFPE human breast cancer tissue. J Immunother Cancer. 2023; 11:A1620. 10.1136/jitc-2023-SITC2023.1457. [DOI] [Google Scholar]
  • 61. Vickovic  S, Eraslan  G, Salmén  F  et al.  High-definition spatial transcriptomics for in situ tissue profiling. Nat Methods. 2019; 16:987–90. 10.1038/s41592-019-0548-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Wang  X, Allen  WE, Wright  MA  et al.  Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 2018; 361:eaat5691. 10.1126/science.aat5691. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Shi  H, He  Y, Zhou  Y  et al.  Spatial atlas of the mouse central nervous system at molecular resolution. Nature. 2023; 622:552–61. 10.1038/s41586-023-06569-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. McCann  MT, Unser  M  et al.  Biomedical image reconstruction: From the foundations to deep neural networks. Found Trends Signal Process. 2019; 13:283–359. 10.1561/2000000101. [DOI] [Google Scholar]
  • 65. Wang  G, Ye  JC, Mueller  K  et al.  Image reconstruction is a new frontier of machine learning. IEEE T Med Imaging. 2018; 37:1289–96. 10.1109/TMI.2018.2833635. [DOI] [PubMed] [Google Scholar]
  • 66. Zeng  GL  Medical image reconstruction. 2010; 530:Berlin, Heidelberg: Springer. [Google Scholar]
  • 67. Herman  GT  Fundamentals of computerized tomography: image reconstruction from projections. 2009; London: Springer. [Google Scholar]
  • 68. Natterer  F, Wübbeling  F  Mathematical methods in image reconstruction. 2001; Philadelphia: Society for Industrial and Applied Mathematics; 10.1137/1.9780898718324. [DOI] [Google Scholar]
  • 69. Korostelev  AP, Tsybakov  AB  Minimax theory of image reconstruction. 2012; 82:New York, NY: Springer. [Google Scholar]
  • 70. Ye  JC, Unser  M, Eldar  YC  Deep learning for biomedical image reconstruction. 2023; Cambridge: Cambridge University Press. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

No new data were generated or analyzed in support of this research.


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES