Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2025 Feb 19;53(4):gkaf087. doi: 10.1093/nar/gkaf087

Precise gene expression deconvolution in spatial transcriptomics with STged

Jia-Juan Tu 1,2, Hong Yan 3,4, Xiao-Fei Zhang 5,6,, Zhixiang Lin 7,
PMCID: PMC11838043  PMID: 39970279

Abstract

Spatially resolved transcriptomics (SRT) has transformed tissue biology by linking gene expression profiles with spatial information. However, sequencing-based SRT methods aggregate signals from multiple cell types within capture locations (“spots”), masking cell-type-specific gene expression patterns. Traditional cell-type deconvolution methods estimate cell compositions within spots but fail to resolve cell-type-specific gene expression, limiting their ability to uncover critical biological processes such as cellular interactions and microenvironmental dynamics. Here, we present STged (spatial transcriptomic gene expression deconvolution), a novel computational framework that goes beyond traditional deconvolution by reconstructing cell-type-specific gene expression profiles from mixed spots. STged integrates graph-based spatial correlations and reference-derived gene signatures using a non-negative least-squares regression framework, achieving precise and biologically meaningful deconvolution. Comprehensive simulations show that STged consistently outperforms existing methods in accuracy and robustness. Applications to human pancreatic ductal adenocarcinoma and human squamous cell carcinoma datasets reveal its capacity to identify microenvironment-specific highly variable genes, reconstruct spatial cell–cell communication networks, and resolve tissue architecture at near-single-cell resolution. In mouse kidney tissues, STged uncovers dynamic spatial gene expression patterns and distinct gene programs, advancing our understanding of tissue heterogeneity and cellular dynamics.

Graphical Abstract

Graphical Abstract.

Graphical Abstract

Introduction

Spatially resolved transcriptomics (SRT) techniques capture gene expression profiles at capture locations while also recording their spatial context [1, 2]. Current protocols can be broadly categorized into two groups: imaging-based methods (e.g. seqFISH+ and MERFISH) and sequencing-based methods [e.g. spatial transcriptomics (ST) and commercial 10x Genomics Visium] [3–6]. Imaging-based approaches provide high gene-detection sensitivity at the single-cell level but are often limited in simultaneously measuring a large number of genes [2, 6, 7]. Sequencing-based methods can capture the whole transcriptome at capture locations (“spots”) but lack single-cell resolution, resulting in combined gene transcripts from diverse cell types [1, 2, 5, 8–10]. The combined transcripts from different cell types impede the direct comparison of spots, possibly leading to the spurious or missed inference of biologically relevant signals when attempting to characterize the gene expression heterogeneity [11, 12].

To characterize the spatial distribution of different cell types within complex tissues, various cell-type deconvolution methods have been proposed to estimate cell-type composition within spots [8–10, 13–16]. These methods offer valuable insights into dissecting the spatial distribution of different cell types, uncovering patterns of cell localization, and identifying potential interactions between cell types. However, despite their utility, cell-type deconvolution approaches often encounter challenges in capturing the nuanced gene expression profiles within individual cell types. Factors such as cell-to-cell expression variations and heterogeneous cellular states present obstacles to accurately deciphering the intricate gene expression landscape [7, 17]. Consequently, there is an urgent need for more sophisticated gene expression deconvolution methods aimed at extracting cell-type-specific gene expression information from mixed spots [18]. Such advancements are crucial for achieving a deeper understanding of gene expression heterogeneity and its implications for tissue function.

Several gene expression deconvolution methods [19–22] have been proposed to infer cell-type-specific gene expression profiles from bulk samples. These methods typically operate within a linear framework, treating gene expression profiles in bulk samples as weighted averages of expression values from different cell types. The weights are determined based on the proportional composition of cell types in the mixture, which can be inferred using cell-type deconvolution methods. While these deconvolution methods can theoretically be applied directly to SRT data by treating each spot as a bulk sample [10], it is important to note that they may not fully exploit the rich spatial information available in SRT data. Indeed, neighboring spots often exhibit similar gene expression levels [8, 23–25], providing valuable insights for inferring cell-type-specific gene expression. Incorporating models of neighborhood similarity and spatial structure correlation can enhance deconvolution accuracy. Moreover, cells of the same type typically display similar gene expression levels [17]. Single-cell RNA sequencing (scRNA-seq) offers valuable insights into cellular homogeneity and heterogeneity within tissues. scRNA-seq data highlight the advantages of leveraging cell-type-specific gene expression information for spatial analysis of transcriptomics data [8, 10, 16, 25, 26]. Integrating gene expression information from scRNA-seq data can further enhance the robustness of gene expression deconvolution for each cell type in SRT data [11].

In this study, we present a novel method, termed spatial transcriptomic gene expression deconvolution (STged), designed for inferring cell-type-specific gene expression within individual spots. STged integrates spatial correlation patterns of gene expression and intra-cell-type expression similarity to achieve precise and robust deconvolution results. Implemented within a non-negative least-squares regression framework, STged models gene expression levels at each spot as a weighted linear combination of cell-type-specific gene expression, with the weights corresponding to the respective cell-type proportions. By incorporating a spatial neighborhood graph prior, STged captures spatial correlation structures in cell-type expressions across spots. Moreover, it integrates cell-type-specific gene expression information prior from scRNA-seq data to enhance accuracy. Through extensive downstream experiments, STged yields valuable biological insights into the influence of neighboring cell types on gene expression levels, cellular interactions within the microenvironment, and topological heterogeneity at both gene expression and cell levels. These insights include the identification of microenvironment-associated highly variable genes (mHVGs) within cell types, mapping of cell–cell communication (CCC) patterns, and exploration of the spatial organization of cells.

Materials and methods

STged model

We consider an SRT dataset comprising p genes, n spots, and K cell types, where genes, spots, and cell types are indexed by i, j, and k, respectively. The gene expression matrix for spots is denoted as Y, with rows representing genes and columns representing spots, while the spot location matrix is represented by Z, where rows represent spots and columns represent spatial coordinates. We assume the availability of the cell-type proportion matrix V for all spots, where rows represent spots and columns represent cell types. If this matrix is not available, it can be estimated using established cell-type deconvolution methods [16]. Additionally, we assume a gene signature matrix U, derived from scRNA-seq reference data, is given. This matrix has rows representing genes and columns representing cell types, providing mean gene expression profiles for each cell type. Given the observed spot gene expression matrix Y, spot location matrix Z, cell-type proportion matrix V, and gene signature matrix U, our objective is to infer a 3D tensor Inline graphic representing cell-type-specific gene expression profiles. Each element fijk of F denotes the expression level of gene i in cells of type k within spot j.

The STged model describes the observed gene expression levels, yij, of gene i at each spot j as a weighted linear combination of cell-type-specific gene expressions fijk, with the weights corresponding to the respective cell-type proportions $v$jk:

graphic file with name M0001a.gif (1)

where eij is a random error. Our goal is to estimate the cell-type-specific gene expression tensor F given the gene expression matrix Y and cell-type proportion matrix V. A straightforward method to estimate the parameters is least squares. However, due to the need to estimate a 3D tensor F with only given two matrices Y and V, this problem is ill-posed, and the least-squares method may not produce well estimators.

To address this issue, we incorporate more prior knowledge of F for estimation. First, we incorporate prior knowledge of F for estimation by leveraging the spatial relationship between spots. Previous studies have shown a monotonic relationship between cellular pairwise distances in gene expression and physical space [8, 23–25], indicating that cells of the same type in neighboring spots tend to exhibit similar gene expression profiles. Based on this observation, we construct a binary spatial graph G using Squidpy [27], with spatial locations Z as input. For 10x Visium data, we adopt a hexagonal grid structure, while for ST data, a square grid is used. Each spot is connected to up to eight nearest neighbors, forming an adjacency matrix W, where Inline graphic if spot j and spot j′ are connected, and Inline graphic otherwise. The resulting matrix W is symmetric and captures local spatial relationships, which are incorporated as a prior to enforce spatial coherence during the estimation of F.

Furthermore, although cells of the same type tend to have certain variations across different cell states or spatial environments, they are similar to the means of cells belonging to this type since they belong to the same cell types [8, 11]. Therefore, we assume that the spot-specific gene expression profile will be similar to the gene signature matrix U derived from scRNA-seq data. We employ a regularization term to impose this prior regularization for estimating F. By taking into account the above two aspects, we propose a new optimization model to estimate cell-type-specific gene expression profiles F:

graphic file with name M0004.gif (2)

Here, the non-negative constraint fijk ≥ 0 ensures that the estimated expression profile is non-negative. The first term is the least-squares loss, ensuring that the gene expression levels for each spot are a weighted sum of estimated cell-type-specific gene expressions. The second term is a graph regularization term that encourages cells belonging to the same types in neighboring spots to have similar gene expression profiles. The third term ensures that the estimated cell-type-specific gene expression profiles are similar to the gene signature matrix U derived from scRNA-seq reference data. λ1 and λ2 are two tuning parameters that balance the three terms. For detailed insights into the algorithm and the selection of the two tuning parameters, please refer to the Supplementary text (Sections 3.1 and 3.2).

Cell-type subpopulation analysis

In biological systems, cells of the same type can exhibit substantial heterogeneity in gene expression, driven by variations in cellular states, microenvironmental influences, or developmental stages. These intra-cell-type differences are crucial for understanding cellular functions, disease mechanisms, and therapeutic responses. We address the intra-cell-type heterogeneity by analyzing cell-type-specific gene expression profiles at the spot level, which are inferred through the STged method. These profiles capture the collective gene expression of a specific cell type within each measured spot, enabling us to explore the diversity of gene expression within a single cell type. To investigate this heterogeneity, we apply two complementary analytical strategies: categorical analysis, which classifies cells into distinct subpopulations based on their cell-type-specific gene expression profiles, and continuous analysis, which models the transitions between different cellular states along a developmental trajectory.

For categorical analysis, we use the Seurat toolkit [28] to cluster spots based on their cell-type-specific gene expression profiles derived from STged. Using a shared nearest neighbor (SNN) graph built from the top 30 principal components, we identify distinct subpopulations of cells. Differential expression analysis for each subpopulation is performed using the “FindMarkers” function from the Seurat package. We apply the Wilcoxon rank-sum test, comparing gene expression in each cluster to all other clusters. A threshold of log10(fold change) > 0.25 and an adjusted P-value < 0.05 are used to identify overexpressed genes for each subpopulation.

For continuous analysis, we perform trajectory analysis using Monocle3 [29], applying pseudotime analysis to model the developmental pathways of specific cell types identified within the spatial spots. By utilizing the spot-level, cell-type-specific gene expression data from STged, we map the dynamic transitions and developmental states within each cell type, offering insights into their spatially influenced progression.

CCC analysis

CCC is essential for understanding the signaling pathways that mediate interactions between different cell types. In this study, we infer CCC using two complementary approaches: CellChat [30] to broadly explore ligand–receptor (L–R) interactions across all cell types, and NicheNet [31] to investigate how specific genes in receiver cells are influenced by ligands from sender cells of specific cell types.

For L–R interaction networks, we apply CellChat to STged-derived spot- and cell-type-specific gene expression data. A gene is considered expressed in a given cell type if its expression exceeds predefined thresholds (e.g. 2). Using the CellChatDB interaction database, we identify overexpressed ligands and receptors involved in CCC. The CellChat [30] algorithm calculates communication probabilities for L–R pairs, highlighting active signaling pathways with robust statistical significance (adjusted P-value < 0.05). This analysis provides an overview of active intercellular signaling networks across the tissue microenvironment.

To investigate upstream regulatory mechanisms of CCC, we use NicheNet to identify ligands expressed by sender cells that regulate a predefined set of target genes in receiver cells. Target genes are selected based on their expression levels and potential functional relevance to CCC. A gene is considered expressed in a given cell type if its log2-transformed mean TPM (transcripts per kilobase million) exceeds 4, calculated from spots where the cell-type proportion exceeds 0.05. NicheNet [31] employs a prior knowledge-based model to link ligands from sender cells to downstream target genes in receiver cells. Using the “predictInline graphicligandInline graphicactivities” function, we calculate ligand regulatory activity scores and prioritize ligands with the highest potential to drive target gene expression changes. To ensure result reliability, we retain the top 100 target genes and focus on ligands with the highest regulatory scores, refining connections through the ligand–target matrix.

By integrating CellChat [30] and NicheNet [31] analyses, we provide a comprehensive characterization of CCC within the tissue microenvironment. CellChat identifies broad L–R interaction networks and active signaling pathways across all cell types, offering a global perspective on intercellular communication. In contrast, NicheNet pinpoints specific ligands that drive target gene expression in receiver cells, revealing the regulatory mechanisms underpinning CCC. Together, these complementary approaches elucidate the dynamic and complex cellular communication processes in the studied tissues, providing mechanistic insights into tissue microenvironment interactions.

mHVGs analysis

Spatial heterogeneity in gene expression often arises within the same cell type due to interactions with the local microenvironment, such as variations in the composition of neighboring cell types. To systematically characterize this phenomenon, we design a computational framework to identify mHVGs. These genes are defined as those whose expression levels in a specific cell type are significantly modulated by the proportions of neighboring cell types.

For each cell type k and gene i, we hypothesize that gene expression levels at a given spatial spot j depend on the proportions of surrounding cell types k′( ≠ k). This relationship is modeled as

graphic file with name M0007.gif

where fijk represents the expression of gene i in cell type k at spot j, Inline graphic is the proportion of cell type k′ in spot j, and ϵijk accounts for residual variability. The intercept Inline graphic represents the baseline expression of gene i in cell type k, while Inline graphic quantifies the contribution of neighboring cell type k′ to the expression of gene i. If Inline graphic, the expression of gene i in cell type k is considered influenced by the microenvironment, and k′ is identified as a contributing cell type.

To address multicollinearity among neighboring cell-type proportions and ensure robust feature selection, we apply elastic net regularization to estimate the model parameters. The optimization objective is defined as

graphic file with name M00012.gif (3)

The first term minimizes the residual sum of squares, while the second term introduces elastic net regularization, combining the sparsity-inducing ℓ1 norm (lasso) and the shrinkage effect of the ℓ2 norm (ridge). The tuning parameter γ controls the overall penalty strength, and α determines the balance between the two penalties. Elastic net enables robust identification of relevant neighboring cell types while mitigating overfitting and multicollinearity effects.

After estimating coefficients Inline graphic, we classify genes as follows: (i) mHVGs: A gene i in cell type k is identified as an mHVG if at least one Inline graphic, indicating modulation by neighboring cell types. (ii) ctHVGs: To pinpoint specific influences, we identify cell-type-specific HVGs (ctHVGs), where Inline graphic highlights the effect of cell type k′ on gene i.

We implement this framework using the glmnet package in R. By default, we set α = 0.5 to balance lasso and ridge penalties. The regularization parameter γ is tuned via five-fold cross-validation, using the “lambda.1se” criterion to ensure a trade-off between model simplicity and predictive stability.

Gene expression programs analysis

Gene expression within a given cell type often reflects subtle variations linked to continuous cell states. Deciphering these patterns is essential for understanding the mechanisms underlying cellular behaviors and phenotypes. To address this, we develop a spatial gene coexpression analysis framework that identifies spatially coordinated gene expression programs.

The analysis begins with the identification of HVGs using the “modelGeneVar” and “getTopHVGs” functions from the scran package, setting a false discovery rate threshold of 0.05 to select genes with significant variability. For a given cell type k, a spatially weighted correlation matrix (WCor) is constructed to capture the spatial organization of gene coexpression [24]. The process begins with the computation of a weighted covariance matrix: Inline graphic, where Inline graphic is the mean-centered gene expression matrix for cell type k, and W is a spatial weight matrix encoding relationships between spots containing cells of type k. The weights in W are based on spatial proximities or connectivity measures among the spots. The weighted covariance matrix Inline graphic is normalized to calculate the correlation matrix:

graphic file with name M00019.gif

where Inline graphic represents the weighted covariance between genes i and i′, and Inline graphic and Inline graphic are the variances of genes i and i′, respectively. This normalization ensures that WCor values are bounded between −1 and 1, allowing for meaningful comparisons of gene coexpression while accounting for spatial proximity.

To identify spatial coexpression modules, WCor is preprocessed through min–max normalization and symmetrization (averaging WCor with its transpose). The symmetrized correlation matrix is then converted into a distance matrix for hierarchical clustering. Genes are grouped into modules based on their spatial coexpression patterns, with the number of clusters determined through visual inspection of the resulting dendrogram to ensure biologically meaningful groupings. The biological relevance of each module is assessed using the “AddModuleScore” function from the Seurat package [28]. This function calculates module activity scores, which are subsequently mapped onto the spatial tissue framework. These mappings provide insights into the organization of cellular states and reveal the roles of spatially coexpressed gene modules in driving cellular functions and phenotypes.

Gene Ontology enrichment analysis

To functionally annotate gene sets, we perform gene set enrichment analysis using Gene Ontology (GO) biological process (BP) terms. All enrichment analyses are performed using the “enrichGO” function from the clusterProfiler package [32] with the default “BH” method for P-value multiple testing correction and the default significant level at 0.05 [33].

Datasets

Human pancreatic ductal adenocarcinoma

We analyze a comprehensive dataset from a pancreatic ductal adenocarcinoma (PDAC) tissue, including matched SRT and scRNA-seq data obtained from the same individual, as described in [34]. This publicly available dataset, accessible through the Gene Expression Omnibus (GEO) under accession number GSE111672, provides a valuable resource for studying PDAC at both spatial and single-cell resolutions. The SRT dataset contains 428 spatial spots profiling 19 738 genes (denoted as PDAC-A), while the scRNA-seq dataset includes 1926 cells. The scRNA-seq data are annotated into 20 distinct cell types, organized into four main categories: tumor cells, stromal cells, immune cells, and pancreatic duct-associated cells. Tumor cells are further classified into cancer clones A and B. Stromal cells include endocrine cells, endothelial cells, fibroblasts, mast cells, and tuft cells. Immune cells consist of macrophages A and B, myeloid dendritic cells A and B, plasmacytoid dendritic cells, monocytes, T cells and NK cells, and red blood cells. Pancreatic duct-associated cell types include ductal high hypoxic cells, ductal terminal cells, ductal centroacinar cells, and ductal antigen-presenting cells.

Human squamous cell carcinoma

We analyze a dataset of squamous cell carcinoma (SCC) obtained using the ST platform, focusing on sample replicate 2 from patient 2 (GSE144240), as described in [35]. This dataset includes matched scRNA-seq data generated from the same tissue using the 10x Genomics Chromium platform, providing a shared gene set for integrated analysis. The SRT dataset contains 646 spots, while the scRNA-seq dataset consists of 2683 cells, both profiling 12 005 genes. The scRNA-seq dataset identifies 11 distinct cell types, categorized into tumoral and nontumoral cell types. The nontumoral cell types include myeloid, endothelial, epithelial, fibroblast, macrophages, T cells and NK cells, and melanocytes. The tumoral cell types comprise four specific keratinocyte subtypes: tumor-specific keratinocytes (TSKs), basal keratinocytes (KC-Basal), cycling keratinocytes (KC-Cyc), and differentiating keratinocytes (KC-Diff). Among these, TSK cells are unique to tumor tissues and are initially identified through comparative analysis of tumor cell subpopulations from normal and SCC tissues using scRNA-seq data, as detailed in [35].

Mouse kidney 10x SRT

We analyze a dataset of the mouse kidney, a heterogeneous tissue composed of multiple cell types across anatomically distinct zones [36]. The dataset includes SRT data generated using Visium v1 chemistry, containing 1430 spots, and scRNA-seq data retrieved from the GEO database (GSE129798). The scRNA-seq dataset comprises 3790 cells, selected through stratified downsampling to ensure proportional representation of diverse cell types, with 8357 shared genes profiled across both datasets. The scRNA-seq data are annotated with 10 distinct cell types, including interstitial cells, macrophages, principal cells, renal corpuscular cells, T cells, thick limb loop of henle cells, vascular cells, vascular smooth muscle cells, distal tubule cells, and proximal tubule cells. These annotations are based on marker genes and classification provided by [36], ensuring accurate representation of diverse kidney cell populations.

Data preprocessing

Correcting for sequencing depth and other technical variations is essential in sequencing data analysis. To ensure data quality and consistency, we apply the following filtering criteria: genes expressed in <1% of cells (or spots) are removed, and cells with <100 total read counts are excluded. To address disparities in sequencing depth and other technical factors, we first normalize the raw count data by library size, scaling by a factor of 1e6. To avoid undefined results from zero counts and enhance numerical stability and interpretability, we add one to each scaled value before applying a logarithmic transformation. This two-step normalization process effectively corrects for variations in sequencing depth across all SRT and scRNA-seq datasets. Additionally, to mitigate the impact of outliers in gene expression, we cap expression values at the 99.5th percentile for each gene across all cells (or spots).

Cell-type deconvolution

We utilize EnDecon [16] to estimate cell-type proportions in real SRT data, integrating six base deconvolution methods (CARD [8], SpatialDWLS [9], RCTD [10], Cell2location [13], SPOTlight [14], and Stereoscope [15]) tailored for SRT data and using raw scRNA-seq data from STged as reference. To mitigate the impact of limited cells in capture spots, we filter out cell types with proportions below 0.05 in each spot.

Results

Overview of STged

STged is a computational framework designed to deconvolute SRT data into cell-type-specific gene expression by integrating it with annotated scRNA-seq data. The method requires three primary inputs: (i) SRT data, including a gene expression matrix Y and a location matrix Z, (ii) a cell-type proportion matrix V, and (iii) a scRNA-seq reference dataset (Fig. 1A). If cell-type proportions are not provided, STged employs EnDecon [16] to estimate them through ensemble learning.

Figure 1.

Figure 1.

STged is specifically designed for deconvolving gene expression from low-resolution SRT data. (A) Inputs for STged include an SRT gene expression matrix with coordinate information (top panel), corresponding cell-type proportion information (middle panel), and annotated scRNA-seq data as a reference for cell-type-specific gene expression (bottom panel). (B) STged’s computational model utilizes graph-based and reference gene signature-guided approaches, integrating cell-type-specific gene expression data from both spatial neighbor data and matched tissue scRNA-seq data. A spatial neighbor graph is constructed using spot location information, and cell-type-specific gene expression is derived from the annotated scRNA-seq data. (C) Outputs from STged include spot- and cell-type-specific gene expression matrices, alongside detailed gene expression profiles for each spot. (D) STged reconstructs spot- and cell-type-specific gene expression data for various downstream analyses, using cell- and gene-level approaches to thoroughly understand spatial cellular heterogeneity. At the cell level, clustering identifies distinct cell populations and continuous trajectories, and CCC analysis helps elucidate signaling interactions among cell types. At the gene level, mHVGs analysis identifies gene expression levels that vary across spatial microenvironments, and gene expression program analysis investigates the coordinated roles of gene sets in cellular regulation.

STged incorporates spatial context by constructing a spatial weight matrix W based on the spatial coordinates of spots, enabling graph-based regularization to enforce smoothness of gene expression profiles among neighboring cells of the same type. From the scRNA-seq dataset, a reference cell-type signature matrix U is generated by averaging the expression profiles of cells within each type. The core optimization model estimates cell-type-specific gene expression profiles by minimizing a loss function that includes three components: (i) a least-squares loss to match observed spot-level gene expression with the weighted sum of cell-type-specific profiles, where weights are cell-type proportions; (ii) a graph-based regularization term to promote spatial smoothness of gene expression; and (iii) a reference-guided regularization term to align estimated gene expression profiles with scRNA-seq-derived signatures (Fig. 1B). Solving this optimization problem yields a third-order tensor Inline graphic, where fijk represents the expression of gene i in cells of type k at spatial spot j (Fig. 1C). This tensor serves as the foundation for downstream analyses at both the cell and gene levels (Fig. 1D).

At the cell level, STged enables the identification of subpopulations within the same cell type by clustering cell-type-specific gene expression profiles, offering a detailed perspective on cellular diversity. For cells exhibiting continuous states, it facilitates the inference of developmental trajectories, providing insights into dynamic cellular processes and transitional states. Furthermore, STged supports CCC analysis using CellChat, which constructs global L–R interaction networks to identify active signaling pathways and key intercellular communication events. Additionally, it integrates NicheNet, mapping ligands expressed by sender cells to downstream target genes in receiver cells, thereby uncovering mechanisms of transcriptional regulation. These diverse approaches provide a detailed view of intercellular signaling pathways and regulatory networks.

At the gene level, STged identifies mHVGs using a regression framework that quantifies the impact of surrounding cell-type compositions on the expression of target genes within a specific cell type. These mHVGs are further refined into ctHVGs by determining the specific neighboring cell types contributing to their variability. Additionally, STged analyzes gene expression programs by identifying coexpressed gene modules, which are then used to compute activity scores reflecting the functional roles of these modules. This approach provides insights into how gene programs drive continuous cell states and coordinate regulatory mechanisms. Collectively, these analyses offer a comprehensive understanding of the spatial and functional organization of tissues, bridging cellular heterogeneity and gene-level regulatory processes.

Performance evaluation using simulated SRT data

Generation of simulated SRT data

To evaluate gene expression deconvolution methods, we simulate low-resolution SRT data by aggregating single-cell resolution SRT data into grid-based squares (spots). Gene expression levels within each spot are calculated by summing the expression values from all cells within its boundaries, creating a spatial structure that mirrors real-world spot-level SRT data. Cell-type proportions are determined using cell label information, with transcriptional profiles derived by averaging expression across cells of the same type. Spot coordinates are set as the mean x- and y-coordinates of each grid square, ensuring spatial consistency across datasets.

This simulation approach is applied to single-cell resolution SRT data from both the seqFISH+ [4, 9] and MERFISH [6] platforms. For seqFISH+, we use a publicly available single-cell resolution dataset containing 523 cells and 10 000 genes (available at https://github.com/CaiGroup/seqFISH-PLUS). Following the methodology outlined in [9], we define spot regions of ∼51.5 μm to generate spot SRT data (Fig. 2A), resulting in 69 spots comprising six main cell types: excitatory neurons (eNeurons), inhibitory neurons (iNeurons), astrocytes, oligodendrocytes (Olig), microglia cells, and endothelial-mural cells (endoInline graphicmural). Aggregated gene expression values and cell-type proportions in each spot serve as ground truth for deconvolution analysis. Additionally, a publicly available scRNA-seq dataset from mouse cortex tissue (Fig. 2B) [37] is used to compute the gene signature matrix, guiding gene expression deconvolution in STged and supporting cell-type deconvolution as needed. To ensure consistency, we retain genes expressed in at least 5% of cells or spots, resulting in 6540 shared genes between the simulated SRT and reference scRNA-seq data.

Figure 2.

Figure 2.

Analysis of simulation SRT data generated from seqFISH+ data. (A) Visualization of the spatial distribution of cell types in the seqFISH+ mouse cortex. Each square grid represents a simulated spot containing multiple cells, with each color indicating a different cell type. The six cell types are: eNeuron, iNeuron, astrocytes, Olig, microglia, and endo-mural cells. (B) UMAP (Uniform Manifold Approximation and Projection) plot for the scRNA-seq data from mouse cortex tissue. This dataset contains 8091 genes and 1691 cells annotated into six cell types. (C–D) Visualization of the spatial distribution of cell-type proportions. (C) It shows the true cell-type proportions derived from panel (A). (D) It shows the cell-type proportions inferred by the deconvolution method of EnDecon [16]. Each pie chart represents a point in the SRT data showing the cell-type composition. (E–F) Evaluation of the performance of the gene expression deconvolution methods in two different scenarios, based on four evaluation metrics. (E) Scenario 1: methods evaluated with true cell-type proportions from the simulation. (F) Scenario 2: methods evaluated using cell-type proportions estimated by original or recommended deconvolution methods. Each method is represented by a different color.

To assess deconvolution performance, we design two experimental scenarios reflecting different levels of information availability. In scenario 1, where cell-type proportions are known, these values are directly provided to the deconvolution methods (Fig. 2C), allowing assessment under ideal conditions. In scenario 2, simulating real-world conditions where cell-type proportions are unknown, we estimate proportions using the EnDecon method [16] (Fig. 2D). MERFISH data from the mouse medial preoptic area is analyzed as described in detail in Supplementary Section 3.4 and illustrated in Supplementary Fig. S25, due to space limitations.

Performance assessment of different deconvolution methods

To evaluate the empirical performance of STged, we benchmark it against several deconvolution methods, including RCTD [10], TCA [38], ENIGMA [39], LSR [20], and a baseline method we term Spotdecon, which estimates gene expression by multiplying total expression in a spot by its cell-type proportions. We also include a comparison with the cell-type signature matrix derived from scRNA-seq reference data, referred to as refInline graphicmean. Each method utilizes distinct analytical approaches, enabling a comprehensive comparison of their relative strengths and limitations in SRT data analysis. To quantify the accuracy and robustness of each method, we employ four evaluation metrics: Pearson correlation coefficient (PCC), cosine similarity, root mean square error (RMSE), and Jensen–Shannon divergence (JSD) (details provided in Supplementary Section 3.3).

RCTD, designed specifically for SRT, estimates cell-type-specific gene expression based on the conditional expectation of gene expression at each spot. LSR, originally developed for bulk RNA-seq data, estimates cell-type-specific profiles by treating each SRT spot as a bulk sample and applying least-squares regression. TCA [38] employs tensor decomposition and robust regression techniques to identify cell-type-specific DNA methylation signals, and it is also applicable for estimating cell-type-specific gene expression. ENIGMA [39], which employs weighted matrix completion with trace-norm regularization, is particularly suited for SRT data. We run ENIGMA using cell-type proportions predicted via robust linear regression, as suggested by [39].

We assess the performance of each method under two scenarios. In scenario 1, we use the true cell-type proportions as input for all methods (Fig. 2C). In scenario 2, EnDecon [16] is used to estimate cell-type proportions for STged (Fig. 2D). For other methods, we apply their recommended deconvolution techniques as outlined in their respective studies to ensure a fair comparison. Specifically, for RCTD, we employ its original method to calculate cell-type proportions [10], and for ENIGMA, we use cell-type proportions derived via robust linear regression, as advised by [39]. EnDecon is also used consistently for TCA [38], LSR, and Spotdecon.

In scenario 1, SRT-specific methods like STged and RCTD outperform those initially developed for bulk RNA-seq data. STged achieves the highest PCC at 0.344, representing a 19.0% improvement over RCTD (PCC = 0.289, P-value = 0.030, Kolmogorov–Smirnov test). Similarly, STged leads in cosine similarity with a score of 0.675, a 9.8% improvement over RCTD (cosine similarity = 0.615, P-value = 0.030) (Figure2E). To further assess the statistical significance of RMSE differences, we use the Diebold–Mariano (DM) test. STged achieves the lowest RMSE at 6.41e−5, significantly lower than RCTD’s RMSE of 6.78e−5 (P-value = 2.44e−10, two-sided DM test), confirming its superior predictive accuracy. Additionally, STged records the lowest JSD score of 0.278, indicating better alignment with true gene expression profiles compared with RCTD (JSD = 0.291, P-value = 2.58e−11). Spotdecon, while performing comparably to RCTD in some evaluations (e.g. PCC = 0.322, cosine similarity = 0.667), fails to match STged due to its limited ability to model complex spatial patterns (e.g. RMSE = 6.38e−5, JSD = 0.283). Other methods, such as ENIGMA and LSR, show lower overall metrics, emphasizing the importance of tailoring methods to SRT data.

In scenario 2, STged maintains superior results across all evaluation metrics. It achieves a PCC of 0.343 and a cosine similarity of 0.671, reflecting minimal loss compared with its performance in scenario 1. In contrast, LSR and Spotdecon show significant declines, with LSR dropping to a PCC of 0.156 and an RMSE of 8.46e−5. Spotdecon performs better than LSR but still falls short of STged, with a PCC of 0.322 and an RMSE of 6.38e−5 (Fig. 2F). The advantage of STged is further validated by testing on simulated data from the MERFISH dataset, where STged outperforms other methods across multiple evaluation metrics (Supplementary Section 3.4 and Supplementary Fig. S26). Its ability to leverage spatial structure correlations and cell-type-specific prior information ensures consistently accurate gene expression estimates.

STged reveals distinct fibroblast subpopulations with unique spatial distributions in PDAC tissue

Fibroblasts are pivotal in tumor progression, shaping the tumor microenvironment and displaying significant functional heterogeneity [40]. To explore the spatial and functional diversity of fibroblasts in PDAC tissue, we employ STged to analyze SRT data obtained via the ST platform [34]. Histological analysis annotates the tissue slice into four main regions—cancerous, pancreatic, ductal, and stromal—based on H & E staining (Fig. 3A) [34]. Due to the low resolution of the SRT data, we utilize EnDecon [16] to estimate cell-type proportions, referencing a matched scRNA-seq dataset (Fig. 3B and Supplementary Fig. S1). Using STged with tuning parameters (λ1 = 0.011, λ2 = 0.114), we deconvolve the data and identify gene expression profiles of 20 distinct cell types, including fibroblasts.

Figure 3.

Figure 3.

Analysis of PDAC SRT data. (A) Histologist annotation of PDAC tissue sections divided into four distinct regions (right panel) based on H & E staining images (left panel) from the original study [30]. (B) Spatial scatter pie chart representation of cell-type composition, where each pie represents a spot in the SRT data with cell-type proportion inferred by EnDecon [16]. (C) Spatial distribution of fibroblast subpopulations, with different colors indicating distinct subgroups and gray representing spots without fibroblasts. (D) Bubble plot showing significant L-R pairs between malignant cells and macrophages, as inferred by CellChat [30]. (E) Venn diagrams illustrating the overlap of significantly detected L-R pairs across four different datasets. (F) Distribution of PECAM1 expression across various cell types in different datasets. (G) The spatial distribution of ctHVGs in fibroblasts with the highest PCC scores for each cell type is shown, with the top panel for cancer clone A-specific ctHVGs and the bottom panel for endothelial-specific ctHVGs. (H) Schematic of biological insights into the communication underlying fibroblast-cancer clone A/endothelial cell interactions.

To assess fibroblast transcriptomic heterogeneity, we apply Seurat’s SNN clustering at a resolution of 0.15, identifying three subpopulations (Supplementary Fig. S2). Spatial mapping reveals distinct distribution patterns: subcluster 0 is enriched in stromal areas, subcluster 1 in cancerous regions, and subcluster 2 in pancreatic regions (Fig. 3C). Cell-type enrichment analysis confirms these patterns by comparing subgroup distributions across tissue regions using a permutation test with 1000 iterations (Supplementary Fig. S3). Differential expression analysis identifies 1752, 637, and 2394 genes significantly overexpressed in subclusters 0, 1, and 2, respectively. The mean expression levels of these overexpressed genes correspond with the spatial distribution of their respective enriched regions (Supplementary Fig. S4).

We characterize three functionally heterogeneous fibroblast subclusters based on GO analysis of their overexpressed genes (Supplementary Fig. S5). Subcluster 0 exhibits increased gene expression regulation and activation of the Wnt signaling pathway, facilitating extracellular matrix (ECM) remodeling and maintaining tissue stability (Supplementary Fig. S5A). This suggests a significant role in supporting the structural integrity and dynamic remodeling of the stromal environment. Subcluster 1, localized in cancerous regions, demonstrates increased cell-matrix adhesion and tissue remodeling, thereby promoting the tumor microenvironment and facilitating tumor cell invasion and migration (Supplementary Fig. S5B). These adhesive properties indicate their role in supporting tumor growth and progression within the cancerous niche. Additionally, subcluster 2, found in pancreatic regions, plays significant roles in metabolic regulation and immune responses, supporting pancreatic cell functions and maintaining local immune homeostasis (Supplementary Fig. S5C). The metabolic and immunological functions of subcluster 2 fibroblasts demonstrate their role in sustaining pancreatic tissue functionality and mediating immune surveillance within the organ. These findings underscore the spatial and transcriptional heterogeneity of fibroblast populations in PDAC, highlighting their distinct functional roles in tumor biology and tissue homeostasis.

STged improves cellular communication inference over raw SRT data in PDAC data

Building on our understanding of fibroblast diversity in the PDAC microenvironment, we investigate the interactions between cancer and stromal cells that influence tumor progression and therapy response. Leveraging STged’s capabilities to deconvolve mixed expression data into specific cell-type contributions, we enhance our analysis of CCC from SRT data. By integrating STged with CellChat [30], we aim to elucidate the network of L–R interactions governing cellular dynamics in PDAC tissue. As a comparison, we also analyze L–R interactions directly on the raw SRT data, where each spot is treated as a single cell and cell types are defined by the dominant type, as determined by the deconvolution results (Supplementary Fig. S6). For CCC analysis, we include cancer clone A and stromal cells (fibroblasts and endothelial cells) identified in both the raw SRT data and the STged-deconvoluted data.

We identify 95 significant L–R pairs in the STged-deconvoluted data (adjusted P-value < 0.05, BH method), compared with 53 pairs in the raw SRT data (Fig. 3D and Supplementary Figs S7 and S8). To evaluate the accuracy of the STged deconvolution, we benchmark the predicted L–R pairs against CellChat analysis of scRNA-seq data from the same patient tissue. The STged-deconvoluted data demonstrate a high degree of concordance with the scRNA-seq benchmark, accurately predicting 48 L–R pairs (Fig. 3E). To assess the results of CCC inference, we use precision (the proportion of correctly identified L–R pairs out of all predicted pairs), recall (the proportion of correctly identified L–R pairs out of all actual pairs), and the F1 score, which balances precision and recall. STged achieves a precision of 0.505, recall of 0.187, and F1 score of 0.273, reducing false positives while maintaining recall. In contrast, raw SRT data correctly predict 16 L–R pairs, with a precision of 0.302, recall of 0.062, and F1 score of 0.103 (Supplementary Table S1). This discrepancy indicates the limitations of using raw SRT data directly the CCC inference, where the mixing from diverse cell types within the spots likely increases false positives. Such false signals likely arise from the presence of both cell types in the same spot, leading to coexpression.

Notably, in the raw SRT data, the apparent coexpression of the PECAM1-PECAM1 L-R pair between fibroblasts and endothelial cells suggests a potential bidirectional interaction, with PECAM1 on endothelial cells regulating its expression on fibroblasts, and vice versa (Fig. 3D). However, this result is misleading because PECAM1 is an endothelial cell marker involved in cell adhesion, migration, and junction formation within the tumor microenvironment, and it is not a fibroblast marker [41]. PECAM1 expression significantly differs between endothelial and fibroblast cells in scRNA-seq/STged-deconvoluted data (Student’s t-test: P-value < 1e−10), whereas raw SRT data show no significant difference (Student’s t-test: P-value > 0.1) (Fig. 3F). Furthermore, fibroblast-dominant spots in raw SRT data contain a higher proportion of endothelial cells than expected (mean proportions: fibroblasts 0.245, endothelial cells 0.142), suggesting that the misleading interaction observed in the raw SRT data is likely due to the co-localization of fibroblasts and endothelial cells within the same spots and the deconvolution by STged can resolve this issue.

Additionally, when comparing inferred significant L–R pairs from STged with those from RCTD, which does not account for spatial correlations in gene expression, we observe that CellChat identifies fewer L–R pairs in RCTD-deconvoluted data (49 L–R pairs) and reports lower precision (0.490), recall (0.093), and F1 score (0.157) compared with STged-deconvoluted data (Fig. 3E and Supplementary Table S1). These results indicate the effectiveness of STged deconvolution in detecting CCC patterns.

STged uncovers cancer clone A- and endothelial cell-mediated regulation of fibroblast gene expression within the PDAC tumor microenvironment

Having demonstrated the utility of STged in uncovering fibroblast subpopulation diversity and CCC networks between cell types, we now extend our analysis to explore gene-level spatial dependencies within the PDAC microenvironment. STged enables the identification of cell-type-specific spatial relationships, linking variations in gene expression to the spatial distribution of cell types across tissue spots [42, 43].

Using STged-deconvoluted data, we focus on how the proportions of surrounding cell types within each spot influence gene expression in fibroblast cells. From this analysis, we identify 915 fibroblast-associated mHVGs, characterized by expression variability that correlates with the spatial distribution of cell types. We also employ distance correlation (dCor) [44] to evaluate the relationships between mHVG expression levels in fibroblasts and the proportions of cell types neighboring fibroblasts (Supplementary Section 3.5). This analysis demonstrates that the spatial expression patterns of the identified mHVGs are correlated with local cell-type abundance (Supplementary Fig. S9).

To further evaluate the impact of cell-type distributions on the expression levels of mHVGs within the spatial microenvironment, we categorize fibroblast-associated mHVGs into groups termed ctHVGs, by evaluating the effect of specific cell type on mHVGs expression. The numbers of ctHVGs whose expression in fibroblasts is associated with the proportions of specific cell types are summarized in Supplementary Table S2. To further quantify these relationships, we also calculate PCC scores (Supplementary Section 3.6), indicating the strength of associations between ctHVG expression levels and the proportions of specific cell types (Supplementary Fig. S10).

To better understand the subtype-specific influences of cancer and stromal cells on the microenvironment, we examine interactions between fibroblasts and one cancer subtype, cancer clone A, as well as one stromal cell type, endothelial cells. This analysis reveals how cancer clone A and endothelial cells uniquely shape the microenvironment, affecting fibroblast gene expression and influencing tumor progression and therapy response. The four ctHVGs with the highest PCC scores for cancer clone and endothelial cells exhibit distinct, subtype-specific expression patterns (Fig. 3G). GO analysis based on these ctHVGs indicates distinct intercellular communication mechanisms (Supplementary Fig. S11). The ctHVGs for cancer clone A are enriched in pathways related to ECM organization, wound healing, and apoptotic signaling, reflecting roles in tissue remodeling and apoptosis regulation (Supplementary Fig. S11A). The ctHVGs for endothelial cells are enriched in BPs such as angiogenesis, migration, and vascular permeability regulation, contributing to blood vessel formation and maintenance in the tumor microenvironment (Supplementary Fig. S11B). These results underscore the complexity of fibroblast–cancer and fibroblast–stromal cell interactions, highlighting how specific cancer clones and stromal cells, such as endothelial cells, modulate fibroblast gene expression and contribute to the dynamic tumor microenvironment.

To further investigate the regulatory mechanisms underlying fibroblast-cancer clone A/endothelial cell interactions, we construct ligand–target signaling networks to identify ligands secreted by cancer clone A and endothelial cells (sender cells) to their target genes (ctHVGs) in fibroblasts. This analysis reveals the upstream mechanisms regulating the identified ctHVGs. Ligand–target analysis reveals distinct strategies by cancer clone A and endothelial cells in shaping fibroblast gene regulation (Fig. 3H). Cancer clone A regulates fibroblast activity through pathways associated with ECM stabilization, immune modulation, and tumor growth, while endothelial cells regulate fibroblasts via pathways that enhance blood vessel formation, immune cell recruitment, and fibroblast activation. These findings highlight the complexity of regulation on fibroblasts within the tumor microenvironment and underscore the unique roles of cancer clone A and endothelial cells. Detailed results are provided in Supplementary Section 3.7 and illustrated in Supplementary Figs S27 and S28.

This gene-level analysis, combined with the study of cellular interactions, offers a detailed view of the molecular and cellular dynamics in PDAC tissue. And it demonstrates STged’s ability to decode complex gene expression patterns and uncovers key insights into the regulatory mechanisms within the tumor microenvironment.

STged reveals spatially distinct TSK subpopulations in human SCC SRT data

We apply STged to SCC SRT data to investigate the cellular composition and spatial organization of SCC tissues, building on the work by Ji et al. [35]. Using EnDecon [16] with tissue matched scRNA-seq data, we predict cell-type proportions within spots, encompassing 11 cell types, including four tumor cell types (e.g. TSK, KC-Basal, KC-Cyc, and KC-Diff) and seven nontumor cell types (Fig. 4A). We sum the proportions of tumor and nontumor cells individually, with the spatial distribution of the dominant cell types displaying distinct patterns (Fig. 4b). STged deconvolution, optimized with λ1 = 0.008 and λ2 = 0.151, generates gene expression profiles that enable analysis of spatial and functional heterogeneity in SCC tissues.

Figure 4.

Figure 4.

Analysis of the SCC SRT data. (A) Representation of cell-type composition using a spatial scatter pie chart, where each pie represents a spot in the SRT data, with cell-type proportion estimated by EnDecon [16]. (B) Visualization of the spatial distribution of dominant cell types based on tumor and nontumor classifications. Tumor cell proportions are calculated by summing the proportions of four tumor cell types (e.g. TSK, KC-Basal, KC-Cyc, and KC-Diff), while nontumor cell proportions are determined by summing the proportions of four to seven nontumor cell types. (C) Visualization of the spatial distribution of TSK subpopulations, with different colors indicating distinct subgroups and gray representing spots without TSKs. (D) Presentation of Venn diagrams to show the overlap of significantly detected L-R pairs across four different datasets. (E) Bubble plot displays inferred cell-type interaction pairs between different cell types using CellChat [30] based on STged-deconvoluted and raw SRT data. (F) Examination of CDH1 expression distribution between TSK and fibroblast cells. (G) Schematic illustration of CDH1 gene self-regulation dynamics between fibroblast-dominant and TSK-dominant spots derived from raw SRT data. (H) Left panel shows cell-type distribution and right panel displays the spatial distribution of TSK-associated ctHVGs with the highest PCC scores. (I) Schematic illustration to elucidate the biological interactions and communication between TSK and macrophage cells and fibroblast cells.

We focus on TSK cells, which are epithelial components linked to epithelial-to-mesenchymal transition (EMT), a process that promotes tumor progression and metastasis [35, 45, 46]. Using Seurat clustering at a resolution of 0.4, we identify two TSK subpopulations (Supplementary Fig. S12). Spatial mapping reveals that subcluster 0 predominantly resides in tumor cell-dominant regions (e.g. TS, KC-Basal, KC-Cyc, and KC-Diff), whereas subcluster 1 is localized in nontumor cell-dominant regions (Fig. 4C). Cell-type enrichment analysis using permutation testing (1000 permutations) statistically validates these spatial patterns, demonstrating significant enrichment of subcluster 0 in tumor regions and subcluster 1 in nontumor regions (Supplementary Fig. S13). Differential expression analysis identifies 475 and 363 overexpressed genes in subclusters 0 and 1, respectively. The mean expression levels of these overexpressed genes align with the spatial localization patterns of their respective enriched regions (Supplementary Fig. S14). GO enrichment analysis of these overexpressed genes reveals functional heterogeneity between the subclusters (Supplementary Fig. S15). Subcluster 0 is enriched in BPs such as epithelial cell migration, ECM organization, cell-substrate adhesion, and wound healing (Supplementary Fig. S15A). These processes reflect hallmark EMT features, including enhanced migration, ECM remodeling, and cell-matrix interactions, facilitating tumor invasion and microenvironment remodeling. In contrast, subcluster 1 is enriched in translation, ribosome biogenesis, and ribosomal RNA metabolic processes, indicating a focus on metabolic activity and protein synthesis (Supplementary Fig. S15B). Pathways regulating intrinsic apoptosis and p53-mediated signal transduction further suggest that subcluster 1 adapts to stress in nontumor regions, maintaining epithelial-like characteristics and contributing to local homeostasis. These findings reveal spatially distinct functional roles of TSK subpopulations. Subcluster 0 promotes EMT-related processes, including tumor remodeling and invasion, consistent with its localization in tumor-dominant regions. Subcluster 1 supports metabolic and regulatory functions, preserving epithelial traits and maintaining cellular homeostasis in nontumor regions. These results highlight the role of spatial heterogeneity in shaping TSK functions and their contributions to tumor progression.

STged reconstructs TSK-stroma communications in SCC SRT data

Building on our detailed characterization of TSK subpopulations, we extend our focus to understand how these cells communicate with stromal components, which play a pivotal role in modulating tumor behavior and therapeutic outcomes. Utilizing STged, we dissect the intricate L–R communication networks between TSKs and stromal cells, particularly fibroblasts and endothelial cells, essential for maintaining the structural and functional dynamics of SCC [47].

Using CellChat [30], we analyze cell-type-specific expression data refined by STged to explore the signaling pathways mediating interactions between TSKs and stromal cells. This enhanced analysis facilitates a comprehensive understanding of the complex signaling networks contributing to the tumor microenvironment and functional heterogeneity. To evaluate TSK-stroma communication, we analyze the performance of raw SRT data, which is annotated by the dominant cell types at each spot (Supplementary Fig. S16). Benchmarking against matched scRNA-seq data shows that STged identifies a higher count of correct L–R pairs (214 out of 376) compared with the raw SRT data (204 out of 568) (Fig. 4D and Supplementary Figs S17 and S18). Additionally, STged achieves a precision of 0.569 compared with 0.359, a recall of 0.586 versus 0.559, and an F1 score of 0.578 versus 0.434 for raw SRT data (Supplementary Table S3). These results suggest that STged more effectively deciphers L–R pairs, likely due to its precise extraction of cell-type-specific gene expression from mixed signals.

Furthermore, analysis of incorrect L–R pairs indicates that raw SRT data frequently produces more common L–R pairs, affecting both communication directions-whether TSK cells regulate the stroma or vice versa (Supplementary Fig. S18). Specifically, CDH1, an oncogene known to promote tumor invasiveness [48], acts as both a ligand and receptor between fibroblasts and TSK cells, facilitating a feedback loop where CDH1 from TSK acts on fibroblasts and reciprocally (Fig. 4E). Visualization of CDH1 expression in raw SRT data reveals overexpression in tumor cell-dominant regions (Supplementary Fig. S19) and a significant difference between TSK-dominated and fibroblast-dominated spots (Student’s t-test, P-value < 1e−16) (Fig. 4F). This result is corroborated by STged-deconvoluted and scRNA-seq data, which also indicate CDH1 overexpression in TSK cells (Fig. 4F). In fibroblast-dominated spots, a positive correlation is evident between CDH1 expression levels and the proportions of the four tumor cell types (dCor = 0.312). We categorize these spots into two groups: cluster 1, with a tumor cell proportion exceeding 0.1, and cluster 2, devoid of tumor cells. Using CellChat to assess intercellular communication in these clusters, we observe CDH1 self-regulation in cluster 1 but not in cluster 2 (Fig. 4G), suggesting that the presence of tumor cells in fibroblast-dominated spots can lead to misleading L–R pair interpretations. Overall, these results demonstrate that STged effectively captures cell-type-specific gene expression from raw SRT data, enabling more accurate CCC analysis.

After establishing the effectiveness of STged against raw SRT data, we further assess its performance relative to RCTD, which overlooks spatial correlations in gene expression deconvolution. Our comparison shows that STged identifies more L–R pairs (364 versus 331) and achieves higher precision (0.561 compared with RCTD’s 0.435), recall (0.586 compared with RCTD’s 0.356), and F1 score (0.578 compared with RCTD’s 0.392), as validated by matching-tissue scRNA-seq data (Fig. 4 d; Supplementary Table S3). The higher precision, recall, and F1 score further underscore STged’s superior capability in accurately elucidating CCCs relative to RCTD, emphasizing the importance of spatial data in gene expression analyses. These results indicate that STged effectively captures cell-type-specific gene expression and reveals intercellular communications obscured by mixed signals, providing key insights into the regulatory mechanisms of the SCC microenvironment.

STged unveils TSKs mediate chemokine signaling with macrophages in SCC tissue

Building on the insights gained from the analysis of TSK subpopulations and cell-type interactions mediated by L–R pairs within the SCC microenvironment, we now extend our exploration to gene-level dynamics. Given its demonstrated performance at the cellular scale, STged is now employed to gain deeper insights into how gene expression variations linked to the spatial distribution of cell types influence tumor behavior. The objective of this analysis is to ascertain the impact of neighboring cells on TSK cells, a cell type of particular importance within the tumor microenvironment.

We identify 5344 TSK-associated mHVGs, whose spatial expression patterns correlate with the proportions of neighboring cell types (Supplementary Fig. S20). To explore how neighboring cells influence TSKs, these mHVGs are classified based on neighboring cell types, forming distinct groups of ctHVGs (Supplementary Table S4). By combining STged deconvolution data with cell-type proportions across spatial spots, we calculate PCCs, revealing both positive and negative correlations between gene expression and neighboring cell-type abundances (Supplementary Fig. S21). The analysis focuses on macrophage- and fibroblast-specific ctHVGs due to their roles in immune regulation and tissue remodeling. The four ctHVGs with the highest PCC scores for each cell type exhibit distinct, cell-type-specific expression patterns (Fig. 4H).

GO enrichment analysis of macrophage-associated ctHVGs shows their involvement in cytokine-mediated signaling, response to biotic stimuli, defense against viral infections, and regulation of leukocyte proliferation (Supplementary Fig. S22A). These processes shape immune responses and contribute to an inflammatory microenvironment that supports tumor progression [49]. GO enrichment analysis of fibroblast-associated ctHVGs reveals their involvement in pathways related to protein turnover (proteasome-mediated protein catabolic process), RNA metabolism (RNA splicing), and intracellular trafficking (macroautophagy and Golgi vesicle transport) (Supplementary Fig. S22B). These functions are critical for ECM stabilization, stromal organization, and vascular remodeling in SCC [50]. Together, these analyses suggest that macrophages and fibroblasts contribute to distinct processes within the tumor microenvironment. Macrophages are primarily involved in immune regulation and inflammation, while fibroblasts are associated with structural and vascular support, shaping the SCC microenvironment.

To further investigate the regulatory mechanisms underlying these interactions, we perform ligand–target analysis using ctHVGs from macrophages and fibroblasts as the target gene set in TSK cells. This analysis identifies distinct pathways through which macrophages and fibroblasts regulate TSK activity (Fig. 4I). Macrophages primarily act through pathways involved in immune response, inflammation, and ECM remodeling, whereas fibroblasts regulate TSKs via ECM stabilization, angiogenesis, and stromal remodeling. Detailed results, including specific ligand–target interactions and associated pathways, are provided in Supplementary Section 3.7 and illustrated in Supplementary Figs S29 and S30.

This gene-level analysis, integrated with cellular interaction studies, enhances our understanding of the molecular and cellular dynamics in SCC tissue. These findings demonstrate the utility of STged in decoding complex gene expression patterns and uncovering the regulatory mechanisms driving tumor microenvironment adaptation and progression.

STged effectively captures the continuous expression program topology within spatial organizations in mouse kidney data

To elucidate the complex spatial heterogeneity of mouse kidney tissue, we aim to map the spatial distribution and expression profiles of key kidney cell types, particularly focusing on ProxTub and DistTub. These cell types are critical for kidney function: ProxTub cells are essential for reabsorbing vital substances, while DistTub cells regulate electrolyte balance and pH, as detailed in previous studies [51, 52]. Using the STged framework, optimized with λ1 = 0.017 and λ2 = 0.139, we deconvolve the lower resolution SRT data to reconstruct high-resolution gene expression profiles based on predicted cell-type proportions (Supplementary Fig. S23). We employ a module to analyze spatial topology heterogeneity and spatial correlations in gene expression programs using STged-deconvoluted data, enabling a detailed exploration of spatial organization in kidney tissues.

We first use Monocle [29] to infer the trajectories of cell types of interest and spatially map their pseudotimes onto spatial tissues. Consistent with previous studies [51–53], we observe that ProxTub cells exhibit a continuous spatial trajectory starting from the outer cortex to the inner region (Fig. 5A). Similarly, DistTub cells also show a continuous trajectory with a clear spatial pattern (Fig. 5B) [53]. These observations confirm that STged can effectively capture the topological arrangements of continuous expression programs within spatial organizations.

Figure 5.

Figure 5.

(AB) Pseudotime trajectory analysis and spatial mapping of pseudotime values in ProxTub and DistTub cells. Left panels display inferred cellular trajectories using Monocle [29], and right panels show the spatial distribution of these pseudotime values across kidney tissue. (C) Heatmap of gene modules in DistTub cells, derived from hierarchical clustering on the normalized WCor matrix, representing spatial co-expression patterns within each module. (D) Spatial distribution of representative gene expression from modules 1 and 2 in ProxTub cells, showing the localized expression patterns of these genes across the kidney tissue. (E–F) UMAP visualization of DistTub cells showing activity scores for gene module 1 (E) and gene module 2 (F). Each UMAP plot displays cell clustering by gene module activity scores, with corresponding spatial maps on the right indicating module 1 activity primarily in the medulla and module 2 activity in the cortex.

We introduce a downstream analysis module to investigate the spatial distribution of distinct gene expression programs across topographic regions. Using DistTub-specific gene expression data derived from STged, we compute a gene correlation matrix based on 156 HVGs identified with the scran package. Hierarchical clustering of this matrix identifies two distinct gene modules: gene module 1, comprising 70 genes, and gene module 2, comprising 86 genes (Fig. 5C). Spatial mapping of the activity scores for these two gene modules reveals that gene module 1 is predominantly active in the medulla (Fig. 5D), aligning with the anatomical localization of distal straight tubule (DST) cells, while gene module 2 shows higher activity in the cortex (Fig. 5E), consistent with distal convoluted tubule (DCT) cells [51, 52]. Genes in gene module 1 are enriched in pathways related to cell–matrix interaction and purine metabolism (Supplementary Fig. S24), reflecting the structural and metabolic functions of DST cells. Conversely, genes in gene module 2 are enriched in metabolic and catabolic pathways, corresponding to the ion transport and fluid balance roles of DCT cells. Additionally, the canonical marker gene Slc12a1 (WCor score: 0.753) for DST cells in gene module 1 and Pgam2 (WCor score: 0.525) for DCT cells in gene module 2 exhibit spatial expression patterns that align with their respective gene modules (Fig. 5F). These findings demonstrate that the identified gene modules capture biologically relevant processes and reflect the topological heterogeneity of transcriptional programs across kidney tissue. This analysis highlights STged’s capacity to resolve subtle transcriptional variations within the same cell type, offering valuable insights into kidney function and potential therapeutic targets.

Discussion

SRT offers an approach to map gene expression within its spatial context, addressing a limitation of scRNA-seq, which loses spatial information due to cell dissociation [12, 24]. However, existing computational methods for gene expression deconvolution are primarily designed for bulk RNA-seq data and do not adequately account for the inherent spatial structures in SRT data, where neighboring spots often exhibit correlated gene expression patterns [20–22]. Additionally, these methods typically lack the capability to link gene expression to specific cell types in a spatially resolved manner. To address these issues, we develop STged, a unified framework that integrates SRT data with scRNA-seq-derived cell-type-specific expression profiles, enabling the spatially resolved characterization of cell types and their molecular functions.

STged utilizes intrinsic spatial correlations and scRNA-seq-based cell-type information to improve the deconvolution of low-resolution SRT data. Instead of using a Gaussian kernel and adjusting its scale parameter, we construct a binary spatial neighbor graph from spot location data, connecting each spot to up to eight nearest neighbors (using a hexagonal grid for 10x Visium and a square grid for ST data). This resolution-flexible and scalable approach captures spatial dependencies without the need for kernel-based weighting or scale parameter adjustments, enhancing the model’s applicability across diverse SRT platforms. A gene signature matrix derived from scRNA-seq data ensures cell-type specificity. Validations through simulations and applications to three real datasets demonstrate STged’s functionality. For example, in the PDAC dataset, STged identifies spatial dependencies of fibroblasts associated with distinct cancer clones. In the SCC dataset, it reconstructs tumor-stroma communication mediated by TSKs. When applied to mouse kidney tissue, STged identifies spatial heterogeneity and distinct gene expression programs across kidney zones, providing information on spatially organized tissue functions. Additionally, STged processes low-resolution SRT data from both ST and 10x Visium within minutes (Supplementary Section 3.8). We evaluate the performance of STged for signaling genes (ligands and receptors) and nonsignaling genes using simulated data (Supplementary Section 3.9 and Supplementary Fig. S31).

STged dissects cell-type-specific gene expression within spots by integrating scRNA-seq data. Researchers can use cell label information from scRNA-seq data to map labeled cells to spots in SRT data, infer cell-type compositions within the tissue, and extract cell-type-specific gene expression within spots. This mapping can be facilitated by methods such as Seurat [54] and Tangram [55]. After mapping, the gene expression data for specific cell types is computed using a linear weighted sum approach. This involves aggregating the gene expression data of cells that share the same label in a spot with probability weights. However, directly mapping the single cells in scRNA-seq data to the SRT data may lead to unknown bias and inaccurate gene expression decomposition for the SRT data, because the samples/cells in scRNA-seq data and the SRT data are usually different. STged incorporates the information of reference scRNA-seq data through a prior regularization term, which is more flexible and adaptive to the data distribution observed in SRT data.

STged relies on a cell-type proportion matrix derived from a cell-type deconvolution method. In this study, we select EnDecon [16] for its accuracy and consistency. EnDecon’s weighted ensemble approach offers flexibility across various technical conditions, making it well-suited for different SRT datasets. To investigate how different cell-type deconvolution methods influence the results of STged, we also run STged with cell-type proportion matrices generated by several other methods, including Cell2location [13], DWLS [56], RCTD [10], and SONAR [57]. We then compare the results obtained using these methods with those produced by EnDecon. The results show that STged performs consistently well regardless of the deconvolution method used, with EnDecon consistently providing the most accurate estimates (Supplementary Section 3.10 and Supplementary Fig. S32).

STged incorporates two regularization terms: graph-based regularization with tuning parameter λ1 to enforce spatial smoothness, and gene signature regularization with tuning parameter λ2 to align gene expressions with biological signatures derived from scRNA-seq data. The selection of λ1 and λ2 follows the criteria outlined in Supplementary Section 3.2. Specifically, λ1 controls the balance between spatial coherence and accurate gene expression reconstruction by modulating graph regularization, ensuring that neighboring spots exhibit consistent gene expression patterns without excessive smoothing. Meanwhile, λ2 ensures that gene expressions align with biological patterns derived from scRNA-seq data. Sensitivity analyses demonstrate that optimal tuning of these parameters is crucial for maintaining the balance between smoothing gene expression profiles and preserving biological variability, thus preventing both over-smoothing and under-smoothing (Supplementary Section 3.11 and Supplementary Figs S33 and S34). Ablation studies confirm that both regularization terms are essential: gene signature regularization enhances biological accuracy, while graph-based regularization preserves spatial consistency (Supplementary Section 3.12, Supplementary Figs S35 and S36, and Supplementary Tables S5 and S6).

STged employs elastic net regression to identify mHVGs/ctHVGs based on neighboring cell-type proportions. By applying elastic net regularization, the method addresses multicollinearity arising from correlated cell-type proportions. Elastic net combines the advantages of Ridge and Lasso penalties: the Ridge component shrinks coefficients of correlated predictors toward one another, ensuring that multiple highly correlated cell types share their influence more evenly rather than allowing any single predictor to dominate. Simultaneously, the Lasso component induces sparsity by setting the coefficients of less influential predictors to zero. This data-driven selection ensures that only cell types significantly contributing to gene expression variability are retained. Together, these penalties produce a stable model with interpretable coefficients, suitable for biological data with correlated features, and address two key objectives: (i) identifying genes influenced by the spatial microenvironment and (ii) pinpointing key neighboring cell types that modulate gene expression. The current mHVGs/ctHVGs module focuses on capturing the influence of the spatial microenvironment on gene expression. Future extensions could integrate gene expression profiles of neighboring cell types to provide deeper insights into cell–cell interactions. This would enable a more comprehensive assessment of both structural and functional contributions, including the roles of signaling molecules and ligands in mediating intercellular communication [30, 58]. However, such extensions present challenges, including increased model complexity and the need for high-quality SRT data that capture both cell-type proportions and gene expression across neighboring cells. Future work will address these challenges to enhance the biological relevance and interpretability of the model.

STged has several areas for improvement. One limitation is the uniform treatment of neighbors in the binary spatial neighbor graph, which assumes equal interaction strength. This assumption may not fully capture biological heterogeneity or spatial variability, especially in datasets with uneven spot distributions or irregular resolutions. To address this, future work could explore adaptive graph construction strategies that adjust the number of neighbors or incorporate weighted edges based on spatial proximity or biological relevance. Additionally, we provide an option allowing users to directly supply their own custom graphs, offering more flexibility in graph construction. These modifications would improve the model’s sensitivity and applicability across different SRT contexts, allowing for a more accurate representation of complex tissue architectures. Additionally, while STged effectively models normalized SRT data, extending it to directly handle raw count data using over-dispersed Poisson [10, 11, 57] or negative binomial models [15, 59] could capture more detailed gene expression dynamics. Integrating matched histology images could provide supplementary spatial information, further enhancing gene expression deconvolution [60]. These enhancements would improve the biological relevance and interpretability of STged, supporting more comprehensive ST studies. Overall, we expect that STged will be widely used in ST studies, contributing to a deeper understanding of fundamental BPs and human diseases within their spatial contexts.

Supplementary Material

gkaf087_Supplemental_File

Acknowledgements

Author contributions: Z.X.L., X.F.Z. and H.Y. conceived the project and provided funding support. J.J.T., Z.X.L. and X.F.Z. developed the method. J.J.T. Z.X.L. and X.F.Z. designed the experiments. J.J.T. implemented the software and performed the data analyses. J.J.T., Z.X.L. and X.F.Z. wrote the manuscript. H.Y. revised the manuscript. All authors read and approved the final manuscript.

Contributor Information

Jia-Juan Tu, School of Science, Hubei University of Technology, Wuhan 430079, China; Department of Statistics, The Chinese University of Hong Kong, Hong Kong 999077, China.

Hong Yan, Centre for Intelligent Multidimensional Data Analysis, Hong Kong 999077, China; Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China.

Xiao-Fei Zhang, School of Mathematics and Statistics, and Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan 430079, China; Key Laboratory of Nonlinear Analysis & Applications (Ministry of Education), Central China Normal University, Wuhan 430079, China.

Zhixiang Lin, Department of Statistics, The Chinese University of Hong Kong, Hong Kong 999077, China.

Supplementary data

Supplementary data is available at NAR online.

Conflict of interest

The authors decare that they have no conflict of interest.

Funding

This work is supported by the National Natural Science Foundation of China (12271198 and 11871026), the self-determined research funds of Central China Normal University(CCNU) from the colleges’ basic research and operation of MOE (CCNU24AI001 and CCNU24JC004), The Chinese University of Hong Kong startup grant (4930181), The Chinese University of Hong Kong Science Faculty’s Collaborative Research Impact Matching Scheme (CRIMS 4620033), The Chinese University of Hong Kong direct grants (4053540 and 4053586), and Research Grants Council, University Grants Committee (GRF 14301120 and 14300923). This work is also supported by Innovation and Technology Commission - Hong Kong (ITC) to the State Key Laboratory of Agrobiotechnology (CUHK) and to InnoHK Center CIMDA. Any opinions, findings, conclusions, or recommendations expressed in this publication do not reflect the views of ITC. Funding to pay the Open Access publication charges for this article was provided by research grants.

Data availability

The datasets analyzed in this study are publicly available from the GEO repository under the following accession numbers: GSE111672, GSE144240, and GSE129798. SRT data of mouse kidney coronal section generated are downloaded from the website: (https://www.10xgenomics.com/cn/resources/datasets/mouse-kidney-section-coronal-1-standard-1-1-0).

Code availability

A user-friendly R package was also developed to implement the procedure (https://github.com/TJJjiajuan/STged and https://zenodo.org/records/14379873).

References

  • 1. Burgess  DJ  Spatial transcriptomics coming of age. Nat Rev Genet. 2019; 20:317. 10.1038/s41576-019-0129-z. [DOI] [PubMed] [Google Scholar]
  • 2. Moses  L, Pachter  L  Museum of spatial transcriptomics. Nat Methods. 2022; 19:534–46. 10.1038/s41592-022-01409-2. [DOI] [PubMed] [Google Scholar]
  • 3. Chen  KH, Boettiger  AN, Moffitt  JR  et al.  Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015; 348:aaa6090. 10.1126/science.aaa6090. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Eng  CHL, Lawson  M, Zhu  Q  et al.  Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature. 2019; 568:235–9. 10.1038/s41586-019-1049-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Ståhl  PL, Salmén  F, Vickovic  S  et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016; 353:78–82. 10.1126/science.aaf2403. [DOI] [PubMed] [Google Scholar]
  • 6. Moffitt  JR, Bambah-Mukku  D, Eichhorn  SW  et al.  Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science. 2018; 362:eaau5324. 10.1126/science.aau5324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Zhang  M, Eichhorn  SW, Zingg  B  et al.  Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature. 2021; 598:137–43. 10.1038/s41586-021-03705-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Ma  Y, Zhou  X  Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol. 2022; 40:1349–59. 10.1038/s41587-022-01273-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Dong  R, Yuan  GC  SpatialDWLS: accurate deconvolution of spatial transcriptomic data. Genome Biol. 2021; 22:145. 10.1186/s13059-021-02362-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Cable  DM, Murray  E, Zou  LS  et al.  Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. 2022; 40:517–26. 10.1038/s41587-021-00830-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Cable  DM, Murray  E, Shanmugam  V  et al.  Cell type-specific inference of differential expression in spatial transcriptomics. Nat Methods. 2022; 19:1076–87. 10.1038/s41592-022-01575-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Abdelaal  T, Mourragui  S, Mahfouz  A  et al.  SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res. 2020; 48:e107. 10.1093/nar/gkaa740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Kleshchevnikov  V, Shmatko  A, Dann  E  et al.  Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 2022; 40:661–71. 10.1038/s41587-021-01139-4. [DOI] [PubMed] [Google Scholar]
  • 14. Elosua-Bayes  M, Nieto  P, Mereu  E  et al.  SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 2021; 49:e50. 10.1093/nar/gkab043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Andersson  A, Bergenstråhle  J, Asp  M  et al.  Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun Biol. 2020; 3:565. 10.1038/s42003-020-01247-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Tu  JJ, Li  HS, Yan  H  et al.  EnDecon: cell type deconvolution of spatially resolved transcriptomics data via ensemble learning. Bioinformatics. 2023; 39:btac825. 10.1093/bioinformatics/btac825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Jovic  D, Liang  X, Zeng  H  et al.  Single-cell RNA sequencing technologies and applications: a brief overview. Clin Transl Med. 2022; 12:e694. 10.1002/ctm2.694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Altschuler  SJ, Wu  LF  Cellular heterogeneity: do differences make a difference?. Cell. 2010; 141:559–63. 10.1016/j.cell.2010.04.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Venet  D, Pecasse  F, Maenhaut  C  et al.  Separation of samples into their constituents using gene expression data. Bioinformatics. 2001; 17:S279–87. 10.1093/bioinformatics/17.suppl_1.S279. [DOI] [PubMed] [Google Scholar]
  • 20. Shen-Orr  SS, Tibshirani  R, Khatri  P  et al.  Cell type-specific gene expression differences in complex tissues. Nat Commun. 2010; 7:287–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Zhao  Y, Simon  R  Gene expression deconvolution in clinical samples. Genome Med. 2010; 2:93. 10.1186/gm214. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Marquez-Galera  A, de  la Prida LM, Lopez-Atalaya  JP  A protocol to extract cell-type-specific signatures from differentially expressed genes in bulk-tissue RNA-seq. STAR Protoc. 2022; 3:101121. 10.1016/j.xpro.2022.101121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Hu  J, Li  X, Coleman  K  et al.  SpaGCN: integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021; 18:1342–51. 10.1038/s41592-021-01255-8. [DOI] [PubMed] [Google Scholar]
  • 24. Miller  BF, Bambah-Mukku  D, Dulac  C  et al.  Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res. 2021; 31:1843–55. 10.1101/gr.271288.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Dries  R, Zhu  Q, Dong  R  et al.  Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 2021; 22:78. 10.1186/s13059-021-02286-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Li  H, Zhou  J, Li  Z  et al.  A comprehensive benchmarking with practical guidelines for cellular deconvolution of spatial transcriptomics. Nat Commun. 2023; 14:1548. 10.1038/s41467-023-37168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Palla  G, Spitzer  H, Klein  M  et al.  Squidpy: a scalable framework for spatial omics analysis. Nat Methods. 2022; 19:171–8. 10.1038/s41592-021-01358-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Butler  A, Hoffman  P, Smibert  P  et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018; 36:411–20. 10.1038/nbt.4096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Trapnell  C, Cacchiarelli  D, Grimsby  J  et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014; 32:381–6. 10.1038/nbt.2859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Jin  S, Guerrero-Juarez  CF, Zhang  L  et al.  Inference and analysis of cell–cell communication using CellChat. Nat Commun. 2021; 12:1088. 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Browaeys  R, Saelens  W, Saeys  Y  NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods. 2020; 17:159–62. 10.1038/s41592-019-0667-5. [DOI] [PubMed] [Google Scholar]
  • 32. Yu  G, Wang  LG, Han  Y  et al.  clusterProfiler: an R package for comparing biological themes among gene clusters. Omics. 2012; 16:284–7. 10.1089/omi.2011.0118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Benjamini  Y, Hochberg  Y  Controlling the false discovery rate: a practical and powerful approach to multiple testing. JR Stat Soc Ser B. 1995; 57:289–300. 10.1111/j.2517-6161.1995.tb02031.x. [DOI] [Google Scholar]
  • 34. Moncada  R, Barkley  D, Wagner  F  et al.  Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol. 2020; 38:333–42. 10.1038/s41587-019-0392-8. [DOI] [PubMed] [Google Scholar]
  • 35. Ji  AL, Rubin  AJ, Thrane  K  et al.  Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell. 2020; 182:497–514. 10.1016/j.cell.2020.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Ransick  A, Lindström  NO, Liu  J  et al.  Single-cell profiling reveals sex, lineage, and regional diversity in the mouse kidney. Dev Cell. 2019; 51:399–413. 10.1016/j.devcel.2019.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Zeisel  A, Muñoz-Manchado  AB, Codeluppi  S  et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015; 47:1138–42. 10.1126/science.aaa1934. [DOI] [PubMed] [Google Scholar]
  • 38. Rahmani  E, Schweiger  R, Rhead  B  et al.  Cell-type-specific resolution epigenetics without the need for cell sorting or single-cell biology. Nat Commun. 2019; 10:3417. 10.1038/s41467-019-11052-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Wang  W, Zhou  X, Wang  J  et al.  Approximate estimation of cell-type resolution transcriptome in bulk tissue through matrix completion. Brief Bioinform. 2023; 24:bbad273. 10.1093/bib/bbad273. [DOI] [PubMed] [Google Scholar]
  • 40. Kalluri  R  The biology and function of fibroblasts in cancer. Nat Rev Cancer. 2016; 16:582–98. 10.1038/nrc.2016.73. [DOI] [PubMed] [Google Scholar]
  • 41. Woodfin  A, Voisin  MB, Nourshargh  S  PECAM-1: a multi-functional molecule in inflammation and vascular biology. Arterioscler Thromb Vasc Biol. 2007; 27:2514–23. 10.1161/ATVBAHA.107.151456. [DOI] [PubMed] [Google Scholar]
  • 42. Tsuchiya  T, Hori  H, Ozaki  H  CCPLS reveals cell-type-specific spatial dependence of transcriptomes in single cells. Bioinformatics. 2022; 38:4868–77. 10.1093/bioinformatics/btac599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Mason  K, Sathe  A, Hess  PR  et al.  Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell–cell interactions. Genome Biol. 2024; 25:14. 10.1186/s13059-023-03159-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Székely  GJ, Rizzo  ML, Bakirov  NK  Measuring and testing dependence by correlation of distances. Ann Statist. 2007; 35:2769–94. 10.1214/009053607000000505. [DOI] [Google Scholar]
  • 45. Mandal  M, Ghosh  B, Anura  A  et al.  Modeling continuum of epithelial–mesenchymal transition plasticity. Integr Biol. 2016; 8:167–76. 10.1039/C5IB00219B. [DOI] [PubMed] [Google Scholar]
  • 46. Cloutier  G, Sallenbach-Morrissette  A, Beaulieu  JF  Non-integrin laminin receptors in epithelia. Tissue Cell. 2019; 56:71–8. 10.1016/j.tice.2018.12.005. [DOI] [PubMed] [Google Scholar]
  • 47. Shao  X, Li  C, Yang  H  et al.  Knowledge-graph-based cell–cell communication inference for spatially resolved transcriptomic data with SpaTalk. Nat Commun. 2022; 13:4429. 10.1038/s41467-022-32111-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Gall  TMH, Frampton  AE  Gene of the month: E-cadherin (CDH1). J Clin Pathol. 2013; 66:928–32. 10.1136/jclinpath-2013-201768. [DOI] [PubMed] [Google Scholar]
  • 49. Ghebremedhin  A, Athavale  D, Zhang  Y  et al.  Tumor-associated macrophages as major immunosuppressive cells in the tumor microenvironment. Cancers. 2024; 16:3410. 10.3390/cancers16193410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Kalluri  R  The biology and function of fibroblasts in cancer. Nat Rev Cancer. 2016; 16:582–98. 10.1038/nrc.2016.73. [DOI] [PubMed] [Google Scholar]
  • 51. Lake  BB, Chen  S, Hoshi  M  et al.  A single-nucleus RNA-sequencing pipeline to decipher the molecular anatomy and pathophysiology of human kidneys. Nat Commun. 2019; 10:2832. 10.1038/s41467-019-10861-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Janosevic  D, Myslinski  J, McCarthy  TW  et al.  The orchestrated cellular and molecular responses of the kidney to endotoxin define a precise sepsis timeline. Elife. 2021; 10:e62270. 10.7554/eLife.62270. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Wei  R, He  S, Bai  S  et al.  Spatial charting of single-cell transcriptomes in tissues. Nat Biotechnol. 2022; 40:1190–9. 10.1038/s41587-022-01233-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Hao  Y, Stuart  T, Kowalski  MH  et al.  Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol. 2024; 42:293–304. 10.1038/s41587-023-01767-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Biancalani  T, Scalia  G, Buffoni  L  et al.  Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021; 18:1352–62. 10.1038/s41592-021-01264-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Tsoucas  D, Dong  R, Chen  H  et al.  Accurate estimation of cell-type composition from gene expression data. Nat Commun. 2019; 10:2975. 10.1038/s41467-019-10802-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Liu  Z, Wu  D, Zhai  W  et al.  SONAR enables cell type deconvolution with spatially weighted Poisson-Gamma model for spatial transcriptomics. Nat Commun. 2023; 14:4727. 10.1038/s41467-023-40458-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Efremova  M, Vento-Tormo  M, Teichmann  SA  et al.  Inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes with CellPhoneDB. Nat Protoc. 2020; 15:1484–506. 10.1038/s41596-020-0292-x. [DOI] [PubMed] [Google Scholar]
  • 59. Zhao  P, Zhu  J, Ma  Y  et al.  Modeling zero inflation is not necessary for spatial transcriptomics. Genome Biol.  2022; 23:118. 10.1186/s13059-022-02684-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Wan  X, Xiao  J, Tam  SST  et al.  Integrating spatial and single-cell transcriptomics data using deep generative models with SpatialScope. Nat Commun. 2023; 14:7848. 10.1038/s41467-023-43629-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkaf087_Supplemental_File

Data Availability Statement

The datasets analyzed in this study are publicly available from the GEO repository under the following accession numbers: GSE111672, GSE144240, and GSE129798. SRT data of mouse kidney coronal section generated are downloaded from the website: (https://www.10xgenomics.com/cn/resources/datasets/mouse-kidney-section-coronal-1-standard-1-1-0).


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES