Abstract
Spatial transcriptomics has enabled the study of mRNA distributions within cells, a key aspect of cellular function. However, there is a dearth of tools that can identify and interpret functionally relevant spatial patterns of subcellular transcript distribution. To address this, we present CellSP, a computational framework for identifying, visualizing, and characterizing consistent subcellular spatial patterns of mRNA. CellSP introduces the concept of “gene-cell modules”, which are gene sets with coordinated subcellular transcript distributions in many cells. It provides intuitive visualizations of the captured patterns and offers functional insights into each discovered module. We demonstrate that CellSP reliably identifies functionally significant modules across diverse tissues and technologies. We use the tool to discover subcellular spatial phenomena related to myelination, axonogenesis, and synapse formation in the mouse brain. We find immune response-related modules that change between kidney cancer and healthy samples, and myelination-related modules specific to mouse models of Alzheimer’s Disease.
Subject terms: Statistical methods, Machine learning
CellSP is a computational framework for identifying and visualizing subcellular spatial patterns of mRNA. It uncovers gene-cell modules that capture consistent transcript distributions linked to key biological processes in various tissues.
Introduction
Spatial transcriptomics (ST) technologies and analytical tools available today offer unprecedented views of gene expression patterns within the intricate tapestry of tissues1–3. At the cutting edge of this technology are single-molecule resolution assays4–6, which provide a detailed view of transcript distributions within individual cells, promising to transform our understanding of mRNA localization and its relationship with cellular functions. This relationship has been documented in the literature through anecdotal examples showing that RNA localization to specific subcellular regions may underlie efficient spatial organization of cellular processes7, rapid local translation in response to external stimuli8, maintenance of cell polarity9,10 facilitation of cell migration11, coordination of developmental patterning12, etc. Thus, there is growing recognition of the need to expand our understanding of subcellular RNA localization13, and the time is ripe for this pursuit, supported by the state-of-the-art subcellular spatial transcriptomics tools.
While experimental techniques have advanced rapidly – enabling, for instance, the mapping of individual transcripts of hundreds to thousands of genes in thousands of cells14–16 – analytical methods have lagged behind. Available ST-related tools are mostly designed for analysis at the single-cell resolution17. More recently, methods targeting subcellular phenomena have emerged13,18–20 that can highlight subcellular localization patterns involving single genes or gene pairs, in individual cells. For example, a gene may be annotated as having transcripts localized to cell edges by the BENTO tool13 or having a “radial” distribution in a cell by the SPRAWL software19. Similarly, a gene pair may be found to be significantly colocalized by the InSTAnT toolkit18, in an individual cell or across many cells. However, when applied to a typical subcellular ST dataset, these tools identify tens of thousands of statistically significant subcellular spatial patterns involving individual genes or gene pairs in individual cells. This overwhelming volume of results imposes a significant interpretive burden, complicating the transition from statistical findings to meaningful and actionable biological insights. Moreover, existing methods are fundamentally limited in scope: they operate at the level of single genes or gene pairs, lacking the ability to capture broader patterns of spatial organization of mRNA. The goal of this work is to bridge this gap with an analytical method that identifies the most salient subcellular spatial patterns in an ST data set, along with biological interpretations of those patterns.
An important lesson from decades of transcriptomics data analysis is the concept of gene “module” – a set of genes that share some common pattern of expression. Systematic identification of gene modules21 followed by statistical enrichment tests22 is a popular approach to distil a large number of statistical observations into a more compact set of systems-level insights. Inspired by this paradigm, we sought a method to define gene modules with shared subcellular spatial patterns. One simple strategy is to determine if a gene exhibits a specific subcellular localization (e.g., “cell edge” according to BENTO) in many cells, and group together all genes exhibiting the same frequent localization into a module. This strategy is overly permissive, since two genes annotated with the same frequent localization pattern may not exhibit that common pattern in the same cells. We therefore conceptualized a “gene-cell” module as a set of genes that exhibit the same subcellular pattern in the same cells. However, requiring all genes of a module to exhibit the same subcellular pattern in all module cells (as in ref. 18) is overly restrictive, and will result in highly fragmented modules. To address this, we introduce a biclustering approach that searches for genes co-exhibiting the same subcellular pattern in a substantial subset of module cells rather than requiring their presence in all module cells, enabling the discovery of more meaningful and coherent gene-cell modules (see Supplementary Fig. 1 for more details). This algorithm forms the core of the software presented here, called “CellSP”.
CellSP analyzes single-molecule resolution ST data, identifies significant subcellular spatial distribution patterns, and distills them into a compact list of gene-cell modules that typically comprise tens of genes and hundreds of cells. It provides specialized techniques for visualizing such modules. It uses gene set enrichment analysis to describe the genes comprising the module. Additionally, it uses machine learning to distinguish module-associated from other cells in the tissue based on their transcriptomic profiles, identifying genes and biological properties that characterize module cells.
We used CellSP to analyze ST data sets from four different studies generated using two different technologies. We present the results of these analyses and provide guidance on interpreting the discovered modules. We identify modules associated with myelination in mouse brain tissues across various technologies and find these modules to exhibit differences linked to Alzheimer’s disease. The reported differences are not necessarily detectable through traditional gene expression analyses. We also find modules related to cell-cell adhesion and axonogenesis, and a closer examination of these modules sheds light on interactions among neighboring cells. Use of CellSP on a cancer data set reveals immune-response related modules that are specific to cancer versus healthy tissue. In summary, CellSP establishes a robust framework for the systematic detection, visualization, and comparison of subcellular transcriptomic patterns, coupled with a statistical characterization of their associated biological functions.
Results
Overview of CellSP
CellSP is a tool for analysis of spatial transcriptomics (ST) data at single-molecule resolution (Fig. 1a). It identifies “gene-cell modules”, where a module is defined as a set of genes whose transcripts exhibit a specific subcellular spatial distribution pattern in a set of cells. (We use the terms “gene-cell module” and “module” interchangeably.) Modules are recovered based on statistical criteria and represent the occurrence of interesting, persistent subcellular spatial phenomena in the tissue. There are three main steps involved:
Fig. 1. Schematic of CellSP.
a CellSP analyzes single-molecule resolution spatial transcriptomics data from a tissue; such data provides subcellular locations of individual transcripts of a set of genes within each delineated cell. b Subcellular Pattern Discovery. CellSP utilizes existing tools InSTAnT and SPRAWL for the discovery of subcellular spatial patterns of genes within each cell. Types of patterns detected include gene-gene colocalization reported by InSTAnT and four types of subcellular localization preferences (peripheral, punctate, central, radial) reported by SPRAWL. c Module Discovery. The next step identifies spatial patterns involving multiple genes and cells. Patterns identified in the previous step (represented as matrices of cells x genes or gene-pairs) are subjected to a biclustering analysis that identifies a subset of rows and columns with a large average value, representing a “gene-cell module”: a set of genes or gene-pairs exhibiting the same subcellular pattern within the same cells. Gene-cell modules that overlap in a significant number of genes (or gene pairs) are coalesced to reduce redundancy. This step outputs a set of gene-cell modules with a varying number of cells and genes. d CellSP provides intuitive visualizations of gene-cell modules, which often span tens to hundreds of cells, making direct inspection impractical. CellSP visualizations are tailored to the type of pattern defining the module and summarize the strength of subcellular spatial patterns involving module-associated genes and cells while contrasting them with background genes and cells. e Module Characterization. Module-associated genes are subjected to Gene Ontology (GO) analysis to gain functional insights into detected gene-cell modules. Alternatively, a Random Forest classifier is trained to differentiate module cells from non-module cells, and the most predictive genes, called cell-marker genes, are subjected to GO analysis. To address the limited size of gene panels in subcellular spatial transcriptomic datasets, CellSP employs Tangram to impute gene expression from single-cell RNA-seq datasets of the same tissue, enabling more comprehensive analysis. Results of GO analysis are visualized using REVIGO plots. The Gene Ontology logo is reproduced from the Gene Ontology Consortium (CC BY 4.0).
Step 1 – Subcellular pattern discovery
Here, the statistical tools SPRAWL19 and InSTAnT18 are used to rigorously identify subcellular spatial patterns involving individual genes (SPRAWL) or gene pairs (InSTAnT), in each cell (Fig. 1b). SPRAWL identifies four types of subcellular patterns – peripheral, radial, punctate and central – describing the distribution of a gene’s transcripts within the cell, while InSTAnT tests if transcripts of a gene pair tend to be proximal to each other more often than expected by chance. Patterns of each type are stored in a matrix (Fig. 1c, left) whose entries indicate if a gene or gene pair shows that pattern in a cell. Five such matrices are produced, one for each of the four spatial patterns from SPRAWL analysis and one from InSTAnT analysis.
Step 2 – Module discovery
Next, a statistical tool called LAS (Large Average Submatrices)23 is used to analyze each pattern annotation matrix and identify “biclusters”, i.e., a subset of rows and columns with a large average value (Fig. 1c, middle). Each bicluster represents a set of genes or gene pairs that exhibit the same type of subcellular pattern in the same set of cells, with statistical significance estimated by a Bonferroni-based score (Methods). Biclusters identified in this manner may overlap in terms of rows (cells) and columns (genes/gene pairs), yielding many redundant modules. To address this, CellSP deploys an iterative module-coalescing process where pairs of modules comprising similar sets of cells and genes are combined into larger modules (Methods, Fig. 1c, right). The module discovery step utilizes a parallel computing implementation to accelerate runtime and facilitate the analysis of large datasets.
Step 3 – Module characterization
To aid biological interpretation, CellSP reports shared properties of the genes and cells of each discovered module (Fig. 1e). Genes are characterized using Gene Ontology (GO) enrichment tests, while cells are characterized by their cell type composition if such information is available. Additionally, CellSP trains a machine learning classifier to discriminate module cells from all other cells, using the expression levels of all genes other than the module genes. (Here, it uses a reference scRNA-seq data set, if available, to impute levels of genes missing in the ST data set; see Methods.) Genes that are highly predictive in this task are then subjected to GO enrichment tests, furnishing hypotheses about biological processes and pathways that are active specifically in the module cells.
User experience
The CellSP tool ingests a single-molecule resolution ST data set with cell delineations (Methods) and returns a listing of gene-cell modules identified for all five pattern types (peripheral, radial, punctate, central, colocalization). The detailed report of each module includes the type of subcellular pattern, identities of genes, number of cells, GO terms characterizing the module genes, cell type composition, and GO terms associated with module cells. The latter two are provided depending on the availability of cell type annotations and reference scRNA-seq data (Methods), both of which are optional.
Visualization
CellSP introduces new techniques for intuitive visualization of subcellular phenomena represented by modules. Here, we outline these visualization methods in general terms; specific examples are provided in subsequent sections. For modules defined by gene pair colocalization, it creates a heatmap showing that colocalization frequencies are higher for module genes versus other genes, and in module cells versus other cells (Fig. 1d, right). Modules of “central” or “peripheral” pattern are depicted using a density plot of transcript distribution at varying radial locations in an idealized circular cell, averaged over all module cells (Fig. 1d, middle). Subcellular patterns of types “radial” and “punctate” are depicted using a density plot binned by sectors of an idealized circular cell, aggregated over all module cells (Fig. 1d, left). Additional visual aids provided in the report include sample cells illustrating the subcellular phenomenon directly, “REVIGO”24 plots of GO term enrichments, and Uniform Manifold Approximation and Projection (UMAP)25 as well as spatial plots of module cells.
CellSP reveals myelination- and axonogenesis-related subcellular spatial phenomena in preoptic area of mouse hypothalamus
In its first demonstrative application, we used CellSP to analyze a MERFISH data set comprising 5,149 cells in the hypothalamic preoptic area (hPOA) of a mouse brain26, identifying 38 modules spanning the five pattern types and each module comprising 9 genes and 138 cells on average (Supplementary Fig. 2a, Supplementary Data 1). The modules are distinct from each other, with only a small degree of overlap (Supplementary Figs. 2b, 2c). Furthermore, we observed no correlation between a gene’s average expression level and its frequency of inclusion in modules (Supplementary Fig. 3). The modules comprise genes enriched in a range of biological processes and cellular components, especially cytoskeleton and cellular structure regulation, neuronal function and development, and metabolic processes (Supplementary Data 1). Note that the underlying subcellular pattern discovery tool InSTAnT reports 358 significant gene pairs (p-value < 1e−3) that colocalize in 35 cells on average, while SPRAWL reports 134 genes having a significant pattern (score threshold = 0.5) in 351 cells on average. The resulting compendium of significant statistical findings reported by these tools, tens of thousands in number, creates a substantial interpretive burden. CellSP effectively distills these patterns into a manageable set of modules, streamlining the crucial step of biological interpretation.
We next illustrate the format in which CellSP reports its discovered modules (Supplementary Fig. 4) and how specific biological insights can be gleaned from these. The module “M0_I” consists of the four genes Sgk1 (serum/glucocorticoid regulated kinase 1), Ttyh2 (tweety family member 2), Ermn (ermin, ERM-like protein) and Ndrg1 (N-myc downstream regulated 1), and a set of 332 cells (Fig. 2a). It was identified using InSTAnT in the pattern discovery step, which means that these genes exhibit statistically significant colocalization in the module cells, a selection of which are shown in Fig. 2b. Notably, the colocalization is observed in varying subcellular regions such as the nucleus, nuclear periphery, cytoplasm and cell membrane. (See Supplementary Fig. 2e for more examples.) Fig. 2c illustrates how CellSP helps visualize a colocalization module, highlighting the specificity of the phenomenon to the module’s genes and cells. The module cells are mostly mature oligodendrocytes (67%, Fig. 2d) and the module genes are enriched for GO terms such as the molecular function “cytoskeletal protein binding” (p-value = 0.001), the cellular component “myelin sheath” (p-value 0.004) and the biological process “supramolecular fiber organization” (p-value = 0.02) (Fig. 2e). (Note that cytoskeletal protein binding and supramolecular fiber organization play important roles in formation and maintenance of myelin sheath27,28.) Moreover, according to CellSP’s classification-based characterization, the module cells are marked by genes enriched for “ensheathment of neurons” (p-value = E−12) (Fig. 2f). Thus, CellSP analysis suggests a connection between subcellular colocalization of the module and the myelination process in mature oligodendrocytes. A possible interpretation is that the mRNAs of these four genes are captured in the process of being transported to the myelin sheath for local translation, as has been recorded for at least two other myelin biogenesis-related genes, viz., Mbp (Myelin Basic Protein)29 and Mobp (Myelin Oligodendrocyte Basic Protein)30. Interestingly, another CellSP-reported module – “M11_S” – represents a punctate subcellular pattern (Supplementary Fig. 2d) involving three of the four above genes, viz., Sgk1, Ttyh2, Ermn, and another myelination-related gene, Gjc3 (gap junction protein gamma 3) that has been shown to localize in myelin sheaths31.
Fig. 2. CellSP reveals myelination-related subcellular spatial phenomena in mouse brain.
a Summary table of an example gene-cell module, “M0_I”, discovered by CellSP from MERFISH data on hypothalamic preoptic area in mouse brain. (The suffix “_I” indicates that it was discovered by aggregating single-cell level colocalization patterns reported by InSTAnT.) “GO Genes” shows the Gene Ontology (GO)-based characterization of the four module genes. “GO Cells” shows the GO-based characterization of the genes whose expression levels are most discriminative of the 332 module cells versus remaining cells. b Direct visualization of six of the 332 module cells, with blue boundaries indicating automatically delineated cell periphery, red boundaries indicating nuclear periphery, each dot representing a transcript, colored in green for module genes and in gray for other genes. (c CellSP visualization of module M0_I. The heatmap shows pairwise colocalization propensity of gene pairs in module cells compared to non-module cells, as a log ratio, with warmer colors (more positive values) indicating specificity of the colocalization to the module cells. The first half of rows and columns corresponds to module genes while the second half corresponds to a random subset of non-module genes, so the contrast between the upper left quadrant (more warm color cells) and other quadrants underscores that the phenomenon is specific to the module gene pairs. d UMAP visualization of all cells in the sample, with colors indicating cell types and module cells shown in black. The module mostly comprises mature oligodendrocytes (ODM). e REVIGO plot of biological processes enriched in module genes, showing a prominent association with myelination-related terms. f REVIGO plot of biological processes associated with gene markers of module cells, showing that these cells are also characterized by genes enriched for myelination-related terms.
An example of a module discovered using SPRAWL in the pattern discovery step is “M0_S” (Fig. 3a), consisting of four genes – Fn1 (fibronectin 1), Slco1a4 (solute carrier organic anion transporter family member 1a4), Rgs5 (regulator of G protein signaling 5), Sema3c (semaphorin 3C) – distributed in a peripheral pattern (Fig. 3b) in 221 cells. Figure 3c illustrates how CellSP helps visualize such a module, revealing the greater tendency of this module’s genes to have peripheral localization, compared with background genes. The module genes are enriched in regulation of axonogenesis (Fig. 3e)32–34, while the module cells, mostly neurons (Fig. 3d) are marked by genes associated with cell adhesion (Fig. 3f), a key process during axonogenesis. The peripheral localization pattern defining the module suggests local translation of proteins that localize in neurites (cell delineations mostly capture somata35) or extracellular matrix (ECM). Consistent with this speculation, RGS5 protein is known to exhibit strong synaptic localization36,37, while FN1 protein is an extracellular matrix component38, supporting the possibility of local translation. Considering the module’s statistical association with cell-adhesion functions, we examined the transcript distribution in adjacent pairs of cells and observed that the peripheral localization tends to be at the facing boundaries (Fig. 3g–i, Supplementary Fig. 5), a phenomenon not seen in non-module cells. Putting together the different facets of this module’s characterization by CellSP, we hypothesize that the four genes are locally translated into proteins that localize at soma boundaries and perform adhesion-related functions. This may be the case if the module cells, mostly neurons, are actively forming synapses, undergoing neurite outgrowth, etc.39, which is plausible considering that the preoptic region profiled through these data is an important site of developing neurites40.
Fig. 3. CellSP reveals axonogenesis-related subcellular spatial phenomena in the mouse brain.
a Summary table of an example gene-cell module, “M0_S”, discovered by CellSP from MERFISH data on hypothalamic preoptic area in mouse brain. (The suffix “_S” indicates that it was discovered by aggregating single-cell level colocalization patterns reported by SPRAWL). b Direct visualization of six of the 221 module cells, with each dot representing a transcript, colored in green for module genes and in gray for other genes. c CellSP visualization of module M0_S. The subcellular space is idealized as a circle with concentric rings representing varying distances from the “center” (see Methods) and color intensity representing transcript abundance of a gene set in a ring, averaged over the module cells. The upper circle depicts this information for module genes, and the lower circle represents all other genes for contrast, which in this case highlights that the module gene transcripts are enriched in peripheral regions (outermost ring). This type of visualization is used for any gene-cell module with a peripheral or central pattern. d UMAP visualization of all cells in the tissue, with colors indicating cell types and module cells shown in black. The module mostly comprises neurons. e REVIGO plot of biological processes associated with module genes, highlighting enrichment for axonogenesis-related terms. f REVIGO plot of biological processes associated with marker genes of module cells, showing that these cells are characterized by genes enriched for cell adhesion-related terms. g, h Spatial plot of all cells in tissue (h) with module cells highlighted in red, along with detailed visualization of a subset of module cells in close spatial proximity (g, i). In these detailed views, module cells are depicted with a stronger boundary, while their fill color corresponds to their cell type. Transcripts of module genes are highlighted in red and are localized at the contact boundaries between module cells.
Subcellular spatial patterns recur across cell types and regions of brain
We next analyzed a large ST data set that profiles a whole mouse brain using Xenium technology41. This profiles a panel of 248 genes in 162,033 cells, grouped into 50 clusters based on their transcriptomes and a graph-based clustering algorithm (Supplementary Fig. 6). We analyzed each cell cluster separately to avoid confounding effects of inter-cluster expression differences and to test module reproducibility across varying cellular contexts (see Supplementary Note 1, Supplementary Data 16–28). CellSP identified ~22 modules per cell cluster, each module comprising 8 genes and 68 cells on average (Supplementary Data 2). Some of the prominent biological characterizations (GO terms) reported for these modules included synaptic processes, myelination, hormone receptor activity, cellular responses to external stimuli, extracellular components, secretory processes, and inter-cellular communication.
Closer inspection of these results revealed that modules with certain biological characterizations recur in multiple cell clusters (cell types and/or regions) of the brain. For instance, we observed 36 modules, across 18 different clusters (Supplementary Data 3, Supplementary Fig. 8) to be associated with myelination process. There is substantial overlap in their constituent genes (Fig. 4C), with the three most frequently recurring constituents being the myelin-associated genes Gjc3, Sox10 (SRY-box transcription factor 10) and Opalin (oligodendrocytic myelin paranodal and inner loop protein)42–44. Gjc3 is known to be expressed in myelinating glial cells31, Sox10 is a transcription factor responsible for the activation of several myelin-specific genes45 and Opalin is known to promote oligodendrocyte differentiation and axon myelination46. We confirmed that the inclusion of a gene in a module is not a mere reflection of its expression levels in module cells (Supplementary Fig. 7a), though higher expression does play a role (Supplementary Fig. 7b). These myelination-associated modules comprise 3493 cells overall, which are observed across clusters with diverse transcriptomic profiles (Fig. 4a) and in different brain regions (Fig. 4b). The shared biological properties of these modules are further elucidated by the numerous other GO terms enrichments common to them (Fig. 4d,e), including mesenchymal stem cell (MSC) differentiation and oligodendrocyte differentiation. MSCs have the potential to differentiate to or stimulate the maturation of oligodendrocytes47, the primary myelinating cells.
Fig. 4. CellSP uncovers myelination and GABA-ergic synapse related modules across cell types and regions of mouse brain.
a UMAP visualization of all cells in a Xenium data set of whole mouse brain, with cells of 36 myelination-related gene-cell modules detected by CellSP shown in color. (Cells of each module are shown in a different color.) This demonstrates that CellSP captures myelination-related subcellular patterns across cells with diverse gene expression profiles, indicating that these patterns are not merely a reflection of overall gene expression levels. b Spatial plot of all cells in the whole brain data set, with cells of myelination-related modules (same as in A) shown in different colors, while cells not in any of these modules are shown in grey. c Matrix showing gene composition of each of the 36 myelination-related modules (rows); blue indicates a gene’s inclusion in a module. Genes Sox10, Gjc3, and Opalin appear in all or most modules, while four others appear in a majority of modules. (d) REVIGO plot of biological processes associated with module genes, aggregated over all 36 modules. Color intensity of a bin in this plot represents the number of significant module associations of one or more GO terms that map to that bin in the semantic space computed by REVIGO. e REVIGO plot of biological processes associated with gene markers of module cells, aggregated over all the 36 modules. f UMAP visualization of all cells in data set, with cells in any of the 20 discovered GABA-ergic synapse-related modules shown in color. (Cells of each module are shown in a different color). g Spatial plot of all cells in the sample, with cells from all 20 GABA-ergic synapse-related modules shown in different colors, while cells not in any of these modules are shown in grey.
Another prominent theme in the whole brain CellSP results was the repeated finding of modules characterized by “GABA-ergic synapse”, a GO subcellular localization term. Twenty modules across 14 different cell clusters were reported to have this annotation enriched (p-value < 0.0001) in their constituent genes as well as in expression markers of their member cells (Supplementary Fig. 9a, b). These modules were observed in different expression contexts (Fig. 4f) and physical regions of the brain (Fig. 4g), and to exhibit different types of subcellular patterns (Supplementary Fig. 11, Supplementary Data 4). Several genes appear repeatedly as members of these modules, with the most frequent ones being Gad1 (glutamate decarboxylase 1), Gad2 (glutamate decarboxylase 2) and Rab3b (RAB3B, member RAS oncogene family) (Supplementary Fig. 9c, Supplementary Fig. 10), which play critical roles in maintaining basal levels of GABA (Gamma-aminobutyric acid), as well as synthesis and vesicle release of GABA in an activity-dependent manner at inhibitory synapses48–50. There is evidence suggesting that GAD2 and RAB3B proteins are localized in presynaptic terminals49,51, hinting at an explanation for the subcellular colocalization of their transcripts.
CellSP-detected modules can be specific to biological conditions
The analyses above show us that gene-cell modules recovered by CellSP reveal subcellular transcriptomic phenomena related to biological processes such as myelination and axonogenesis. This raises the natural question: do these modules also reveal subcellular phenomena that vary under different tissue conditions? To investigate this, we used CellSP on data sets comprising case-control pairs of tissues. The first such analysis involved Xenium data on a kidney cancer (papillary renal cell carcinoma) FFPE sample (Supplementary Fig. 12a) and a healthy kidney sample52. As in the whole-brain analysis above, we first clustered the cells of each sample based on their transcriptomes and analyzed each cell cluster with CellSP, identifying 11 modules in the cancer sample and 15 modules in the healthy sample (Supplementary Data 5, 6). The cancer modules were mostly (10/11) found in Cluster 6 and Cluster 9 (Fig. 5a, Supplementary Fig. 12b). Six of these cancer modules comprise genes and cells characterized by CellSP as being immune response-related, with enrichment of GO terms such as “immune response”, “defense response”, “T cell activation”, “positive regulation of leukocyte proliferation”, etc. (Supplementary Data 5).
Fig. 5. CellSP detects differences in subcellular spatial patterns between tissue conditions.
a Spatial plot of a Kidney PRCC (kidney cancer) tissue (Xenium data), highlighting cluster 6 cells in blue, cluster 9 cells in green, and cells of any of the 6 CellSP-detected immune-response related gene-cell modules in red. b–d CellSP visualization of three selected immune system related gene-cell modules discovered in the cancer sample. b shows module R6_M5_S (central pattern) in left panel while right panel is a visualization of the subcellular distribution of the module genes in expression-matched cells from the healthy kidney sample (“control”). Module genes are more uniformly localized towards the cell’s “center” in the cancer sample than in healthy sample. c shows module R6_M0_S (radial pattern) in left panel and the right panel depicts the subcellular distribution of module genes in cells from the healthy sample. The cancer sample (left panel) shows a strong clear difference between module genes and non-module genes in terms of radial spread of transcripts, while this contrast is insignificant in the healthy sample (right panel) (highlighted by the visual overlay). d shows module R6_M3_I, defined by a colocalization pattern. The colocalization visualization has been modified to provide contrast with control cells rather than non-module cancer cells. The module genes are much more proximal to each other in cancer module cells than in random cells from the healthy sample. e Gene-cell modules detected in mouse hemibrain Xenium samples from three TgCRND8 mice (AD model) at different time points and three wildtype (WT) mice at similar time points were examined for inclusion of six AD-related genes (columns). Shown are the number of detected modules in each of the six samples (rows) that include a gene. Notably, Picalm and Trem2 have more modules detected in AD samples than in WT, consistent with their known roles in Alzheimer’s Disease-related processes. f Chord diagram depicting the number of modules detected in AD samples (bottom) or WT samples (top) that include a pair of genes, shown for genes annotated with function “ensheathment of neurons”. The gene pairs “Olig2-Plp1” and “Olig2-Mobp” (shown in black) exhibit a significant difference between AD and WT samples. g Blue bars show z-scores computed from a proportion test comparing AD and WT samples in terms of the fraction of CellSP modules that include Olig2 and another gene (x-axis labels), shown only for partner genes where the proportion test yields a significant p-value. Orange bars show similarly calculated z-scores for modules defined by cell-level co-expression, as computed by WGCNA.
Examination of the 15 modules in the healthy sample (Supplementary Data 6) suggested that the immune response-related modules found in the cancer sample were specific to it and were not present in the healthy sample. To pursue this observation objectively, we used CellSP visualization routines to contrast a module’s spatial pattern in the module cells (from the cancer sample) versus cells in the other (healthy) tissue sample. As condition-specific gene expression can be a confounder for such analysis, this analysis used a subset of healthy tissue cells that matched the cancer module cells in the expression levels of module genes (Methods). Figure 5b-d show the cancer-versus-healthy tissue comparison for three of the seven cancer-associated immune-response modules. For instance, module C6_M5_S (Fig. 5b) comprises a set of 31 genes that are localized centrally in a subset of 22 cells (Supplementary Fig. 12c) and was detected in Cluster 6 of the cancer sample (Fig. 5a). The genes are enriched for the GO term “defense response” (p-value 8.9E−08). As shown in Fig. 5b (right), the same set of genes lacks this centrally localized distribution of transcripts in expression-matched cells from the healthy sample. A second example is the module C6_M0_S (Fig. 5c): this comprises 24 genes, also enriched in “defense response”, that exhibit radial subcellular localization in a set of 18 cells of Cluster 6 (Supplementary Fig. 12d). CellSP visualization (Fig. 5c, left) confirms that the spatial pattern is significantly different between module genes and other genes when examining the module cells. The adjacent panel (Fig. 5c, right) shows that this prominent contrast is not seen in expression-matched cells from healthy tissue, supporting the cancer-specificity of this module. The third cancer-specific module we highlight here is C6_M3_I (Fig. 5d), a set of seven genes enriched in the term “T cell activation”, that exhibit significant subcellular colocalization in 66 cells of Cluster 6 (Supplementary Fig. 12e). The strength of its spatial pattern in cancer versus healthy tissue is visualized in Fig. 5d, where the top left quadrant of the heatmap indicates that the module genes have a stronger tendency for pairwise colocalization in the module (cancer) cells compared to expression-matched non-module cells from the healthy sample. (The other quadrants, especially the bottom-right, show that this contrast is not observed for a random subset of non-module genes.) In summary, CellSP discovers immune-response related gene-cell modules representing significant subcellular patterns that are specific to cancer versus healthy tissue.
CellSP detects subcellular patterns associated with Alzheimer’s Disease in a mouse model
We next used CellSP to identify gene-cell modules in brains of mouse models of Alzheimer’s Disease (AD). For this, we analyzed Xenium data on a coronal section of one hemisphere of TgCRND8 transgenic male mouse brain at three time points (pathological progression stages), as well as wild-type (WT) mouse brain at similar time points53. Following the same procedure as above, we clustered cells in each brain sample and used CellSP to find significant modules in each cell cluster, recovering ~11 gene-cell modules per cluster on average (Supplementary Fig. 13, Supplementary Data 7).
To contextualize these modules with the AD genetics literature, we focused on six genes from the Xenium gene panel implicated in AD risk54 (Fig. 5e). We observed the gene Picalm (phosphatidylinositol binding clathrin assembly protein) to appear in many modules in the AD mice, especially in the early and middle time points, significantly more frequently than in WT mice (t-test p-value 0.047) (Supplementary Fig. 14). The Picalm protein is responsible for recruiting clathrin and adaptor protein-2 (AP-2) to the plasma membrane55 as part of clathrin-mediated endocytosis, and contributes to clearance of amyloid-β (Aβ) at the blood brain barrier56. It is linked to processes that are disrupted in AD and is also genetically associated with the disease57. We thus speculate that the observed subcellular patterns involving this gene are reflections of the gene’s dysfunction in AD mice. We observed another AD risk gene – Trem2 (triggering receptor expressed on myeloid cells 2) – to feature in many CellSP modules (12) (Supplementary Fig. 15) in the late-stage AD mouse brain, significantly more than in the other five samples from WT or AD mice. This late stage is associated with an increase in AD-associated microglial population58, and Trem2 is primarily expressed in microglia, where it functions as a transmembrane receptor59 enabling the progression to a mature disease-associated microglia phenotype58 and is involved in the response to Aβ plaques60. Variants in Trem2 gene have been identified as a significant risk factor for late-onset AD61, suggesting again that the observed gene-cell modules reveal subcellular phenomena reflecting its dysfunction. The four other AD risk genes were found in few modules overall (Fig. 5e), and these modules were of similar counts in AD vs. WT mice. We also note that the above AD-associated changes involving Picalm and Trem2 expression, observed at the subcellular spatial level, would not have been detected at the level of overall cellular expression (Supplementary Fig. 16).
We next examined the CellSP-detected modules in AD and WT brains with a focus on myelination. Given the reported connection between myelin damage and AD pathology62; and building on our earlier identification of myelination-related gene-cell modules in the mouse brain, we investigated whether such modules have any unique characteristics in one phenotypic group compared to the other. We concentrated on all identified modules associated with the GO term “ensheathment of neurons” (nominal p-value < 0.01) and, to identify any characteristics that differentiate these modules between AD and WT, we examined the frequency of myelination-related genes co-occurring in the same module (Fig. 5f). While most gene pairs show similar co-occurrence statistics between the two groups, the pairs Olig2 (oligodendrocyte transcription factor 2)-Plp1 (proteolipid protein 1) and Olig2-Mobp have a significantly higher co-occurrence in WT brain compared to AD brain (proportion test p-value < 0.05) (also see Supplementary Data 8). To follow up on these intriguing observations involving Olig2, a gene essential to maturation of oligodendrocytes (the cells responsible for producing myelin), we next identified all genes that co-occur with Olig2 in a group-specific manner (proportion test p-value < 0.05). (The previous analysis was limited to eight select myelination-related genes, including Olig2.) We found 10 such genes (Fig. 5g, blue bars), nine of which co-occur with Olig2 preferentially in gene-cell modules of the WT brains, and consist of the above-noted genes Plp1, Mobp as well as other genes linked to myelination, including Opalin and Cnp (2’,3’-cyclic nucleotide 3’ phosphodiesterase)46,63–65. This inter-group difference in module composition is most in time points 1 and 2 (Supplementary Data 9) which correspond to periods of continued myelination in the mice brain66. These observations suggest that the integrity and functionality of subcellular modules found in WT brains may be impaired in AD brains, reflecting a broader dysfunction in myelination processes in AD67. In summary, our myelination-focused examination of CellSP modules points to a potential significance of Olig2 in Alzheimer’s disease pathology, highlighting a shift from normal myelination functions in healthy brains.
The specific relationships and insights uncovered here are a glimpse into the dynamics of the subcellular transcriptome during AD progression, not merely a reflection of differential co-expression of genes at the whole cell level. To substantiate this, we repeated the module discovery using co-expression analysis with the popular WGCNA tool21. Modules were detected for each cell cluster in each of the six brain samples, as above, except that these were gene co-expression modules rather than CellSP-identified gene-cell modules. These co-expression modules do not show any preferential co-occurrence of Olig2 with any of the genes identified above except one – Olig2-Gatm (glycine amidinotransferase) (p-value < 0.05, Fig. 5g, orange bars) (Supplementary Data 10). This analysis underscores the ability of CellSP modules to reveal subcellular pattern changes between phenotypic conditions that may not be discernible at the cellular level.
Discussion
With the increasing popularity of spatial transcriptomics (ST) techniques of single-molecule resolution68–70, new tools have been proposed to identify genes and gene pairs that exhibit interesting transcript distribution patterns within cells13,18,19. However, there remain fundamental and practical challenges not addressed by these analysis tools. A practical problem is that they annotate subcellular patterns such as gene-gene colocalization or gene localization preferences in individual cells, leading to very large compendia of significant patterns that are difficult to sift through for actionable insights. A conceptual limitation is that while they can effectively shortlist genes that merit further exploration on account of their subcellular distributions, they do not highlight subsets of cells where interesting spatial phenomena manifest and could point to special functional properties of those cells. The simple approach of examining every cell that exhibits a statistically significant spatial pattern is impractical, as a very large number of cells harboring diverse patterns of varying functional origins are enumerated in this way. On the other hand, examining all cells where a particular gene (or pair) has the same subcellular distribution pattern will demand that we repeat such examination for a large number of genes, again leading to a substantial interpretive burden.
The key innovation of CellSP is the idea of a “gene-cell module”, which is one solution to the above shortcomings of existing tools. Such a module, if found to be statistically significant, draws our attention to a subset of cells that share a subcellular spatial pattern involving the same subset of genes, suggesting a functional commonality among those cells. Furthermore, CellSP automatically searches for that functional characterization by using machine learning to identify gene markers of the member cells of the module, and reporting biological properties enriched in the marker genes. While module genes often show high expression, focusing on non-module genes helps reveal additional markers and pathways associated with the module cells, providing complementary biological insight beyond the module’s defining genes. This is how our analysis of MERFISH data on hypothalamic preoptic area uncovered a collection of 332 cells, mostly mature oligodendrocytes, that appear to be involved in myelination, and another subset of 221 cells, mostly neurons, that are characterized by axonogenesis-related functions. These discoveries would not have been possible with existing tools. In identifying a gene-cell module, CellSP not only highlights an interesting subset of cells, it simultaneously draws our attention to a set of genes that share an interesting transcript distribution pattern in those cells, prompting functional hypotheses involving those spatial patterns. Such gene set discovery is a standard technique of transcriptomics analysis, e.g., as co-expression module identification21, and a CellSP module extends this time-tested concept to subcellular colocalization of genes. It was this functionality that led to the identification of the four-gene module comprising myelination-related genes (Sgk1, Ttyh2, Ermn and Ndrg1) that colocalize with each other in many mature oligodendrocytes, and another module comprising axonogenesis-related genes (Fn1, Slco1a4, Rgs5, Sema3c) whose transcripts localize at the periphery in the same collection of cells, raising the possibility of localized translation of these genes as part of cell adhesion processes.
We note that the InSTAnT toolkit18 provides module discovery functionality, where a module has a different statistical interpretation from that adopted in CellSP – it is a subset of genes that colocalize with each other significantly frequently across the entire population of cells. We compared CellSP to the gene module discovery methods implemented in InSTAnT (Supplementary Note 2, Supplementary Data 13–15) and found that the former offers significant advantages in terms of biological interpretation and ease of use. Importantly, CellSP-reported modules include four additional types of subcellular spatial patterns (radial, peripheral, punctate, and central patterns discovered by SPRAWL19) and are accompanied by innovative visualizations and machine learning-based characterization of module cells, features that are crucial to the scientific discovery process involving functional subcellular patterns. To reduce interpretive burden, CellSP includes a module coalescing step that merges highly overlapping modules while retaining statistical significance. Although this improves clarity and reduces redundancy, we acknowledge that, in some cases, individual modules may have offered additional biological insight if reported separately.
When interpreting CellSP modules, it is important to keep in mind that total gene expression can be a confounder – cells with low expression of a gene are unlikely to reveal statistically significant subcellular patterns involving that gene. For this reason, we used a clustering approach on the larger, more heterogeneous data sets, with CellSP being run separately on each cluster of cells that are relatively less heterogeneous in their expression profiles. Such clustering is not performed by CellSP, rather it has to be performed by the user, and each cluster is provided as a separate data set to the tool. We have shown that CellSP is robust to cluster granularity, provided each cluster has ≥1000 cells (with ~250 genes); overly fine clustering may fragment spatial modules (Supplementary Note 1).
By analyzing data sets from diverse organs and tissues, such as the whole brain, specific brain regions, the kidney, etc., under various biological conditions such as Alzheimer’s disease models or kidney cancer, and different technologies such as MERFISH and Xenium, we demonstrated how CellSP can be used to explore large and high-dimensional ST data and extract new insights into subcellular transcript distribution. The discovered patterns are suggestive of co-transportation or co-localization of RNAs that participate in similar biological functions71, and also support the phenomenon of local translation72, which requires the mRNA to be transported to precise locations within the cell where they are translated into proteins. Furthermore, we showed how gene-cell modules can be compared across conditions to discover subcellular changes associated with disease states, which are not recoverable using traditional differential expression analysis. Overall, CellSP offers a powerful new approach to subcellular spatial transcriptomics data analysis.
Limitations
We recognize that certain aspects of CellSP can be further improved in future versions. For example, the heuristic nature of the LAS search algorithm and our module merging step may result in variability across different runs, and alternative search algorithms should be explored to address this. As another example, the scheme for visualizing subcellular idealized cells as unit circles, which may not accurately depict elongated or irregularly shaped cells, techniques for average shape construction73 may be fruitful for future work in this direction. Additionally, while CellSP can be computationally intensive for very large datasets, its use on clusters of relatively homogeneous cells—as we recommend—keeps run times tractable. Furthermore, sampling strategies applied to large datasets can provide a reasonable approximation of full-run results, given that both InSTAnT and SPRAWL operate on a per-cell basis and CellSP analyzes cell-level patterns (see Supplementary Note 1).
CellSP, along with its underlying methods InSTAnT and SPRAWL, operates on transcript-level data for each cell. While this granularity enables high-resolution spatial analysis, it also imposes substantial computational demands—challenges that will intensify as spatial technologies scale up in both gene coverage and tissue size. To address this, we show that CellSP can produce reliable approximations when applied to clustered or subsampled datasets (see Supplementary Note 1), though excessively fine clustering may reduce statistical power. We have also implemented parallelization of core components to improve performance. However, further optimization will be necessary to ensure scalability to future, larger datasets.
The selection of the LAS algorithm was guided by its strong performance in a benchmarking study by Padilha et al.74 and in our own limited evaluations, where it consistently outperformed other biclustering methods. Nevertheless, we acknowledge that a more comprehensive benchmarking effort is needed to fully evaluate alternative approaches. Although CellSP is extensible in principle, incorporating other biclustering algorithms currently requires manual edits to the source code and is not yet available as a streamlined, user-accessible feature.
The GO enrichment analysis used to characterize modules may, in some cases, be underpowered if the gene set comprising the module is small (e.g., 5 or fewer). We note that even in these cases, any reported association (nominal p-value < 0.05) is reliable, though true associations may not always be recovered. Generally, the characterization of module cells (via GO enrichment of module marker genes) does not suffer from this shortcoming. Using imputed gene expression improves the classification of module cells compared to relying on measured genes alone (Supplementary Data 1–7). However, the extent to which this improvement depends on the quality or depth of the reference scRNA-seq dataset used for Tangram imputation has not yet been systematically evaluated and remains an important direction for future work.
Finally, accurate cell segmentation remains a major challenge for subcellular transcriptomic data analysis. Limitations in segmentation accuracy, especially for irregular or complex cell shapes, can affect the spatial pattern scores assigned by SPRAWL and propagate to downstream analyses. Addressing these issues will be essential to improving the biological resolution and reliability of CellSP findings.
Methods
CellSP module discovery algorithm
CellSP identifies persistent subcellular spatial phenomena in the tissue by defining “gene-cell modules” as sets of genes with specific subcellular transcript spatial patterns across cells. This process is performed in two steps -
Subcellular pattern discovery
We utilize the existing tools InSTAnT18 and SPRAWL19 for subcellular pattern detection in individual cells. InSTAnT detects gene pair colocalization in single cells, employing the “Proximal Pairs” (PP) test to calculate colocalization p-values for each gene pair in every cell. It also assigns a global colocalization p-value to each gene pair, via the “Conditional Poisson Binomial” (CPB) test. We use a user-tunable threshold of 1e−3 on this p-value to limit the set of candidate gene pairs. A matrix MI is constructed to store the PP test p-values with dimensions ncells × ngenepairs where ngenepairs is the number of candidate gene pairs. We then perform a negative log transformation on this matrix.
SPRAWL detects subcellular mRNA localization patterns and classifies them into four categories - peripheral, central, radial, and punctate. Each gene receives a statistical significance score (on a scale of -1 to 1) representing the strength of the spatial pattern in each cell. These scores are stored in four separate matrices Mperipheral, Mcentral, Mradial, Mpunctate each with dimensions ncells × ngenes.
Module discovery
Each entry of the matrices obtained from InSTAnT and SPRAWL represents the strength of pattern occurrence for each gene/gene-pair in each cell. To aggregate these patterns across genes and cells, we use the LAS23 algorithm to perform biclustering on each of these matrices. The LAS algorithm iteratively searches for a submatrix (bicluster) with significantly high average value, removes the influence of the selected submatrix from the original data matrix, and repeats the process until the desired number of submatrices are retrieved or no significant submatrices are found. The LAS significance score for a k × l submatrix U within a data matrix X of dimensions m × n is defined as
where Φ(·) denotes the cumulative distribution function (CDF) of the standard normal distribution, and τ denotes the average value of the submatrix U.
The reported submatrices are what we call “gene-cell modules” or “modules”. Due to the heuristic nature of the search, it is possible that a reported module is part of a larger module with a similarly high average value, but this larger module is not discovered. To mitigate this issue across reported modules, we implement a two-step module expansion process:
Cell Expansion – For each module, we iterate through all cells and add a cell to the module if the cell’s average value across the module’s genes exceeds the original module’s average value.
Module Mergers – Iterating over modules in descending order of significance score, we calculate the overlap coefficient for the genes of each module pair. The overlap coefficient is defined as
where A and B denote the gene sets of the two modules. If the overlap coefficient exceeds 0.667, then we consider merging the module pair by creating a new module that comprises the union of genes/gene pairs of the original pair and the union of cells of the original pair. If the new module’s significance score surpasses the module significance threshold, the merged module is retained, and the original pair of modules is removed. This process is repeated iteratively, prioritizing the merger of highly significant modules first, until no further mergers are possible. This merger step is useful to reduce the number of modules detected and combine modules of similar composition (high overlap in constituent genes) into one. This step is made optional.
The module expansion process allows closely related modules to be merged while reducing fragmentation and improving interpretation of the modules. The default value of 0.667 denotes that two-thirds of a smaller module’s genes are part of another module. This parameter remains adjustable by users to allow for varying levels of module fragmentation. The module discovery process (including module expansion) is conducted independently for modules derived from each of the five matrices MI, Mperipheral, Mcentral, Mradial, Mpunctate that represent different kinds of subcellular patterns.
In the final reported list of gene-cell modules, two modules may comprise overlapping sets of cells and genes, allowing the same cells and genes to be part of multiple modules (Fig. 1). Modules detected from the SPRAWL-derived matrices Mperipheral Mcentral, Mradial, Mpunctate are reported as one group (their module identifiers have the suffix “_S” and a unique numeric prefix), while modules detected from the InSTAnT-derived matrix MI are reported as another group (identifiers with suffix “_I” and a unique numeric prefix).
Module Characterization
To investigate factors (other than cell types) that underlie or are associated with subcellular spatial phenomena, CellSP performs a two-level characterization of the detected modules using gene set enrichment analysis and predictive modeling.
Module Gene Characterization
The biological functions associated with module genes are characterized using PantherDB74 for gene enrichment analysis. For each module, the module genes are used as the query gene set, while the gene panel of the ST assay serves as the background set. This analysis identifies pathways, biological processes, and molecular functions enriched in module genes.
Module cell characterization
To characterize the cells in each module, CellSP trains a classifier to distinguish between module cells (positive set) and all other cells (negative set). Each cell’s gene expression profile (total transcript count of each gene in that cell) is used as the cell’s feature vector, i.e., this is not a spatially resolved featurization. Since an ST assay of single-molecule resolution typically profiles a limited number of genes, the number of features is relatively modest. To address this, CellSP uses the Tangram tool75 to impute the expression of additional genes in each cell of the ST data using scRNA-seq data of the same tissue, if available. The imputation process expands the gene set significantly, increasing the number of features from a few hundreds (in the original ST data) to several thousands (as available in scRNA-seq data). We construct an extended gene expression matrix by retaining the original expression values for genes present in the ST panel and incorporating the imputed expression values for genes absent from the panel. The extended gene expression matrix is used to train a Random Forest classifier of module cells. The top 20 most informative genes are identified based on their feature importance, which is quantified using SHAP76,77. These genes, along with any other genes that exhibit a high Pearson correlation (r > 0.98) with them, are designated as “marker genes” of the module cells. These genes are then subjected to gene set characterization using PantherDB to identify pathways, biological processes, and molecular functions that characterize the module’s cells.
Module pattern visualization
To help visualize modules defined by the five types of subcellular spatial patterns (four types identified by SPRAWL and the colocalization pattern identified by InSTAnT), we developed three complementary plotting techniques.
SPRAWL detects localization patterns (peripheral, central, radial or punctate) for each gene in each cell. To aggregate these spatial localization patterns across many cells, CellSP transforms each cell into a uniform representation within a unit circle. This transformation is achieved using the smallest enclosing circle algorithm to identify the smallest circle enclosing all transcripts in the cell. The circle is then centered at the origin and scaled to have a unit radius. For “central” and “peripheral” patterns, the unit circle is divided into C (default value of 5) concentric rings, and the proportion of module gene transcripts within each ring is calculated. The transcript abundance in a ring is then averaged across all cells of the module, and the ring is displayed with color shade depicting the average abundance. A similar visualization is constructed for all non-module cells and the two plots are placed side-by-side. For modules displaying “central” patterns, we expect higher abundance in the innermost rings, while “peripheral” patterns are anticipated to exhibit higher abundance in the outermost rings.
For modules defined by “punctate” or “radial” patterns, the unit circle representing an idealized view of a cell is divided into S (defaults to 10) sectors. Gene transcript density is computed in each sector for module genes. To account for the directional variability of these patterns, each cell’s idealized circle view is rotated such that the sector with the maximum density aligned with 0°. Following this alignment, densities are aggregated across cells by calculating mean densities for each sector. This results in a density distribution across sectors, with the highest density in the first sector by construction. This calculation is then repeated for non-module genes, and the density distributions (across sectors) for module genes and non-module genes are plotted overlaid on each other, in different colors. In this framework, module genes are expected to concentrate primarily in the first sector, while non-module genes are anticipated to exhibit more uniform distributions across sectors, despite having the highest density in the first sector.
InSTAnT detects colocalized gene pairs as those whose transcripts are in close proximity (within distance d) of each other. To visualize such patterns aggregated over cells, CellSP first constructs a d-radius neighborhood graph of transcripts for each module cell (where d equals the distance threshold parameter used in InSTAnT runs), calculates the number of neighboring transcript pairs for each pair of module genes and averages these counts across all module cells, thus obtaining a “proximity score” for that gene pair aggregating information across module cells. This process is then repeated over non-module cells to obtain a proximity score of the same gene pair, but now aggregating information from non-module cells. The proximity enrichment score of the gene pair is then defined as the log ratio of proximity scores from module cells and non-module cells. High positive values of the proximity enrichment score indicate that the gene pair exhibits a greater degree of colocalization in module cells compared to non-module cells. To add further contrast, a randomly selected set of non-module (“control”) genes, equal in count to the module genes, is included in the visualization: a heatmap is constructed whose rows and columns represent module genes (top half of rows and left half of columns) and the selected control genes (bottom half of rows and right half of columns), and values depict proximity enrichment scores of gene pairs. The upper-left quadrant corresponds to pairs of module genes, the lower-right quadrant to pairs of control genes, and the remaining two quadrants correspond to a module gene paired with a non-module gene.
Gene set enrichment visualization
CellSP provides the ability to visualize the gene set enrichment reports generated by PantherDB using the Revigo tool24. Revigo summarizes lists of GO terms by clustering them based on their semantic similarity, identifying representative terms. These terms are visualized as circles in a scatterplot, where the circle size reflects the gene set size and the color intensity indicates statistical significance, with darker colors representing higher significance. This visualization highlights the importance, similarity, and uniqueness of terms, making it easier to interpret enrichment results.
We used a special type of Revigo plots to aggregate gene set enrichment reports across multiple detected modules (Fig. 4d,e; this is not a standard functionality of CellSP). We first collected all GO terms with a module association at a p-value below 0.01, across the modules, along with their p-values. This aggregated list of significant terms was then processed through Revigo, which calculated the principal component analysis (PCA) values for each term. Using these PCA coordinates, we generated a 2D histogram to visualize the aggregated GO terms across modules. The PCA values determined the positions of the bins in the plot, and the number of modules each term was significantly associated with was represented by the color intensity of the corresponding bin. This visualization highlights the most commonly enriched biological themes across modules while preserving the semantic relationships among GO terms.
Cross-condition module comparison
Human kidney dataset
To compare the detected modules between conditions in the Human Kidney dataset, we assessed whether the subcellular patterns of modules from the cancer condition were present in the control tissue. We first performed min-max normalization of gene expression values for both the control and cancer datasets independently. For a gene-cell module detected in the cancer dataset, we calculated the total gene expression of module genes in each module cell and determined their minimum Pmin and maximum Pmax as percentiles of total gene expression in the population of all cells in the cancer dataset. Using these percentiles, we identified cells from the control dataset that have total gene expression (of module genes) between Pmin and Pmax percentile of the population of control cells. Thus, we identified a subset of the control cells whose total expression is similar to the module cells (which belong to the cancer data set). Subcellular patterns of module genes were then examined in these control cells and compared to the patterns in the module cells from the cancer dataset.
Mouse Alzheimer’s disease dataset
To identify changes in functionally similar (myelination-related) modules across conditions (AD model mice and WT mice), we asked whether two sets of modules (one set from each condition) differ in their gene composition. One way to answer this is based on how many modules in each condition include a specific gene, and whether the gene’s frequency of inclusion is different between conditions. This approach was used for Fig. 5e. A complementary approach is based on counting how many modules in each condition include a specific gene pair and comparing the proportion of such modules between conditions using a one-sided proportion test78. The test statistic for the proportion test is given by
where is the proportion for group 1, is the proportion for group 2, is the pooled proportion and n1, n2 are the sample sizes for group 1 and 2 respectively.
Comparison with traditional analysis methods
To compare insights generated by our approach against traditional methods, we performed differential expression analysis and co-expression network analysis on the datasets. For differential expression analysis, we utilized Scanpy79 to apply the Wilcoxon rank-sum test, comparing gene expression between the two conditions under investigation. This provided insights into genes with significantly altered expression levels across conditions.
For co-expression network analysis, we employed WGCNA21 using the PyWGCNA framework80. Similar to our approach using CellSP, WGCNA was run independently for each cell cluster in the dataset. WGCNA assigns a module identity (class label) to each gene within a cluster based on its co-expression patterns. We used these module identities to calculate the frequency with which a gene pair is assigned to the same WGCNA module across the dataset. These frequencies were then compared between the conditions using a one-sided proportion test to assess condition-specific co-expression relationships.
CellSP user guide
CellSP provides flexible and tunable parameters that allow users to adapt analyses to their specific datasets and objectives. For InSTAnT, the distance threshold (d) is typically set to approximately 5% of the average cell diameter. The significance threshold (α) for the CPB p-values is set to 1e−5 by default. If this results in too few significant gene pairs (<250), we recommend relaxing the threshold to 1e−3 or lower. To manage computational complexity, an additional parameter, K, allows users to select the top K gene pairs for further analysis. For SPRAWL, we adapted the original scripts into CellSP to enable parallelization and use the default parameters. Similarly, LAS scripts from the implementation available in biclustlib81 were parallelized and integrated into CellSP. LAS includes two primary adjustable parameters: N and RS. The parameter N determines the number of modules to search and is set to a default value of ‘auto’ which uses an adaptive thresholding strategy based on an empirical null distribution of LAS scores. Specifically, we shuffle the score matrix obtained for each pattern multiple times and use the average of the top bicluster scores from these permutations as a threshold; only biclusters exceeding this threshold are retained. This data-driven approach allows CellSP to automatically determine the number of meaningful modules. Alternatively, users may manually set N, and we recommend choosing a value between 5 and 50 for most applications. The parameter RS specifies the number of randomized searches performed to enhance the quality of the biclusters. By default, RS is set to 50,000; however, for larger datasets (e.g., >200 genes or > 10,000 cells), we recommend reducing this value to balance computational demands while maintaining reasonable bicluster quality. Visualization in CellSP is also highly customizable. Users can fine-tune the representation of subcellular patterns using parameters such as the number of concentric circles (C), the number of sectors (S), and the distance threshold (d). These settings provide flexibility in highlighting key spatial patterns. Details for the specific parameters used in each analysis can be found in Supplementary Data 11. On a dataset containing 10,000 cells and 248 genes, CellSP completes its analysis in approximately 3 hours. Additional runtime details can be found in Supplementary Note 3 and Supplementary Data 12. CellSP requires input in the AnnData format and includes usage documentation on GitHub.
Statistics and reproducibility
CellSP performs gene set enrichment analysis for module characterization using the PantherDB API. Statistical comparisons of module composition were conducted using one-sided proportion tests. Specifically, to assess whether the detection of a given gene Gi differs between samples S1 and S2, the test compares the proportion of modules in S1 that include Gi to the corresponding proportion in S2. The sample size for this analysis was defined as the total number of modules identified in S1 and S2. For gene co-occurrence analysis, where the relationship between Gj and Gi across the two samples was evaluated, the sample size was defined as the number of modules containing Gi in S1 and S2. Lastly, variability between CellSP runs arises from the number of random searches performed by the biclustering algorithm. Increasing the number of searches reduces stochastic variation but increases computational cost.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Description of Additional Supplementary Data
Acknowledgements
We thank Abhishek Ojha for his help in designing the statistical tests. This research utilized resources from PACE at the Georgia Institute of Technology, Atlanta, USA. Funding: This work was supported by the National Institutes of Health (R35GM131819 to S.S.) and Georgia Institute of Technology (Wallace H. Coulter Distinguished Faculty Chair: S.S.).
Author contributions
B.A. and S.S. jointly designed the study. B.A. developed the computational methods and performed the analyses. S.S. supervised the project. Both authors wrote the manuscript and contributed equally to this work.
Peer review
Peer review information
Communications Biology thanks Jingyang Qian, Zhiyuan Yuan, and Madhavi Tippani for their contribution to the peer review of this work. Primary Handling Editors: Kuangyu Yen and Aylin Bircan.
Data availability
The MERFISH dataset for the Mouse Preoptic Hypothalamus region26 was obtained through direct communication with Dr. Jeffrey Moffitt. After excluding cells labeled with ambiguous cell types, the dataset comprises 5,149 cells distributed across 9 distinct cell types, with a gene panel of 135 genes. The Xenium datasets are publicly available at 10x Genomics Datasets. We used the “Fresh Frozen Mouse Brain for Xenium Explorer Demo”41, “Human Kidney Preview Data”1,52, and the “Xenium In Situ Analysis of Alzheimer’s Disease Mouse Model Brain Coronal Sections from One Hemisphere Over a Time Course” 53 datasets for the mouse brain, human kidney, and Alzheimer’s disease experiments, respectively. The Mouse Brain dataset contains 162,033 cells grouped into 50 clusters based on the expression profiles of 248 genes. The Human Kidney dataset comprises 56,509 cells grouped into 19 clusters in the cancer tissue and 97,546 cells grouped into 21 clusters in the control tissue. The gene panel includes 377 genes. The Mouse Alzheimer’s dataset includes six tissue samples from two conditions—Wild Type and Alzheimer’s (TgCRND8 mouse model)—at three timepoints (2.5 months, 5.7 months, and 13+ months). The Alzheimer’s samples contain 53,908, 58,681, and 61,435 cells across the timepoints, while the Wild Type samples contain 58,230, 58,685, and 59,933 cells, with a gene panel of 347 genes. Source data for the figures has been provided in Supplementary Data 29.
Code availability
CellSP is open source and available at https://github.com/bhavaygg/CellSP.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
The online version contains supplementary material available at 10.1038/s42003-025-08891-2.
References
- 1.Maynard, K. R. et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci.24, 425–436 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Watson, B. et al. Spatial transcriptomics of healthy and fibrotic human liver at single-cell resolution. bioRxiv16, 319 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Cadinu, P. et al. Charting the cellular biogeography in colitis reveals fibroblast trajectories and coordinated spatial remodeling. Cell187, 2010–2028 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Shi, H. et al. Spatial atlas of the mouse central nervous system at molecular resolution. Nature622, 552–561 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Farah, E. N. et al. Spatially organized cellular communities form the developing human heart. Nature627, 854–864 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Zhang, M. et al. Molecularly defined and spatially resolved cell atlas of the whole mouse brain. Nature624, 343–354 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Holt, C. E. & Bullock, S. L. Subcellular mRNA localization in animal cells and why it matters. Science326, 1212–1216 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wang, D. et al. DM3Loc: multi-label mRNA subcellular localization prediction and analysis based on multi-head self-attention mechanism,. Nucleic Acids Res.49, e46–e46 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schuster, T. et al. Quantitative determination of the spatial distribution of components in single cells with CellDetail. Nat. Commun.15, 10250 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Barr, J., Yakovlev, K. V., Shidlovskii, Y. & Schedl, P. Establishing and maintaining cell polarity with mRNA localization in Drosophila. BioEssays : N. Rev. Mol., Cell. Dev. Biol.38, 244–253 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dermit, M. et al. Subcellular mRNA localization regulates ribosome biogenesis in migrating cells. Dev. Cell55, 298–313.e10 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Parton, R. M., Davidson, A., Davis, I. & Weil, T. T. Subcellular mRNA localisation at a glance,. J. cell Sci.127, 2127–2133 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Mah, C. K. et al. Bento: a toolkit for subcellular analysis of spatial transcriptomics data. Genome Biol.25, 82 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chen, K. H., Boettiger, A. N., Moffitt, J. R., Wang, S. & Zhuang, X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science348, aaa6090 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Eng, C.-H. L. et al. Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+. Nature568, 235–239 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Wang, X. et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science361, 6400 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Dong, K. & Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat. Commun.13, 1739 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kumar, A. et al. Intracellular spatial transcriptomic analysis toolkit (InSTAnT),. Nat. Commun.15, 7794 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Bierman, R., Dave, J. M., Greif, D. M. & Salzman, J. Statistical analysis supports pervasive RNA subcellular localization and alternative 3’UTR regulation. eLife12, RP87517 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Walter, F. C., Stegle, O. & Velten, B. FISHFactor: a probabilistic factor model for spatial transcriptomics data with subcellular resolution,. Bioinformatics39, btad183 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis,”. BMC Bioinforma.9, 1–13 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci.102, 15545–15550 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Shabalin A. A., Weigman V. J., Perou C. M. and Nobel A. B., Finding large average submatrices in high dimensional data, Ann. Appl. Stat., p. 985–1012, (2009).
- 24.Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of gene ontology terms,. PloS one6, e21800 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.McInnes, L., Healy, J. & Melville, J. UMAP: Uniform manifold approximation and projection for dimension reduction. J. Open Source Softw.3, 861 (2018).
- 26.Moffitt, J. R. et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362, eaau5324 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Wang, H., Tewari, A., Einheber, S., Salzer, J. L. & Melendez-Vasquez, C. V. Myosin II has distinct functions in PNS and CNS myelin sheath formation. J. Cell Biol.182, 1171–1184 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Inouye, H. et al. Myelin organization in the nodal, paranodal, and juxtaparanodal regions revealed by scanning x-ray microdiffraction. PLoS One9, e100592 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Laursen, L. S., Chan, C. W. & ffrench-Constant, C. Translation of myelin basic protein mRNA in oligodendrocytes is regulated by integrin activation and hnRNP-K,. J. Cell Biol.192, 797–811 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Gould R. and Brady S., Identifying mRNAs Residing in Myelinating Oligodendrocyte processes as a basis for understanding internode autonomy. Life, 13 (2023). [DOI] [PMC free article] [PubMed]
- 31.Altevogt, B. M., Kleopa, K. A., Postma, F. R., Scherer, S. S. & Paul, D. L. Connexin29 is uniquely distributed within myelinating glial cells of the central and peripheral nervous systems. J. Neurosci.22, 6458–6470 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Koncina, E., Roth, L., Gonthier, B. & Bagnard, D. Role of semaphorins during axon growth and guidance. Axon Growth Guid.621, 50–64 (2007). [DOI] [PubMed] [Google Scholar]
- 33.Liu, C. et al. Regulator of G protein signaling 5 (RGS5) inhibits sonic hedgehog function in mouse cortical neurons. Mol. Cell. Neurosci.83, 65–73 (2017). [DOI] [PubMed] [Google Scholar]
- 34.Tonge, D. A. et al. Fibronectin supports neurite outgrowth and axonal regeneration of adult brain neurons in vitro. Brain Res.1453, 8–16 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Petukhov, V. et al. Cell segmentation in imaging-based spatial transcriptomics,. Nat. Biotechnol.40, 345–354 (2021). [DOI] [PubMed] [Google Scholar]
- 36.Thul, P. J. et al. A subcellular map of the human proteome. Science356, 6340 (2017). [DOI] [PubMed] [Google Scholar]
- 37.The Human Protein Atlas, RGS5.
- 38.Wang, J., Yin, L. & Chen, Z. Neuroprotective role of fibronectin in neuron-glial extrasynaptic transmission. Neural Regen. Res.8, 376–382 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Togashi, H., Sakisaka, T. & Takai, Y. Cell adhesion molecules in the central nervous system. Cell Adhes. Migr.3, 29–35 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Biran, J., Tahor, M., Wircer, E. & Levkowitz, G. Role of developmental factors in hypothalamic function. Front. Neuroanat.9, 47 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.10x Genomics, Fresh Frozen Mouse Brain Replicates, In Situ Gene Expression dataset analyzed using Xenium Onboard Analysis 1.0.2. (2023).
- 42.Abrams, C. K. Diseases of connexins expressed in myelinating glia. Neurosci. Lett.695, 91–99 (2019). [DOI] [PubMed] [Google Scholar]
- 43.Turnescu, T. et al. Sox8 and Sox10 jointly maintain myelin gene expression in oligodendrocytes,. Glia66, 279–294 (2018). [DOI] [PubMed] [Google Scholar]
- 44.Tang, R., Vargas-Medrano, J., Ramos, E., Thompson, P. & Gadad, B. Gene Expression Analysis of CCL2, MOBP and OPALIN in Major Depressive Disorder and Suicidality (P5-6.005). Neurology98, 3266 (2022). [Google Scholar]
- 45.Hornig, J. et al. The transcription factors Sox10 and Myrf define an essential regulatory network module in differentiating oligodendrocytes,. PLoS Genet.9, e1003907 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.de Faria, O. Jr et al. TMEM10 promotes oligodendrocyte differentiation and is expressed by oligodendrocytes in human remyelinating multiple sclerosis plaques. Sci. Rep.9, 3606 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.George, S., Hamblin, M. R. & Abrahamse, H. Differentiation of mesenchymal stem cells to neuroglia: in the context of cell signalling,. Stem Cell Rev. Rep.15, 814–826 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Dicken, M. S., Hughes, A. R. & Hentges, S. T. Gad1 mRNA as a reliable indicator of altered GABA release from orexigenic neurons in the hypothalamus. Eur. J. Neurosci.42, 2644–2653 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Pan, Z. Z. Transcriptional control of Gad2. Transcription3, 68–72 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Schlüter, O. M., Basu, J., Südhof, T. C. & Rosenmund, C. Rab3 superprimes synaptic vesicles for release: implications for short-term synaptic plasticity. J. Neurosci.26, 1239–1246 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Tsetsenis, T. et al. Rab3B protein is required for long-term depression of hippocampal inhibitory synapses and for normal reversal learning. Proc. Natl. Acad. Sci.108, 14300–14305 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.10x Genomics. Human Kidney Preview Data (Xenium Human Multi-Tissue and Cancer Panel). In Situ Gene Expression dataset analyzed using Xenium Onboard Analysis 1.5.0, (2023).
- 53.10x Genomics, Xenium In Situ Analysis of Alzheimer’s Disease Mouse Model Brain Coronal Sections from One Hemisphere Over a Time Course, In Situ Gene Expression dataset analyzed using Xenium Onboard Analysis 1.4.0, (2023).
- 54.Novikova, G. et al. Integration of Alzheimer’s disease genetics and myeloid genomics identifies disease risk regulatory elements and genes. Nat. Commun.12, 1610 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Baig, S. et al. Distribution and expression of picalm in Alzheimer disease. J. Neuropathol. Exp. Neurol.69, 1071–1077 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Xu, W., Tan, L. & Yu, J.-T. The Role of PICALM in Alzheimer’s disease. Mol. Neurobiol.52, 399–413 (2015). [DOI] [PubMed] [Google Scholar]
- 57.Xu, W. et al. The impact of PICALM genetic variations on reserve capacity of posterior cingulate in AD continuum. Sci. Rep.6, 24480 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.10x Genomics, Exploring Alzheimer’s-like pathology at subcellular resolution using Xenium In Situ.
- 59.Yang, J. et al. TREM2 ectodomain and its soluble form in Alzheimer’s disease. J. Neuroinflamm.17, 204 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Lue, L.-F. et al. TREM2 Protein Expression changes correlate with Alzheimer’s disease neurodegenerative pathologies in post-mortem temporal cortices. Brain Pathol.25, 469–480 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Qin, Q. et al. TREM2, microglia, and Alzheimer’s disease. Mech. Ageing Dev.195, 111438 (2021). [DOI] [PubMed] [Google Scholar]
- 62.Papuć, E. & Rejdak, K. The role of myelin damage in Alzheimer’s disease pathology. Arch. Med. Sci.16, 345–351 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Ruskamo, S. et al. Human myelin proteolipid protein structure and lipid bilayer stacking. Cell. Mol. Life Sci.79, 419 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Montague, P., McCallion, A. S., Davies, R. W. & Griffiths, I. R. Myelin-associated oligodendrocytic basic protein: a family of abundant CNS myelin proteins in search of a function. Dev. Neurosci.28, 479–487 (2006). [DOI] [PubMed] [Google Scholar]
- 65.Gravel M., Trapp B., Peterson J. & Braun P. E., CNP in myelination: Overexpression alters oligodendrocyte morphogenesis, Cell Biology and Pathology of Myelin: Evolving Biological Concepts and Therapeutic Approaches, p. 75–82, (1997).
- 66.Sturrock, R. R. Myelination of the mouse corpus callosum,. Neuropathol. Appl. Neurobiol.6, 415–420 (1980). [DOI] [PubMed] [Google Scholar]
- 67.Ota, M. et al. Changes of myelin organization in patients with Alzheimer’s disease shown by q-space myelin map imaging,. Dement. Geriatr. Cogn. Disord. Extra9, 24–33 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Stickels, R. R. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat. Biotechnol.39, 313–319 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Zeng, H. et al. Integrative in situ mapping of single-cell transcriptional states and tissue histopathology in a mouse model of Alzheimer’s disease. Nat. Neurosci.26, 430–446 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Kim, Y. et al., Seq-Scope: repurposing Illumina sequencing flow cells for high-resolution spatial transcriptomics, Nat. Protocols, p. 1–47, (2024). [DOI] [PMC free article] [PubMed]
- 71.Santangelo, P. J., Nitin, N. & Bao, G. Direct visualization of mRNA colocalization with mitochondria in living cells using molecular beacons. J. Biomed. Opt.10, 044025–044025 (2005). [DOI] [PubMed] [Google Scholar]
- 72.Das, S., Vera, M., Gandin, V., Singer, R. H. & Tutucci, E. Intracellular mRNA transport and localized translation. Nat. Rev. Mol. Cell Biol.22, 483–504 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Dryden I. L. and Mardia K. V., Statistical Shape Analysis, (Wiley: New York, NY), (1998).
- 74.Mi, H. et al. PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res.49, D394–D403 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Biancalani, T. et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat. methods18, 1352–1362 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Lundberg S. M. and Lee S.-I., A Unified Approach to Interpreting Model Predictions, In Advances in Neural Information Processing Systems 30, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S. and Garnett R., Eds., Curran Associates, Inc., p. 4765–4774. (2017).
- 77.Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell.2, 2522–5839 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.S. Seabold and J. Perktold, statsmodels: Econometric and statistical modeling with Python, In 9th Python in Science Conference, (2010).
- 79.Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol.19, 1–5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Rezaie, R. N., Reese, F. & Mortazavi, A. PyWGCNA: a Python package for weighted gene co-expression network analysis,. Bioinformatics39, btad415 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Padilha, V. A. & Campello, R. J. G. B. A systematic comparative evaluation of biclustering techniques. BMC Bioinforma.18, 1–25 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Description of Additional Supplementary Data
Data Availability Statement
The MERFISH dataset for the Mouse Preoptic Hypothalamus region26 was obtained through direct communication with Dr. Jeffrey Moffitt. After excluding cells labeled with ambiguous cell types, the dataset comprises 5,149 cells distributed across 9 distinct cell types, with a gene panel of 135 genes. The Xenium datasets are publicly available at 10x Genomics Datasets. We used the “Fresh Frozen Mouse Brain for Xenium Explorer Demo”41, “Human Kidney Preview Data”1,52, and the “Xenium In Situ Analysis of Alzheimer’s Disease Mouse Model Brain Coronal Sections from One Hemisphere Over a Time Course” 53 datasets for the mouse brain, human kidney, and Alzheimer’s disease experiments, respectively. The Mouse Brain dataset contains 162,033 cells grouped into 50 clusters based on the expression profiles of 248 genes. The Human Kidney dataset comprises 56,509 cells grouped into 19 clusters in the cancer tissue and 97,546 cells grouped into 21 clusters in the control tissue. The gene panel includes 377 genes. The Mouse Alzheimer’s dataset includes six tissue samples from two conditions—Wild Type and Alzheimer’s (TgCRND8 mouse model)—at three timepoints (2.5 months, 5.7 months, and 13+ months). The Alzheimer’s samples contain 53,908, 58,681, and 61,435 cells across the timepoints, while the Wild Type samples contain 58,230, 58,685, and 59,933 cells, with a gene panel of 347 genes. Source data for the figures has been provided in Supplementary Data 29.
CellSP is open source and available at https://github.com/bhavaygg/CellSP.





