Abstract
Advances in spatially-resolved transcriptomics (SRT) technologies have propelled the development of new computational analysis methods to unlock biological insights. As the cost of generating these data decreases, these technologies provide an exciting opportunity to create large-scale atlases that integrate SRT data across multiple tissues, individuals, species, or phenotypes to perform population-level analyses. Here, we describe unique challenges of varying spatial resolutions in SRT data, as well as highlight the opportunities for standardized preprocessing methods along with computational algorithms amenable to atlas-scale datasets leading to improved sensitivity and reproducibility in the future.
Keywords: Spatially-resolved transcriptomics, multi-sample, integrative analysis, spatial registration, spatial alignment, population-level
Comprehensive atlases of molecular profiles with spatial resolution have the power to provide new insights into human health and disease, which can transform the future of medicine via improved diagnostics and targeted therapies [1,2]. Recent commercialization has led to broad accessibility and hence collection of substantial amounts of spatially-resolved transcriptomics (SRT) data, signifying a new era for spatial cellular atlases and charting the unknown territory of life science [3]. These technologies enable mapping of heterogeneous cell populations in situ to tissue architectures, equipping investigators to study the relationships between structure and biological activities [4]. Computational tools and analytic strategies that can fully exploit the atlas-scale SRT data and increase the power to detect small biological signals are critically needed [5–7]. However, integrating multiple tissues [8], developmental stages [9,10], species [11,12], or phenotypes [13] to perform population-level analyses faces new and unique challenges.
In contrast to single-sample analyses [14], performing population-level analyses with an integrated set of SRT samples has the potential to identify spatially-dependent commonalities and differences at the population-level across disease states or conditions such as Alzheimer’s disease [15,16], schizophrenia [17], and cancer [18,19]. Here, we discuss the computational challenges involved in analyzing an integrative spatial atlas across tissues and individuals with a focus on the existing computational strategies currently available as well as future opportunities for development. We focus on the challenge of integrating SRT samples where observations are measured at different levels of spatial resolution due to inherent capabilities and limitations of the employed technologies. We illustrate that varying levels of resolution combined with differences in the features captured can lead to spurious findings in downstream analyses, such as dimensionality reduction. These problems are exacerbated by challenges faced in bulk and single-cell/nucleus RNA-sequencing (sc/snRNA-seq) data, such as sparsity and noise [20]. Finally, we summarize the state-of-the-art methods for integrating multiple SRT samples to perform population-level analyses.
From bulk to single-cell and spatial resolution
As the implementation of integration has become commonplace as the number of SRT datasets has increased, the value of reliably identifying shared or distinctive cellular features across these data sets has been demonstrated. Unwanted variation across samples or datasets, which are ubiquitous across most sequencing modalities ranging from bulk to single-cell data [21,22], and are routinely referred to as batch effects, are an inevitable challenge faced during integration [20,23]. This undesired heterogeneity usually comes from artifacts such as differences in handling protocols, library preparation and sequencing platforms. Therefore, correcting for batch effects is a major goal of integration. Examples for how to correct for batch effects in bulk RNA-seq include the use of statistical modeling to adjust for sample-level differences [24–26] along with the use of control genes [27].
In contrast to bulk RNA-seq, which measures gene expression in one sample that is averaged across measured cells, scRNA-seq measures gene expression across thousands to millions of cells and introduces more heterogeneity in the gene expression space. Therefore, as we moved from bulk to single-cell resolution, one type of integration strategy that was developed for scRNA-seq experiments was to identify groups of cells that share similar expression patterns across batches (called anchors). Broadly these approaches use similarity-based methods in a reduced dimension space, such as mutual nearest neighbors (MNN) [28], Harmony [29], and canonical correlation [30]. The key idea is that similar cells should follow a common distribution in the latent space regardless of batch. As an extension of dimension reduction methods, generative models effectively help capture nonlinear characteristics of batch effects and systematic biological signals, such as improving the exhaustiveness of artifact elimination [31].
However, a prominent feature of scRNA-seq data is that the measured observation, namely gene expression in one cell, is the same, in principle, across all observations measured in multiple scRNA-seq experiments. With SRT, the observations that we measure within a tissue may be the same, but the resolution of observations across multiple samples may not be the same (Figure 1). Therefore, while these integrative methods developed for bulk and scRNA-seq experiments demonstrate significant success when integrating bulk and single-cell data, it remains unclear how well these methods will work for SRT data due to intrinsic differences in experimental protocols and the biological context of generated data. For example, this motivates the use of alternative pieces of information, such as anatomical landmarks [32,33], to assist in the construction of population-level spatial atlases, but these are not always relevant, for example with cancer tissue.
Figure 1. Schematic of experimental platforms and cellular resolutions across transcriptomics technologies.
Considering three different experimental platforms (i) bulk RNA-sequencing, (ii) single-cell/nucleus RNA-sequencing, and (iii) spatially-resolved transcriptomics, each of these can profile gene expression at different cellular resolutions, including cellular, near-cellular, and sub-cellular. Differences in experimental platforms also have differences in the units being measured, including observational units and biological units, where observational units describe the observations that we measure and the biological units describe the cellular structure that the observation unit captures.
Inconsistent spatial and biological resolutions challenge cross-technology integration
‘Spatially-resolved transcriptomics’ [34], is used as an umbrella term for multiple distinct technologies that can measure spatial gene expression [35]. However, due to intrinsic differences in how they measure spatial gene expression, these data have unique computational and biological properties that make using integration strategies developed for scRNA-seq data analysis challenging.
For example, the units in which we measure observations, namely individual cells or groups of cells, referred to as observational units, vary substantially across the SRT platforms. In image-based, targeted, in situ transcriptomic profiling, such as MERFISH [36] or Xenium [37], gene expression is captured from a targeted subset of genes at molecule-level resolution, where the molecules are aggregated together to computationally infer the “cellular” observational unit using cell segmentation algorithms. In contrast, non-targeted RNA capture and sequencing profiling, such as Slide-seq [38] or Visium [39], captures RNA on an array-based platform at different resolutions, including “near-cellular” (such as 55 µm spots on the Visium platform) or “sub-cellular” (such as 2 μm grids on the VisiumHD platform [40]). Integration of data generated across technologies with different observational units requires special attention.
Unlike the concept of the observational unit whose distinction across SRT technologies is well acknowledged [3], the heterogeneity in biological content being profiled across observations generated from the same SRT technology is often overlooked. Sc/snRNA-seq protocols often employ cell dissociation techniques to isolate individual cells or nuclei. When processed correctly, data observations have a uniform and biologically meaningful unit (referred to as biological unit hereafter), cell or nuclei, across samples and studies. However, because the profiling happens in situ, this property is often missing in SRT data. For example, with sequencing-based SRT technologies, the profiling within the observational unit is constrained by physical size, e.g. spots or grids, so the generated data observation frequently does not maintain a uniform biological content, and hence the biological unit of data observations varies widely. Specifically, the cellular structures being captured across observations could include both soma of cells and the extracellular space between multiple cell bodies (Figure 1). Inconsistency in biological units in SRT datasets greatly challenges the fundamental assumption that many integration methods for sc/snRNA seq data depend on, namely that each observation is an individual cell. This can lead to spurious results or biases in fundamental data preprocessing steps, such as data normalization [41], quality control [42], and in turn propagate through downstream integration steps.
Even the single-cell resolution image-based SRT technologies may suffer from the inconsistency of biological units across data observations. Despite individual SRT tissue sections being conceptually treated as 2D objects, each tissue section has a 3D structure, meaning that the tissue section has some dimension into the Z-plane. Depending on their orientation, it is possible for cells to be bisected during tissue sectioning. In this scenario a cell will not maintain, full integrity since only a portion of the cell structure is captured (Figure 1).
Moreover, many image-based SRT technologies require iterative imaging of small regions of a tissue section. This iterative imaging procedure creates cells that are located at the boundary of images and hence only partially profiled, resulting in variation in captured genes [41,43]. Although the degree of variation in biological units is smaller than sequencing-based SRT data, further research is necessary to understand the downstream impact of these confounders in data analysis and integration.
In addition to addressing the inconsistency in observational and biological units when integrating across SRT datasets, another significant challenge is to mitigate the inclusion of divergent sets of assayed genes across platforms or studies. While targeted profiling technologies provide better spatial resolution, they are often limited by the amount of genes profiled. Specifically, targeted panels focus on measuring pre-selected sets of genes that are often tissue- or disease-specific, occasionally with some additional genes that are customized to individual studies. Unlike transcriptome-wide sequencing technologies that capture all genes and hence have a consistent set of genes across studies, the divergent sets of assayed genes from targeted-panels lead to missing gene features when integrating data collected from different studies. Also, the mismatching of gene profiles due to the frequent missing genes issue of the targeted technologies prevents the direct adoption of scRNA-seq methods to integrate data generated from targeted and non-targeted platforms.
Case study: cross-platform integration using cell type-based anchors
In the following section, we highlight a few examples of how the unique properties of this data generate computational challenges for integrating multiple SRT samples.
Normalization is a critical step in processing transcriptomics data to remove variation due to technical noise. Current normalization practices for SRT data, regardless of platform, are directly adopted from scRNA-seq pipeline. However, whether this practice is uniformly appropriate for the diverse types of SRT data remains unclear. A common practice is to normalize the expression of each gene according to the total number of transcripts detected, often referred to as library size normalization. The library size normalization is based on the assumption that the variation in library size across samples is due to technical reasons. However, due to the inconsistent biological unit across samples, library size could reflect variation attributed to the differences in underlying biology. Hence, library size normalization can overcorrect the technical variation and potentially reduce the biological signal. As a result, downstream tasks, such as spatial clustering to establish functional regions, are significantly impacted [44]. Moreover, Atta et al recently demonstrated that applying library size normalization to targeted SRT data could result in false positive and false negative findings in differential expression testing and spatially variable gene detection [41]. Relevantly, many QC methods rely on descriptive metrics such as library size, total gene detected, which is not robust to the inconsistent biological unit unique to SRT data. Totty et al [42] recently showed that scRNA-seq inspired quality control methods could result in differential removal of data observations across multiple biological functional regions in an undesirable way.
Additionally, cell types, often used as anchors to harmonize multiple datasets in sc/snRNA-seq, could be substantially challenging to be properly defined from both the computational perspective and philosophical perspective. While the main intuition for cell type annotation is that the difference in gene signature between data observations, i.e. cells, is driven by the difference of the cell types, the implicit assumption here is that the data observations are single cells. Nevertheless, for near-cellular and sub-celluar resolution SRT data, such assumption is often violated. Mapping observations with different biological units to the common latent space can create dubious clusters that lack biological meanings and confound cell type-driven anchors for cross-study integration. For example, in near cellular resolution platforms, each observation could contain a homogeneous or heterogeneous cell population, resulting in distinction in biological units across observations beyond simply capturing different numbers of cells. This creates challenges to define cell type-driven anchors and leads to extra cell type clusters, which, in fact, should be merged with existing clearly-defined cell types (Figure 2B). In another case, targeted platforms can miss important marker genes. Thus, when integrated with data generated from transcriptome-wide platforms that have a full spectrum of genes, the anchors cannot be accurately established such that some cell types cannot be successfully differentiated (Figure 2C). Analytically, the SRT technologies provide an unprecedented opportunity to study the molecular mechanisms underpinning the heterogeneities in functions across tissue regions. Many research questions of interest, investigated using SRT platforms, focus on heterogeneity in gene expression associated with functional regions instead of cell types, requiring a switch of thinking from the cell-type centric to the tissue-centric [45]. As a result, the integration tools and strategies that account for both gene expression space and physical space are highly motivated to establish a common coordinate framework [33].
Figure 2. Schematic of cell-type driven integration in gene expression space across multiple SRT technologies.
(A) With accurate normalization removing technical variation, integration of image-based SRT follows single-cell practice. (B) Mapping observations that have different biological units to a common latent space results in dubious clusters that lack biological meanings and confound cell-type driven anchors for cross-study integration. (C) Integrating SRT datasets generated with different gene panels (targeted vs non-targeted) creates challenges to computationally define gene expression space where cell-type based anchors cannot be clearly defined due to missing marker genes in targeted platforms.
State-of-the-art methods
Broadly, methods developed for bulk or scRNA-seq are being widely applied to spatial data, despite the problems outlined above. However, new methods to integrate multiple samples for spatial transcriptomics data have recently been developed. In this section, we outline the modern methods specifically designed for spatial data and give recommendations to data analysts and users of these methods.
Integration in a physical space
The first category that we consider is to integrate multiple samples in a physical space. Within this category, we further distinguish approaches based on the type of data being integrated including (i) the alignment of two tissue slices from the same tissue block or from different tissue blocks, but both profiling the transcriptome in a 2D space and (ii) the registration of a set of dissociated single cells to one tissue slice profiling the transcriptome in a 2D space.
Early work of spatial alignment were computer-assisted, requiring human input, such as manually defined anatomical landmarks, and computationally relies on the affine transformation, e.g. using iterative closest point algorithm [46], of high-resolution images of samples, e.g. hematoxylin and eosin (H&E) or immunofluorescent images, to address rotations and shifting. Then, various methods were developed to address possible nonlinear distortion, leveraging thin plate spline [47], Gaussian process [48], diffeomorphic metric mapping [49]. Because spatial alignment of tissue images normally requires different degrees of involvement in manual labor, a significant challenge is how to scale it to atlas-scale data sets that contain hundreds of samples. Considerable approaches have been proposed to address this challenge, most involving modeling the entire gene expression profiles accounting for the glocal structure of the spatial unit arrangement, including the two-layer Gaussian process model [48], diffeomorphic metric mapping [49], optimal transport [50], and a graph adversarial matching strategy [51]. These methods seem to be well motivated for the alignment of (i) samples with partial matching, also referred to as spatially heterogeneous samples, (ii) spatial alignment to a reference or template, such as a reference include a predefined Brain atlas [52,53], or (iii) samples across different SRT technologies or possible of various phenotype readouts, such as gene expression and protein expression.
In addition, the spatial registration of single cells to a 2D tissue section provides another venue for spatial integration that mitigates analytic challenges due to morphological variation. Specifically, isolated single cells, or possibly spots, are computationally mapped to a spatial common coordinate system/spatial template based on their molecular signature. By mapping all cells isolated from a tissue (either experimentally through scRNA-seq or computationally with SRT data) to a reference or template tissue with uniform morphology, the tissue slices are hence aligned in the physical space with conformable shapes. Developed for the spatial reference assayed with low-throughput technologies, early methods use a set of pre-specified marker genes to anchor cells to a limited number of spatial positions, such as the tessellation of a 2D surface, in Gaussian mixture models [54] or Monte Carlo simulations [55]. To allow non-informative mapping without external reference and improve the spatial resolution of the mapping, advanced machine learning frameworks are adapted, including multiple variations of optimal transport algorithms [56,57], convex optimization using the Jonker–Volgenant shortest augmenting path algorithm [58], and deep neural networks [59,60]. Some spatially-aware deconvolution methods can be also used for the spatial registration of cells [61].
While these methods are promising, it is important to be cautious in the interpretation because it is possible that spurious gene expression correlations are found due to known problems such as ‘double dipping’ [62]. Given the fast-paced nature of this research area, multi-modal integration approaches might also be useful to avoid the circular problem of double dipping and directly use different data modalities.
Integration in a latent space
The second approach ignores (for the most part) the physical space, and focuses on integrating the samples in the latent space. Recent examples are the use of deep learning models [63–65], which combine spatial neighbor networks and graph auto-encoders to learn latent embeddings. In contrast to integration in the physical space, integration in the latent space is anchored in the feature (gene expression) space, while also borrowing information from nearby, physically adjacent cells/spots. Feature space integration has a long history in scRNA data analysis, where gene expression data are projected into the same latent space accounting for batch effects and the downstream investigation is completed in the shared latent space. However, integration that ignores the physical relationships between cells is vulnerable to noise in the data, particularly when the biological signal is small. The spatial information is introduced to the latent space integration to remove the noise, such as in the form of dimension reduction, or possibly in the form of clustering. Distinctly, PRECAST [66] models any arbitrary tissue architecture across multiple tissue samples by factorizing the input into a latent factor with a shared distribution for cell/domain and then using an intrinsic conditional autoregressive component to capture the spatial dependence for spatial clustering. In addition, the spatial information can be also used to smooth out any possible noise. For example, BayesSpace [67] uses majority voting to accomplish spatial smoothing of cluster membership. Once these spatial domains are identified, which normally aligns with anatomical definitions, cells/spots are pseudo-bulked across individual samples, followed by nominal analysis, such as differential expression analysis. However, this pseudo-bulking analysis normally lacks sensitivity for local spatial signals and hence would not be helpful for any granular analysis to study micro-environment.
Integration in using pseudobulking approaches
Given the flourishing development of pseudobulking approaches in integration of scRNAseq data [68,69], it is natural to extend this for SRT data where gene expression is aggregated across spots within a spatial domain and a tissue section. As the analysis is conducted at the observational unit of a spatial domain and tissue section, there is no need to address morphological variation. Furthermore, this approach enables existing methods, designed for bulk RNA-sequencing to be used in this setting. For example, pseudobulking SRT data can be used to identify differentially expressed genes (DEGs) across spatial domains with multiple tissue blocks or individuals, which has been successfully applied in human brain tissue including dorsolateral prefrontal cortex (DLPFC) [70,71], locus coeruleus (LC) [72], and the hippocampus (HPC) [73]. In addition, pseudobulking provides a scalable solution to analyzing atlas-scale SRT data, motivated by the underlying “divide-and-conquer” or “map-and-reduce” philosophy.
However, this approach is not necessarily appropriate for all research questions and methods for integrative analysis using SRT data. While this approach sidesteps the challenges due to morphological variation, it also ignores variation due to gene expression patterns varying within a spatial domain as that information is aggregated together into one summary statistic. This loss of information might be particularly important for some downstream analyses, such as identifying spatially variable genes, cellular deconvolution, and cell-cell communication. Pseudobulking or other approaches that summarize features at the sample-level [74] might not be appropriate in these cases.
Furthermore, there are open opportunities to improve the sensitivity and robustness of the statistical models used to perform integrative analyses using pseudobulk data. Current approaches use linear models with empirical Bayes techniques to identify DEGs where the spatial domains are assumed to be discrete. However, more modern models could be considered where the spatial domains are more continuous across a 2D space. In addition, this approach requires labor-intensive human intervention, for example, the need to harmonize the labels corresponding to the spatial domains from unsupervised clustering algorithms. Specifically, the spatial alignment can be addressed via a manual operation, but this can be time-consuming and is prone to human error, leading to potentially unreliable and unreplicable results.
Discussion
The aim of this work is to both describe the historical context and summarize state-of-the-art strategies to perform population-level analyses that can overcome potential systematic biases in SRT data. The three approaches can broadly be summarized as (i) integration in a physical space, (ii) integration in a latent space, and (iii) integration using pseudobulking or similar approaches. Despite this, approaches developed for scRNA-seq remain widely used in practice. While it remains unclear which type of approach is best to integrate SRT data, there are many ongoing and active efforts to begin comparing these approaches through robust benchmark evaluations.
Furthermore, while much progress has been made towards early strategies, there remain important open challenges to be addressed. For example, when using approaches to integrate tissue sections in a physical space, it remains unclear how to identify the partial overlap and quantify how much area is needed for successful alignment. In addition, smoothing of gene expression is often used implicitly or explicitly as a step in the alignment process before passing to downstream analysis. Further validation to avoid potential over smoothing that eliminates nuanced biological signals in micro environments would be greatly encouraged to avoid introducing computational artifacts. These challenges also relate to potential differences in the amount of biological tissue measured, where one can imagine new spatial platforms capturing a larger tissue area compared to older spatial platforms.
There also remain challenges with the current state-of-the-art strategies, such as the accuracy of cell segmentation, which remains one of the largest challenges with SRT data. Also, while pseudobulking enables the integration across datasets with different observational units, this approach also potentially masks important spatial variation within a given spatial domain. Therefore, we imagine new computational tools being developed that can integrate multiple samples measured with different observational units to take advantage of the full rich information provided by multi-sample SRT datasets.
Acknowledgements:
We would like to thank Kasper Hansen and members of the Hicks Lab for their feedback on this commentary. We would also like to thank our collaborators at the Lieber Institute for Brian Development for input and feedback.
Funding:
This project was supported by the National Institute of Mental Health [R01MH126393 to B.G., S.H.K., K.M., S.C.H.], and the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation [DAF2023–323340 to S.C.H., P.P., S.G.], the National Institute of General Medical Sciences [R01GM151301 to W.L.], and Australian Research Council Discovery Early Career Researcher Awards (DE220100964) funded by the Australian Government to S.G.; Chan Zuckerberg Initiative Single Cell Biology Data Insights grant (2022–249319) to S.G. and P.P.. All funding bodies had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Abbreviations:
- sc/snRNA-seq
single-cell/nucleus RNA-sequencing
- SRT
spatial transcriptomics
- H&E
hematoxylin and eosin
- DLPFC
dorsolateral prefrontal cortex
- HPC
hippocampus
- LC
locus coeruleus
- DEGs
differential expressed genes
Footnotes
Conflict of Interest:
The authors have no declared conflicts of interests.
Data availability:
We used previously published data, which we referenced in the body of the text.
Bibliography
- 1.Rood JE, Maartens A, Hupalowska A, Teichmann SA, Regev A. Impact of the Human Cell Atlas on medicine. Nat Med. 2022;28: 2486–2496. doi: 10.1038/s41591-022-02104-7 [DOI] [PubMed] [Google Scholar]
- 2.Park J, Gregorio R, Hissong E, Patel S, Robinson B, Socciarelli F, et al. Abstract 4900: Spatial Atlas of Human Anatomy (SAHA): A subcellular, multiscale spatial odyssey of immune and gastrointestinal tissues and tumors. Cancer Res. 2024;84: 4900–4900. doi: 10.1158/1538-7445.AM2024-4900 [DOI] [Google Scholar]
- 3.Moses L, Pachter L. Museum of spatial transcriptomics. Nat Methods. 2022;19: 534–546. doi: 10.1038/s41592-022-01409-2 [DOI] [PubMed] [Google Scholar]
- 4.Rao A, Barkley D, França GS, Yanai I. Exploring tissue architecture using spatial transcriptomics. Nature. 2021;596: 211–220. doi: 10.1038/s41586-021-03634-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cheng M, Jiang Y, Xu J, Mentis A-FA, Wang S, Zheng H, et al. Spatially resolved transcriptomics: a comprehensive review of their technological advances, applications, and challenges. J Genet Genomics. 2023;50: 625–640. doi: 10.1016/j.jgg.2023.03.011 [DOI] [PubMed] [Google Scholar]
- 6.Piwecka M, Rajewsky N, Rybak-Wolf A. Single-cell and spatial transcriptomics: deciphering brain complexity in health and disease. Nat Rev Neurol. 2023;19: 346–362. doi: 10.1038/s41582-023-00809-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Asp M, Bergenstråhle J, Lundeberg J. Spatially Resolved Transcriptomes-Next Generation Tools for Tissue Exploration. Bioessays. 2020;42: e1900221. doi: 10.1002/bies.201900221 [DOI] [PubMed] [Google Scholar]
- 8.Jain S, Pei L, Spraggins JM, Angelo M, Carson JP, Gehlenborg N, et al. Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nat Cell Biol. 2023;25: 1089–1100. doi: 10.1038/s41556-023-01194-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Ardini-Poleske ME, Clark RF, Ansong C, Carson JP, Corley RA, Deutsch GH, et al. Lungmap: the molecular atlas of lung development program. Am J Physiol Lung Cell Mol Physiol. 2017;313: L733–L740. doi: 10.1152/ajplung.00139.2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Lohoff T, Ghazanfar S, Missarova A, Koulena N, Pierson N, Griffiths JA, et al. Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nat Biotechnol. 2022;40: 74–85. doi: 10.1038/s41587-021-01006-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lotfollahi M, Yuhan Hao, Theis FJ, Satija R. The future of rapid and automated single-cell data analysis using reference mapping. Cell. 2024;187: 2343–2358. doi: 10.1016/j.cell.2024.03.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Rosen Y, Brbić M, Roohani Y, Swanson K, Li Z, Leskovec J. Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Nat Methods. 2024. doi: 10.1038/s41592-024-02191-z [DOI] [PMC free article] [PubMed]
- 13.Kumar T, Nee K, Wei R, He S, Nguyen QH, Bai S, et al. A spatially resolved single-cell genomic atlas of the adult human breast. Nature. 2023;620: 181–191. doi: 10.1038/s41586-023-06252-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Atta L, Fan J. Computational challenges and opportunities in spatially resolved transcriptomic data analysis. Nat Commun. 2021;12: 5283. doi: 10.1038/s41467-021-25557-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Miyoshi E, Morabito S, Henningfield CM, Rahimzadeh N, Kiani Shabestari S, Das S, et al. Spatial and single-nucleus transcriptomic analysis of genetic and sporadic forms of Alzheimer’s Disease. BioRxiv. 2023. doi: 10.1101/2023.07.24.550282 [DOI] [PMC free article] [PubMed]
- 16.Wang C, Acosta D, McNutt M, Bian J, Ma A, Fu H, et al. A single-cell and spatial RNA-seq database for Alzheimer’s disease (ssREAD). Nat Commun. 2024;15: 4710. doi: 10.1038/s41467-024-49133-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Leon J, Yoshinaga S, Hino M, Nagaoka A, Ando Y, Moody J, et al. Integrative transcriptomics reveals layer 1 astrocytes altered in schizophrenia. BioRxiv. 2024. doi: 10.1101/2024.06.27.601103 [DOI]
- 18.Arora R, Cao C, Kumar M, Sinha S, Chanda A, McNeil R, et al. Spatial transcriptomics reveals distinct and conserved tumor core and edge architectures that predict survival and targeted therapy response. Nat Commun. 2023;14: 5029. doi: 10.1038/s41467-023-40271-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Denisenko E, de Kock L, Tan A, Beasley AB, Beilin M, Jones ME, et al. Spatial transcriptomics reveals discrete tumour microenvironments and autocrine loops within ovarian cancer subclones. Nat Commun. 2024;15: 2860. doi: 10.1038/s41467-024-47271-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, et al. Eleven grand challenges in single-cell data science. Genome Biol. 2020;21: 31. doi: 10.1186/s13059-020-1926-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Hicks SC, Townes FW, Teng M, Irizarry RA. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2018;19: 562–578. doi: 10.1093/biostatistics/kxx053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Luecken MD, Büttner M, Chaichoompu K, Danese A, Interlandi M, Mueller MF, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods. 2022;19: 41–50. doi: 10.1038/s41592-021-01336-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet. 2010;11: 733–739. doi: 10.1038/nrg2825 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 2014;15: R29. doi: 10.1186/gb-2014-15-2-r29 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26: 139–140. doi: 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15: 550. doi: 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol. 2014;32: 896–902. doi: 10.1038/nbt.2931 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36: 421–427. doi: 10.1038/nbt.4091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods. 2019;16: 1289–1296. doi: 10.1038/s41592-019-0619-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36: 411–420. doi: 10.1038/nbt.4096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15: 1053–1058. doi: 10.1038/s41592-018-0229-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Ekvall M, Bergenstråhle L, Andersson A, Czarnewski P, Olegård J, Käll L, et al. Spatial landmark detection and tissue registration with deep learning. Nat Methods. 2024;21: 673–679. doi: 10.1038/s41592-024-02199-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Rood JE, Stuart T, Ghazanfar S, Biancalani T, Fisher E, Butler A, et al. Toward a common coordinate framework for the human body. Cell. 2019;179: 1455–1467. doi: 10.1016/j.cell.2019.11.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Marx V. Method of the Year: spatially resolved transcriptomics. Nat Methods. 2021;18: 9–14. doi: 10.1038/s41592-020-01033-y [DOI] [PubMed] [Google Scholar]
- 35.Alexandrov T, Saez-Rodriguez J, Saka SK. Enablers and challenges of spatial omics, a melting pot of technologies. Mol Syst Biol. 2023;19: e10571. doi: 10.15252/msb.202110571 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348: aaa6090. doi: 10.1126/science.aaa6090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Janesick A, Shelansky R, Gottscho AD, Wagner F, Williams SR, Rouault M, et al. High resolution mapping of the tumor microenvironment using integrated single-cell, spatial and in situ analysis. Nat Commun. 2023;14: 8353. doi: 10.1038/s41467-023-43458-x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rodriques SG, Stickels RR, Goeva A, Martin CA, Murray E, Vanderburg CR, et al. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363: 1463–1467. doi: 10.1126/science.aaw1219 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ståhl PL, Salmén F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 2016;353: 78–82. doi: 10.1126/science.aaf2403 [DOI] [PubMed] [Google Scholar]
- 40.Oliveira MF, Romero JP, Chung M, Williams S, Gottscho AD, Gupta A, et al. Characterization of immune cell populations in the tumor microenvironment of colorectal cancer using high definition spatial profiling. BioRxiv. 2024. doi: 10.1101/2024.06.04.597233 [DOI]
- 41.Atta L, Clifton K, Anant M, Aihara G, Fan J. Gene count normalization in single-cell imaging-based spatially resolved transcriptomics. Genome Biol. 2024;25: 153. doi: 10.1186/s13059-024-03303-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Totty M, Hicks SC, Guo B. SpotSweeper: spatially-aware quality control for spatial transcriptomics. BioRxiv. 2024. doi: 10.1101/2024.06.06.597765 [DOI] [PMC free article] [PubMed]
- 43.Cervilla S, Grases D, Perez E, Real FX, Musulen E, Esteller M, et al. Comparison of spatial transcriptomics technologies across six cancer types. BioRxiv. 2024. doi: 10.1101/2024.05.21.593407 [DOI] [PMC free article] [PubMed]
- 44.Bhuva DD, Tan CW, Salim A, Marceaux C, Pickering MA, Chen J, et al. Library size confounds biology in spatial transcriptomics data. Genome Biol. 2024;25: 99. doi: 10.1186/s13059-024-03241-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Ramirez Flores RO, Lanzer JD, Dimitrov D, Velten B, Saez-Rodriguez J. Multicellular factor analysis of single-cell data for a tissue-centric understanding of disease. eLife. 2023;12. doi: 10.7554/eLife.93161 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Bergenstråhle J, Larsson L, Lundeberg J. Seamless integration of image and molecular analysis for spatial transcriptomics workflows. BMC Genomics. 2020;21: 482. doi: 10.1186/s12864-020-06832-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Andersson A, Andrusivová Ž, Czarnewski P, Li X, Sundström E, Lundeberg J. A Landmark-based Common Coordinate Framework for Spatial Transcriptomics Data. BioRxiv. 2021. doi: 10.1101/2021.11.11.468178 [DOI]
- 48.Jones A, Townes FW, Li D, Engelhardt BE. Alignment of spatial genomics and histology data using deep Gaussian processes. BioRxiv. 2022. doi: 10.1101/2022.01.10.475692 [DOI] [PMC free article] [PubMed]
- 49.Clifton K, Anant M, Aihara G, Atta L, Aimiuwu OK, Kebschull JM, et al. Alignment of spatial transcriptomics data using diffeomorphic metric mapping. BioRxiv. 2023. doi: 10.1101/2023.04.11.534630 [DOI] [PMC free article] [PubMed]
- 50.Zeira R, Land M, Strzalkowski A, Raphael BJ. Alignment and integration of spatial transcriptomics data. Nat Methods. 2022;19: 567–575. doi: 10.1038/s41592-022-01459-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Xia C-R, Cao Z-J, Tu X-M, Gao G. Spatial-linked alignment tool (SLAT) for aligning heterogenous slices properly. BioRxiv. 2023. doi: 10.1101/2023.04.07.535976 [DOI] [PMC free article] [PubMed]
- 52.Allen Reference Atlases :: Atlas Viewer. [cited 31 Jul 2024]. Available: http://atlas.brain-map.org/
- 53.ISH Data :: Allen Brain Atlas: Mouse Brain. [cited 31 Jul 2024]. Available: http://mouse.brain-map.org/
- 54.Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33: 495–502. doi: 10.1038/nbt.3192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Achim K, Pettit J-B, Saraiva LR, Gavriouchkina D, Larsson T, Arendt D, et al. High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol. 2015;33: 503–509. doi: 10.1038/nbt.3209 [DOI] [PubMed] [Google Scholar]
- 56.Cang Z, Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat Commun. 2020;11: 2084. doi: 10.1038/s41467-020-15968-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576: 132–137. doi: 10.1038/s41586-019-1773-3 [DOI] [PubMed] [Google Scholar]
- 58.Vahid MR, Brown EL, Steen CB, Zhang W, Jeon HS, Kang M, et al. High-resolution alignment of single-cell and spatial transcriptomes with CytoSPACE. Nat Biotechnol. 2023;41: 1543–1548. doi: 10.1038/s41587-023-01697-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Biancalani T, Scalia G, Buffoni L, Avasthi R, Lu Z, Sanger A, et al. Deep learning and alignment of spatially resolved single-cell transcriptomes with Tangram. Nat Methods. 2021;18: 1352–1362. doi: 10.1038/s41592-021-01264-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Zhang Q, Jiang S, Schroeder A, Hu J, Li K, Zhang B, et al. Leveraging spatial transcriptomics data to recover cell locations in single-cell RNA-seq with CeLEry. Nat Commun. 2023;14: 4050. doi: 10.1038/s41467-023-39895-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. 2022;40: 661–671. doi: 10.1038/s41587-021-01139-4 [DOI] [PubMed] [Google Scholar]
- 62.Gao LL, Bien J, Witten D. Selective inference for hierarchical clustering. J Am Stat Assoc. 2022; 1–27. doi: 10.1080/01621459.2022.2116331 [DOI] [PMC free article] [PubMed]
- 63.Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14: 1155. doi: 10.1038/s41467-023-36796-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13: 1739. doi: 10.1038/s41467-022-29439-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Xu C, Jin X, Wei S, Wang P, Luo M, Xu Z, et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res. 2022;50: e131. doi: 10.1093/nar/gkac901 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Liu W, Liao X, Luo Z, Yang Y, Lau MC, Jiao Y, et al. Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat Commun. 2023;14: 296. doi: 10.1038/s41467-023-35947-w [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;39: 1375–1384. doi: 10.1038/s41587-021-00935-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Crowell HL, Soneson C, Germain P-L, Calini D, Collin L, Raposo C, et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat Commun. 2020;11: 6077. doi: 10.1038/s41467-020-19894-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Murphy AE, Skene NG. A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis. Nat Commun. 2022;13: 7851. doi: 10.1038/s41467-022-35519-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021;24: 425–436. doi: 10.1038/s41593-020-00787-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Huuki-Myers L, Spangler A, Eagles N, Montgomery KD, Kwon SH, Guo B, et al. Integrated single cell and unsupervised spatial transcriptomic analysis defines molecular anatomy of the human dorsolateral prefrontal cortex. BioRxiv. 2023. doi: 10.1101/2023.02.15.528722 [DOI]
- 72.Weber LM, Divecha HR, Tran MN, Kwon SH, Spangler A, Montgomery KD, et al. The gene expression landscape of the human locus coeruleus revealed by single-nucleus and spatially-resolved transcriptomics. eLife. 2024;12. doi: 10.7554/eLife.84628 [DOI] [PMC free article] [PubMed]
- 73.Nelson ED, Tippani M, Ramnauth AD, Divecha HR, Miller RA, Eagles NJ, et al. An integrated single-nucleus and spatial transcriptomics atlas reveals the molecular landscape of the human hippocampus. BioRxiv. 2024. doi: 10.1101/2024.04.26.590643 [DOI] [PMC free article] [PubMed]
- 74.Cao Y, Lin Y, Patrick E, Yang P, Yang JYH. scFeatures: multi-view representations of single-cell and spatial data for disease outcome prediction. Bioinformatics. 2022;38: 4745–4753. doi: 10.1093/bioinformatics/btac590 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
We used previously published data, which we referenced in the body of the text.


