Abstract
Spatial transcriptomics combines gene expression data with spatial coordinates to allow for the discovery of detailed RNA localization, study development, investigating the tumor microenvironment, and creating a tissue atlas. A large range of spatial transcriptomics software is available, with little information on which may be better suited for particular datasets or computing environments. A review was conducted to detail the useful metrics when choosing appropriate software for spatial transcriptomics analysis. Specifically, the results from benchmarking studies that compared software across four key areas of spatial transcriptomics analysis (tissue architecture identification, spatially variable gene discovery, cell–cell communication analysis, and deconvolution) were assimilated into a single review that can serve as guidance when choosing potential spatial transcriptomics analysis software.
Keywords: spatial transcriptomics, benchmarking, tissue architecture identification, spatially variable gene, cell–cell communication, deconvolution
1. Introduction
From the initial decoding of the human genome, the fields of next-generation sequencing (NGS) and bioinformatics continue to rapidly expand in terms of both the available technologies and methods of analysis. While the ability to read DNA and RNA emerged early, advancements in the depth and breadth are still being made. Bulk RNA-seq provides a measure of gene expression in a sample, revealing valuable information about cellular activities [1,2]. This expression, however, is merely an average gene expression, as bulk RNA-seq produces only sample-level information. The next iteration of sequencing, single-cell RNA-seq (scRNA-seq), provides gene expression at the individual cell level but obscures the location of cells in a sample [3,4]. Gene expression in relation to tissue structure has become an increasingly important factor in the study of many diseases and conditions, and so the field of spatial transcriptomics (ST) emerged [5].
ST employs various methods of cellular dissection to capture and create a two-dimensional expression map of a sample [6]. This map can detail the gene expressions limited to a specific region of cells in the sample, can place into context the function of a cell in relation to the location, and can illuminate active cell communication happening across a region [7]. Because ST conserves the physical location of the RNA expression, the field has unique clinical applications, such as precisely mapping tumor heterogeneity and immune infiltration to better inform treatment and track progression. Embryonic development can be understood at a much deeper level than with bulk RNA or scRNA-seq. This information is gathered at a resolution ranging from 10s of cells down to the sub-cellular level [7,8].
ST analysis consists of many steps, but there are four that are frequently performed: tissue architecture identification, spatially variable gene (SVG) detection, cell–cell communication (CCC) analysis, and deconvolution. Tissue architecture identification combines gene expression profiles with spatial coordinates to group individual cells or cell spots, enabling the assignment of cell-type labels and other group analyses. SVG detection shows which genes’ expressions vary across a tissue sample in a statistically significant way. In cell–cell communication analysis, gene expression is compared with a ligand–receptor database while considering the spatial location of cells expressing either component to calculate the probability that two cells are interacting with one another. Finally, as some spatial sequencing methods are not at the single-cell resolution, deconvolution is needed to predict the proportion of cell types present in each spatial spot assayed.
Recently, a large range of software has been developed for the analysis of one or all of the mentioned steps. While individual papers exist for each software package, demonstrating their abilities, the datasets analyzed, hardware utilized, and quality metrics vary widely across publications. Comparing software head-to-head under controlled conditions provides a more accurate assessment, which is achieved in benchmarking studies and the basis for the systematic review given here. Specifically, we considered various factors for recommendations, although not with equal priority: accuracy; runtime; system requirements; programming language; and compatibility with the 10x Visium platform, a popular choice for ST sequencing.
2. Tissue Architecture Identification
Established gene expression profiles and the unique advantage of the physical location of cells of ST allow one to algorithmically “reassemble” tissue samples. Benchmarking analyses compared each software against real and simulated data with the established ground truth and ranked them by accuracy while also noting consistent performance between the datasets, runtimes, and resource requirements (Table 1).
Table 1.
Summary of software in tissue architecture identification benchmark studies.
Author | Software | Accuracy Metric | Dataset | Technology | Computer Environment | |
---|---|---|---|---|---|---|
Cheng et al. | BayesSpace (v1.00) DR.SC (v2.9) Giotto-H (v1.0.3) Giotto-HM (v1.0.3) Giotto-KM (v1.0.3) Giotto-LD (v1.0.3) Seurat-LV (v4.0.5) Seurat-LVM V (v4.0.5) |
Seurat-SLM V (v4.0.5) SpaCell (v1.0.1) SpaCell-G (v1.0.1) SpaCell-I (v1.0.1) SpaGCN (v1.2.0) SpaGCN+ (v1.2.0) StLearn (v0.3.2) |
ARI with annotated datasets as ground truth. Also mean and AD of ARI across replicates. |
Mouse olfactory bulb | Spatial Transcriptomics | Information not provided |
Mouse kidney coronal | 10x Genomics Visium V1 | |||||
Mouse brain sagittal | 10x Genomics Visium V1 | |||||
Mouse hypothalamic preoptic | MERFISH | |||||
Mouse somatosensory cortex | osmFISH | |||||
Mouse olfactory bulb | Stereo-seq | |||||
Mouse brain cerebellum | Slide-seq | |||||
Hu et al. | ADEPT [9] BANKSY [10] BASS [11] BayesSpace [12] CCST [13] ConGI [14] conST [15] DeepST [16] DR.SC [17] GPSA [18] GraphST [19] |
PASTE [20] PASTE2 [21] PRECAST [22] SEDR [23] SpaceFlow [24] SPACEL [25] SpaGCN [4] SpatialPCA [26] SPIRAL [27] STAGATE [28] STalign [29] STAligner [30] |
ARI, NMI, AMI, and HOM, with annotated and simulated datasets as ground truth. | DLPFC | 10x Genomics Visium V1 | Intel Xeon W-2195 CPU 2.3 GHz 36 CPU cores 256 GB DDR4 RAM Four Quadro RTX A6000 GPUs 48 GB RAM 4608 CUDA cores |
HBCA1 | 10x Genomics Visium V1 | |||||
MB2SA | 10x Genomics Visium V1 | |||||
HER2BT | Spatial Transcriptomics | |||||
MHPC | Slide-seq V2 | |||||
Embryo | Stereo-seq | |||||
MVC | STARmap | |||||
MPFC | STARmap | |||||
Yuan et al. | BASS [11] BayesSpace [12] CCST [13] conST [15] GraphST [19] Leiden [31,32] |
Louvain [31] SCAN-IT [33] SEDR [23] SpaceFlow [24] SpaGCN [4] STAGATE [28] StLearn [34] |
NMI against annotated datasets for ground truth. | DLPFC | 10x Visium | Intel Xeon E5-2683v3 2.00 GHz 14 cores 128 GB RAM NVIDIA TITAN Xp GPU 12 GB RAM |
Mouse embryo | Stereo-seq | |||||
Mouse primary cortex | Barista-seq | |||||
Mouse hypothalamic preoptic | MERFISH | |||||
Mouse somatosensory cortex | osmFISH | |||||
Mouse medial prefrontal cortex | STARmap | |||||
Mouse visual cortex | STARmap* | |||||
Mouse somatosensory cortex with downsampling or noise addition | Simulated_1 | |||||
Simulated_2 | ||||||
Simulated_3 | ||||||
Simulated_4 |
Cheng et al. [35] explored 15 different software packages using seven real ST datasets accompanied by simulated gene expression and Hematoxylin and Eosin (H&E) data. Of note, they found options that did not use spatial coordinates or histological data (i.e., Seurat) were not necessarily disadvantaged without those data and better histology did not correlate with more accurate clustering. They also found little difference in the performance when the software had different algorithmic choices (i.e., Seurat-LV, SLM, LVM, Giotto-H, HM, LD). Another benchmark analysis assessed 16 software packages using 10 real ST datasets based on accuracy and how they handled spatial continuity (along with other analysis software not discussed here [36]). They tested the effect of random seeds where applicable and found the performance was associated with the provided random seed value. A final study benchmarked 13 packages across seven datasets, which notably compared the accuracy in-depth across not only data but also sample origin and spatial technology [37]. While some software performed well across all of the metrics, it was found that both the sample origin (tissue and patient) and spatial technology used to image the sample played large roles in the accuracy of the results.
Consolidating the accuracy results from the three benchmarks while also considering the runtime, necessary computer resources, and compatibility with 10x Visium data, the following software is suggested. BASS [11] and BayesSpace [12] were consistently in the top five in accuracy in all three benchmarks, although both scaled poorly with increased dataset size and BayesSpace did not perform well when given imaging data (as opposed to sequencing data). SpaGCN [38] was weak to a lack of spatial patterns but consistently performed quickly and well. Seurat [39] also showed high accuracy and good runtime and the advantage of not requiring histology data. A final option, STAGATE [28], was less accurate than those previously mentioned but still a solid choice if none of the others fit the desired workflow.
Some universally applicable observations were made by one or more of the benchmarking groups. All noted to some degree that the pre-processing employed by the software had a major effect on the accuracy while post-processing usually universally improved accuracy. For any methods that require a specific cluster number input by the user, the provided number could heavily influence the accuracy if it did not match the ground truth number of cell types present in the sample. Each benchmark also noted that the analysis performance was highly dataset dependent, meaning one software package could perform very well with brain tissue but struggle with spinal tissue with no general pattern to predict which software was the best choice for a given sample. Several authors postulated this fluctuation in performance could be a result of overfitting on the relatively small ST dataset collection currently available.
Another general observation was that no one particular algorithm type outperformed the others in clustering. While BASS and BayesSpace are both Bayesian-based (statistical), SpaGCN and Seurat are graph-based and performed nearly as well. The next best options STAGATE and CCST [13] (not suggested here for 10x Visium data) were deep learning-based, signifying the top recommendations were an equal mixture of available method types. This will not always hold true in other areas of ST analysis.
3. Spatially Variable Gene Discovery
Once counts are normalized in a sample, a popular next step is to find genes whose expression differs across physical space. Spatially variable genes (SVGs) can illuminate “regions” of cell activity that are not evident through histology or sequencing alone. While there are numerous options for SVG analysis, not many benchmarks have been performed yet (Table 2). Still, there are some helpful notes and patterns that can be gleaned from the available literature.
Table 2.
Summary of spatially variable gene identification benchmarking papers.
Author | Software | Accuracy Metric | Dataset | Production Method/Technology | Computer Environment | |
---|---|---|---|---|---|---|
Li et al. | BOOST-GP [40] GPcounts [41] Moran’s I (Squidpy v1.2.3) nnSVG (v1.2.0) scGCO (v1.1.0) Sepal (Squidpy v1.2.3) SOMDE (v0.1.7) |
SpaGCN (v1.2.5) SpaGFT (v0.1.1.4) Spanve {v0.1.0) SPARK (v1.1.1) SPARK-X (v1.1.1) SpatialDE (v1.1.3) SpatialDE2 [42] |
Area under the precision–recall curve for calls against simulated data ground truth. | Simulated SVGs | Produced with normal and Gaussian distributions | AMD EPYC 7H12 CPU 64 cores 1 TB RAM A100 GPU 40 GB RAM |
Simulated non-SVGs | Identity matrix | |||||
Breast tumor with annotation | GP mixture model, log fold change | |||||
DLPFC | Manual annotation | |||||
Chen et al. | Giotto k-means [43] Giotto rank [43] MERINGUE [44] Moran’s I [45] nnSVG [46] SOMDE [47] SPARK-X [48] SpatialDE [42] |
Spearman’s correlation between SVG lists returned by software. | Mouse embryo E12 | DbiT-seq, D1 | Standard virtual machine 16 OCPUs 256 GB RAM |
|
Mouse embryo E11 | DbiT-seq, D2 | |||||
Human osteosarcoma | MERFISH | |||||
Mouse brain cortex | seqFISH+ | |||||
Mouse cerebellum | Slide-seqV1 | |||||
Human kidney cortex | Slide-seqV2 | |||||
Mouse hippocampus | Slide-seqV2 | |||||
Mouse brain cortex | SM_Omics, D1 | |||||
Mouse brain cortex | SM_Omics, D2 | |||||
Human squamous carcinoma | ST | |||||
Mouse hippocampus | ST | |||||
Mouse primary motor cortex | Visium | |||||
Mouse kidney sham | Visium, D1 | |||||
Mouse kidney ischemia | Visium, D2 | |||||
Zebrafish melanoma | Visium | |||||
Mouse kidney sepsis | Visium | |||||
Mouse prefrontal cortex | Visium | |||||
Mouse lymph node | Visium, D1 | |||||
Mouse MCA205 tumor | Visium, D2 | |||||
Human prostate | Visium | |||||
Human breast cancer | Visium, D1 | |||||
Human breast cancer | Visium, D2 |
Li et al. [49] used simulated ground truth data with a variety of spatial patterns to mimic real-world data scenarios. They found noise played a large part in accurately determining the SVGs, as the performance of all software packages decreased with increased noise in the data. Highly variable gene (HVG) information is sometimes used as a feature for tissue architecture identification but can become much more powerful when combined with SVGs. A second benchmark by Chen et al. [50] evaluated SVG software by exploring the correlation between them and found that although the statistics were similar, the returned lists of SVGs had very little overlap. They also found that SVG ranking correlated positively with gene expression and that most packages had a high false discovery rate (FDR) when working with simulated, ground truth data.
Of the software benchmarked, SpatialDE2 [42] often topped the results in regard to accuracy, although it struggled with downsampling or high numbers of spatial locations and was slow and resource-heavy. SPARK-X [48], on the other hand, was consistently in the top three for accuracy and significantly faster and lighter on RAM than SpatialDE2. Likewise, SOMDE [47] was fast and light with a good FDR but suffered from low sensitivity. Finally, Moran’s I [51] was often second in rankings and had a unique approach due to its permutation-based algorithm. It is resource-light and performs well with sparse spatial locations; however, it suffers from a high FDR.
The SVG results varied as, once again, the available software seemed to be built for a specific sample type due to this being a new area of research and having relatively few datasets to test on. Across both benchmarks and all analysis methods, the SVG rankings correlated with gene expression, which created a possible source of universal bias. Overall, the available software had either a high sensitivity or high specificity but not both. Finally, the two major method types, graph-based and kernel-based, performed equally in all assessments.
4. Cell–Cell Communication Analysis
Cells “talk” to one another by sending chemical ligands that bind to receptors on other cells, setting in motion various biological pathways [52]. Some of these signals happen between adjacent (touching) cells, while others are sent to cells further away [53]. After the spatial location of cells is determined and their gene expression evaluated, a further step can be to determine whether cells are communicating with one another. Typically, this analysis is accomplished by finding ligand expression in a group of cells and the corresponding receptor in another [54,55,56]. Communication is assumed if matching ligand–receptor pairs are expressed from cells in the expected proximity befitting of the communication pair. With ST data, the proximity can be precisely calculated in a way that bulk or scRNA-seq cannot provide. Unfortunately, at the current time, there is no way to validate that the gene expression of ligand–receptor pairs guarantees communication and is a caveat of this analysis. However, CCC could be integrated with one of several methods to measure interaction by inducing fluorescence resonance energy transfers [57], which uses tagged donor/receiver molecules to explore interactions or support planar lipid bilayers [58], which allows for the imaging of protein interaction and organization.
Liu et al. [59] compared the concordance of ligand–receptor pairs returned by 16 CCC software packages with a simulated dataset and between the packages themselves. They created a metric, the distance enrichment score (DES), that uses the difference between the expected (from simulated datasets) and observed (from real datasets) spatial distance tendencies to determine the accuracy of the CCC analysis software (Table 3). CellChat [60], ICELLNET [61], and CellPhoneDB [62] all performed very well with most datasets. They were all fast and light on resources and had good consistency when given spatial data. CellChat also integrated regulatory information, which boosted its ability to identify ligand–receptor pairs. SingleCellSignalR [63] also performed well but was resource-heavy and had low precision. NicheNet [64] was a consistent second place with many datasets and uses a unique network-based method, while the other methods mentioned are statistics-based. NicheNet was very resource-light and handled sparse spatial data well but suffered from a high FDR.
Table 3.
Summary of cell–cell communication analysis benchmarking studies.
Author | Software | Accuracy Metric | Dataset | Production Method/Technology | Computer Environment |
---|---|---|---|---|---|
Liu et al. | CellCall (v.0.0.0.9000) CellChat (v1.0.0) CellPhoneDB (v2) CellPhoneDB (v3) Connectome (v1.0.1) CytoTalk (v4.0.11) Domino (v0.1.1) Giotto (v1.0.4) ICELLNET (v0.99.3) iTALK (v0.1.0) NATMI [65] NicheNet (v1.0.0) scMLnet (v0.1.0) SingleCellSignalR (v1.4.0) stLearn (v0.4.7) |
Distance enrichment score (DES): A calculation to quantify the consistency between the expected and observed distance of ligand–receptor pairs. | Human pancreatic ductal adenocarcinoma | ST | AMD EPYC 7552 48 cores 566 GB RAM |
Human squamous cell carcinoma | Visium V1 | ||||
Mouse cortex | Visium | ||||
Human heart | Visium | ||||
Human intestine | Visium |
In keeping with the theme from previous sections, CCC software accuracy was heavily dependent on the dataset analyzed. This was compounded in CCC by the fact many packages use completely different ligand–receptor databases from which to pull interactions. Two options may therefore return differing results merely due to the database packaged with them. Finally, it was noted that software that predicted more interactions had a higher recall (sensitivity), while those that returned fewer interactions had a higher precision (positive predictive value).
5. Deconvolution
As some ST technologies are not at the single-cell resolution, sequencing happens in a small section of the tissue called “spots”. Deconvolution predicts the proportion of cell types in the population of each sequencing spot. A number of software packages employing a variety of algorithms exist (Table 4) and were tested focusing on accuracy against real and simulated datasets while noting the runtime and resources required.
Table 4.
Summary of deconvolution benchmarking studies.
Author | Software | Accuracy Metric | Dataset | Production Method/Technology | Computer Environment |
---|---|---|---|---|---|
Li et al. | Berglund E et al. (v0.2.0) CARD (v1.0.0) Cell2location (v0.1) DestVI (s cvi-tools 0.16.0) DSTG [66] NMFReg [67] NovoSpaRc (v0.4.4) RCTD (spacexr 2.0.0) SD2 [68] SpaOTsc [69] SpatialDecon [70] SpatialDWLS [71] Stdeconvolve (v1.0.0) stereoscope (v.03) SpiceMix [72] SPOTlight (v0.99.0) STRIDE [73] Tangram (v1.0.3) |
JSD score, RMSE, and PCC against annotated ground truth. | Mouse brain medial pre-optic area | MERFISH | Intel Xeon E5-2680 v3 2.50 GHz 24 cores 528 GB RAM Two Nvidia Quadro M6000 GPUs 24 GB |
Mouse cortex | seqFISH+ | ||||
PDAC | ST | ||||
Mouse brain | Visium | ||||
Mouse hippocampus | Slide-seqV2 | ||||
Olfactory bulb | Stereo-seq | ||||
Zebrafish embryo | Stereo-seq | ||||
Yan and Sun | Cell2location [74] DestVI [75] DSTG [66] Giotto/Hypergeometric [43] Giotto/PAGEGiotto/rank [43] MIA [76] RCTD [77] Seurat [39] SpatialDecon [70] SpatialDWLS [71] Stdeconvolve [78] stereoscope [79] SPOTlight [80] STRIDE [73] Tangram [81] |
RMSE, PCC, and JSD with synthetic datasets as ground truth. | Mouse embryo | Sci-Space | 2.7 GHz 112 cores |
Li et al. | Cell2location [74] DestVI [75] DSTG [66] gimVI [82] LIGER [83] NovoSpaRc [84] RCTD [77] Seurat [39] SPaOTsc [69] SpatialDWLS [71] stereoscope [79] SPOTlight [80] StPlus [85] STRIDE [73] Tangram [81] |
Pearson correlation coefficient between expression vector in ground truth dataset and expression vector in the result predicted by each integration method. | Mouse primary visual cortex (VISp) | BARISTAseq | CPU 2.2 GHz 144 CPU cores NVIDIA Tesla K80 GPU 12 GB RAM |
Mouse primary visual cortex (VISp) | ExSeq | ||||
Drosophila embryo | FISH | ||||
Mouse olfactory bulb | HDST | ||||
Human MTG | ISS | ||||
Mouse primary visual cortex (VISp) | ISS | ||||
Human osteosarcoma | MERFISH | ||||
Mouse hypothalamic preoptic region | MERFISH | ||||
Mouse primary motor cortex | MERFISH | ||||
Mouse primary visual cortex (VISp) | MERFISH | ||||
Mouse somatosensory cortex | osmFISH | ||||
Mouse liver | Seq-scope | ||||
Mouse embryonic | seqFISH | ||||
Mouse gastrulation | seqFISH | ||||
Mouse hippocampus | seqFISH | ||||
Mouse cortex | seqFISH+ | ||||
Mouse olfactory bulb | seqFISH+ | ||||
Mouse primary motor cortex | Slide-seq | ||||
Mouse cerebellum | Slide-seqV2 | ||||
Mouse hippocampus | Slide-seqV2 | ||||
Human squamous carcinoma | ST | ||||
Mouse hippocampus | ST | ||||
Mouse prefrontal cortex | STARmap | ||||
Mouse visual cortex | STARmap | ||||
Human prostate | Visium | ||||
Mouse brain | Visium | ||||
Mouse breast cancer | Visium | ||||
Mouse embryo | Visium | ||||
Mouse hindlimb muscle | Visium | ||||
Mouse hippocampus | Visium | ||||
Mouse kidney | Visium | ||||
Mouse lymph node | Visium | ||||
Mouse MCA205 tumor | Visium | ||||
Mouse prefrontal cortex | Visium | ||||
Mouse primary motor cortex | Visium | ||||
Zebrafish melanoma | Visium |
A first benchmark from Li et al. [86] showed that library preparation played a key role in the analysis results, noting the choice of RNA library preparation and sequencing platforms could affect the deconvolution due to differences in the gene expression profiles. Also, variation in the datasets between scRNA-seq and ST data for the same sample presented a problem, as the prior for this analysis assumed the cell populations were identical in both. Yan and Sun [87] explored sequencing depth in more detail than others and found most software held up well at various sequencing depths but suffered when the spot size became smaller. Data sparsity was found to greatly affect the performance of integration methods that predicted the spatial distribution of RNA transcripts in a final benchmark [88].
Cell2location [74] offered the highest accuracy of all the options but at the cost of heavy resource requirements and long runtimes. Tangram [81] was nearly as accurate, with a much lighter resource load, shorter runtime, and high proficiency when predicting the spatial distribution of transcripts. RCTD [77] steadily performed in the “top 5” ranking across all benchmarks and is offered as a solid alternative to the previously mentioned. A final method from Berglund et al. was noted in a benchmark as a deconvolution option that does not require scRNA-seq data. The results from this method are not favorable compared with others and the authors recommended to always use scRNA-seq data for deconvolution.
Unlike tissue architecture identification, deconvolution’s best performers were heavily biased towards probabilistic and deep learning-based methods (NMF-based, graph-based, and optimal transport-based did not do well comparatively). All benchmarks noted that normalization greatly affected the performance. Raw data counts were generally preferable to lognorm and scatternorm methods where available [86], normalization could vary in performance across different sample types [87], and raw spatial data worked best with either raw or normalized scRNA-seq data [88]. One benchmark mentioned EnDecon [89], a software package that integrates multiple deconvolution results to improve accuracy, and concluded that the increase in accuracy was marginal compared with well-tuned solitary options. Once again, there was wide variation in the software accuracy depending on the dataset, suggesting the overfitting of algorithms to specialized cases.
6. Computing Resource Requirements and Accuracy Metrics
Determining the best route for ST analysis requires considering several factors. First, fundamental concepts, such as the analysis accuracy or potential throughput capacity, must be considered across all software to determine an analysis timeframe and computer requirements. However, compatibility with the tissue sample under study and the choice of whether to normalize the data before running the analysis appear to be equally important factors in determining high-quality data.
To support the decision-making for ST analysis software, we present a summary of multiple benchmarking results in a few concise formats below. First, the Figure 1 graphs provide a visual comparison of the accuracy and resource score for the top-performing software in each benchmarking paper. To calculate the resource score, runtime and RAM requirements were first normalized per study to fit a 0–1 scale. Then, the normalized runtime and RAM values were averaged together per software package per study. Finally, this average was subtracted from 1 so that larger values correspond to shorter runtimes and less RAM usage (i.e., a “better” score). Hence, the software closer to the top right corner might be more desirable because it provides higher accuracy while using fewer resources. Second, Table 5 shows more details from each benchmarking paper reviewed here so that our software comparison results can be understood in the context, e.g., the computer environment used to evaluate software in each benchmarking paper. Finally, based on these benchmarking results, Figure 2 shows a final decision tree that provides an at-a-glance recommendation for an initial or exploratory analysis.
Figure 1.
Visual comparison of software by benchmarking study using the accuracy (y-axis) and resource score (x-axis). The accuracy metrics were provided by each paper. The resource score is the average normalized time and RAM requirements subtracted from one. Higher values are better in all cases. Cheng et al. [35], Hu et al. [36], Yuan et al. [37], Li et al. [49], Chen et al. [50], Liu et al. [59], Li et al. [86], Yan and Sun [87], Li et al. [88]. † RAM information not provided for resource score. * Cell2location ran out of RAM.
Table 5.
Summary of computing resource requirements and accuracy metrics for different software. Orange rows denote R coding language options. Blue rows denote Python coding language options. The best scoring option(s) for each metric by study are bolded.
Tissue Architecture Identification | ||||||
---|---|---|---|---|---|---|
Cheng et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | ARI | |
BayesSpace (v1.00) | R | Visium with 2696–3353 cells and 31,053 genes | Information not provided | 31.623 | 5.495 | 0.820 |
SpaGCN (v1.2.0) | Python | <1 | 1.000 | 0.990 | ||
Seurat (v4.0.5) | R | <1 | 1.778 | 0.900 | ||
Hu et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | Average ARI | |
BASS [11] | R | Visium DLFPC, HBCA1, and MB25A datasets | Intel Xeon W-2195 CPU 2.3 GHz 36 CPU cores 256 GB DDR4 RAM Four Quadro RTX A6000 GPUs 48 GB RAM 4608 CUDA cores |
316.228 | Data not provided | 0.450 |
BayesSpace [12] | R | 630.957 | Data not provided | 0.400 | ||
SpaGCN [4] | Python | 10.000 | Ran out of RAM | 0.420 | ||
STAGATE [28] | Python | 19.953 | Data not provided | 0.500 | ||
Yuan et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | NMI | |
BASS [11] | R | Visium DLFPC | Intel Xeon E5-2683v3 2.00 GHz 14 cores 128 GB RAM NVIDIA TITAN Xp GPU 12 GB RAM |
20.000 | 2.5 | 0.800 |
BayesSpace [12] | R | 41.667 | 8.5 | 0.750 | ||
SpaGCN [4] | Python | 16.667 | 1.5 | 0.550 | ||
STAGATE [28] | Python | 16.667 | <1 | 0.500 | ||
Spatially Variable Gene Discovery | ||||||
Li et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | auPRC | |
SpatialDE2 [42] | Python | Simulated 100 genes and 40,000 spots | AMD EPYC 7H12 CPU 64 cores 1 TB RAM A100 GPU 40 GB RAM |
45 | 16 | 12.625 |
SPARK-X (v1.1.1) | R | 45 | 6 | 11.875 | ||
SOMDE (v0.1.7) | Python | 45 | 6 | 3.500 | ||
Moran’s I (Squidpy v1.2.3) | Python | 45 | 6 | 11.000 | ||
Chen et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | Ratio Returned List to Ground Truth List SVGs | |
SPARK-X [48] | R | Combination of Visium datasets with ~12,000 genes and ~200 spots | Standard virtual machine 16 OCPUs 256 GB RAM |
10 | <1 | 0.990 |
SOMDE [47] | Python | 10 | <1 | 0.950 | ||
Moran’s I [45] | Python | 30 | 3 | 0.650 | ||
Cell–Cell Communication Analysis | ||||||
Liu et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | Median DES | |
CellChat (v1.0.0) | R | Aggregate of 15 simulated datasets | AMD EPYC 7552 48 cores 566 GB RAM |
<1 | 4 | 0.082 |
CellPhoneDB (v2) | Python | <1 | 4.5 | -0.037 | ||
ICELLNET (v0.99.3) | R | <1 | 3.2 | 0.039 | ||
NicheNet (v1.0.0) | R | 9.167 | 4 | -0.322 | ||
SingleCellSignalR (v1.4.0) | R | 16.667 | 11 | -0.279 | ||
Deconvolution | ||||||
Li et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | PCC | |
Cell2location (v0.1) | Python | Time: MERFISH mouse brain ~4750 cells, 135 genes PCC: Average across 5 real-world datasets |
Intel Xeon E5-2680 v3 2.50 GHz 24 cores 528 GB RAM Two Nvidia Quadro M6000 GPUs 24 GB |
91.050 | Data not provided | 0.197 |
Tangram (v1.0.3) | Python | 3.867 | Data not provided | 0.407 | ||
CARD (v1.0.0) | R | 8.950 | Data not provided | 0.425 | ||
RCTD (spacexr 2.0.0) | R | 102.117 | Data not provided | 0.386 | ||
Yan and Sun | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | PCC | |
Cell2location [74] | Python | Average of 3 real-world datasets | 2.7 GHz 112 cores |
95.000 | 4.000 | 0.900 |
Tangram [81] | Python | <1 | 1.000 | 0.800 | ||
RCTD [77] | R | 4.333 | 2.667 | 0.850 | ||
Li et al. | ||||||
Software | Dataset | Computer Environment | Time (Min) | RAM (GB) | PCC | |
Cell2location [74] | Python | Time and RAM: Simulated dataset with 20,000 spots and 10,000 cells Accuracy: Average across 32 simulated datasets |
CPU 2.2 GHz 144 CPU cores NVIDIA Tesla K80 GPU 12 GB RAM |
Out of RAM | Out of RAM | 0.897 |
Tangram [81] | Python | 28.800 | 2.500 | 0.588 | ||
RCTD [77] | R | 30.700 | 71.000 | 0.606 |
Figure 2.
A simple decision tree to assist with determining software choices for ST analysis.
7. Conclusions
The emerging field of spatial transcriptomics combines gene expression with spatial information to provide a detailed look into the workings of a given tissue sample. Within the ST analysis process, a wide variety of software is available, with each having unique traits that may affect the accuracy, time, and required computing resources. Multiple groups have performed benchmarking studies to tease out the differences between the available software packages to help researchers make informed choices for their analysis pipeline. Here, these benchmarking results were gathered and analyzed to further narrow the software choices to ones most likely to fit well into an ST analysis workflow.
Some key hurdles for the field of ST analysis to overcome were noted during the generation of this review. First, an issue mentioned in nearly all benchmarking literature was that of software performance being highly correlated with and dependent upon the dataset under analysis. The potential sample possibilities are highly diverse, so providing accurate and consistent analyses on a wide breadth of tissues will be needed. As the authors noted, this phenomenon points to the possibility of overfitting algorithms, which can lead to unreliable results. Second, and perhaps related, as postulated by some of the authors, is the lack of annotated reference data available. Some tissues have no matching reference available (at least in a matching species), or the reference has no cell type annotations, or datasets for tissue are broken into smaller anatomical regions that do not cover the entire tissue under study. While reference databases such as Aquila [90], STOmics DB [91,92], and the Spatial transcriptOmics Analysis Resource (SOAR) [93] have emerged, they do not yet offer fully annotated data that is easily usable in an ST analysis pipeline. A catalog of well-annotated cell types with expression data, perhaps selectable by organ/tissue type, would be immensely helpful in furthering ST research.
While this review aimed to be thorough, there were limitations to the research presented. First, the ST field, still being new, is constantly evolving. New methods, software options, analysis types, and workflows are published on a monthly basis, which limited the scope of information provided in this review to a snapshot of resources available at the time of writing. As noted, even within the papers reviewed, the results varied based on the software, noted particularly in the results lists returned by SVG discovery and CCC analysis. More analyses and additional data can lead to the refinement of these software packages as relationships between data output and biological reality are better understood.
Second, the list of available ST software is quite large and could not be covered comprehensively. This review was limited to current benchmarking publications as a means to allow for direct software comparison and was therefore limited to options tested in such a manner. Other options may have been overlooked in the current review, as they have not yet found their way into a benchmarking study. There is also the potential for bias introduced by those who conducted the studies having a preference for certain software packages. This has been somewhat mitigated by using multiple benchmarking studies where available and recommending software that consistently performed well across multiple studies.
Finally, this was a literature-based summary and proposal that lacked benchmarking of its own. For optimal recommendations, each proposed software package should be integrated into a pipeline and studied both as an individual step and in the process as a whole. Changing parameters or algorithms used early in the pipeline may affect the results of later steps. Ideally, the proposed pipelines would be benchmarked against several datasets and each other on a variety of platforms to draw definitive conclusions, which remains as future work.
Abbreviations
The following abbreviations are used in this manuscript:
NGS | Next-generation sequencing |
scRNA-seq | Single-cell RNA-seq |
ST | Spatial transcriptomics |
SVG | Spatially variable gene |
CCC | Cell–cell communication |
H&E | Hematoxylin and Eosin |
HVG | Highly variable gene |
FDR | False discovery rate |
DES | Distance enrichment score |
Author Contributions
Conceptualization, J.G. and D.C.; formal analysis, J.G.; investigation, J.G.; resources, D.C.; data curation, J.G.; writing—original draft preparation, J.G., D.C., M.P. and M.-A.S.; writing—review and editing, J.G., D.C., M.P. and M.-A.S.; supervision, D.C.; project administration, D.C.; funding acquisition, D.C. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Funding Statement
This research was funded by the National Human Genome Research Institute, grant number R21 HG012482; the National Institute of General Medical Sciences, grant number R01 GM152585; the National Institute on Aging, grant number U54 AG075931; and the Pelotonia Institute for Immuno-Oncology (PIIO). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
References
- 1.Emrich S.J., Barbazuk W.B., Li L., Schnable P.S. Gene Discovery and Annotation Using LCM-454 Transcriptome Sequencing. Genome Res. 2007;17:69–73. doi: 10.1101/gr.5145806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wang Z., Gerstein M., Snyder M. RNA-Seq: A Revolutionary Tool for Transcriptomics. Nat. Rev. Genet. 2009;10:57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Haque A., Engel J., Teichmann S.A., Lönnberg T. A Practical Guide to Single-Cell RNA-Sequencing for Biomedical Research and Clinical Applications. Genome Med. 2017;9:75. doi: 10.1186/s13073-017-0467-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Choi J.R., Yong K.W., Choi J.Y., Cowie A.C. Single-Cell RNA Sequencing and Its Combination with Protein and DNA Analyses. Cells. 2020;9:1130. doi: 10.3390/cells9051130. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Williams C.G., Lee H.J., Asatsuma T., Vento-Tormo R., Haque A. An Introduction to Spatial Transcriptomics for Biomedical Research. Genome Med. 2022;14:68. doi: 10.1186/s13073-022-01075-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rao A., Barkley D., França G.S., Yanai I. Exploring Tissue Architecture Using Spatial Transcriptomics. Nature. 2021;596:211–220. doi: 10.1038/s41586-021-03634-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Liang G., Yin H., Ding F. Technical Advances and Applications of Spatial Transcriptomics. GEN Biotechnol. 2023;2:384–398. doi: 10.1089/genbio.2023.0032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tian L., Chen F., Macosko E.Z. The Expanding Vistas of Spatial Transcriptomics. Nat. Biotechnol. 2023;41:773–782. doi: 10.1038/s41587-022-01448-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hu Y., Zhao Y., Schunk C.T., Ma Y., Derr T., Zhou X.M. ADEPT: Autoencoder with differentially expressed genes and imputation for robust spatial transcriptomics clustering. iScience. 2023;26:106792. doi: 10.1016/j.isci.2023.106792. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Singhal V., Chou N., Lee J., Yue Y., Liu J., Chock W.K., Lin L., Chang Y.-C., Teo E.M.L., Aow J., et al. BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis. Nat. Genet. 2024;56:431–441. doi: 10.1038/s41588-024-01664-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Li Z., Zhou X. BASS: Multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022;23:168. doi: 10.1186/s13059-022-02734-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Zhao E., Stone M.R., Ren X., Guenthoer J., Smythe K.S., Pulliam T., Williams S.R., Uytingco C.R., Taylor S.E.B., Nghiem P., et al. Spatial Transcriptomics at Subspot Resolution with BayesSpace. Nat. Biotechnol. 2021;39:1375–1384. doi: 10.1038/s41587-021-00935-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Li J., Chen S., Pan X., Yuan Y., Shen H.-B. Cell Clustering for Spatial Transcriptomics Data with Graph Neural Networks. Nat. Comput. Sci. 2022;2:399–408. doi: 10.1038/s43588-022-00266-5. [DOI] [PubMed] [Google Scholar]
- 14.Zeng Y., Yin R., Luo M., Chen J., Pan Z., Lu Y., Yu W., Yang Y. Deciphering spatial domains by integrating histopathological image and transcriptomics via contrastive learning. bioRxiv. 2022 doi: 10.1101/2022.09.30.510297. bioRxiv:2022.09.30.510297. [DOI] [Google Scholar]
- 15.Zong Y., Yu T., Wang X., Wang Y., Hu Z., Li Y. conST: An interpretable multi-modal contrastive learning framework for spatial transcriptomics. bioRxiv. 2022 doi: 10.1101/2022.01.14.476408. bioRxiv:2022.01.14.476408. [DOI] [Google Scholar]
- 16.Xu C., Jin X., Wei S., Wang P., Luo M., Xu Z., Yang W., Cai Y., Xiao L., Lin X., et al. DeepST: Identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res. 2022;50:e131. doi: 10.1093/nar/gkac901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Liu W., Liao X., Yang Y., Lin H., Yeong J., Zhou X., Shi X., Liu J. Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data. Nucleic Acids Res. 2022;50:e72. doi: 10.1093/nar/gkac219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Jones A., Townes F.W., Li D., Engelhardt B.E. Alignment of spatial genomics data using deep Gaussian processes. Nat. Methods. 2023;20:1379–1387. doi: 10.1038/s41592-023-01972-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Long Y., Ang K.S., Li M., Chong K.L.K., Sethi R., Zhong C., Xu H., Ong Z., Sachaphibulkij K., Chen A., et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat. Commun. 2023;14:1155. doi: 10.1038/s41467-023-36796-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zeira R., Land M., Strzalkowski A., Raphael B.J. Alignment and integration of spatial transcriptomics data. Nat. Methods. 2022;19:567–575. doi: 10.1038/s41592-022-01459-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu X., Zeira R., Raphael B.J. Partial alignment of multislice spatially resolved transcriptomics data. Genome Res. 2023;33:1124–1132. doi: 10.1101/gr.277670.123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Liu W., Liao X., Luo Z., Yang Y., Lau M.C., Jiao Y., Shi X., Zhai W., Ji H., Yeong J., et al. Probabilistic embedding, clustering, and alignment for integrating spatial transcriptomics data with PRECAST. Nat. Commun. 2023;14:296. doi: 10.1038/s41467-023-35947-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Xu H., Fu H., Long Y., Ang K.S., Sethi R., Chong K., Li M., Uddamvathanak R., Lee H.K., Ling J., et al. Unsupervised spatially embedded deep representation of spatial transcriptomics. Genome Med. 2024;16:12. doi: 10.1186/s13073-024-01283-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Ren H., Walker B.L., Cang Z., Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat. Commun. 2022;13:4076. doi: 10.1038/s41467-022-31739-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Xu H., Wang S., Fang M., Luo S., Chen C., Wan S., Wang R., Tang M., Xue T., Li B., et al. SPACEL: Deep learning-based characterization of spatial transcriptome architectures. Nat. Commun. 2023;14:7603. doi: 10.1038/s41467-023-43220-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Shang L., Zhou X. Spatially aware dimension reduction for spatial transcriptomics. Nat. Commun. 2022;13:7203. doi: 10.1038/s41467-022-34879-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Guo T., Yuan Z., Pan Y., Wang J., Chen F., Zhang M.Q., Li X. SPIRAL: Integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biol. 2023;24:241. doi: 10.1186/s13059-023-03078-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Dong K., Zhang S. Deciphering Spatial Domains from Spatially Resolved Transcriptomics with an Adaptive Graph Attention Auto-Encoder. Nat. Commun. 2022;13:1739. doi: 10.1038/s41467-022-29439-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Clifton K., Anant M., Aihara G., Atta L., Aimiuwu O.K., Kebschull J.M., Miller M.I., Tward D., Fan J. STalign: Alignment of spatial transcriptomics data using diffeomorphic metric mapping. Nat. Commun. 2023;14:8123. doi: 10.1038/s41467-023-43915-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhou X., Dong K., Zhang S. Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nat. Comput. Sci. 2023;3:894–906. doi: 10.1038/s43588-023-00528-w. [DOI] [PubMed] [Google Scholar]
- 31.Wolf F.A., Angerer P., Theis F.J. SCANPY: Large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Traag V.A., Waltman L., Van Eck N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019;9:5233. doi: 10.1038/s41598-019-41695-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Cang Z., Ning X., Nie A., Xu M., Zhang J. SCAN-IT: Domain Segmentation of Spatial Transcriptomics images by Graph Neural Network. BMVC. 2021;32:406. doi: 10.5244/c.35.320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Pham D., Tan X., Balderson B., Xu J., Grice L.F., Yoon S., Willis E.F., Tran M., Lam P.Y., Raghubar A., et al. Robust mapping of spatiotemporal trajectories and cell–cell interactions in healthy and diseased tissues. Nat. Commun. 2023;14:7739. doi: 10.1038/s41467-023-43120-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cheng A., Hu G., Li W.V. Benchmarking Cell-Type Clustering Methods for Spatially Resolved Transcriptomics Data. Brief. Bioinform. 2022;24:bbac475. doi: 10.1093/bib/bbac475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Hu Y., Xie M., Li Y., Rao M., Shen W., Luo C., Qin H., Baek J., Zhou X.M. Benchmarking Clustering, Alignment, and Integration Methods for Spatial Transcriptomics. Genome Biol. 2024;25:212. doi: 10.1186/s13059-024-03361-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Yuan Z., Zhao F., Lin S., Zhao Y., Yao J., Cui Y., Zhang X.-Y., Zhao Y. Benchmarking Spatial Clustering Methods with Spatially Resolved Transcriptomics Data. Nat. Methods. 2024;21:712–722. doi: 10.1038/s41592-024-02215-8. [DOI] [PubMed] [Google Scholar]
- 38.Hu J., Li X., Coleman K., Schroeder A., Ma N., Irwin D.J., Lee E.B., Shinohara R.T., Li M. SpaGCN: Integrating Gene Expression, Spatial Location and Histology to Identify Spatial Domains and Spatially Variable Genes by Graph Convolutional Network. Nat. Methods. 2021;18:1342–1351. doi: 10.1038/s41592-021-01255-8. [DOI] [PubMed] [Google Scholar]
- 39.Hao Y., Stuart T., Kowalski M.H., Choudhary S., Hoffman P., Hartman A., Srivastava A., Molla G., Madad S., Fernandez-Granda C., et al. Dictionary Learning for Integrative, Multimodal and Scalable Single-Cell Analysis. Nat. Biotechnol. 2024;42:293–304. doi: 10.1038/s41587-023-01767-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Li Q., Zhang M., Xie Y., Xiao G. Bayesian modeling of spatial molecular profiling data via Gaussian process. Bioinformatics. 2021;37:4129–4136. doi: 10.1093/bioinformatics/btab455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.BinTayyash N., Georgaka S., John S.T., Ahmed S., Boukouvalas A., Hensman J., Rattray M. Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments. Bioinformatics. 2021;37:3788–3795. doi: 10.1093/bioinformatics/btab486. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Svensson V., Teichmann S.A., Stegle O. SpatialDE: Identification of Spatially Variable Genes. Nat. Methods. 2018;15:343–346. doi: 10.1038/nmeth.4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Dries R., Zhu Q., Dong R., Eng C.-H.L., Li H., Liu K., Fu Y., Zhao T., Sarkar A., Bao F., et al. Giotto: A toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. 2021;22:78. doi: 10.1186/s13059-021-02286-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Miller B.F., Bambah-Mukku D., Dulac C., Zhuang X., Fan J. Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities. Genome Res. 2021;31:1843–1855. doi: 10.1101/gr.271288.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hao Y., Hao S., Andersen-Nissen E., Mauck W.M., Zheng S., Butler A., Lee M.J., Wilk A.J., Darby C., Zager M., et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e29. doi: 10.1016/j.cell.2021.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Weber L.M., Saha A., Datta A., Hansen K.D., Hicks S.C. nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nat. Commun. 2023;14:4059. doi: 10.1038/s41467-023-39748-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Hao M., Hua K., Zhang X. SOMDE: A Scalable Method for Identifying Spatially Variable Genes with Self-Organizing Map. Bioinformatics. 2021;37:4392–4398. doi: 10.1093/bioinformatics/btab471. [DOI] [PubMed] [Google Scholar]
- 48.Zhu J., Sun S., Zhou X. SPARK-X: Non-Parametric Modeling Enables Scalable and Robust Detection of Spatial Expression Patterns for Large Spatial Transcriptomic Studies. Genome Biol. 2021;22:184. doi: 10.1186/s13059-021-02404-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Li Z., Patel Z.M., Song D., Yan G., Li J.J., Pinello L. Benchmarking Computational Methods to Identify Spatially Variable Genes and Peaks. bioRxiv. 2023 doi: 10.1101/2023.12.02.569717. bioRxiv:2023.12.02.569717. [DOI] [Google Scholar]
- 50.Chen C., Kim H.J., Yang P. Evaluating Spatially Variable Gene Detection Methods for Spatial Transcriptomics Data. Genome Biol. 2024;25:18. doi: 10.1186/s13059-023-03145-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.MORAN P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika. 1950;37:17–23. doi: 10.1093/biomet/37.1-2.17. [DOI] [PubMed] [Google Scholar]
- 52.Zhou X., Franklin R.A., Adler M., Jacox J.B., Bailis W., Shyer J.A., Flavell R.A., Mayo A., Alon U., Medzhitov R. Circuit Design Features of a Stable Two-Cell System. Cell. 2018;172:744–757.e17. doi: 10.1016/j.cell.2018.01.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Hu M., Polyak K. Microenvironmental Regulation of Cancer Development. Curr. Opin. Genet. Dev. 2008;18:27–34. doi: 10.1016/j.gde.2007.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Shao X., Lu X., Liao J., Chen H., Fan X. New Avenues for Systematically Inferring Cell-Cell Communication: Through Single-Cell Transcriptomics Data. Protein Cell. 2020;11:866–880. doi: 10.1007/s13238-020-00727-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Armingol E., Officer A., Harismendy O., Lewis N.E. Deciphering Cell–Cell Interactions and Communication from Gene Expression. Nat. Rev. Genet. 2021;22:71–88. doi: 10.1038/s41576-020-00292-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ma F., Zhang S., Song L., Wang B., Wei L., Zhang F. Applications and Analytical Tools of Cell Communication Based on Ligand-Receptor Interactions at Single Cell Level. Cell Biosci. 2021;11:121. doi: 10.1186/s13578-021-00635-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Sekar R.B., Periasamy A. Fluorescence Resonance Energy Transfer (FRET) Microscopy Imaging of Live Cell Protein Localizations. J. Cell Biol. 2003;160:629–633. doi: 10.1083/jcb.200210140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Jackman J.A., Cho N.-J. Supported Lipid Bilayer Formation: Beyond Vesicle Fusion. Langmuir. 2020;36:1387–1400. doi: 10.1021/acs.langmuir.9b03706. [DOI] [PubMed] [Google Scholar]
- 59.Liu Z., Sun D., Wang C. Evaluation of Cell-Cell Interaction Methods by Integrating Single-Cell RNA Sequencing Data with Spatial Information. Genome Biol. 2022;23:218. doi: 10.1186/s13059-022-02783-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Jin S., Plikus M.V., Nie Q. CellChat for Systematic Analysis of Cell–Cell Communication from Single-Cell Transcriptomics. Nat. Protoc. 2025;20:180–219. doi: 10.1038/s41596-024-01045-4. [DOI] [PubMed] [Google Scholar]
- 61.Noël F., Massenet-Regad L., Carmi-Levy I., Cappuccio A., Grandclaudon M., Trichot C., Kieffer Y., Mechta-Grigoriou F., Soumelis V. Dissection of Intercellular Communication Using the Transcriptome-Based Framework ICELLNET. Nat. Commun. 2021;12:1089. doi: 10.1038/s41467-021-21244-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Efremova M., Vento-Tormo M., Teichmann S.A., Vento-Tormo R. CellPhoneDB: Inferring Cell–Cell Communication from Combined Expression of Multi-Subunit Ligand–Receptor Complexes. Nat. Protoc. 2020;15:1484–1506. doi: 10.1038/s41596-020-0292-x. [DOI] [PubMed] [Google Scholar]
- 63.Cabello-Aguilar S., Alame M., Kon-Sun-Tack F., Fau C., Lacroix M., Colinge J. SingleCellSignalR: Inference of Intercellular Networks from Single-Cell Transcriptomics. Nucleic Acids Res. 2020;48:e55. doi: 10.1093/nar/gkaa183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Browaeys R., Saelens W., Saeys Y. NicheNet: Modeling Intercellular Communication by Linking Ligands to Target Genes. Nat. Methods. 2020;17:159–162. doi: 10.1038/s41592-019-0667-5. [DOI] [PubMed] [Google Scholar]
- 65.Hou R., Denisenko E., Ong H.T., Ramilowski J.A., Forrest A.R.R. Predicting cell-to-cell communication networks using NATMI. Nat. Commun. 2020;11:5011. doi: 10.1038/s41467-020-18873-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Song Q., Su J. DSTG: Deconvoluting spatial transcriptomics data through graph-based artificial intelligence. Brief. Bioinform. 2020;22:bbaa414. doi: 10.1093/bib/bbaa414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Rodriques S.G., Stickels R.R., Goeva A., Martin C.A., Murray E., Vanderburg C.R., Welch J., Chen L.M., Chen F., Macosko E.Z. Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution. Science. 2019;363:1463–1467. doi: 10.1126/science.aaw1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Li H., Li H., Zhou J., Gao X. SD2: Spatially resolved transcriptomics deconvolution through integration of dropout and spatial information. Bioinformatics. 2022;38:4878–4884. doi: 10.1093/bioinformatics/btac605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Cang Z., Nie Q. Inferring spatial and signaling relationships between cells from single cell transcriptomic data. Nat. Commun. 2020;11:2084. doi: 10.1038/s41467-020-15968-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Danaher P., Kim Y., Nelson B., Griswold M., Yang Z., Piazza E., Beechem J.M. Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data. Nat. Commun. 2022;13:385. doi: 10.1038/s41467-022-28020-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Dong R., Yuan G.-C. SpatialDWLS: Accurate deconvolution of spatial transcriptomic data. Genome Biol. 2021;22:145. doi: 10.1186/s13059-021-02362-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Chidester B., Zhou T., Alam S., Ma J. SpiceMix enables integrative single-cell spatial modeling of cell identity. Nat. Genet. 2023;55:78–88. doi: 10.1038/s41588-022-01256-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Sun D., Liu Z., Li T., Wu Q., Wang C. STRIDE: Accurately decomposing and integrating spatial transcriptomics using single-cell RNA sequencing. Nucleic Acids Res. 2022;50:e42. doi: 10.1093/nar/gkac150. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Kleshchevnikov V., Shmatko A., Dann E., Aivazidis A., King H.W., Li T., Elmentaite R., Lomakin A., Kedlian V., Gayoso A., et al. Cell2location Maps Fine-Grained Cell Types in Spatial Transcriptomics. Nat. Biotechnol. 2022;40:661–671. doi: 10.1038/s41587-021-01139-4. [DOI] [PubMed] [Google Scholar]
- 75.Lopez R., Li B., Keren-Shaul H., Boyeau P., Kedmi M., Pilzer D., Jelinski A., David E., Wagner A., Addad Y., et al. Multi-resolution deconvolution of spatial transcriptomics data reveals continuous patterns of inflammation. bioRxiv. 2021 doi: 10.1101/2021.05.10.443517. bioRxiv:2021.05.10.443517. [DOI] [Google Scholar]
- 76.Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020;38:333–342. doi: 10.1038/s41587-019-0392-8. [DOI] [PubMed] [Google Scholar]
- 77.Cable D.M., Murray E., Zou L.S., Goeva A., Macosko E.Z., Chen F., Irizarry R.A. Robust Decomposition of Cell Type Mixtures in Spatial Transcriptomics. Nat. Biotechnol. 2022;40:517–526. doi: 10.1038/s41587-021-00830-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Miller B.F., Huang F., Atta L., Sahoo A., Fan J. Reference-free cell type deconvolution of multi-cellular pixel-resolution spatially resolved transcriptomics data. Nat. Commun. 2022;13:2339. doi: 10.1038/s41467-022-30033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Andersson A., Bergenstråhle J., Asp M., Bergenstråhle L., Jurek A., Navarro J.F., Lundeberg J. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun. Biol. 2020;3:565. doi: 10.1038/s42003-020-01247-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Elosua-Bayes M., Nieto P., Mereu E., Gut I., Heyn H. SPOTlight: Seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res. 2021;49:e50. doi: 10.1093/nar/gkab043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Biancalani T., Scalia G., Buffoni L., Avasthi R., Lu Z., Sanger A., Tokcan N., Vanderburg C.R., Segerstolpe Å., Zhang M., et al. Deep Learning and Alignment of Spatially Resolved Single-Cell Transcriptomes with Tangram. Nat. Methods. 2021;18:1352–1362. doi: 10.1038/s41592-021-01264-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Lopez R., Nazaret A., Langevin M., Samaran J., Regier J., Jordan M.I., Yosef N. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. arXiv. 2019 doi: 10.48550/arxiv.1905.02269.1905.02269 [DOI] [Google Scholar]
- 83.Welch J.D., Kozareva V., Ferreira A., Vanderburg C., Martin C., Macosko E.Z. Single-Cell multi-omic integration compares and contrasts features of brain cell identity. Cell. 2019;177:1873–1887.e17. doi: 10.1016/j.cell.2019.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Nitzan M., Karaiskos N., Friedman N., Rajewsky N. Gene expression cartography. Nature. 2019;576:132–137. doi: 10.1038/s41586-019-1773-3. [DOI] [PubMed] [Google Scholar]
- 85.Shengquan C., Boheng Z., Xiaoyang C., Xuegong Z., Rui J. stPlus: A reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics. 2021;37((Suppl. 1)):i299–i307. doi: 10.1093/bioinformatics/btab298. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Li H., Zhou J., Li Z., Chen S., Liao X., Zhang B., Zhang R., Wang Y., Sun S., Gao X. A Comprehensive Benchmarking with Practical Guidelines for Cellular Deconvolution of Spatial Transcriptomics. Nat. Commun. 2023;14:1548. doi: 10.1038/s41467-023-37168-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Yan L., Sun X. Benchmarking and Integration of Methods for Deconvoluting Spatial Transcriptomic Data. Bioinformatics. 2022;39:btac805. doi: 10.1093/bioinformatics/btac805. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Li B., Zhang W., Guo C., Xu H., Li L., Fang M., Hu Y., Zhang X., Yao X., Tang M., et al. Benchmarking Spatial and Single-Cell Transcriptomics Integration Methods for Transcript Distribution Prediction and Cell Type Deconvolution. Nat. Methods. 2022;19:662–670. doi: 10.1038/s41592-022-01480-9. [DOI] [PubMed] [Google Scholar]
- 89.Tu J.-J., Li H.-S., Yan H., Zhang X.-F. EnDecon: Cell Type Deconvolution of Spatially Resolved Transcriptomics Data via Ensemble Learning. Bioinformatics. 2023;39:btac825. doi: 10.1093/bioinformatics/btac825. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Aquila Cheunglab.org. 2024. [(accessed on 11 May 2025)]. Available online: https://aquila.cheunglab.org/
- 91.Xu Z., Wang W., Yang T., Li L., Ma X., Chen J., Wang J., Huang Y., Gould J., Lu H., et al. STOmicsDB: A comprehensive database for spatial transcriptomics data sharing, analysis and visualization. Nucleic Acids Res. 2024;5:D1053–D1061. doi: 10.1093/nar/gkad933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.STOmicsDB: Spatial TranscriptOmics DataBase. 2024. [(accessed on 11 May 2025)]. Available online: https://db.cngb.org/stomics/
- 93.SOAR: Spatial TranscriptOmics Analysis Resource. Northwestern.edu. 2018. [(accessed on 11 May 2025)]. Available online: https://soar.fsm.northwestern.edu/
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.