Abstract
Motivation
Recent rapid developments in spatial transcriptomic techniques at cellular resolution have gained increasing attention. However, the unique characteristics of large-scale cellular resolution spatial transcriptomic datasets, such as the limited number of transcripts captured per spot and the vast number of spots, pose significant challenges to current cell-type deconvolution methods.
Results
In this study, we introduce stVAE, a method based on the variational autoencoder framework to deconvolve the cell-type composition of cellular resolution spatial transcriptomic datasets. To assess the performance of stVAE, we apply it to five datasets across three different biological tissues. In the Stereo-seq and Slide-seqV2 datasets of the mouse brain, stVAE accurately reconstructs the laminar structure of the pyramidal cell layers in the cortex, which are mainly organized by the subtypes of telencephalon projecting excitatory neurons. In the Stereo-seq dataset of the E12.5 mouse embryo, stVAE resolves the complex spatial patterns of osteoblast subtypes, which are supported by their marker genes. In Stereo-seq and Pixel-seq datasets of the mouse olfactory bulb, stVAE accurately delineates the spatial distributions of known cell types. In summary, stVAE can accurately identify spatial patterns of cell types and their relative proportions across spots for cellular resolution spatial transcriptomic data. It is instrumental in understanding the heterogeneity of cell populations and their interactions within tissues.
Availability and implementation
stVAE is available in GitHub (https://github.com/lichen2018/stVAE) and Figshare (https://figshare.com/articles/software/stVAE/23254538).
1 Introduction
Spatial transcriptomics is a revolutionary molecular profiling method that enables the measurement of mRNA expression levels of genes in biological tissue, while simultaneously providing spatial information. This technique presents a unique opportunity to identify spatial patterns of cell types and uncover cellular heterogeneity within tissues (Michaela et al. 2020). Spatial transcriptomics technologies such as 10× Visium have been widely used for systematic profiling of spatially resolved transcriptome (Stähl et al. 2016). With a spot diameter of 55 μm, 10× Visium offers a spatial resolution of about 1–10 cells per spot. There may be multiple cell types present in each spot. To resolve the cell types that are present in each spot, several methods have been developed, including DestVI (Lopez et al. 2022), RCTD (Cable et al. 2021), Stereoscope (Andersson et al. 2020, Gayoso et al. 2022), and Spotlight (Elosua-Bayes et al. 2021). In this article, Stereoscope denotes the one reimplemented in scvi-tools (Gayoso et al. 2022). These methods use single-cell RNA sequencing (scRNA-seq) data as a reference and infer the cell type proportion of the spots.
Sequencing-based spatial transcriptomics technologies that can achieve cellular resolution are emerging, including Slide-seqV2 (Stickels et al. 2021), Stereo-seq (Chen et al. 2022), Pixel-seq (Xiaonan et al. 2022), and other technologies. These technologies present several unique challenges for methodology development. Firstly, the datasets generated from these technologies tend to have a much larger scale: the number of profiled spatial spots can range from 10 000 to 500 000 (Supplementary Table S1). This means the deconvolution process would be extremely time-consuming and memory-intensive. Secondly, the number of cells per spot is small. Therefore, the cell-type composition of spots should be very sparse. Thirdly, the datasets generated from these technologies tend to have a higher level of noise: the mean total unique molecular identifier (UMI) counts per spot are very low (Supplementary Table S2). In particular, existing excellent methods (e.g. RCTD, DestVI, and cell2location) have limitations in addressing all the challenges, thereby restricting their application in the analysis of large-scale cellular resolution spatial transcriptomics. This highlights the need to develop a method with efficient memory usage that can accurately infer the sparse cell-type composition of spots for large-scale spatial transcriptomic datasets with cellular resolution.
We have developed stVAE that employs a variational encoder-decoder framework to decompose cell-type mixtures for cellular resolution spatial transcriptomic data. stVAE is scalable to large-scale datasets and has less running time (see Supplementary Material). For the small spatial transcriptomic dataset, which may lack enough data to train stVAE, we construct a pseudo-spatial transcriptomic dataset to guide the training of stVAE on the small spatial transcriptomic dataset (i.e. smaller number of spatial spots). More importantly, stVAE could accurately capture the sparsity of cell-type composition in the spots of cellular resolution spatial transcriptomic data. Through the implementation on sequencing-based spatial transcriptomic data generated from different platforms and tissues, we demonstrate that stVAE accurately decomposes the cell-type mixtures for cellular resolution spatial transcriptomic data.
2 Materials and methods
2.1 Statistical model
Our model consists of one encoder network E and one decoder network . The encoder network takes a UMI count vector Xi as input. The outputs of E are the mean vector and the vector of the diagonal elements of the covariance matrix for the normal distribution . We sample latent feature Zi from using re-parameterization trick.
The decoder network with trainable parameter ω takes Zi as input and generate the cell type proportion vector , where yit represent the proportion of cell type t at spot i. We assume that the spatial expression data Xi follows a negative binomial distribution.
Formally, the model of stVAE is as follows:
(1) |
(2) |
(3) |
(4) |
(5) |
where μig represents the mean expression level of gene g at spot i, βg represents the gene-specific dispersion parameter, utg represents the mean gene expression level of gene g for cell type t, sg represents the gene-specific scaling parameter, and γg denotes the gene-specific additive noise. In our model, utg and βg are estimated from scRNA-seq reference data using the package scvi-tools. Combining Equation (4) and (5), we have likelihood of Xi as
(6) |
where . Apart from stVAE, we also considered three alternative models. The first model is denoted as stVAE_Poisson, where the only difference is that we replace the negative binomial distribution with the Poisson distribution to model the spatial expression data Xi. In the second model, we model spatial transcriptomics Xig using zero-inflated negative binomial distribution (ZINB): , where μig represents the mean expression level of gene g at spot i, βg represents the gene-specific dispersion parameter, τg represents the gene-specific dropout parameter. In the third model, we replace the variational autoencoder in stVAE with a deep neural network (DNN) (details in Supplementary Fig. S9), which has the same hidden layers and output layers with . DNN takes Xi as input and outputs Yi. We compared the performance of these models with stVAE on the mouse brain Stereo-seq (Chen et al. 2022) dataset, the mouse olfactory bulb (MOB) Stereo-seq (Chen et al. 2022) and Pixel-seq (Xiaonan et al. 2022) datasets. (Supplementary Figs S10 and S11), and they do not perform as well as stVAE.
2.2 Variational inference
We implement the variational autoencoder (VAE) framework (Kingma and Welling 2013) for variational inference. The posterior distribution of the latent feature Zi is approximated by a tractable normal distribution . This could be achieved by minimizing the Kullback–Leibler divergence.
(7) |
where is the prior distribution of Zi.
Define the evidence lower bound (ELBO):
(8) |
So maximizing is equivalent to minimizing the following objective function,
(9) |
3 Results
3.1 Overview of stVAE
The framework of stVAE is shown in Fig. 1. The network architecture of stVAE consists of encoder and decoder networks (Fig. 1a). The network takes gene expression of both pseudo spots and real spatial spots as input and generates their inferred cell type proportions. The pseudo spots are generated from a reference scRNA-seq dataset: the true cell type proportions are known for the pseudo spots, and these pseudo spots serve as the supervised component to facilitate training of the neural networks in stVAE. To reduce the noise in raw data, stVAE encodes gene expression data of spots into low-dimensional latent features. This dimension reduction operation preserves the essential information and helps remove some noise in the input data (Im et al. 2017, Nguyen and Holmes 2019). To capture the sparsity of cell type composition in the spots of cellular resolution spatial transcriptomic data, we utilize the Sparsemax (Martins and Astudillo 2016) layer in the output layer. Through mini-batch training, stVAE is scalable to large-scale datasets. To reduce processing time, stVAE is implemented by Pytorch, which could be accelerated by GPU.
The intuition of stVAE is illustrated in Fig. 1b, through the implementation on the E12.5 mouse embryo Stereo-seq spatial transcriptomic dataset. The encoder network embeds the spots in a low-dimensional space Z, where different cell types are well separated. Taking advantage of this, the decoder network uses the latent feature Z as input and can easily estimate the cell type composition of the spots from Z. To validate the performance of stVAE, we constructed a simulation study (see Supplementary Material).
3.2 Application of stVAE on a cellular resolution spatial transcriptomic data of mouse brain
To assess the effectiveness of stVAE in real tissues, we first applied stVAE to analyze a cellular resolution spatial transcriptomic dataset of mouse brain generated from Stereo-seq (Chen et al. 2022). The mouse brain (Stereo-seq) dataset has a spatial resolution of 10 μm and comprises 251 760 spots (bin 20, 20 × 20 DNA nanoballs are aggregated). We used a public mouse brain scRNA-seq dataset (Zeisel et al. 2018) as the reference.
We first assessed stVAE in identifying regionally enriched cell types. For example, stVAE correctly localized dentate gyrus granule neurons (DGGRC2) to the dentate gyrus (Zeisel et al. 2018), which is strongly supported by the expression of its top-ranked marker genes Ahcyl2 (Supplementary Fig. S2a). We zoomed in the region of the dentate gyrus (Supplementary Fig. S2b): notably, the densely packed cells in the histology image are well-matched with the distribution of DGGRC2 inferred by stVAE and its marker gene Ahcyl2.
Next, we compared stVAE with DestVI and Stereoscope for mapping 23 subtypes of telencephalon projecting excitatory neurons (TEGLU) to the cortical pyramidal layers. We did not include RCTD and Spotlight in this comparison because they cannot be implemented due to the high memory usage. DestVI failed to infer the proportions of most TEGLU subtypes (Supplementary Fig. S3). We first visualized the inferred proportions of five subtypes of TEGLU and observed that stVAE identified more distinct patterns with higher proportions of the cell types in their localized areas compared to Stereoscope, supported by the high expression of the corresponding top two marker genes ranked by P-value in Zeisel et al. (2018) (Fig. 2a). Therefore, stVAE accurately reproduced the laminar structure of the pyramidal cell layers in the cortex of the mouse brain. Next, to quantitatively assess the performance of stVAE on all 23 subtypes of TEGLU, we selected the top two ranking marker genes (sorted by P-value) for each TEGLU subtype and calculated the Spearman’s rank correlation between its inferred cell type proportion and marker gene expression across all spots. All the marker gene-cell type pairs were pooled in the boxplots (Fig. 2b). The resulting higher correlation confirms the accuracy of stVAE deconvolution in identifying the TEGLU subtypes. We also computed Moran’s I score, which evaluates the spatial autocorrelation of the inferred cell type proportion (Moran 1950). The higher Moran’s I score (Fig. 2c) demonstrates that the spatial distribution of TEGLU subtypes inferred by stVAE has a stronger spatial pattern. Furthermore, to compare the overall performance of stVAE and Stereoscope on the mouse brain (Stereo-seq) dataset, we calculated the Spearman’s rank correlation between the inferred proportions of the 224 cell types and the expression of their 446 top-ranked marker genes (Zeisel et al. 2018). The higher Spearman’s rank correlation (Fig. 2d) confirms the accuracy of stVAE deconvolution in mouse brain (Stereo-seq) dataset.
3.3 stVAE identifies cell types in a large-scale cellular resolution spatial transcriptomic data of E12.5 mouse embryo
We next applied stVAE to identify cell types in a large-scale cellular resolution spatial transcriptomic data of E12.5 mouse embryo generated from Stereo-seq (Chen et al. 2022). The dataset has a spatial resolution of 10 μm and comprises 318 364 spots (bin 20, 20 × 20 DNA nanoballs are aggregated). The scRNA-seq reference dataset of the mouse embryo is obtained from the mouse organogenesis cell atlas (Cao et al. 2019).
To benchmark stVAE, DestVI, and Stereoscope in mapping cell subtypes with complex spatial patterns, we considered subtypes of osteoblasts, which are cells responsible for synthesizing bone tissue and play a crucial role in skeletal development and remodeling (Dirckx et al. 2019). We focused on five osteoblast subtypes with distinct spatial patterns and utilized their top-ranked marker genes (Cao et al. 2019) to evaluate their inferred proportions. Compared to DestVI and Stereoscope, stVAE accurately identified these osteoblast subtypes (Fig. 3), supported by the expression of their corresponding marker genes. For example, the spatial pattern of osteoblasts-15 is difficult to discern using DestVI and Stereoscope, but its marker gene, Camk1d, shows a clear spatial pattern that closely matches the proportion of osteoblasts-15 inferred by stVAE. Therefore, stVAE could aid in elucidating the intricate spatial patterns of osteoblast subtypes, which are corroborated by the spatial patterns of their top-ranked marker genes.
3.4 stVAE accurately localized cell types in cellular resolution spatial transcriptomic data of MOB
Finally, we applied stVAE to localize cell types in two cellular resolution spatial transcriptomic datasets of MOB generated from Stereo-seq and Pixel-seq, respectively. Both datasets have a cellular spatial resolution of 10 μm. The MOB (Stereo-seq) (Chen et al. 2022) dataset comprises 107 416 spots (bin 14, 14 × 14 DNA nanoballs are aggregated). The MOB (Pixel-seq) (Xiaonan et al. 2022) dataset has 115 590 spots (33 × 33 bin). We used a public MOB scRNA-seq dataset (Tepe et al. 2018) as the reference. The coronal MOB has a clear anatomical structure and is organized into six layers: rostral migratory stream (RMS), granule cell layer (GCL), mitral cell layer (MCL), external plexiform layer (EPL), glomerular layer (GL), and olfactory nerve layer (ONL) (Fu et al. 2021) (Fig. 4a).
We first utilized the MOB (Stereo-seq) dataset to evaluate the overall performance of stVAE by looking at all 40 cell types together. More specifically, for each cell type, we calculated the Spearman’s rank correlation coefficient and JS distance between the expression of its top two marker genes (Tepe et al. 2018) and the inferred proportion of cell type across all the spots (Fig. 4b and c). stVAE tends to have a higher correlation and lower JS distance compared to the other methods, which suggests that cell type proportion inferred by stVAE is more strongly supported by the observed marker gene expression. Additionally, the cell type proportion inferred by stVAE tends to have a higher Moran’s I score (Fig. 4d).
Next, we focused on five cellular subtypes with distinct regional identities, including astrocytes (Astro1), olfactory ensheathing cells (OEC4), developing immature neurons (n04-Immature), granule cells (n12-GC-6), and mitral and tufted (M/T) cells (n16-M/TC-2). Compared to other methods, the cell type proportions inferred by stVAE are more distinct, where the cell types have a higher proportion in the areas that they are localized, supported by the expression of their marker genes (Fig. 4e). For example, the top-ranked marker gene of n04-Immature Sox11 is highly expressed in the region RMS, which is consistent with the higher proportion of n04-Immature inferred by stVAE in the same region (Kahle and Bix 2012). Furthermore, compared to other methods, stVAE identified a distinct enrichment of n12-GC-6 in the superficial regions of GCL, which is supported by the expression of the marker gene Cplx1 and also the literature (Tepe et al. 2018). This demonstrates that stVAE is better able to capture the complex spatial heterogeneity of cell types.
4 Discussion
The unique characteristics of the large-scale cellular resolution spatial transcriptomics datasets, such as the low UMI counts and sparse cell-type composition per spot, pose significant challenges to current cell-type deconvolution methods. Therefore, we developed stVAE. Compared to existing methods, stVAE encodes gene expression data of spots into low-dimensional latent features, which is a dimension reduction method to reduce noise. Additionally, the Sparsemax layer integrated into the model enhances stVAE’s ability to accurately capture the sparsity of cell-type composition in real data.
The spearman’s correlation values shown in Figs 2–4 are low. This is because the total UMI counts per spot tend to be low (shown in Supplementary Table S1), indicating a low capture rate and high level of noise (Liu et al. 2023). As a result, the observed marker gene expressions tend to show a high level of sparsity and noise, resulting in a low correlation with cell type proportions across all spots. Even so, compared to other methods, cell type proportions inferred by stVAE have higher correlations with the expression levels of marker genes.
To demonstrate that stVAE is well-designed, we compare stVAE with three versions of the model. The results (Supplementary Fig. S10) indicate that the negative binomial distribution is sufficient for modeling spatial transcriptomics.
In fact, integration of scRNA-seq data and spatial transcriptomics data has been widely utilized in analyzing transcriptomics data. SpaGE (Abdelaal et al. 2020) and stPlus (Shengquan et al. 2021) perform joint embedding to identify a common latent space shared by scRNA-seq data and spatial transcriptomic data. CARD (Ma and Zhou 2022) assumes a linear model between the mixed-cell expression matrix and the cell-type-specific expression matrix, which is constructed from the scRNA-seq data. PAST (Zhen et al. 2022) treats the scRNA-seq data as one source to construct a prior gene expression matrix. This matrix could provide reference information to the Bayesian neural network module in PAST during the training process. In comparison, stVAE utilizes maximum-likelihood estimation to derive gene expression profiles for each cell type from the scRNA-seq data. Then the cell type’s expression profiles are incorporated into the VAE framework.
With the high-quality reference scRNA-seq data available, stVAE is well-suited for processing cellular resolution spatial transcriptomics datasets, and it is especially useful for large-scale datasets containing more than 100 000 spots. As more cellular resolution spatial transcriptomic datasets become available, stVAE is poised to play an increasingly important role in the analysis of such data.
Supplementary Material
Contributor Information
Chen Li, Department of Statistics, Chinese University of Hong Kong, Hong Kong 999077, China.
Ting-Fung Chan, School of Life Sciences, The Chinese University of Hong Kong, Hong Kong 999077, China; State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong 999077, China.
Can Yang, Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong 999077, China; Guangdong-Hong Kong-Macao Joint Laboratory for Data-Driven Fluid Mechanics and Engineering Applications, The Hong Kong University of Science and Technology, Hong Kong 999077, China.
Zhixiang Lin, Department of Statistics, Chinese University of Hong Kong, Hong Kong 999077, China.
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
None declared.
Funding
This work was supported by the Innovation and Technology Commission (ITC), Hong Kong Special Administrative Region Government (HKSAR) to the State Key Laboratory of Agrobiotechnology (CUHK). Any opinions, findings, conclusions, or recommendations expressed in this publication do not reflect the views of HKSAR or ITC. This work was supported by the Chinese University of Hong Kong startup grant (4930181), the Chinese University of Hong Kong Science Faculty’s Collaborative Research Impact Matching Scheme (CRIMS 4620033), Hong Kong University of Science and Technology Startup Grants (R9405, Z0428) from the Big Data Institute, and Hong Kong Research Grant Council (24301419, 14301120, 16301419, 16308120, 16307221, 16307322).
Data availability
References
- Abdelaal T, Mourragui S, Mahfouz A. et al. SpaGE: spatial gene enhancement using scRNA-seq. Nucleic Acids Res 2020;48:e107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Andersson A, Bergensträhle J, Asp M. et al. Single-cell and spatial transcriptomics enables probabilistic inference of cell type topography. Commun Biol 2020;3:565. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cable DM, Murray E, Zou LS. et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 2022;40:517–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cao J, Spielmann M, Qiu X. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 2019;566:496–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen A, Liao S, Cheng M. et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell 2022;185:1777–92.e21. [DOI] [PubMed] [Google Scholar]
- Dirckx N, Moorer MC, Clemens TL. et al. The role of osteoblasts in energy homeostasis. Nat Rev Endocrinol 2019;15:651–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elosua-Bayes M, Nieto P, Mereu E. et al. SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes. Nucleic Acids Res 2021;49:50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fu H, Xu H, Chong K. et al. Unsupervised Spatially Embedded Deep Representation of Spatial Transcriptomics. bioRxiv 2021, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
- Gayoso A, Lopez R, Xing G. et al. A Python library for probabilistic analysis of single-cell omics data. Nat Biotechnol 2022;40:163–6. [DOI] [PubMed] [Google Scholar]
- Im DJ, Ahn S, Memisevic R. et al. Denoising criterion for variational auto-encoding framework. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, San Francisco, CA, USA. Palo Alto, CA, USA: AAAI Press 2017, 2059–2065.
- Kahle MP, Bix GJ.. Structural and chemical influences on neuronal migration in the adult rostral migratory stream. J Cell Sci Ther 2012;27:469–78. [Google Scholar]
- Kingma DP, Welling M. Auto-Encoding Variational Bayes. In: Bengio Y, LeCun Y (eds), Proceedings of the 2nd International Conference on Learning Representations, Banff, AB, Canada, 2014.
- Liu Y, Zhao J, Adams TS. et al. iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects. BMC Bioinformatics 2023;24:394–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lopez R, Li B, Keren-Shaul H. et al. DestVI identifies continuums of cell types in spatial transcriptomics data. Nat Biotechnol 2022;40:1360–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma Y, Zhou X.. Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol 2022;40:1349–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martins A, Astudillo R.. From softmax to sparsemax: a sparse model of attention and Multi-Label classification. PMLR 2016;2016:1614–23. [Google Scholar]
- Michaela A, Joseph B, Joakim L.. Spatially resolved transcriptomes-next generation tools for tissue exploration. BioEssays 2020;42:1900221. [DOI] [PubMed] [Google Scholar]
- Moran PA. Notes on continuous stochastic phenomena. Biometrika 1950;37:17–23. [PubMed] [Google Scholar]
- Nguyen LH, Holmes S.. Ten quick tips for effective dimensionality reduction. PLoS Comput Biol 2019;15:e1006907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stähl PL, Salmén F, Vickovic S. et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 2016;353:78–82. [DOI] [PubMed] [Google Scholar]
- Shengquan C, Boheng Z, Xiaoyang C. et al. stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics. Bioinformatics 2021;37:i299–307. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stickels RR, Murray E, Kumar P. et al. Highly sensitive spatial transcriptomics at near-cellular resolution with Slide-seqV2. Nat Biotechnol 2021;39:313–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tepe B, Hill MC, Pekarek BT. et al. Single-cell RNA-Seq of mouse olfactory bulb reveals cellular heterogeneity and activity-dependent molecular census of Adult-Born neurons. Cell Rep 2018;25:2689–703.e3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiaonan F, Li S, Runze D. et al. Polony gels enable amplifiable DNA stamping and spatial transcriptomics of chronic pain. Cell 2022;185:4621–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeisel A, Hochgerner H, Lönnerberg P. et al. Molecular architecture of the mouse nervous system. Cell 2018;174:999–1014.e22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhen L, Xiaoyang C, Xuegong Z. et al. PAST: latent feature extraction with a prior-based self-attention framework for spatial transcriptomics. bioRxiv 2022.11.09.515447 2022, preprint: not peer reviewed. [DOI] [PMC free article] [PubMed]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.