Skip to main content
Cell Reports Methods logoLink to Cell Reports Methods
. 2022 Dec 13;2(12):100369. doi: 10.1016/j.crmeth.2022.100369

SpatialCorr identifies gene sets with spatially varying correlation structure

Matthew N Bernstein 1,, Zijian Ni 2, Aman Prasad 3, Jared Brown 2, Chitrasen Mohanty 4, Ron Stewart 1, Michael A Newton 2,4, Christina Kendziorski 4,5,∗∗
PMCID: PMC9795364  PMID: 36590683

Summary

Recent advances in spatially resolved transcriptomics technologies enable both the measurement of genome-wide gene expression profiles and their mapping to spatial locations within a tissue. A first step in spatial transcriptomics data analysis is identifying genes with expression that varies spatially, and robust statistical methods exist to address this challenge. While useful, these methods do not detect spatial changes in the coordinated expression within a group of genes. To this end, we present SpatialCorr, a method for identifying sets of genes with spatially varying correlation structure. Given a collection of gene sets pre-defined by a user, SpatialCorr tests for spatially induced differences in the correlation of each gene set within tissue regions, as well as between and among regions. An application to cutaneous squamous cell carcinoma demonstrates the power of the approach for revealing biological insights not identified using existing methods.

Keywords: spatial transcriptomics, differential correlation, statistical test

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • SpatialCorr detects spatially varying correlation between genes across space

  • SpatialCorr tests for changes in correlation both within and between tissue regions

  • SpatialCorr tests for changes in correlation among sets of genes in addition to pairs

Motivation

A first step in spatial transcriptomics data analysis is identifying genes with expression that varies spatially, and robust statistical methods exist to address this challenge. While useful, these methods do not detect spatial changes in the coordinated expression within a group of genes. We present SpatialCorr, a method for identifying sets of genes with spatially varying correlation structure. Given a collection of gene sets pre-defined by a user, SpatialCorr tests for spatially induced differences in the correlation of each gene set within tissue regions, as well as between and among regions.


Bernstein et al. present SpatialCorr, a statistical method for finding groups of genes whose interactions change across space in tissue samples measured by spatially resolved transcriptomics technologies. SpatialCorr detects genes whose interactions change both within tissue regions as well as between regions.

Introduction

Spatial transcriptomics (ST) experiments provide spatially localized measurements of genome-wide gene expression, allowing investigators to address scientific questions that were elusive just a few years ago. Unlike single-cell RNA-sequencing (scRNA-seq) approaches in which a tissue sample is dissociated to produce a suspension of single cells, thereby losing information about each cell’s location within the tissue, ST experiments retain spatial information and are therefore essential for comprehensively addressing questions associated with cell state and function when cell position and neighbors are crucial. By allowing investigators to address questions related to how adjacent cells communicate, how functional specialization is influenced by spatial location within a tissue, and how proximity among varying cell types affects downstream phenotypes, the ST technology has already enabled key insights into embryonic development,1 nephrology,2 wound healing,3 brain function,4 and cancer.5,6,7

A first step in ST data analysis is identifying genes with expression that varies spatially, so-called spatially variable (SV) genes, and robust statistical methods exist to address this challenge.8,9,10,11,12 While useful, SV genes alone do not fully describe, and in many cases cannot capture, important signals present in ST data. Specifically, cellular phenotypes are largely determined, and downstream phenotypes are largely affected, by spatially coordinated regulation (and deregulation, or re-regulation) of expression among a group of genes. Canonical examples are found in cancer, where coordinated gene expression in immune cells changes based on spatial proximity to cancer cells, substantially affecting tumor progression and response to treatment13,14,15 and in spinal cord samples from ALS patients that show disease-relevant regional differences in coordinated expression among genes in microglia and astrocyte populations.16 These and numerous other studies17,18,19 demonstrate that spatial location can have a substantial effect on coordinated gene expression both within and among cell types, and that changes in coordinated expression have important implications for basic biology and medicine.

While SV methods provide information on changes in average expression, they do not specify how, or even if, genes within a group manifest coordinated expression. A group of SV genes may be independent, or dependent; or some subset may be dependent. In addition to identifying SV genes, knowledge of the dependence structure among a group of genes, and how it changes within and across tissue regions, is required to comprehensively describe expression dynamics underlying cell states in ST experiments.

Toward this end, we introduce SpatialCorr, a semiparametric approach for identifying spatial changes in the correlation structure of a group of genes. An overview is provided in Figure 1. Given a collection of gene sets and tissue regions pre-defined by a user (Figure 1A), SpatialCorr tests for spatially induced differences in the correlation structure of each gene set within tissue regions, as well as between regions. Specifically, for a pre-defined set of genes, SpatialCorr estimates spot-specific correlation matrices using a Gaussian kernel (Figure 1B); region-specific correlations are estimated using all spots in a region (Figure 1D). SpatialCorr tests for spatially varying correlation within each tissue region (the WR-test) using a multivariate normal (MVN) likelihood ratio test statistic that compares the MVN with spot-specific correlation estimates to an MVN with constant correlation estimated from all spots in the region (Figures 1C and 1F), and statistical significance is assessed nonparametrically. Specifically, p values are computed via sequential Monte Carlo (SMC) permutation20 and false discovery rate (FDR) is controlled across regions using Benjamini-Hochberg.21 For each set on test, region-specific p values are provided to highlight spatial locations showing the most significant internal variation (Figure 1C) and gene-pairs within the group are also ranked by pairwise p values so that those pairs contributing most to the differential correlation (DC) call can be easily identified. The test for identifying correlations that change between-regions (the BR-test) is not well defined for regions showing internal changes in correlation. Given this, we only consider those regions for which the correlation structure among genes shows no significant change according to the WR-test. Then, for each pair of regions, differences in correlations among the genes are identified using the BR-test (Figures 1E and 1F). As with the WR-test, FDR-adjusted p values for the BR-test are computed via permutation.

Figure 1.

Figure 1

SpatialCorr overview

A schematic workflow of the analyses performed by SpatialCorr.

(A) First, distinct tissue regions are identified (e.g., via clustering or manual annotation).

(B) For a pre-defined set of genes, since spots do not contain replicates, SpatialCorr estimates correlations at each spot using a Gaussian kernel. In this example, regions 1 and 2 have too few neighbors for the Gaussian kernel and, consequently, were removed by the effective-neighbors filter.

(C) SpatialCorr tests for spatially varying correlation within each tissue region (the WR-test); p values are estimated via permutation with FDR controlled across regions using Benjamini-Hochberg. The WR-test identifies region 10 as significant.

(D) Regions without spatially varying correlation are identified (here, regions 3–9) and region-specific correlations are estimated using all spots in those regions.

(E) The pre-defined set of genes is tested for differences in correlation structure between each pair of regions using the BR-test. As in the WR-test, p values for the BR-test are computed via permutation and FDR is controlled across pairs using Benjamini-Hochberg.

(F) An overview of the MVN likelihood ratio test statistic used by SpatialCorr. The WR-test compares spot-specific correlations with correlation calculated across the region; the BR-test compares region-specific correlations with correlation calculated across the pair of regions being compared (when testing for differences between two regions) or across the entire slide (when testing for differences among regions across the entire slide).

SpatialCorr relies on SMC permutation20 to efficiently compute p values and is implemented in an easy-to-use Python package. Consequently, it is computationally efficient and scalable to hundreds of gene groups; it also integrates directly into existing ST analysis pipelines implemented in Scanpy.22 Finally, the SpatialCorr software simulates realistic count data with spatially varying correlation structure. In addition to evaluating SpatialCorr, the simulation framework developed here is expected to prove useful in benchmarking studies as additional DC methods for spatial experiments are developed.

In contrast to scHOT,23 an approach for identifying changes in a gene’s variance or in gene-gene pairwise correlations across the entire tissue, SpatialCorr considers the full dependence structure among a group of genes, as represented by the gene groups’ correlation matrix. In addition, SpatialCorr conducts DC tests within a tissue region, or between tissue regions, resulting in improved localization of the DC signal. SpatialCorr also directly accounts for differences in latent expression levels across distinct tissue regions, avoiding false positives due to mean expression that varies across regions. Since spurious correlation can be induced by a latent factor, it is important to directly model spatial changes in mean expression to avoid false positives in DC tests.

Another related method, CSN,24 constructs cell-specific gene-gene association networks in scRNA-seq data where for each cell, a network among genes is constructed where edges indicate statistical dependences within that cell. While SpatialCorr provides an estimate of the correlation matrix at each spot, it goes a step further by testing whether the correlation matrix is changing across space. Furthermore, while both CSN and SpatialCorr pool information across cells (or spots in the case of ST), CSN does not consider the spatial proximity between cells since such information is not available in scRNA-seq data. These qualities also differentiate SpatialCorr from methods like Gaussian Graphical Models,25 which seek only to infer statistical dependencies among a set of random variables, but do not seek to determine how such dependencies are changing across space.

Results

SpatialCorr increases the power to identify groups of genes having spatially varying correlation within and between tissue regions

We conducted several simulations to evaluate the performance of SpatialCorr under a variety of conditions. We note that SpatialCorr’s WR-test and BR-test assume an MVN model of the data, which may not precisely describe the distribution of normalized UMI counts found in ST data, and thus, we sought to simulate ST data composed of normalized counts in order to evaluate SpatialCorr on realistic data. While methods are available to simulate Gaussian data having a pre-defined correlation, these methods are not easily extended to count data with correlation that varies spatially.26 To simulate realistic ST data with spatially varying dependencies between genes, we developed a simulation framework based on the Poisson-lognormal model (Figure 2A).

Figure 2.

Figure 2

Simulation framework and results

An overview of the simulation framework is shown in (A). Given an input ST sample, tissue regions are identified (e.g., via clustering or manual annotation). A user then specifies the number of genes to simulate and selects a corresponding set of target genes in the spatial transcriptomics sample upon which the simulations are based. For each target gene, the latent parameters in the Poisson-lognormal model are inferred on a per-region basis. A pattern of spatially varying latent correlation is then generated using a Gaussian process sampling procedure that is tuned to match the correlations observed in experimental data. Finally, counts are generated from the Poisson-lognormal model.

(B) In an effort to ensure that results provide relevant information, data are simulated at realistic gene expression levels. Specifically, genes are rank ordered by total UMI counts. Here, four genes were selected based on expression (rank 50, 100, 200, and 400); the 400th ranked gene is considered lowly expressed, rank 200 moderately low, rank 100 moderately high, and rank 50 highly expressed. Four correlation levels (low, moderately low, moderately high, high) were also considered based on those observed in real data.

(C) Heatmaps depicting four examples of the latent spatially varying correlation patterns generated for each of the four correlation levels.

(D) Histograms comparing the distribution of UMI counts in the experimental data (left) to the simulated data (right) for each of the four expression levels.

(E) The average power of the WR-test is shown for varying correlation and expression levels (averages taken over 30 simulated datasets).

(F) The average power of the WR-test is shown for varying expression levels of gene sets of size 2, 4, 6, and 8 (averages taken over 30 simulated datasets).

(G and H) Similar to (E) and (F), but for the BR-test. Instead of varying the amount of underlying correlation among genes, we vary the difference in correlation between regions.

Given a spatially varying covariance matrix, the framework provides spatially resolved unique molecular identifier (UMI) count data having marginal distributions and spot-specific size factors that match input experimental data while controlling the correlation between a group of genes at each spot. Pre-specifying a spatially varying covariance matrix is challenging given the requirement that it be positive semidefinite. To address this challenge, the simulation framework within SpatialCorr provides a function to generate smoothly varying covariance matrices using a Gaussian process model.

We carried out six sets of simulations, Sims I-VI, using as input a breast cancer sample assayed via the 10x Visium platform. To ensure that the simulations provide both realistic and comprehensive results, genes from the input dataset were rank ordered by median UMI counts and four genes were selected as shown in Figure 2B. Data were simulated for genes having low expression (matching the 400th ranked gene), moderately low (200th ranked gene), moderately high (100th ranked gene), and high (50th ranked gene). Four correlation levels (weak, moderately weak, moderately strong, strong) were also considered based on those observed in real data (Figures 2C, S4A, and S4B). In Sim I, we simulate a pair of lowly expressed genes at each of the four correlation levels (30 replicates per level); the simulation was repeated for gene pairs having expressions in the moderately low, moderately high, and high expression ranges. The marginal distributions for simulated genes at each of the expression levels are shown in Figure 2D. Sim II is similar, but considers groups of genes of size 4, 6, and 8.

Results demonstrate the increases in power observed with increasing expression and correlation levels. Specifically, Figure 2E shows that the WR-test in SpatialCorr has low power to identify weakly correlated gene pairs, even those that are highly expressed. However, there is reasonable power to identify gene pairs with stronger levels of correlation. Figure 2F shows that SpatialCorr’s power increases substantially when the correlation structure across multiple genes is considered jointly. When groups of four genes having weak correlation are considered, for example, power is increased from 47% to 90%; the power is further increased to nearly 100% when six genes are considered. False-positive rates are well-controlled across simulation settings; for each expression level, we simulated pairs of genes from the null scenario in which there is no spatially varying correlation in each region (200 samples for each expression level). The false positive rate (FPR) across all null samples was 0.034 when using a p value cutoff of 0.05 to make a positive call.

Sim III evaluates the BR-test. Specifically, we simulate a pair of lowly expressed genes with correlation that differs by 0.2 between regions but remains constant across spots within each region; the simulation was repeated for gene pairs having expression in the moderately low, moderately high, and high expression ranges. For each expression level, the simulation was repeated with increasing differences in correlation of 0.4, 0.6, and 0.8 between regions. Sim IV is similar, but considers groups of genes of size 4, 6, and 8.

Results demonstrate the increases in power observed with increasing expression and difference in correlation between regions. Specifically, Figure 2G shows that the BR-test in SpatialCorr has low power to identify correlated gene pairs whose correlation differs minimally between regions, even those that are highly expressed (Figure 2G). However, the power increases significantly when the difference in correlation between regions increases to 0.4 and above. Figure 2H shows that SpatialCorr’s power also increases substantially when the correlation structure across multiple genes is considered jointly. When groups of four genes with pairwise correlations that differ by 0.2 between regions are considered, for example, power is increased from 23% to 77%; the power is further increased to 93% when six genes are considered. As with the WR-test, the false-positive rate is well controlled (at 0.04 when using a p value cutoff of 0.05 to make a positive call).

Next, we compared SpatialCorr and scHOT on simulated data comprising two tissue regions with large differences in mean expression between the two regions, but no spatially varying correlation within each region (Sim V; Figures S1A–S1C). When not considering these differing regions, the difference in mean expression induces spuriously high correlation along the border between the regions, which can lead to false identification of spatially varying correlation when not taking into account the differing means between the regions (Figures S1D and S1E). SpatialCorr’s WR-test and scHOT were applied to 200 datasets from Sim V. By conditioning on tissue type, SpatialCorr avoids false positives and yields uniformly distributed p values under the null distribution (Figure S1F). In contrast, scHOT, which does not condition on tissue type, produces high rates of false positives (Figure S1F). To address this in practice, scHOT restricts analyses to gene pairs showing no change in average expression. Given that most interesting genes will show some change in mean expression across a heterogeneous tissue, limiting the search to non-SV genes is often restrictive.

In order to compare the performance of scHOT with SpatialCorr’s WR-test on data that satisfy scHOT’s requirement that input genes are not SV, we generated a final set of simulations, Sim VI, for which the mean expression of genes simulated do not vary from region to region. Three correlation levels (weak, moderate, strong) were considered, and for each level, 30 datasets were simulated and evaluated by both scHOT and SpatialCorr’s WR-test (Figure S1G). At the weakest level of correlation, we found scHOT was more powerful than SpatialCorr to detect spatially varying correlation (47% versus 30%); however, the two methods performed identically at the higher levels (Figure S1H). scHOT’s better power to detect weak correlation is likely due to its use of a Spearman correlation-based test statistic, which may be more robust than SpatialCorr’s MVN-based test statistic.

SpatialCorr reveals spatially varying correlation among genes within and between regions of cutaneous squamous cell carcinoma

To further assess the performance of SpatialCorr, we applied it to clinical ST data from human cutaneous squamous cell carcinoma (cSCC), the second most common cancer worldwide.5,27 In spite of its prevalence, the molecular basis of cSCC remains poorly understood and surgery is the main treatment.28 ST experiments analyzed with SpatialCorr offer the chance to reveal new therapeutic insight beyond traditional ST analysis, including previously unknown regulatory networks. In Ji et al., ST data were obtained from four patients.5 Three of the four patients (patients 2, 4, and 6) had the histopathologic subtype of well-differentiated cSCC whereas patient 10 had a more aggressive histopathologic subtype of moderately differentiated cSCC.29 SpatialCorr was applied to each dataset to identify spatially varying correlation among groups of genes within or between regions. We considered 836 groups of genes defined by GO categories involved in immune response, cell adhesion, proliferation, and others relevant to tumorigenesis.28

Results from the WR- and BR-tests are summarized in Table S1. Consistent with the nature of the cSCC tumors samples, the top GO term identified by either the WR- or BR-tests in all four patient samples was “keratinocyte differentiation.” This GO term is enriched for keratin genes, which are primarily structural proteins of keratinocytes, but which have emerging roles in tumor invasiveness and metastasis.30 Keratins are obligate heterodimers in which type 1 keratins (e.g., KRT1, KRT5, and KRT6) classically bind to type 2 keratins (e.g., KRT10, KRT14, KRT16, and KRT17) and are therefore expected to be highly correlated in skin.31 This presents a convenient test case for SpatialCorr and we indeed identified strong correlation between expected type 1 and type 2 pairs such as KRT1 and KRT10, as well as KRT5 and KRT14, throughout the epidermis and much of the tumor (Figures 3A–3C and S2A). Less well understood are changes in the strength of typical keratin correlation patterns, as well non-canonical pairings, that have been observed in skin disease.32,33 To examine the spatially varying patterns among these keratins more closely, we identified subgroups of keratins having similar correlation patterns by clustering the pairwise correlations estimated from SpatialCorr. Figure 3F shows five subgroups derived from the moderately differentiated cSCC sample. A focus on one of the subgroups shows strong positive correlation between keratin pairs in the tumor region that decreases in the non-tumor region; the strongest pairs in the tumor region include KRT17 and KRT16 as well as KRT17 and KRT5. Most pairwise correlations in this subgroup are highest in the tumor region defined by cluster 7 (Figures 3B, 3H, S2B, and S2C), reflecting coordinated regulation of these keratins within the region. This region corresponds to a novel cell subpopulation unique to cSCC made up of tumor-specific keratinocytes (TSKs) that was identified by Ji et al. as putatively involved in tumor progression, immunosuppression, and tumor invasiveness. While KRT17 and KRT16 have been independently associated with proliferation34 and aggressiveness in a number of cancers,35,36,37,38 their coordinated regulation has not been studied. The strong correlation identified by SpatialCorr suggests an increased level of coordinated regulation within the TSK region, providing further insights into this novel subpopulation.

Figure 3.

Figure 3

Analysis of cutaneous squamous cell carcinoma data

(A) H&E stain of moderately differentiated cSCC tumor (patient 10) from Ji et al. (2020).

(B) Spots colored by their assigned BayesSpace clusters; spots that were removed by the effective-neighbors filter (having too few neighbors to estimate a correlation via the Gaussian kernel) are colored in gray. Here, all spots from clusters 1 and 2 were removed by the filter.

(C) Spots colored according to whether they belong to a cluster that is mostly tumor.

(D) Spots colored according to their tumor-specific keratinocyte (TSK) score, which measures the enrichment of the tumor-specific keratinocyte population discovered by Ji et al. (2020).

(E) Expression of nine keratin genes in the keratinocyte differentiation GO category that is identified by both the WR and BR-test in SpatialCorr as having spatially varying correlations.

(F) Spot-specific kernel estimates of correlation between all pairs of keratin genes were calculated and the pairs of genes were clustered according to their correlation pattern across the slide.

(G) Subgroups were identified by cutting the dendogram shown in (F). Each subgroup is shown as a graph consisting of genes where an edge is drawn between genes if the corresponding gene-pair belongs to the subgroup and each graph is colored according to (F).

(H) A subgroup involving KRT17 is highlighted (left); kernel estimates of correlation between each pair in the subgroup (right). We note high correlation between KRT17 and genes KRT5, KRT14, KRT6, and KRT16 in the region of the tumor enriched for the TSK population.

SpatialCorr reveals temporally varying correlation among genes within differentiating hematopoietic cells

While we have focused on ST data, SpatialCorr can be applied to any gene expression dataset for which a distance metric is defined between samples. To demonstrate, we consider a study of hematopoiesis from39 where scRNA-seq data are profiled in differentiating mouse myeloid and erythroid hematopoietic cells (Figure 4A); distance between cells is defined by the shortest path distance along the nearest-neighbors graph (Figure 4B). SpatialCorr was applied to test for temporally varying correlation within 227 groups of genes defined by GO terms related to myeloid and erythroid cell function and differentiation. The top gene set identified was Antimicrobial Humoral Response, which includes the genes Ctsg, Elane, and Prtn3. SpatialCorr finds increased pairwise correlation along the monocyte and neutrophil trajectories, with lower correlation along the other branches (Figure 4C). These genes are thought to be regulated by a common promoter and were recently discovered to jointly catalyze histone H3 amino terminus proteolytic cleavage in a process that controls hematopoietic cell differentiation.41

Figure 4.

Figure 4

Analysis of differentiating hematopoietic cells

(A) Force-directed layout of the nearest-neighbors graph of cells colored by Louvain clusters.40 Cells are differentiating from hematopoietic stem cells (Stem) into neutrophils (Neu), monocytes (Mo), basophils (Baso), megakaryocytes (MK), and erythrocytes (Ery).

(B) Force-directed layout of cells colored according to their estimated pseudotime.

(C) Correlation analysis for the genes Prtn3, Elane, and Ctsg. Plots along the diagonal depict each gene’s expression. Off-diagonal plots depict the cell-wise kernel estimates of the correlation between the gene pairs. We note high correlation between these genes along the neutrophil and monocyte trajectories.

The SpatialCorr package is an efficient and comprehensive suite of tools for analyzing and visualizing spatial correlation patterns

SpatialCorr is implemented in a fast and user-friendly Python package that implements a variety of functions for studying spatially varying correlation structure. In addition to the WR- and BR-tests that facilitate identification of gene sets having significantly varying correlation within or between regions, additional functions within SpatialCorr enable a user to compute and visualize per-spot kernel estimates of correlation (as shown in Figure 3H, right panel), confidence intervals around these estimates (Figure S2B), regional estimates of correlation (Figure S2C), and clusters of correlation patterns within groups of gene pairs (Figure 3F). For further downstream analyses, the package can be easily integrated into existing analysis pipelines implemented with Scanpy. In addition to these analysis capabilities, SpatialCorr includes a simulation framework for simulating ST data which we expect will prove useful in the development and evaluation of future methods.

SpatialCorr uses Python’s efficient NumPy library42 that provides relatively fast performance. When computing an SMC p value, permutations are computed sequentially and terminated early if the p value is deemed to be high. In this way, fewer permutations are generated when the p value is high, thereby diverting computational resources toward those gene sets that produce low p values, which require precise estimates for downstream methods that account for multiple testing. To evaluate timing on analysis of pairs, we considered 20 highly expressed gene pairs from a large sample of brain tissue consisting of 4,226 spots from Maynard et al.4 The WR-test takes approximately 2.8 min per pair to estimate SMC p values using 1,000 permutations on a single CPU core; the BR-test takes approximately 3.8 min per pair (17 and 30 min per pair are required, respectively, to estimate exact p values). In contrast, scHOT takes over 9 h per pair running 1,000 permutations on one CPU core, indicating that scHOT does not easily scale to analyses that involve multiple gene pairs in ST samples with many spots. To further improve speed, the SpatialCorr permutation test can be parallelized to use multiple CPU cores. Last, we benchmarked SpatialCorr’s speed as a function of gene set size. As expected, we found that SpatialCorr’s runtime scales approximately with the square of the number of input genes since most computation involved in SpatialCorr’s test requires operations on correlation matrices (Figure S3A).

Discussion

A fundamental task in ST experiments is identifying genes with expression levels that vary across a tissue sample, and a number of differential expression (DE) methods have been developed toward this end.8,9,10,11,12 Although a critical first step, DE measures do not capture many important types of differential regulation. Specifically, complex phenotypes may arise from a de- or re-regulation that does not significantly affect a gene’s average expression level, but rather affects the ways in which a gene’s expression changes with respect to other genes. SpatialCorr provides an approach for identifying groups of genes with spatially varying correlation within or between tissue regions. SpatialCorr is provided as a Python package that, in addition to the statistical tests for DC, provides point estimates of spot-specific correlations and confidence intervals of these estimates. We expect it will be a useful complement to existing ST analysis methods.

Limitations of the study

We note a few areas that warrant further investigation. First, test statistics in SpatialCorr’s WR-test and BR-test are derived from normal distribution theory, and it is likely that normalized UMI counts do not always follow a normal model, even after suitable transformation. Nonetheless, the permutation analysis assures validity beyond the normal data model and this mismatch is only expected to result in loss of sensitivity. Future work will entail investigating new multivariate test statistics that directly model sparse count data such as those based on the lognormal-Poisson distribution or other multivariate distributions.26 Furthermore, SpatialCorr calculates permutation p values rather than analytic p values, which may increase both the speed and accuracy of the p values. We leave the derivation of parametric p values to future work. Last, SpatialCorr is designed to operate on one sample at a time and does not incorporate data from multiple samples such as those from multiple biological replicates. An area of future development will adapt SpatialCorr to operate on multiple biological replicates. For example, one may be able to examine spatially varying correlation within “consensus” slices of tissue produced by the integration of multiple adjacent tissue slices, such as those estimated by PASTE.43

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Deposited data

10x Visium breast tumor sample https://www.10xgenomics.com V1_Breast_Cancer_Block_A_Section_1
10x Visium brain sample Maynard et al., 20214 Sample 151,507 (http://spatial.libd.org/spatialLIBD/)
10x Visium cSCC samples Ji et al., 20205 GEO: GSE144239

Software and algorithms

SpatialCorr This paper https://doi.org/10.5281/zenodo.7332905
SpatialCorr-sim This paper https://doi.org/10.5281/zenodo.7332911
BayesSpace Zhao et al., 202144 v1.2.0
Dino Brown et al., 202145 v0.99.10
Scanpy Wolf et al., 201822 v1.7.2

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the Lead Contact, Christina Kendziorski (kendzior@biostat.wisc.edu).

Materials availability

This study did not generate new unique reagents.

Method details

Test statistic

Test statistics in SpatialCorr’s WR-test and BR-test are derived from normal distribution theory, although the permutation analysis assures validity beyond the normal data model. Specifically, let Y be an n×m normalized expression matrix for a set of m genes with expression profiled at n spots. The alternative and null models are then

palt(Y):=i=1nMVN(yi;μi,SiRi(alt)Si)
pnull(Y):=i=1nMVN(yi;μi,SiRi(null)Si)

respectively, where yi is the m -dimensional vector of gene expression values at spot i, μi is the m -dimensional mean expression vector for the set of genes, Si is a diagonal matrix consisting of the genes’ standard deviations, and Ri(alt) and Ri(null) are the m×m correlation matrices at spot i under the alternative and null models, respectively. The test statistic, T(Y), is the log likelihood ratio between these models.

The means of both models are allowed to vary spot-to-spot to ensure that the test is testing for spatial differences in correlation rather than spatial differences in mean expression. The mean of each gene is estimated using kernel estimation46:

μi:=j=1nI(ci=cj)K(si,sj;γ)yjj=1nI(ci=cj)K(si,sj;γ)

where si and sj are the coordinates for spots i and j respectively, ci and cj are the tissue region id’s for spots i and j respectively, I is the identity function, K is the kernel function, and γ is the bandwidth parameter. By default, SpatialCorr uses a Gaussian kernel defined as

K(si,sj;γ):=exp(D(si,sj)γ)

where D(si,sj) is the Euclidean distance between spots i and j. We chose a default bandwidth parameter of five when operating on spot-coordinates from the 10x Visium platform. We found that as we increased the bandwidth parameter, statistical power ceased to improve at a value of five (Figure S3C) and that this minimum value was small enough to measure granular changes in correlation.

WR-test

Under the null model, the diagonal entries of the covariance matrix, SiRi(null)Si, are allowed to vary spot-to-spot to ensure that the test is testing for spatial differences in correlation rather than for spatial differences in variance. Like the means, the variance of each gene g is estimated using kernel estimation:

σnull,i,g2:=j=1nI(ci=cj)K(xi,xj;γ)(yj,gμj,g)2j=1nI(ci=cj)K(xi,xj;γ)

Under the null model, the off-diagonal entries of the covariance matrix are calculated as follows: First, for each tissue region C, the Pearson correlation matrix among the set of genes, RC, is estimated using the data for all spots in that region. Then, for each spot, the off-diagonal entries of the covariance matrix – that is, the covariance estimates between genes g and g – are calculated as σnull,i,g,g:=Rci,g,gσnull,i,g2σnull,i,g2 where ci is the region in which spot i belongs and Rci,g,g is the correlation between genes g and g in region ci.

For the alternative model, the entire covariance matrix, including the off-diagonal entries, is estimated via kernel estimation46:

SiRi(alt)Si:=j=1nI(ci=cj)K(si,sj;γ)(yjμj)(yjμj)Tj=1nI(ci=cj)K(si,sj;γ)

BR-test

Just as in the WR-test, the null and alternative models differ according to off-diagonal entries of the covariance matrix. In the alternative model, the off-diagonal entry representing the covariance between two genes g and g is given by σalt,i,g,g:=Rci,g,gσi,g2σi,g2 where ci is the tissue region of spot i and Rci,g,g is the Pearson correlation estimated in region ci. Under the null model, the Pearson correlation is calculated between the two genes using all the spots as σnull,i,g,g:=Rglobal,g,gσi,g2σi,g2 where Rglobal,g,g is the Pearson correlation estimated globally using all of the spots.

Filtering spots

Estimates of spot-specific parameters used in SpatialCorr’s test statistic will have high variance for spots with few neighbors due to a small number of spots contributing disproportionately to the estimates. For this reason, SpatialCorr filters spots with a small number of neighboring spots by setting a filter on the number of “effective neighbors” -- that is, the effective sample size used in the calculation. For spot i, effective_neighborsi:=j=1nI(ci=cj)K(si,sj;γ). The default effective-neighbors threshold is set to 10.

Assessing statistical significance

We calculate the significance of T(Y) via a permutation test so that the FPR is robustly controlled. The null hypothesis in the WR-test specifies that there is no spatially varying correlation within each region and thus spots (i.e., normalized expression profiles) are exchangeable over this region and may be permuted without affecting their joint null probability density. For the BR-test, each tissue regions' spots are first zero-mean centered and then the residual expression profiles are permuted between spatial coordinates across the entire slide (or across the pair of regions if we are testing for differing correlation structure between a region-pair).

In order to speed up the computation, SpatialCorr computes SMC p values. Behind the idea of SMC p values is the insight that one does not require as precise an estimate of a large p value as one does for a small p value. That is, if a p value is large, we need few samples from the null distribution to safely conclude that the null hypothesis cannot be rejected. In contrast, one requires precise estimates of small p values in order to perform more precise downstream analyses and correction for multiple hypothesis testing. To compute an SMC p value, one sets a pre-defined threshold t (by default, we set t=20) and generates samples from the null distribution by calculating the test statistic T(Y) on sequentially generated, random permutations until t samples from the null exceed T(Y). The p value is then computed as p:=t/l where l is the total number of permutations. For extremely small p values that may require an exorbitant number of permutations before termination, the algorithm is terminated at a pre-defined g1 permutations (by default, we set g=10,000), and estimates the p value as p:=(u+1)/g where u is the number of null samples that exceed T(Y).

Computing confidence intervals around correlation estimates

We calculate a confidence interval around the spot-specific kernel estimates of the correlation at each spot using a non-parametric bootstrap. Specifically, a fixed radius is considered around each spot and a set of bootstrap samples of spots are taken from this neighborhood. For each bootstrap sample, the kernel estimate of the correlation is calculated as described above; and a confidence interval is computed using the bootstrapped correlation estimates.

Simulation framework

Our goal is to simulate realistic ST data where the covariance for select sets of genes varies smoothly across the slide. Toward this end, each simulated dataset requires a case study input dataset from which parameters are estimated. The simulation framework is built upon two models. The first model generates smoothly varying latent correlations between genes at each spot. The second model is a lognormal-Poisson model that uses these correlations as parameters to generate UMI counts.

For spot i, counts for two genes are simulated as Poisson: yi,1 Poisson(siλ1) and yi,2 Poisson(siλ2) with logλ1,logλ2MVN(μ1,μ2,σ12,σ22,σ1,2,i). Here, N is the total number of spots, si is the total UMI count at spot i, and yi,j is the UMI count for gene j at spot i. Synthetic measurements are mutually independent among spots; the dependencies arise between genes within spots. The means, μ1 and μ2, and variances, σ12 and σ22, are estimated using the experimental data. Specifically, for each gene, we use the posterior means from a simple hierarchical Bayesian model that uses a normal prior for the means and a truncated Cauchy prior for the variances. When simulating UMI counts from distinct tissue regions, we perform the aforementioned process on each region independently using different parameters (based on different pairs of genes) for each region.

The covariance parameters, σ1,2,i, across all spots i[N], are generated according to a Gaussian-process model. First, we compute the kernel matrix between spots using a radial basis function kernel KRN×N where each element is computed as

Ki,j=exp(sisj2γ)

where si and sj are the spot-coordinates for spot i and spot j, and γ is the bandwidth parameter that sets the width of the radial basis function. Given the kernel, latent Fisher-transformed correlations are sampled from a Gaussian process. atanh(r)MVN(0,cK).

Note that c is referred to as the “covariance strength” parameter which increases or decreases the variance of the correlations across the slide. A high value for c will produce stronger correlations (more correlation values close to -1 and 1) whereas small values for c will produce weaker correlations (closer to 0). Fisher-transformed correlations are converted to Pearson correlations and the covariance between the two genes is then defined as

σ1,2,i:=riσ12,σ22

where ri is the Pearson correlation between the two genes at spot i (i.e., the i th element of r). Note that σ12 and σ22 are estimated from the experimental data as previously described.

To simulate UMI counts for sets of genes with spatially varying correlation, we use the same Poisson-lognormal-based model as was described previously for simulating pairs of genes. However, we augment the process for generating smoothly varying latent correlations due to the requirement that the covariance matrix must be positive semidefinite. For each gene g in the gene set, we generate a normally distributed vector of length N, vgMVN(0,cK), where again KRN×N is a kernel matrix, and c controls the variability of the correlation values across spots. For each spot i, we simulate latent expression profiles for all genes by sampling from an MVN distribution logλiMVN(0,Φ+ziziT) where Φ is a diagonal matrix that stores the variance of each simulated gene and zi:=[v1,iv2,ivG,i]T where G is the total number of genes in the target gene set. In short, for a given gene g, the random variables vg,i vary smoothly across the spatial coordinates and thus, for a given pair of genes, g1 and g2, the value vg1,ivg2,i (i.e., an off-diagonal entry of ziziT) varies smoothly. In this way, we can randomly generate a positive semidefinite matrix at each spot such that they vary smoothly, elementwise, across the spatial coordinates while controlling the resolution of these changes.47 Note that the covariance strength parameter, c, scales the variability in the entries of the Φ+ziziT matrices.

Setting correlation in simulated data to match experimental data

We sought to generate latent correlation patterns in simulated data (such as those shown in Figure 2C) that match the latent correlations between genes in experimental data. This task is challenging because the known latent correlations used as input to the Poisson-lognormal simulation model are not directly comparable to the estimated correlations from the experimental data. In short, the fewer the UMI counts, the greater the discrepancy between the latent and observed correlation values.48 Unfortunately, this issue is not circumvented using normalization methods that attempt to infer the latent expression values in the experimental data, such as Dino45 or SAVER,49 due to the uncertainty in estimates of the expression values.

To assess how well our simulated correlation patterns match those found in experimental data, we performed an empirical comparison of the spot-wise correlations estimated from the normalized counts (via the Gaussian kernel) between the simulated and experimental data. Specifically, we performed the following analysis:

  • 1.

    For each level of simulated latent correlation (i.e., weak, moderately weak, moderately strong, and strong), we generate 10 simulated datasets using experimental data for the Rank 50 gene to seed the simulation and normalize each simulated dataset.

  • 2.

    For each simulated dataset, we estimate the spot-wise correlations using Gaussian kernel estimation. We then compute the variance of these spot-wise correlation estimates across the slide.

  • 3.

    For all pairs of genes between ranks 40–60, we estimate the spot-wise correlations using the experimental data, and for each pair, compute the variance of these spot-wise correlation estimates.

Results are shown in Figure S4A. We found the estimated correlations calculated from the simulated data are similar to those observed in the experimental data at each of the four expression levels.

To more rigorously assess how the variability of simulated spot-wise correlation estimates compare to those in experimental data, we also plot the distribution of the variances of the spot-wise correlation values for all gene-pairs assessed in the experimental data (Figure S4B). For each simulated latent correlation level, the average variances of the correlation estimates across all simulated datasets generated with that latent correlation level are shown (average taken over 10 replicates). As seen in Figure S4B, we found that the average variance for the highest latent correlation level lay within the tail of the distribution of the variances calculated from the experimental data, suggesting that the highest latent correlation level used in the simulations corresponds to a typical strongly correlated gene-pair in real data.

Simulated data

Simulated data was generated based on Sample V1_Breast_Cancer_Block_A_Section_1, which was downloaded from 10x Genomics using the Scanpy Python library via the visium_sge function. This sample is also available to download from the 10x Genomics website: (https://www.10xgenomics.com). We generate simulated data under four simulation scenarios, described as follows.

SimI: Two genes and five tissue regions with spatially varying correlation within each region. We simulate datasets for gene pairs with four levels of expression corresponding to the 50th, 100th, 200th, and 400th gene in the experimental data ranked according to their median UMI counts. The strength of the correlation, as set by the correlation strength parameter, c, is defined to be zero (c = 0.0), weak (c=0.05), moderately weak (c=0.1), moderately strong (c=0.2), and strong (c=0.4). These five parameter values for c were found to generate data with realistic spatially varying correlation.

SimII: Multiple genes and five tissue regions with spatially varying correlation within each region. We simulate datasets for genes with four levels of expression as per SimI. A total of nine genes were simulated using a covariance strength parameter of c=0.1. We note that the effect of c on the variability of generated spot-wise correlations depends on the number of genes being simulated. We found a value of c=0.1 in the nine-gene setting generated a similar strength of correlation as was produced by c=0.05 (the “weak” setting) in the two-gene scenario of SimI. For an example of simulated pairwise correlation patterns for four genes, see Figure S4C.

SimIII: Two genes and five tissue regions with no varying correlation within each region but differing correlation between regions. We simulate datasets for gene-pairs with four levels of expression as per SimI. We grouped the regions into two groups where each group shared the same internal correlation, but the correlation between the groups of regions differed. Differences in the correlation between the regions was set to zero, 0.2, 0.4, and 0.8. For an example of simulated pairwise correlation patterns, see Figure S4D.

SimIV: Multiple genes and five tissue regions with no varying correlation within each region but differing correlation between regions. We simulate datasets for genes with four levels of expression as per SimI. A total of nine genes were simulated for each dataset. We grouped the regions into two groups where each group shared the same internal correlation structure, but the correlation between the groups of regions differed. Specifically, in one group, the correlation between all pairs of genes was zero, but in the second group, it was set to 0.2. For an example of simulated pairwise correlation patterns for four genes, see Figure S4E.

SimV: Two genes and two tissue regions with no varying correlation within each region nor between regions. The mean expression; however, differs between the two regions. The two regions divide the slide in half as seen in Figures S1A–S1C.

SimVI: Two genes and a single tissue region with spatially varying correlation. We simulate datasets for one pair of genes corresponding to the 200th gene in the experimental data ranked according to their median UMI counts. The strength of the correlation, as set by the correlation strength parameter, c, is defined to be weak (c=0.025), moderate (c=0.05), and strong (c=0.1).

Comparison of kernel functions

We tested three kernel functions for use in SpatialCorr’s WR-test: a Gaussian kernel, triangular kernel, and uniform kernel. Given spot coordinates si and sj for spots i and j respectively, let D(si,sj) be the Euclidean distance between them. Then the unnormalized Gaussian kernel is defined as

K(si,sj;γ):=exp(D(si,sj)γ)

where γ is the kernel bandwidth parameter. The unnormalized triangular kernel is defined as

K(si,sj;γ):=max(1D(si,sj)γ,0)

The unnormalized uniform kernel is defined as

K(si,sj;γ):={1:D(si,sj)<γ0:otherwise

Specifically, we applied SpatialCorr’s WR-test to data simulated under Sim I using all three aforementioned kernels. We used a kernel bandwidth of five for the Gaussian and uniform kernels and a bandwidth of eight for the triangular kernel as these bandwidth parameters produced similar distributions of spot-wise effective neighbors across the slide (data not shown). As seen in Figure S3B, the Gaussian and triangular kernels performed similarly; however, the uniform kernel yielded reduced power. This may be because, for a given target spot, the uniform kernel assigns all spots within a specific radius from the spot (determined by the bandwidth parameter) an equal weight and unlike the Gaussian or triangular kernels, does not down weight spots according to distance.

Normalization

It is often observed in spatial transcriptomics datasets that the total UMI counts detected at each spot (hereafter referred to as library size) vary significantly across the tissue slide. Moreover, this variation is not random as regions of high or low counts tend to cluster together. Moreover, because kernel estimation of the correlation is susceptible to changes in measured expression (Figures S1A–S1E), these changes in library size threaten to induce spurious correlation across the slide. For this reason, it is crucial to normalize the expression data to account for changes in library size across the slide. For all datasets used in this study, including simulations, we used the Dino algorithm to perform normalization because it was shown to better normalize for changes in library size than existing state-of-the-art methods.45

Calculation of TSK-scores in cSCC data

We used the TSK marker genes provided in the Supplementary materials of Ji et al. (2020). For each gene and each spot, we compute the Z score of the Dino-normalized gene expression values. We define the TSK-score at each spot to be the sum of z-scores across all TSK-markers.

Publicly available case study datasets

For the analysis of the cSCC data from Ji et al. (2020), we used samples from patients 2, 4, 6 and 10 (GEO accessions GEO:GSM4284316, GEO:GSM4565823, GEO:GSM4565826 and GEO:GSM4284326, respectively). Clustering was performed using BayesSpace44 (v1.2.0 for R v4.1). We followed the procedure outlined in the vignette for the BayesSpace R package. The number of clusters for each sample was selected by choosing the elbow of the curve plotting the log likelihood of the BayesSpace model versus the number of clusters as recommended by Zhao et al. Based on visual inspection, the cluster boundaries matched well with the leading edge of the tumor region as identified by Ji et al. (2020), and therefore, we considered a cluster to be tumor-enriched if its spots fell within the tumor region.

For the analysis of the hematopoiesis single cell data, cells were normalized, ordered by pseudotime, and annotated according to the Scanpy tutorial for trajectory inference (https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html). Specifically, normalization was performed using log counts per million followed by Z score normalization. Pseudotime was calculated using graph diffusion.50 Cells were also normalized separately via Dino and the Dino-normalized expression values were used to run SpatialCorr. Lastly, we used the shortest path distances between cells along the four-nearest-neighbors graph and used these pairwise distances to compute the Gaussian kernel used by SpatialCorr.

For the experiments that tested SpatialCorr’s execution time, we used sample 151507 from the SpatialLIBD project.4 We used the author-provided “layer_guess” annotations of the cortical layers to designate the regions.

Selection of gene sets for analysis

For each cSCC tumor sample, we applied SpatialCorr to 836 groups of genes defined by GO categories known to be relevant to cancer, the skin, and skin development, such as those related to cellular proliferation, cell-cell adhesion, cell death, DNA repair, immune response, and others. GO terms were retrieved in batches by first determining a set of target terms (e.g. “DNA repair”) and for each term, the set of GO terms were identified that contained the given term as a substring. For each GO term, the top 15 most highly expressed genes (measured by the fraction of spots on the slide with at least one UMI count) were used within each category after filtering genes not detected in at least 0.2 of spots. GO categories having fewer than 5 selected genes were not used in order to obtain gene sets of roughly equal size. From the 836 gene sets, this resulted in consideration of 134 gene sets for patient 6, 287 gene sets for patient 10, 539 for patient 4, and 555 for patient 2. We curated a collection of 227 GO categories related to hematopoiesis using the same workflow used in the cSCC analysis, but starting with key terms related to immune cell function and hematopoiesis (e.g., “hematopoietic stem”). From the 227 gene sets, this resulted in consideration of 64 gene sets.

Quantification and statistical analysis

All statistics were performed in Python and R and are described in the results section and figure legends.

Acknowledgments

M.N.B. acknowledges support of a postdoctoral fellowship provided by the Morgridge Institute for Research. C.K., M.A.N., and Z.N. were supported by GM102756. M.A.N. was also partially supported by NSF 2023239-DMS, P01CA250972, and P50CA278595.

Author contributions

M.N.B. and C.K. designed the research project and wrote the first version of the manuscript. M.A.N. proposed the initial idea to use a log likelihood ratio for SpatialCorr’s statistical test, and M.N.B. led the model development and validation. A.P. provided domain expertise for the squamous cell carcinoma study. M.N.B. wrote the SpatialCorr package. Z.N., C.M., and J.B. contributed to the code used to test SpatialCorr. All authors contributed to the conceptual development of the paper and to the writing.

Declaration of interests

The authors declare no competing interests.

Published: December 13, 2022

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.crmeth.2022.100369.

Contributor Information

Matthew N. Bernstein, Email: matthew.nathan.bernstein@gmail.com.

Christina Kendziorski, Email: kendzior@biostat.wisc.edu.

Supplemental information

Document S1. Figures S1–S4
mmc1.pdf (3.4MB, pdf)
Table S1. Top GO terms identified by the WR and BR-tests in the cSCC data, related to Figure 3
mmc2.xlsx (16.9KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (7.3MB, pdf)

Data and code availability

References

  • 1.Srivatsan S.R., Regier M.C., Barkan E., Franks J.M., Packer J.S., Grosjean P., Duran M., Saxton S., Ladd J.J., Spielmann M., et al. Embryo-scale, single-cell spatial transcriptomics. Science. 2021;373:111–117. doi: 10.1126/science.abb9536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Melo Ferreira R., Sabo A.R., Winfree S., Collins K.S., Janosevic D., Gulbronson C.J., Cheng Y.-H., Casbon L., Barwinska D., Ferkowicz M.J., et al. Integration of spatial and single-cell transcriptomics localizes epithelial cell–immune cross-talk in kidney injury. Jci Insight. 2021;6 doi: 10.1172/jci.insight.147703. e147703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Foster D.S., Januszyk M., Yost K.E., Chinta M.S., Gulati G.S., Nguyen A.T., Burcham A.R., Salhotra A., Ransom R.C., Henn D., et al. Integrated spatial multiomics reveals fibroblast fate during tissue repair. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2110025118. e2110025118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Maynard K.R., Collado-Torres L., Weber L.M., Uytingco C., Barry B.K., Williams S.R., Catallini J.L., Tran M.N., Besich Z., Tippani M., et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat. Neurosci. 2021;24:425–436. doi: 10.1038/s41593-020-00787-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ji A.L., Rubin A.J., Thrane K., Jiang S., Reynolds D.L., Meyers R.M., Guo M.G., George B.M., Mollbrink A., Bergenstråhle J., et al. Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma. Cell. 2020;182:497–514.e22. doi: 10.1016/j.cell.2020.05.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Moncada R., Barkley D., Wagner F., Chiodin M., Devlin J.C., Baron M., Hajdu C.H., Simeone D.M., Yanai I. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat. Biotechnol. 2020;38:333–342. doi: 10.1038/s41587-019-0392-8. [DOI] [PubMed] [Google Scholar]
  • 7.Hunter M.V., Moncada R., Weiss J.M., Yanai I., White R.M. Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface. Nat. Commun. 2021;12:6278. doi: 10.1038/s41467-021-26614-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Edsgärd D., Johnsson P., Sandberg R. Identification of spatial expression trends in single-cell gene expression data. Nat. Methods. 2018;15:339–342. doi: 10.1038/nmeth.4634. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Svensson V., Teichmann S.A., Stegle O. SpatialDE: identification of spatially variable genes. Nat. Methods. 2018;15:343–346. doi: 10.1038/nmeth.4636. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sun S., Zhu J., Zhou X. Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nat. Methods. 2020;17:193–200. doi: 10.1038/s41592-019-0701-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Andersson A., Lundeberg J. sepal: identifying transcript profiles with spatial patterns by diffusion-based modeling. Bioinformatics. 2021;37:2644–2650. doi: 10.1093/bioinformatics/btab164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Li Q., Zhang M., Xie Y., Xiao G. Bayesian modeling of spatial molecular profiling data via Gaussian process. Bioinformatics. 2021;37:4129–4136. doi: 10.1093/bioinformatics/btab455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Watson M.J., Vignali P.D.A., Mullett S.J., Overacre-Delgoffe A.E., Peralta R.M., Grebinoski S., Menk A.V., Rittenhouse N.L., DePeaux K., Whetstone R.D., et al. Metabolic support of tumor-infiltrating regulatory T cells by lactic acid. Nature. 2021;591:645–651. doi: 10.1038/s41586-020-03045-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Demaria O., Cornen S., Daëron M., Morel Y., Medzhitov R., Vivier E. Harnessing innate immunity in cancer therapy. Nature. 2019;574:45–56. doi: 10.1038/s41586-019-1593-5. [DOI] [PubMed] [Google Scholar]
  • 15.Maier B., Leader A.M., Chen S.T., Tung N., Chang C., LeBerichel J., Chudnovskiy A., Maskey S., Walker L., Finnigan J.P., et al. A conserved dendritic-cell regulatory program limits antitumour immunity. Nature. 2020;580:257–262. doi: 10.1038/s41586-020-2134-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Maniatis S., Äijö T., Vickovic S., Braine C., Kang K., Mollbrink A., Fagegaltier D., Andrusivová Ž., Saarenpää S., Saiz-Castro G., et al. Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis. Science. 2019;364:89–93. doi: 10.1126/science.aav9776. [DOI] [PubMed] [Google Scholar]
  • 17.Hashimshony T., Feder M., Levin M., Hall B.K., Yanai I. Spatiotemporal transcriptomics reveals the evolutionary history of the endoderm germ layer. Nature. 2015;519:219–222. doi: 10.1038/nature13996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Li J., Luo H., Wang R., Lang J., Zhu S., Zhang Z., Fang J., Qu K., Lin Y., Long H., et al. Systematic reconstruction of molecular cascades regulating GP development using single-cell RNA-seq. Cell Rep. 2016;15:1467–1480. doi: 10.1016/j.celrep.2016.04.043. [DOI] [PubMed] [Google Scholar]
  • 19.Huisman S.M.H., van Lew B., Mahfouz A., Pezzotti N., Höllt T., Michielsen L., Vilanova A., Reinders M.J.T., Lelieveldt B.P.F. BrainScope: interactive visual exploration of the spatial and temporal human brain transcriptome. Nucleic Acids Res. 2017;45:e83. doi: 10.1093/nar/gkx046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Besag J., Clifford P. Sequential Monte Carlo p-Values. Biometrika. 1991;78:301–304. doi: 10.1093/biomet/78.2.301. [DOI] [Google Scholar]
  • 21.Benjamini Y., Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 1995;57:289–300. [Google Scholar]
  • 22.Wolf F.A., Angerer P., Theis F.J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19:15. doi: 10.1186/s13059-017-1382-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ghazanfar S., Lin Y., Su X., Lin D.M., Patrick E., Han Z.G., Marioni J.C., Yang J.Y.H. Investigating higher order interactions in single cell data with scHOT. Nat. Methods. 2020;17:799–806. doi: 10.1038/s41592-020-0885-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dai H., Li L., Zeng T., Chen L. Cell-specific network constructed by single-cell RNA sequencing data. Nucleic Acids Res. 2019;47:e62. doi: 10.1093/nar/gkz172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Shojaie A. Differential network analysis: a statistical perspective. Wiley Interdiscip Rev Comput Stat. 2021;13 doi: 10.1002/wics.1508. e1508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Inouye D.I., Yang E., Allen G.I., Ravikumar P. A review of multivariate distributions for count data derived from the Poisson distribution. Wiley Interdiscip Rev Comput Stat. 2017;9 doi: 10.1002/wics.1398. e1398. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rubió-Casadevall J., Hernandez-Pujol A.M., Ferreira-Santos M.C., Morey-Esteve G., Vilardell L., Osca-Gelis G., Vilar-Coromina N., Marcos-Gragera R. Trends in incidence and survival analysis in non-melanoma skin cancer from 1994 to 2012 in Girona, Spain: a population-based study. Cancer Epidemiol. 2016;45:6–10. doi: 10.1016/j.canep.2016.09.001. [DOI] [PubMed] [Google Scholar]
  • 28.Dotto G.P., Rustgi A.K. Squamous cell cancers: a unified perspective on biology and genetics. Cancer Cell. 2016;29:622–637. doi: 10.1016/j.ccell.2016.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rowe D.E., Carroll R.J., Day C.L., Jr. Prognostic factors for local recurrence, metastasis, and survival rates in squamous cell carcinoma of the skin, ear, and lip. J. Am. Acad. Dermatol. 1992;26:976–990. doi: 10.1016/0190-9622(92)70144-5. [DOI] [PubMed] [Google Scholar]
  • 30.Karantza V. Keratins in health and cancer: more than mere epithelial cell markers. Oncogene. 2011;30:127–138. doi: 10.1038/onc.2010.456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Moll R., Divo M., Langbein L. The human keratins: biology and pathology. Histochem. Cell Biol. 2008;129:705–733. doi: 10.1007/s00418-008-0435-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Quigley D.A., Kandyba E., Huang P., Halliwill K.D., Sjölund J., Pelorosso F., Wong C.E., Hirst G.L., Wu D., Delrosario R., et al. Gene expression architecture of mouse dorsal and tail skin reveals functional differences in inflammation and cancer. Cell Rep. 2016;16:1153–1165. doi: 10.1016/j.celrep.2016.06.061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Toivola D.M., Boor P., Alam C., Strnad P. Keratins in health and disease. Curr. Opin. Cell Biol. 2015;32:73–81. doi: 10.1016/j.ceb.2014.12.008. [DOI] [PubMed] [Google Scholar]
  • 34.Paramio J.M., Casanova M.L., Segrelles C., Mittnacht S., Lane E.B., Jorcano J.L. Modulation of cell proliferation by cytokeratins K10 and K16. Mol. Cell Biol. 1999;19:3086–3094. doi: 10.1128/mcb.19.4.3086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Han W., Hu C., Fan Z.-J., Shen G.-L. Transcript levels of keratin 1/5/6/14/15/16/17 as potential prognostic indicators in melanoma patients. Sci. Rep. 2021;11:1023. doi: 10.1038/s41598-020-80336-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Nair R.R., Hsu J., Jacob J.T., Pineda C.M., Hobbs R.P., Coulombe P.A. A role for keratin 17 during DNA damage response and tumor initiation. Proc. Natl. Acad. Sci. USA. 2021;118 doi: 10.1073/pnas.2020150118. e2020150118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Liu Z., Yu S., Ye S., Shen Z., Gao L., Han Z., Zhang P., Luo F., Chen S., Kang M. Keratin 17 activates AKT signalling and induces epithelial-mesenchymal transition in oesophageal squamous cell carcinoma. J. Proteomics. 2020;211 doi: 10.1016/j.jprot.2019.103557. 103557. [DOI] [PubMed] [Google Scholar]
  • 38.Huang W.-C., Jang T.-H., Tung S.-L., Yen T.-C., Chan S.-H., Wang L.-H. A novel miR-365-3p/EHF/keratin 16 axis promotes oral squamous cell carcinoma metastasis, cancer stemness and drug resistance via enhancing β5-integrin/c-met signaling pathway. J. Exp. Clin. Cancer Res. 2019;38:89. doi: 10.1186/s13046-019-1091-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Paul F., Arkin Y., Giladi A., Jaitin D.A., Kenigsberg E., Keren-Shaul H., Winter D., Lara-Astiaso D., Gury M., Weiner A., et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell. 2015;163:1663–1677. doi: 10.1016/j.cell.2015.11.013. [DOI] [PubMed] [Google Scholar]
  • 40.Blondel V.D., Guillaume J.-L., Lambiotte R., Lefebvre E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008;2008 doi: 10.1088/1742-5468/2008/10/p10008. P10008. [DOI] [Google Scholar]
  • 41.Cheung P., Schaffert S., Chang S.E., Dvorak M., Donato M., Macaubas C., Foecke M.H., Li T.-M., Zhang L., Coan J.P., et al. Repression of CTSG, ELANE and PRTN3-mediated histone H3 proteolytic cleavage promotes monocyte-to-macrophage differentiation. Nat. Immunol. 2021;22:711–722. doi: 10.1038/s41590-021-00928-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Harris C.R., Millman K.J., van der Walt S.J., Gommers R., Virtanen P., Cournapeau D., Wieser E., Taylor J., Berg S., Smith N.J., et al. Array programming with NumPy. Nature. 2020;585:357–362. doi: 10.1038/s41586-020-2649-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Zeira R., Land M., Strzalkowski A., Raphael B.J. Alignment and integration of spatial transcriptomics data. Nat. Methods. 2022;19:567–575. doi: 10.1038/s41592-022-01459-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Zhao E., Stone M.R., Ren X., Guenthoer J., Smythe K.S., Pulliam T., Williams S.R., Uytingco C.R., Taylor S.E.B., Nghiem P., et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat. Biotechnol. 2021;39:1375–1384. doi: 10.1038/s41587-021-00935-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Brown J., Ni Z., Mohanty C., Bacher R., Kendziorski C. Normalization by distributional resampling of high throughput single-cell RNA-sequencing data. Bioinformatics. 2021;37:4123–4128. doi: 10.1093/bioinformatics/btab450. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yin J., Geng Z., Li R., Wang H. Nonparametric covariance model. Stat. Sin. 2010;20:469–479. [PMC free article] [PubMed] [Google Scholar]
  • 47.Hoff P.D., Niu X. A covariance regression model. Stat. Sin. 2012;22:729–753. doi: 10.5705/ss.2010.051. [DOI] [Google Scholar]
  • 48.Wang Y., Hicks S.C., Hansen K.D. Co-expression analysis is biased by a mean-correlation relationship. bioRxiv. 2020 doi: 10.1101/2020.02.13.944777. Preprint at. [DOI] [Google Scholar]
  • 49.Huang M., Wang J., Torre E., Dueck H., Shaffer S., Bonasio R., Murray J.I., Raj A., Li M., Zhang N.R. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods. 2018;15:539–542. doi: 10.1038/s41592-018-0033-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Haghverdi L., Büttner M., Wolf F.A., Buettner F., Theis F.J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods. 2016;13:845–848. doi: 10.1038/nmeth.3971. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Figures S1–S4
mmc1.pdf (3.4MB, pdf)
Table S1. Top GO terms identified by the WR and BR-tests in the cSCC data, related to Figure 3
mmc2.xlsx (16.9KB, xlsx)
Document S2. Article plus supplemental information
mmc3.pdf (7.3MB, pdf)

Data Availability Statement


Articles from Cell Reports Methods are provided here courtesy of Elsevier

RESOURCES